Project

General

Profile

Bug #12449

Updated by Kefu Chai almost 9 years ago

We have setup our cache tiering under the guidance of "this link":http://ceph.com/docs/master/rados/operations/cache-tiering/, with the only exception that we have also set a quota on the *cache* pool. 

 <pre> 
 ??[root@hust17 /home/runsisi]# ceph osd pool ls detail 
 pool 0 'rbd' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 64 pgp_num 64 last_change 52 flags hashpspool max_bytes 20000000 stripe_width 0 
 pool 1 'cache' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 64 pgp_num 64 last_change 125 flags hashpspool,incomplete_clones max_bytes 100000000 tier_of 2 cache_mode writeback target_bytes 100000000 hit_set bloom{false_positive_probability: 0.05, target_size: 0, seed: 0} 3600s x1 stripe_width 0 
 pool 2 'base' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 64 pgp_num 64 last_change 63 lfor 63 flags hashpspool tiers 1 read_tier 1 write_tier 1 stripe_width 0?? 
 </pre> 

 As you can see, the cache pool, whose name is "cache", has a *100MiB* quota set on it, then when we use the rados utility to put a *200MB* object to the backing pool, one of the three OSDs core dumped. This coredump won't happen if we clear the quota on the cache pool. 

 This can be reproduced easily as follows, and the attached file is the log of the core dumped OSD. 

 <pre> 
 ??[root@hust17 /home/runsisi]# ceph -s 
     cluster cbc99ef9-fbc3-41ad-a726-47359f8d84b3 
      health HEALTH_OK 
      monmap e3: 3 mons at {ceph0=192.168.133.10:6789/0,ceph1=192.168.133.11:6789/0,ceph2=192.168.133.12:6789/0} 
             election epoch 6, quorum 0,1,2 ceph0,ceph1,ceph2 
      osdmap e132: 3 osds: 3 up, 3 in 
       pgmap v413: 192 pgs, 3 pools, 4096 kB data, 1 objects 
             323 MB used, 131 GB / 131 GB avail 
                  192 active+clean 
 [root@hust17 /home/runsisi]#  
 [root@hust17 /home/runsisi]# ceph osd pool ls detail 
 pool 0 'rbd' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 64 pgp_num 64 last_change 52 flags hashpspool max_bytes 20000000 stripe_width 0 
 pool 1 'cache' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 64 pgp_num 64 last_change 125 flags hashpspool,incomplete_clones max_bytes 100000000 tier_of 2 cache_mode writeback target_bytes 100000000 hit_set bloom{false_positive_probability: 0.05, target_size: 0, seed: 0} 3600s x1 stripe_width 0 
 pool 2 'base' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 64 pgp_num 64 last_change 63 lfor 63 flags hashpspool tiers 1 read_tier 1 write_tier 1 stripe_width 0 

 [root@hust17 /home/runsisi]# rados -p base ls 
 [root@hust17 /home/runsisi]# rados -p cache ls 
 [root@hust17 /home/runsisi]# ll -h 200m.dat  
 -rw-r--r-- 1 root root 200M Jul 23 10:53 200m.dat 
 [root@hust17 /home/runsisi]#  
 [root@hust17 /home/runsisi]# rados -p base put x1 200m.dat  
 [root@hust17 /home/runsisi]# rados -p base put x1 200m.dat  
 error putting base/x1: (28) No space left on device 
 [root@hust17 /home/runsisi]# rados -p base put x1 200m.dat  
 2015-07-23 20:59:03.262865 7f937a0d7700    0 -- 192.168.133.1:0/1019253 >> 192.168.133.11:6800/20555 pipe(0x3009c90 sd=5 :0 s=1 pgs=0 cs=0 l=1 c=0x3003170).fault 
 [root@hust17 /home/runsisi]# ceph osd tree 
 ID WEIGHT    TYPE NAME        UP/DOWN REWEIGHT PRIMARY-AFFINITY  
 -1 0.11998 root default                                      
 -2 0.03999       host ceph2                                    
  0 0.03999           osd.0         up    1.00000            1.00000  
 -3 0.03999       host ceph1                                    
  1 0.03999           osd.1       down    1.00000            1.00000  
 -4 0.03999       host ceph0                                    
  2 0.03999           osd.2         up    1.00000            1.00000  
 [root@hust17 /home/runsisi]# ceph --version 
 ceph version 0.94.1 (e4bfad3a3c51054df7e537a724c8d0bf9be972ff)?? 
 </pre>

Back