Project

General

Profile

Bug #20221

Updated by Loïc Dachary almost 7 years ago

h3. description 

 When the OSD that has the last surviving copy of an object is killed in a replica 1 cluster, I/O are blocked. 

 h3. reproducer 

 From the build directory 

 <pre> 

 source ../src/test/detect-build-env-vars.sh 

 ../src/stop.sh ; rm -fr out dev ; rm -f ceph.conf ; bash ../src/vstart.sh -n -l --short --nolockdep --mon_num 1 --osd_num 3 --mds_num 0 
 ceph osd pool set rbd size 1 
 rados put BAR3 /etc/group --pool rbd 
 pkill -f 'ceph-osd -i 0' 
 sleep 10 
 ceph osd out osd.0 
 sleep 10 
 rados --debug-objecter 20 --debug-rados 20    put BAR2 /etc/group --pool rbd 
 (hang) 
 </pre> 

 It does not hang if killing osd.1 (replacing *-i 0* with *-i 1*). 

 h3. Sample run 

 <pre> 
 (virtualenv) loic@fold:~/software/ceph/ceph/build$ ../src/stop.sh ; rm -fr out dev ; rm -f ceph.conf ; bash ../src/vstart.sh -n -l --short --nolockdep --mon_num 1 --osd_num 3 --mds_num 0 
 (virtualenv) loic@fold:~/software/ceph/ceph/build$ ceph osd pool set rbd size 1 
 set pool 0 size to 1 
 (virtualenv) loic@fold:~/software/ceph/ceph/build$ ceph pg dump 
 dumped all 
 version 8 
 stamp 2017-06-08 11:37:27.021636 
 last_osdmap_epoch 0 
 last_pg_scan 0 
 full_ratio 0 
 nearfull_ratio 0 
 PG_STAT OBJECTS MISSING_ON_PRIMARY DEGRADED MISPLACED UNFOUND BYTES LOG DISK_LOG STATE          STATE_STAMP                  VERSION REPORTED UP    UP_PRIMARY ACTING ACTING_PRIMARY LAST_SCRUB SCRUB_STAMP                  LAST_DEEP_SCRUB DEEP_SCRUB_STAMP            
 0.7             0                    0          0           0         0       0     0          0 active+clean 2017-06-08 11:37:23.100644       0'0      16:32 [1]            1      [1]                1          0'0 2017-06-08 11:37:01.091443               0'0 2017-06-08 11:37:01.091443  
 0.6             0                    0          0           0         0       0     0          0 active+clean 2017-06-08 11:37:23.100645       0'0      16:41 [0]            0      [0]                0          0'0 2017-06-08 11:37:01.091443               0'0 2017-06-08 11:37:01.091443  
 0.5             0                    0          0           0         0       0     0          0 active+clean 2017-06-08 11:37:23.201360       0'0      16:41 [0]            0      [0]                0          0'0 2017-06-08 11:37:01.091443               0'0 2017-06-08 11:37:01.091443  
 0.4             0                    0          0           0         0       0     0          0 active+clean 2017-06-08 11:37:23.201217       0'0      16:41 [0]            0      [0]                0          0'0 2017-06-08 11:37:01.091443               0'0 2017-06-08 11:37:01.091443  
 0.0             0                    0          0           0         0       0     0          0 active+clean 2017-06-08 11:37:23.201318       0'0      16:36 [0]            0      [0]                0          0'0 2017-06-08 11:37:01.091443               0'0 2017-06-08 11:37:01.091443  
 0.1             0                    0          0           0         0       0     0          0 active+clean 2017-06-08 11:37:23.756707       0'0      16:11 [2]            2      [2]                2          0'0 2017-06-08 11:37:01.091443               0'0 2017-06-08 11:37:01.091443  
 0.2             0                    0          0           0         0       0     0          0 active+clean 2017-06-08 11:37:23.201392       0'0      16:41 [0]            0      [0]                0          0'0 2017-06-08 11:37:01.091443               0'0 2017-06-08 11:37:01.091443  
 0.3             0                    0          0           0         0       0     0          0 active+clean 2017-06-08 11:37:23.756706       0'0      16:11 [2]            2      [2]                2          0'0 2017-06-08 11:37:01.091443               0'0 2017-06-08 11:37:01.091443  
                  
 0 0 0 0 0 0 0 0 0  
                    
 sum 0 0 0 0 0 0 0 0  
 OSD_STAT USED AVAIL    TOTAL HB_PEERS PG_SUM PRIMARY_PG_SUM  
 2          174G    8076M    182G      [0,1]        2                2  
 1          174G    8076M    182G      [0,2]        1                1  
 0          174G    8076M    182G      [1,2]        5                5  
 sum        523G 24229M    547G                                 
 (virtualenv) loic@fold:~/software/ceph/ceph/build$ rados put BAR3 /etc/group --pool rbd 
 (virtualenv) loic@fold:~/software/ceph/ceph/build$ ceph pg dump 
 dumped all 
 version 14 
 stamp 2017-06-08 11:37:39.022935 
 last_osdmap_epoch 0 
 last_pg_scan 0 
 full_ratio 0 
 nearfull_ratio 0 
 PG_STAT OBJECTS MISSING_ON_PRIMARY DEGRADED MISPLACED UNFOUND BYTES LOG DISK_LOG STATE          STATE_STAMP                  VERSION REPORTED UP    UP_PRIMARY ACTING ACTING_PRIMARY LAST_SCRUB SCRUB_STAMP                  LAST_DEEP_SCRUB DEEP_SCRUB_STAMP            
 0.7             0                    0          0           0         0       0     0          0 active+clean 2017-06-08 11:37:23.100644       0'0      16:32 [1]            1      [1]                1          0'0 2017-06-08 11:37:01.091443               0'0 2017-06-08 11:37:01.091443  
 0.6             0                    0          0           0         0       0     0          0 active+clean 2017-06-08 11:37:23.100645       0'0      16:41 [0]            0      [0]                0          0'0 2017-06-08 11:37:01.091443               0'0 2017-06-08 11:37:01.091443  
 0.5             0                    0          0           0         0       0     0          0 active+clean 2017-06-08 11:37:23.201360       0'0      16:41 [0]            0      [0]                0          0'0 2017-06-08 11:37:01.091443               0'0 2017-06-08 11:37:01.091443  
 0.4             0                    0          0           0         0       0     0          0 active+clean 2017-06-08 11:37:23.201217       0'0      16:41 [0]            0      [0]                0          0'0 2017-06-08 11:37:01.091443               0'0 2017-06-08 11:37:01.091443  
 0.0             0                    0          0           0         0       0     0          0 active+clean 2017-06-08 11:37:23.201318       0'0      16:36 [0]            0      [0]                0          0'0 2017-06-08 11:37:01.091443               0'0 2017-06-08 11:37:01.091443  
 0.1             0                    0          0           0         0       0     0          0 active+clean 2017-06-08 11:37:23.756707       0'0      16:11 [2]            2      [2]                2          0'0 2017-06-08 11:37:01.091443               0'0 2017-06-08 11:37:01.091443  
 0.2             0                    0          0           0         0       0     0          0 active+clean 2017-06-08 11:37:23.201392       0'0      16:41 [0]            0      [0]                0          0'0 2017-06-08 11:37:01.091443               0'0 2017-06-08 11:37:01.091443  
 0.3             0                    0          0           0         0       0     0          0 active+clean 2017-06-08 11:37:23.756706       0'0      16:11 [2]            2      [2]                2          0'0 2017-06-08 11:37:01.091443               0'0 2017-06-08 11:37:01.091443  
                  
 0 0 0 0 0 0 0 0 0  
                    
 sum 0 0 0 0 0 0 0 0  
 OSD_STAT USED AVAIL    TOTAL HB_PEERS PG_SUM PRIMARY_PG_SUM  
 2          174G    8076M    182G      [0,1]        2                2  
 1          174G    8076M    182G      [0,2]        1                1  
 0          174G    8076M    182G      [1,2]        5                5  
 sum        523G 24229M    547G                                 
 (virtualenv) loic@fold:~/software/ceph/ceph/build$ ps fauwwx | grep ceph-osd 
 loic       18093    0.0    0.0    15796    1008 pts/24     S+     11:38     0:00            |     |         |     \_ grep ceph-osd 
 loic       17433    0.1    0.2 762160 33584 ?          Ssl    11:37     0:00            \_ /home/loic/software/ceph/ceph/build/bin/ceph-osd -i 0 -c /home/loic/software/ceph/ceph/build/ceph.conf 
 loic       17641    0.1    0.2 761136 32700 ?          Ssl    11:37     0:00            \_ /home/loic/software/ceph/ceph/build/bin/ceph-osd -i 1 -c /home/loic/software/ceph/ceph/build/ceph.conf 
 loic       17848    0.1    0.1 761140 31720 ?          Ssl    11:37     0:00            \_ /home/loic/software/ceph/ceph/build/bin/ceph-osd -i 2 -c /home/loic/software/ceph/ceph/build/ceph.conf 
 (virtualenv) loic@fold:~/software/ceph/ceph/build$ kill 17433 
 (virtualenv) loic@fold:~/software/ceph/ceph/build$ ceph pg dump 
 dumped all 
 version 40 
 stamp 2017-06-08 11:38:29.028347 
 last_osdmap_epoch 0 
 last_pg_scan 0 
 full_ratio 0 
 nearfull_ratio 0 
 PG_STAT OBJECTS MISSING_ON_PRIMARY DEGRADED MISPLACED UNFOUND BYTES LOG DISK_LOG STATE                STATE_STAMP                  VERSION REPORTED UP    UP_PRIMARY ACTING ACTING_PRIMARY LAST_SCRUB SCRUB_STAMP                  LAST_DEEP_SCRUB DEEP_SCRUB_STAMP            
 0.7             0                    0          0           0         0       0     0          0         active+clean 2017-06-08 11:37:23.100644       0'0      16:32 [1]            1      [1]                1          0'0 2017-06-08 11:37:01.091443               0'0 2017-06-08 11:37:01.091443  
 0.6             0                    0          0           0         0       0     0          0 stale+active+clean 2017-06-08 11:37:23.100645       0'0      16:41 [0]            0      [0]                0          0'0 2017-06-08 11:37:01.091443               0'0 2017-06-08 11:37:01.091443  
 0.5             0                    0          0           0         0       0     0          0 stale+active+clean 2017-06-08 11:37:23.201360       0'0      16:41 [0]            0      [0]                0          0'0 2017-06-08 11:37:01.091443               0'0 2017-06-08 11:37:01.091443  
 0.4             0                    0          0           0         0       0     0          0 stale+active+clean 2017-06-08 11:37:23.201217       0'0      16:41 [0]            0      [0]                0          0'0 2017-06-08 11:37:01.091443               0'0 2017-06-08 11:37:01.091443  
 0.0             0                    0          0           0         0       0     0          0 stale+active+clean 2017-06-08 11:37:23.201318       0'0      16:36 [0]            0      [0]                0          0'0 2017-06-08 11:37:01.091443               0'0 2017-06-08 11:37:01.091443  
 0.1             0                    0          0           0         0       0     0          0         active+clean 2017-06-08 11:37:23.756707       0'0      16:11 [2]            2      [2]                2          0'0 2017-06-08 11:37:01.091443               0'0 2017-06-08 11:37:01.091443  
 0.2             1                    0          0           0         0    1499     1          1 stale+active+clean 2017-06-08 11:37:23.201392      16'1      16:42 [0]            0      [0]                0          0'0 2017-06-08 11:37:01.091443               0'0 2017-06-08 11:37:01.091443  
 0.3             0                    0          0           0         0       0     0          0         active+clean 2017-06-08 11:37:23.756706       0'0      16:11 [2]            2      [2]                2          0'0 2017-06-08 11:37:01.091443               0'0 2017-06-08 11:37:01.091443  
                     
 0 1 0 0 0 0 1499 1 1  
                       
 sum 1 0 0 0 0 1499 1 1  
 OSD_STAT USED AVAIL    TOTAL HB_PEERS PG_SUM PRIMARY_PG_SUM  
 2          174G    8076M    182G      [0,1]        2                2  
 1          174G    8076M    182G      [0,2]        1                1  
 0          174G    8076M    182G      [1,2]        5                5  
 sum        523G 24228M    547G                                 
 (virtualenv) loic@fold:~/software/ceph/ceph/build$ rados --debug-objecter 20 --debug-rados 20    put BAR2 /etc/group --pool rbd 
 2017-06-08 11:41:52.928208 7f1e52998e00    1 librados: starting msgr at - 
 2017-06-08 11:41:52.928224 7f1e52998e00    1 librados: starting objecter 
 2017-06-08 11:41:52.928500 7f1e52998e00    1 librados: setting wanted keys 
 2017-06-08 11:41:52.928501 7f1e52998e00    1 librados: calling monclient init 
 2017-06-08 11:41:52.929470 7f1e3debb700 10 client.?.objecter ms_handle_connect 0x5597ace69ec0 
 2017-06-08 11:41:52.929489 7f1e3debb700 10 client.?.objecter resend_mon_ops 
 2017-06-08 11:41:52.930829 7f1e52998e00 10 client.4120.objecter _maybe_request_map subscribing (onetime) to next osd map 
 2017-06-08 11:41:52.930870 7f1e52998e00    1 librados: init done 
 2017-06-08 11:41:52.930877 7f1e52998e00 10 librados: wait_for_osdmap waiting 
 2017-06-08 11:41:52.931078 7f1e3debb700 10 client.4120.objecter ms_dispatch 0x5597acd413a0 osd_map(18..18 src has 1..18) v3 
 2017-06-08 11:41:52.931087 7f1e3debb700    3 client.4120.objecter handle_osd_map got epochs [18,18] > 0 
 2017-06-08 11:41:52.931088 7f1e3debb700    3 client.4120.objecter handle_osd_map decoding full epoch 18 
 2017-06-08 11:41:52.931207 7f1e3debb700 20 client.4120.objecter dump_active .. 0 homeless 
 2017-06-08 11:41:52.931227 7f1e52998e00 10 librados: wait_for_osdmap done waiting 
 2017-06-08 11:41:52.931265 7f1e52998e00 10 librados: writefull oid=BAR2 nspace= 
 2017-06-08 11:41:52.931278 7f1e52998e00 10 client.4120.objecter _op_submit op 0x5597ace6dac0 
 2017-06-08 11:41:52.931281 7f1e52998e00 20 client.4120.objecter _calc_target epoch 18 base BAR2 @0 precalc_pgid 0 pgid 0.0 is_write 
 2017-06-08 11:41:52.931293 7f1e52998e00 20 client.4120.objecter _calc_target target BAR2 @0 -> pgid 0.39aeb7e2 
 2017-06-08 11:41:52.931297 7f1e3debb700 10 client.4120.objecter ms_handle_connect 0x7f1e28009690 
 2017-06-08 11:41:52.931313 7f1e52998e00 10 client.4120.objecter _calc_target    raw pgid 0.39aeb7e2 -> actual 0.2 acting [] primary -1 
 2017-06-08 11:41:52.931319 7f1e52998e00 20 client.4120.objecter _get_session osd=-1 returning homeless 
 2017-06-08 11:41:52.931322 7f1e52998e00 10 client.4120.objecter _maybe_request_map subscribing (onetime) to next osd map 
 2017-06-08 11:41:52.931343 7f1e52998e00 10 client.4120.objecter _op_submit oid BAR2 '@0' '@0' [writefull 0~1499] tid 1 osd.-1 
 2017-06-08 11:41:52.931351 7f1e52998e00 15 client.4120.objecter _session_op_assign -1 1 
 2017-06-08 11:41:52.931352 7f1e52998e00    5 client.4120.objecter 1 in flight 
 2017-06-08 11:41:57.930919 7f1e3e70c700 10 client.4120.objecter tick 
 2017-06-08 11:41:57.930935 7f1e3e70c700 10 client.4120.objecter _maybe_request_map subscribing (onetime) to next osd map 
 2017-06-08 11:42:02.931046 7f1e3e70c700 10 client.4120.objecter tick 
 2017-06-08 11:42:02.931057 7f1e3e70c700 10 client.4120.objecter _maybe_request_map subscribing (onetime) to next osd map 
 2017-06-08 11:42:07.931192 7f1e3e70c700 10 client.4120.objecter tick 
 2017-06-08 11:42:07.931208 7f1e3e70c700 10 client.4120.objecter _maybe_request_map subscribing (onetime) to next osd map 
 2017-06-08 11:42:12.931317 7f1e3e70c700 10 client.4120.objecter tick 
 2017-06-08 11:42:12.931334 7f1e3e70c700 10 client.4120.objecter _maybe_request_map subscribing (onetime) to next osd map 
 2017-06-08 11:42:17.931406 7f1e3e70c700 10 client.4120.objecter tick 
 2017-06-08 11:42:17.931423 7f1e3e70c700 10 client.4120.objecter _maybe_request_map subscribing (onetime) to next osd map 
 2017-06-08 11:42:22.931567 7f1e3e70c700 10 client.4120.objecter tick 
 2017-06-08 11:42:22.931586 7f1e3e70c700 10 client.4120.objecter _maybe_request_map subscribing (onetime) to next osd map 
 </pre> 

Back