Bug #20221
openkill osd + osd out leads to stale PGs
0%
Description
description¶
When the OSD is killed before ceph osd out, the PGs stay in stale state.
reproducer¶
From the build directory
source ../src/test/detect-build-env-vars.sh ../src/stop.sh ; rm -fr out dev ; rm -f ceph.conf ; bash ../src/vstart.sh -n -l --short --nolockdep --mon_num 1 --osd_num 3 --mds_num 0 ceph osd pool set rbd size 1 pkill -f 'ceph-osd -i 0' sleep 10 ceph osd out osd.0 sleep 10 ceph pg dump => osd.0 still mapped to stale PGs
All is well if ceph osd out osd.0 is done before killing the OSD.
Sample run¶
(virtualenv) loic@fold:~/software/ceph/ceph/build$ ../src/stop.sh ; rm -fr out dev ; rm -f ceph.conf ; bash ../src/vstart.sh -n -l --short --nolockdep --mon_num 1 --osd_num 3 --mds_num 0 (virtualenv) loic@fold:~/software/ceph/ceph/build$ ceph osd pool set rbd size 1 set pool 0 size to 1 (virtualenv) loic@fold:~/software/ceph/ceph/build$ ceph pg dump dumped all version 8 stamp 2017-06-08 11:37:27.021636 last_osdmap_epoch 0 last_pg_scan 0 full_ratio 0 nearfull_ratio 0 PG_STAT OBJECTS MISSING_ON_PRIMARY DEGRADED MISPLACED UNFOUND BYTES LOG DISK_LOG STATE STATE_STAMP VERSION REPORTED UP UP_PRIMARY ACTING ACTING_PRIMARY LAST_SCRUB SCRUB_STAMP LAST_DEEP_SCRUB DEEP_SCRUB_STAMP 0.7 0 0 0 0 0 0 0 0 active+clean 2017-06-08 11:37:23.100644 0'0 16:32 [1] 1 [1] 1 0'0 2017-06-08 11:37:01.091443 0'0 2017-06-08 11:37:01.091443 0.6 0 0 0 0 0 0 0 0 active+clean 2017-06-08 11:37:23.100645 0'0 16:41 [0] 0 [0] 0 0'0 2017-06-08 11:37:01.091443 0'0 2017-06-08 11:37:01.091443 0.5 0 0 0 0 0 0 0 0 active+clean 2017-06-08 11:37:23.201360 0'0 16:41 [0] 0 [0] 0 0'0 2017-06-08 11:37:01.091443 0'0 2017-06-08 11:37:01.091443 0.4 0 0 0 0 0 0 0 0 active+clean 2017-06-08 11:37:23.201217 0'0 16:41 [0] 0 [0] 0 0'0 2017-06-08 11:37:01.091443 0'0 2017-06-08 11:37:01.091443 0.0 0 0 0 0 0 0 0 0 active+clean 2017-06-08 11:37:23.201318 0'0 16:36 [0] 0 [0] 0 0'0 2017-06-08 11:37:01.091443 0'0 2017-06-08 11:37:01.091443 0.1 0 0 0 0 0 0 0 0 active+clean 2017-06-08 11:37:23.756707 0'0 16:11 [2] 2 [2] 2 0'0 2017-06-08 11:37:01.091443 0'0 2017-06-08 11:37:01.091443 0.2 0 0 0 0 0 0 0 0 active+clean 2017-06-08 11:37:23.201392 0'0 16:41 [0] 0 [0] 0 0'0 2017-06-08 11:37:01.091443 0'0 2017-06-08 11:37:01.091443 0.3 0 0 0 0 0 0 0 0 active+clean 2017-06-08 11:37:23.756706 0'0 16:11 [2] 2 [2] 2 0'0 2017-06-08 11:37:01.091443 0'0 2017-06-08 11:37:01.091443 0 0 0 0 0 0 0 0 0 sum 0 0 0 0 0 0 0 0 OSD_STAT USED AVAIL TOTAL HB_PEERS PG_SUM PRIMARY_PG_SUM 2 174G 8076M 182G [0,1] 2 2 1 174G 8076M 182G [0,2] 1 1 0 174G 8076M 182G [1,2] 5 5 sum 523G 24229M 547G (virtualenv) loic@fold:~/software/ceph/ceph/build$ rados put BAR3 /etc/group --pool rbd (virtualenv) loic@fold:~/software/ceph/ceph/build$ ceph pg dump dumped all version 14 stamp 2017-06-08 11:37:39.022935 last_osdmap_epoch 0 last_pg_scan 0 full_ratio 0 nearfull_ratio 0 PG_STAT OBJECTS MISSING_ON_PRIMARY DEGRADED MISPLACED UNFOUND BYTES LOG DISK_LOG STATE STATE_STAMP VERSION REPORTED UP UP_PRIMARY ACTING ACTING_PRIMARY LAST_SCRUB SCRUB_STAMP LAST_DEEP_SCRUB DEEP_SCRUB_STAMP 0.7 0 0 0 0 0 0 0 0 active+clean 2017-06-08 11:37:23.100644 0'0 16:32 [1] 1 [1] 1 0'0 2017-06-08 11:37:01.091443 0'0 2017-06-08 11:37:01.091443 0.6 0 0 0 0 0 0 0 0 active+clean 2017-06-08 11:37:23.100645 0'0 16:41 [0] 0 [0] 0 0'0 2017-06-08 11:37:01.091443 0'0 2017-06-08 11:37:01.091443 0.5 0 0 0 0 0 0 0 0 active+clean 2017-06-08 11:37:23.201360 0'0 16:41 [0] 0 [0] 0 0'0 2017-06-08 11:37:01.091443 0'0 2017-06-08 11:37:01.091443 0.4 0 0 0 0 0 0 0 0 active+clean 2017-06-08 11:37:23.201217 0'0 16:41 [0] 0 [0] 0 0'0 2017-06-08 11:37:01.091443 0'0 2017-06-08 11:37:01.091443 0.0 0 0 0 0 0 0 0 0 active+clean 2017-06-08 11:37:23.201318 0'0 16:36 [0] 0 [0] 0 0'0 2017-06-08 11:37:01.091443 0'0 2017-06-08 11:37:01.091443 0.1 0 0 0 0 0 0 0 0 active+clean 2017-06-08 11:37:23.756707 0'0 16:11 [2] 2 [2] 2 0'0 2017-06-08 11:37:01.091443 0'0 2017-06-08 11:37:01.091443 0.2 0 0 0 0 0 0 0 0 active+clean 2017-06-08 11:37:23.201392 0'0 16:41 [0] 0 [0] 0 0'0 2017-06-08 11:37:01.091443 0'0 2017-06-08 11:37:01.091443 0.3 0 0 0 0 0 0 0 0 active+clean 2017-06-08 11:37:23.756706 0'0 16:11 [2] 2 [2] 2 0'0 2017-06-08 11:37:01.091443 0'0 2017-06-08 11:37:01.091443 0 0 0 0 0 0 0 0 0 sum 0 0 0 0 0 0 0 0 OSD_STAT USED AVAIL TOTAL HB_PEERS PG_SUM PRIMARY_PG_SUM 2 174G 8076M 182G [0,1] 2 2 1 174G 8076M 182G [0,2] 1 1 0 174G 8076M 182G [1,2] 5 5 sum 523G 24229M 547G (virtualenv) loic@fold:~/software/ceph/ceph/build$ ps fauwwx | grep ceph-osd loic 18093 0.0 0.0 15796 1008 pts/24 S+ 11:38 0:00 | | | \_ grep ceph-osd loic 17433 0.1 0.2 762160 33584 ? Ssl 11:37 0:00 \_ /home/loic/software/ceph/ceph/build/bin/ceph-osd -i 0 -c /home/loic/software/ceph/ceph/build/ceph.conf loic 17641 0.1 0.2 761136 32700 ? Ssl 11:37 0:00 \_ /home/loic/software/ceph/ceph/build/bin/ceph-osd -i 1 -c /home/loic/software/ceph/ceph/build/ceph.conf loic 17848 0.1 0.1 761140 31720 ? Ssl 11:37 0:00 \_ /home/loic/software/ceph/ceph/build/bin/ceph-osd -i 2 -c /home/loic/software/ceph/ceph/build/ceph.conf (virtualenv) loic@fold:~/software/ceph/ceph/build$ kill 17433 (virtualenv) loic@fold:~/software/ceph/ceph/build$ ceph pg dump dumped all version 40 stamp 2017-06-08 11:38:29.028347 last_osdmap_epoch 0 last_pg_scan 0 full_ratio 0 nearfull_ratio 0 PG_STAT OBJECTS MISSING_ON_PRIMARY DEGRADED MISPLACED UNFOUND BYTES LOG DISK_LOG STATE STATE_STAMP VERSION REPORTED UP UP_PRIMARY ACTING ACTING_PRIMARY LAST_SCRUB SCRUB_STAMP LAST_DEEP_SCRUB DEEP_SCRUB_STAMP 0.7 0 0 0 0 0 0 0 0 active+clean 2017-06-08 11:37:23.100644 0'0 16:32 [1] 1 [1] 1 0'0 2017-06-08 11:37:01.091443 0'0 2017-06-08 11:37:01.091443 0.6 0 0 0 0 0 0 0 0 stale+active+clean 2017-06-08 11:37:23.100645 0'0 16:41 [0] 0 [0] 0 0'0 2017-06-08 11:37:01.091443 0'0 2017-06-08 11:37:01.091443 0.5 0 0 0 0 0 0 0 0 stale+active+clean 2017-06-08 11:37:23.201360 0'0 16:41 [0] 0 [0] 0 0'0 2017-06-08 11:37:01.091443 0'0 2017-06-08 11:37:01.091443 0.4 0 0 0 0 0 0 0 0 stale+active+clean 2017-06-08 11:37:23.201217 0'0 16:41 [0] 0 [0] 0 0'0 2017-06-08 11:37:01.091443 0'0 2017-06-08 11:37:01.091443 0.0 0 0 0 0 0 0 0 0 stale+active+clean 2017-06-08 11:37:23.201318 0'0 16:36 [0] 0 [0] 0 0'0 2017-06-08 11:37:01.091443 0'0 2017-06-08 11:37:01.091443 0.1 0 0 0 0 0 0 0 0 active+clean 2017-06-08 11:37:23.756707 0'0 16:11 [2] 2 [2] 2 0'0 2017-06-08 11:37:01.091443 0'0 2017-06-08 11:37:01.091443 0.2 1 0 0 0 0 1499 1 1 stale+active+clean 2017-06-08 11:37:23.201392 16'1 16:42 [0] 0 [0] 0 0'0 2017-06-08 11:37:01.091443 0'0 2017-06-08 11:37:01.091443 0.3 0 0 0 0 0 0 0 0 active+clean 2017-06-08 11:37:23.756706 0'0 16:11 [2] 2 [2] 2 0'0 2017-06-08 11:37:01.091443 0'0 2017-06-08 11:37:01.091443 0 1 0 0 0 0 1499 1 1 sum 1 0 0 0 0 1499 1 1 OSD_STAT USED AVAIL TOTAL HB_PEERS PG_SUM PRIMARY_PG_SUM 2 174G 8076M 182G [0,1] 2 2 1 174G 8076M 182G [0,2] 1 1 0 174G 8076M 182G [1,2] 5 5 sum 523G 24228M 547G (virtualenv) loic@fold:~/software/ceph/ceph/build$ rados --debug-objecter 20 --debug-rados 20 put BAR2 /etc/group --pool rbd 2017-06-08 11:41:52.928208 7f1e52998e00 1 librados: starting msgr at - 2017-06-08 11:41:52.928224 7f1e52998e00 1 librados: starting objecter 2017-06-08 11:41:52.928500 7f1e52998e00 1 librados: setting wanted keys 2017-06-08 11:41:52.928501 7f1e52998e00 1 librados: calling monclient init 2017-06-08 11:41:52.929470 7f1e3debb700 10 client.?.objecter ms_handle_connect 0x5597ace69ec0 2017-06-08 11:41:52.929489 7f1e3debb700 10 client.?.objecter resend_mon_ops 2017-06-08 11:41:52.930829 7f1e52998e00 10 client.4120.objecter _maybe_request_map subscribing (onetime) to next osd map 2017-06-08 11:41:52.930870 7f1e52998e00 1 librados: init done 2017-06-08 11:41:52.930877 7f1e52998e00 10 librados: wait_for_osdmap waiting 2017-06-08 11:41:52.931078 7f1e3debb700 10 client.4120.objecter ms_dispatch 0x5597acd413a0 osd_map(18..18 src has 1..18) v3 2017-06-08 11:41:52.931087 7f1e3debb700 3 client.4120.objecter handle_osd_map got epochs [18,18] > 0 2017-06-08 11:41:52.931088 7f1e3debb700 3 client.4120.objecter handle_osd_map decoding full epoch 18 2017-06-08 11:41:52.931207 7f1e3debb700 20 client.4120.objecter dump_active .. 0 homeless 2017-06-08 11:41:52.931227 7f1e52998e00 10 librados: wait_for_osdmap done waiting 2017-06-08 11:41:52.931265 7f1e52998e00 10 librados: writefull oid=BAR2 nspace= 2017-06-08 11:41:52.931278 7f1e52998e00 10 client.4120.objecter _op_submit op 0x5597ace6dac0 2017-06-08 11:41:52.931281 7f1e52998e00 20 client.4120.objecter _calc_target epoch 18 base BAR2 @0 precalc_pgid 0 pgid 0.0 is_write 2017-06-08 11:41:52.931293 7f1e52998e00 20 client.4120.objecter _calc_target target BAR2 @0 -> pgid 0.39aeb7e2 2017-06-08 11:41:52.931297 7f1e3debb700 10 client.4120.objecter ms_handle_connect 0x7f1e28009690 2017-06-08 11:41:52.931313 7f1e52998e00 10 client.4120.objecter _calc_target raw pgid 0.39aeb7e2 -> actual 0.2 acting [] primary -1 2017-06-08 11:41:52.931319 7f1e52998e00 20 client.4120.objecter _get_session osd=-1 returning homeless 2017-06-08 11:41:52.931322 7f1e52998e00 10 client.4120.objecter _maybe_request_map subscribing (onetime) to next osd map 2017-06-08 11:41:52.931343 7f1e52998e00 10 client.4120.objecter _op_submit oid BAR2 '@0' '@0' [writefull 0~1499] tid 1 osd.-1 2017-06-08 11:41:52.931351 7f1e52998e00 15 client.4120.objecter _session_op_assign -1 1 2017-06-08 11:41:52.931352 7f1e52998e00 5 client.4120.objecter 1 in flight 2017-06-08 11:41:57.930919 7f1e3e70c700 10 client.4120.objecter tick 2017-06-08 11:41:57.930935 7f1e3e70c700 10 client.4120.objecter _maybe_request_map subscribing (onetime) to next osd map 2017-06-08 11:42:02.931046 7f1e3e70c700 10 client.4120.objecter tick 2017-06-08 11:42:02.931057 7f1e3e70c700 10 client.4120.objecter _maybe_request_map subscribing (onetime) to next osd map 2017-06-08 11:42:07.931192 7f1e3e70c700 10 client.4120.objecter tick 2017-06-08 11:42:07.931208 7f1e3e70c700 10 client.4120.objecter _maybe_request_map subscribing (onetime) to next osd map 2017-06-08 11:42:12.931317 7f1e3e70c700 10 client.4120.objecter tick 2017-06-08 11:42:12.931334 7f1e3e70c700 10 client.4120.objecter _maybe_request_map subscribing (onetime) to next osd map 2017-06-08 11:42:17.931406 7f1e3e70c700 10 client.4120.objecter tick 2017-06-08 11:42:17.931423 7f1e3e70c700 10 client.4120.objecter _maybe_request_map subscribing (onetime) to next osd map 2017-06-08 11:42:22.931567 7f1e3e70c700 10 client.4120.objecter tick 2017-06-08 11:42:22.931586 7f1e3e70c700 10 client.4120.objecter _maybe_request_map subscribing (onetime) to next osd map
Updated by Loïc Dachary almost 7 years ago
- Subject changed from killing the OSD that has the last surviving copy blocks I/O to kill osd + osd out leads to stale PGs
- Description updated (diff)
Updated by Greg Farnum almost 7 years ago
I don't understand what behavior you're looking for. Hanging is the expected behavior when data is unavailable.
Updated by Loïc Dachary almost 7 years ago
The reproducer works as expected on 12.0.3. The behavior changed somewhere in master after 12.0.3 was released.
Updated by Loïc Dachary almost 7 years ago
@Greg Farnum the first reproducer was not trying to rados put the same object. It was trying to rados put another object. I narrowed down the reproducer to "kill osd ; ceph osd out" not working (it did work in 12.0.3).
Updated by Greg Farnum almost 7 years ago
Right, but what you've said here is that if you have pool size one, and kill the only OSD hosting it, then no other OSD takes over for that PG. That's expected: there is no longer a live OSD hosting the data; nobody will notify another OSD to take over; no OSD will send in pg stats to the mon/mgr so the PG just goes stale with its previous state. Is this some unit test? If so I think the test is broken.
Updated by Loïc Dachary almost 7 years ago
@Greg Farnum the original bug description was updated with a simpler reproducer which does not involve copying objects. I believe this is an actual bug.