Project

General

Profile

Actions

Bug #20221

open

kill osd + osd out leads to stale PGs

Added by Loïc Dachary almost 7 years ago. Updated over 4 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

description

When the OSD is killed before ceph osd out, the PGs stay in stale state.

reproducer

From the build directory


source ../src/test/detect-build-env-vars.sh

../src/stop.sh ; rm -fr out dev ; rm -f ceph.conf ; bash ../src/vstart.sh -n -l --short --nolockdep --mon_num 1 --osd_num 3 --mds_num 0
ceph osd pool set rbd size 1
pkill -f 'ceph-osd -i 0'
sleep 10
ceph osd out osd.0
sleep 10
ceph pg dump => osd.0 still mapped to stale PGs

All is well if ceph osd out osd.0 is done before killing the OSD.

Sample run

(virtualenv) loic@fold:~/software/ceph/ceph/build$ ../src/stop.sh ; rm -fr out dev ; rm -f ceph.conf ; bash ../src/vstart.sh -n -l --short --nolockdep --mon_num 1 --osd_num 3 --mds_num 0
(virtualenv) loic@fold:~/software/ceph/ceph/build$ ceph osd pool set rbd size 1
set pool 0 size to 1
(virtualenv) loic@fold:~/software/ceph/ceph/build$ ceph pg dump
dumped all
version 8
stamp 2017-06-08 11:37:27.021636
last_osdmap_epoch 0
last_pg_scan 0
full_ratio 0
nearfull_ratio 0
PG_STAT OBJECTS MISSING_ON_PRIMARY DEGRADED MISPLACED UNFOUND BYTES LOG DISK_LOG STATE        STATE_STAMP                VERSION REPORTED UP  UP_PRIMARY ACTING ACTING_PRIMARY LAST_SCRUB SCRUB_STAMP                LAST_DEEP_SCRUB DEEP_SCRUB_STAMP           
0.7           0                  0        0         0       0     0   0        0 active+clean 2017-06-08 11:37:23.100644     0'0    16:32 [1]          1    [1]              1        0'0 2017-06-08 11:37:01.091443             0'0 2017-06-08 11:37:01.091443 
0.6           0                  0        0         0       0     0   0        0 active+clean 2017-06-08 11:37:23.100645     0'0    16:41 [0]          0    [0]              0        0'0 2017-06-08 11:37:01.091443             0'0 2017-06-08 11:37:01.091443 
0.5           0                  0        0         0       0     0   0        0 active+clean 2017-06-08 11:37:23.201360     0'0    16:41 [0]          0    [0]              0        0'0 2017-06-08 11:37:01.091443             0'0 2017-06-08 11:37:01.091443 
0.4           0                  0        0         0       0     0   0        0 active+clean 2017-06-08 11:37:23.201217     0'0    16:41 [0]          0    [0]              0        0'0 2017-06-08 11:37:01.091443             0'0 2017-06-08 11:37:01.091443 
0.0           0                  0        0         0       0     0   0        0 active+clean 2017-06-08 11:37:23.201318     0'0    16:36 [0]          0    [0]              0        0'0 2017-06-08 11:37:01.091443             0'0 2017-06-08 11:37:01.091443 
0.1           0                  0        0         0       0     0   0        0 active+clean 2017-06-08 11:37:23.756707     0'0    16:11 [2]          2    [2]              2        0'0 2017-06-08 11:37:01.091443             0'0 2017-06-08 11:37:01.091443 
0.2           0                  0        0         0       0     0   0        0 active+clean 2017-06-08 11:37:23.201392     0'0    16:41 [0]          0    [0]              0        0'0 2017-06-08 11:37:01.091443             0'0 2017-06-08 11:37:01.091443 
0.3           0                  0        0         0       0     0   0        0 active+clean 2017-06-08 11:37:23.756706     0'0    16:11 [2]          2    [2]              2        0'0 2017-06-08 11:37:01.091443             0'0 2017-06-08 11:37:01.091443 

0 0 0 0 0 0 0 0 0 

sum 0 0 0 0 0 0 0 0 
OSD_STAT USED AVAIL  TOTAL HB_PEERS PG_SUM PRIMARY_PG_SUM 
2        174G  8076M  182G    [0,1]      2              2 
1        174G  8076M  182G    [0,2]      1              1 
0        174G  8076M  182G    [1,2]      5              5 
sum      523G 24229M  547G                                
(virtualenv) loic@fold:~/software/ceph/ceph/build$ rados put BAR3 /etc/group --pool rbd
(virtualenv) loic@fold:~/software/ceph/ceph/build$ ceph pg dump
dumped all
version 14
stamp 2017-06-08 11:37:39.022935
last_osdmap_epoch 0
last_pg_scan 0
full_ratio 0
nearfull_ratio 0
PG_STAT OBJECTS MISSING_ON_PRIMARY DEGRADED MISPLACED UNFOUND BYTES LOG DISK_LOG STATE        STATE_STAMP                VERSION REPORTED UP  UP_PRIMARY ACTING ACTING_PRIMARY LAST_SCRUB SCRUB_STAMP                LAST_DEEP_SCRUB DEEP_SCRUB_STAMP           
0.7           0                  0        0         0       0     0   0        0 active+clean 2017-06-08 11:37:23.100644     0'0    16:32 [1]          1    [1]              1        0'0 2017-06-08 11:37:01.091443             0'0 2017-06-08 11:37:01.091443 
0.6           0                  0        0         0       0     0   0        0 active+clean 2017-06-08 11:37:23.100645     0'0    16:41 [0]          0    [0]              0        0'0 2017-06-08 11:37:01.091443             0'0 2017-06-08 11:37:01.091443 
0.5           0                  0        0         0       0     0   0        0 active+clean 2017-06-08 11:37:23.201360     0'0    16:41 [0]          0    [0]              0        0'0 2017-06-08 11:37:01.091443             0'0 2017-06-08 11:37:01.091443 
0.4           0                  0        0         0       0     0   0        0 active+clean 2017-06-08 11:37:23.201217     0'0    16:41 [0]          0    [0]              0        0'0 2017-06-08 11:37:01.091443             0'0 2017-06-08 11:37:01.091443 
0.0           0                  0        0         0       0     0   0        0 active+clean 2017-06-08 11:37:23.201318     0'0    16:36 [0]          0    [0]              0        0'0 2017-06-08 11:37:01.091443             0'0 2017-06-08 11:37:01.091443 
0.1           0                  0        0         0       0     0   0        0 active+clean 2017-06-08 11:37:23.756707     0'0    16:11 [2]          2    [2]              2        0'0 2017-06-08 11:37:01.091443             0'0 2017-06-08 11:37:01.091443 
0.2           0                  0        0         0       0     0   0        0 active+clean 2017-06-08 11:37:23.201392     0'0    16:41 [0]          0    [0]              0        0'0 2017-06-08 11:37:01.091443             0'0 2017-06-08 11:37:01.091443 
0.3           0                  0        0         0       0     0   0        0 active+clean 2017-06-08 11:37:23.756706     0'0    16:11 [2]          2    [2]              2        0'0 2017-06-08 11:37:01.091443             0'0 2017-06-08 11:37:01.091443 

0 0 0 0 0 0 0 0 0 

sum 0 0 0 0 0 0 0 0 
OSD_STAT USED AVAIL  TOTAL HB_PEERS PG_SUM PRIMARY_PG_SUM 
2        174G  8076M  182G    [0,1]      2              2 
1        174G  8076M  182G    [0,2]      1              1 
0        174G  8076M  182G    [1,2]      5              5 
sum      523G 24229M  547G                                
(virtualenv) loic@fold:~/software/ceph/ceph/build$ ps fauwwx | grep ceph-osd
loic     18093  0.0  0.0  15796  1008 pts/24   S+   11:38   0:00          |   |       |   \_ grep ceph-osd
loic     17433  0.1  0.2 762160 33584 ?        Ssl  11:37   0:00          \_ /home/loic/software/ceph/ceph/build/bin/ceph-osd -i 0 -c /home/loic/software/ceph/ceph/build/ceph.conf
loic     17641  0.1  0.2 761136 32700 ?        Ssl  11:37   0:00          \_ /home/loic/software/ceph/ceph/build/bin/ceph-osd -i 1 -c /home/loic/software/ceph/ceph/build/ceph.conf
loic     17848  0.1  0.1 761140 31720 ?        Ssl  11:37   0:00          \_ /home/loic/software/ceph/ceph/build/bin/ceph-osd -i 2 -c /home/loic/software/ceph/ceph/build/ceph.conf
(virtualenv) loic@fold:~/software/ceph/ceph/build$ kill 17433
(virtualenv) loic@fold:~/software/ceph/ceph/build$ ceph pg dump
dumped all
version 40
stamp 2017-06-08 11:38:29.028347
last_osdmap_epoch 0
last_pg_scan 0
full_ratio 0
nearfull_ratio 0
PG_STAT OBJECTS MISSING_ON_PRIMARY DEGRADED MISPLACED UNFOUND BYTES LOG DISK_LOG STATE              STATE_STAMP                VERSION REPORTED UP  UP_PRIMARY ACTING ACTING_PRIMARY LAST_SCRUB SCRUB_STAMP                LAST_DEEP_SCRUB DEEP_SCRUB_STAMP           
0.7           0                  0        0         0       0     0   0        0       active+clean 2017-06-08 11:37:23.100644     0'0    16:32 [1]          1    [1]              1        0'0 2017-06-08 11:37:01.091443             0'0 2017-06-08 11:37:01.091443 
0.6           0                  0        0         0       0     0   0        0 stale+active+clean 2017-06-08 11:37:23.100645     0'0    16:41 [0]          0    [0]              0        0'0 2017-06-08 11:37:01.091443             0'0 2017-06-08 11:37:01.091443 
0.5           0                  0        0         0       0     0   0        0 stale+active+clean 2017-06-08 11:37:23.201360     0'0    16:41 [0]          0    [0]              0        0'0 2017-06-08 11:37:01.091443             0'0 2017-06-08 11:37:01.091443 
0.4           0                  0        0         0       0     0   0        0 stale+active+clean 2017-06-08 11:37:23.201217     0'0    16:41 [0]          0    [0]              0        0'0 2017-06-08 11:37:01.091443             0'0 2017-06-08 11:37:01.091443 
0.0           0                  0        0         0       0     0   0        0 stale+active+clean 2017-06-08 11:37:23.201318     0'0    16:36 [0]          0    [0]              0        0'0 2017-06-08 11:37:01.091443             0'0 2017-06-08 11:37:01.091443 
0.1           0                  0        0         0       0     0   0        0       active+clean 2017-06-08 11:37:23.756707     0'0    16:11 [2]          2    [2]              2        0'0 2017-06-08 11:37:01.091443             0'0 2017-06-08 11:37:01.091443 
0.2           1                  0        0         0       0  1499   1        1 stale+active+clean 2017-06-08 11:37:23.201392    16'1    16:42 [0]          0    [0]              0        0'0 2017-06-08 11:37:01.091443             0'0 2017-06-08 11:37:01.091443 
0.3           0                  0        0         0       0     0   0        0       active+clean 2017-06-08 11:37:23.756706     0'0    16:11 [2]          2    [2]              2        0'0 2017-06-08 11:37:01.091443             0'0 2017-06-08 11:37:01.091443 

0 1 0 0 0 0 1499 1 1 

sum 1 0 0 0 0 1499 1 1 
OSD_STAT USED AVAIL  TOTAL HB_PEERS PG_SUM PRIMARY_PG_SUM 
2        174G  8076M  182G    [0,1]      2              2 
1        174G  8076M  182G    [0,2]      1              1 
0        174G  8076M  182G    [1,2]      5              5 
sum      523G 24228M  547G                                
(virtualenv) loic@fold:~/software/ceph/ceph/build$ rados --debug-objecter 20 --debug-rados 20  put BAR2 /etc/group --pool rbd
2017-06-08 11:41:52.928208 7f1e52998e00  1 librados: starting msgr at -
2017-06-08 11:41:52.928224 7f1e52998e00  1 librados: starting objecter
2017-06-08 11:41:52.928500 7f1e52998e00  1 librados: setting wanted keys
2017-06-08 11:41:52.928501 7f1e52998e00  1 librados: calling monclient init
2017-06-08 11:41:52.929470 7f1e3debb700 10 client.?.objecter ms_handle_connect 0x5597ace69ec0
2017-06-08 11:41:52.929489 7f1e3debb700 10 client.?.objecter resend_mon_ops
2017-06-08 11:41:52.930829 7f1e52998e00 10 client.4120.objecter _maybe_request_map subscribing (onetime) to next osd map
2017-06-08 11:41:52.930870 7f1e52998e00  1 librados: init done
2017-06-08 11:41:52.930877 7f1e52998e00 10 librados: wait_for_osdmap waiting
2017-06-08 11:41:52.931078 7f1e3debb700 10 client.4120.objecter ms_dispatch 0x5597acd413a0 osd_map(18..18 src has 1..18) v3
2017-06-08 11:41:52.931087 7f1e3debb700  3 client.4120.objecter handle_osd_map got epochs [18,18] > 0
2017-06-08 11:41:52.931088 7f1e3debb700  3 client.4120.objecter handle_osd_map decoding full epoch 18
2017-06-08 11:41:52.931207 7f1e3debb700 20 client.4120.objecter dump_active .. 0 homeless
2017-06-08 11:41:52.931227 7f1e52998e00 10 librados: wait_for_osdmap done waiting
2017-06-08 11:41:52.931265 7f1e52998e00 10 librados: writefull oid=BAR2 nspace=
2017-06-08 11:41:52.931278 7f1e52998e00 10 client.4120.objecter _op_submit op 0x5597ace6dac0
2017-06-08 11:41:52.931281 7f1e52998e00 20 client.4120.objecter _calc_target epoch 18 base BAR2 @0 precalc_pgid 0 pgid 0.0 is_write
2017-06-08 11:41:52.931293 7f1e52998e00 20 client.4120.objecter _calc_target target BAR2 @0 -> pgid 0.39aeb7e2
2017-06-08 11:41:52.931297 7f1e3debb700 10 client.4120.objecter ms_handle_connect 0x7f1e28009690
2017-06-08 11:41:52.931313 7f1e52998e00 10 client.4120.objecter _calc_target  raw pgid 0.39aeb7e2 -> actual 0.2 acting [] primary -1
2017-06-08 11:41:52.931319 7f1e52998e00 20 client.4120.objecter _get_session osd=-1 returning homeless
2017-06-08 11:41:52.931322 7f1e52998e00 10 client.4120.objecter _maybe_request_map subscribing (onetime) to next osd map
2017-06-08 11:41:52.931343 7f1e52998e00 10 client.4120.objecter _op_submit oid BAR2 '@0' '@0' [writefull 0~1499] tid 1 osd.-1
2017-06-08 11:41:52.931351 7f1e52998e00 15 client.4120.objecter _session_op_assign -1 1
2017-06-08 11:41:52.931352 7f1e52998e00  5 client.4120.objecter 1 in flight
2017-06-08 11:41:57.930919 7f1e3e70c700 10 client.4120.objecter tick
2017-06-08 11:41:57.930935 7f1e3e70c700 10 client.4120.objecter _maybe_request_map subscribing (onetime) to next osd map
2017-06-08 11:42:02.931046 7f1e3e70c700 10 client.4120.objecter tick
2017-06-08 11:42:02.931057 7f1e3e70c700 10 client.4120.objecter _maybe_request_map subscribing (onetime) to next osd map
2017-06-08 11:42:07.931192 7f1e3e70c700 10 client.4120.objecter tick
2017-06-08 11:42:07.931208 7f1e3e70c700 10 client.4120.objecter _maybe_request_map subscribing (onetime) to next osd map
2017-06-08 11:42:12.931317 7f1e3e70c700 10 client.4120.objecter tick
2017-06-08 11:42:12.931334 7f1e3e70c700 10 client.4120.objecter _maybe_request_map subscribing (onetime) to next osd map
2017-06-08 11:42:17.931406 7f1e3e70c700 10 client.4120.objecter tick
2017-06-08 11:42:17.931423 7f1e3e70c700 10 client.4120.objecter _maybe_request_map subscribing (onetime) to next osd map
2017-06-08 11:42:22.931567 7f1e3e70c700 10 client.4120.objecter tick
2017-06-08 11:42:22.931586 7f1e3e70c700 10 client.4120.objecter _maybe_request_map subscribing (onetime) to next osd map
Actions #1

Updated by Loïc Dachary almost 7 years ago

  • Description updated (diff)
Actions #2

Updated by Loïc Dachary almost 7 years ago

  • Description updated (diff)
Actions #3

Updated by Loïc Dachary almost 7 years ago

  • Subject changed from killing the OSD that has the last surviving copy blocks I/O to kill osd + osd out leads to stale PGs
  • Description updated (diff)
Actions #4

Updated by Greg Farnum almost 7 years ago

I don't understand what behavior you're looking for. Hanging is the expected behavior when data is unavailable.

Actions #5

Updated by Loïc Dachary almost 7 years ago

The reproducer works as expected on 12.0.3. The behavior changed somewhere in master after 12.0.3 was released.

Actions #6

Updated by Loïc Dachary almost 7 years ago

@Greg Farnum the first reproducer was not trying to rados put the same object. It was trying to rados put another object. I narrowed down the reproducer to "kill osd ; ceph osd out" not working (it did work in 12.0.3).

Actions #7

Updated by Loïc Dachary almost 7 years ago

FWIW it was reproduced by badone.

Actions #8

Updated by Greg Farnum almost 7 years ago

Right, but what you've said here is that if you have pool size one, and kill the only OSD hosting it, then no other OSD takes over for that PG. That's expected: there is no longer a live OSD hosting the data; nobody will notify another OSD to take over; no OSD will send in pg stats to the mon/mgr so the PG just goes stale with its previous state. Is this some unit test? If so I think the test is broken.

Actions #9

Updated by Loïc Dachary almost 7 years ago

@Greg Farnum the original bug description was updated with a simpler reproducer which does not involve copying objects. I believe this is an actual bug.

Actions #10

Updated by Greg Farnum almost 7 years ago

  • Project changed from Ceph to RADOS
Actions #11

Updated by Patrick Donnelly over 4 years ago

  • Status changed from 12 to New
Actions

Also available in: Atom PDF