Project

General

Profile

Actions

Bug #20221

open

kill osd + osd out leads to stale PGs

Added by Loïc Dachary almost 7 years ago. Updated over 4 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

description

When the OSD is killed before ceph osd out, the PGs stay in stale state.

reproducer

From the build directory


source ../src/test/detect-build-env-vars.sh

../src/stop.sh ; rm -fr out dev ; rm -f ceph.conf ; bash ../src/vstart.sh -n -l --short --nolockdep --mon_num 1 --osd_num 3 --mds_num 0
ceph osd pool set rbd size 1
pkill -f 'ceph-osd -i 0'
sleep 10
ceph osd out osd.0
sleep 10
ceph pg dump => osd.0 still mapped to stale PGs

All is well if ceph osd out osd.0 is done before killing the OSD.

Sample run

(virtualenv) loic@fold:~/software/ceph/ceph/build$ ../src/stop.sh ; rm -fr out dev ; rm -f ceph.conf ; bash ../src/vstart.sh -n -l --short --nolockdep --mon_num 1 --osd_num 3 --mds_num 0
(virtualenv) loic@fold:~/software/ceph/ceph/build$ ceph osd pool set rbd size 1
set pool 0 size to 1
(virtualenv) loic@fold:~/software/ceph/ceph/build$ ceph pg dump
dumped all
version 8
stamp 2017-06-08 11:37:27.021636
last_osdmap_epoch 0
last_pg_scan 0
full_ratio 0
nearfull_ratio 0
PG_STAT OBJECTS MISSING_ON_PRIMARY DEGRADED MISPLACED UNFOUND BYTES LOG DISK_LOG STATE        STATE_STAMP                VERSION REPORTED UP  UP_PRIMARY ACTING ACTING_PRIMARY LAST_SCRUB SCRUB_STAMP                LAST_DEEP_SCRUB DEEP_SCRUB_STAMP           
0.7           0                  0        0         0       0     0   0        0 active+clean 2017-06-08 11:37:23.100644     0'0    16:32 [1]          1    [1]              1        0'0 2017-06-08 11:37:01.091443             0'0 2017-06-08 11:37:01.091443 
0.6           0                  0        0         0       0     0   0        0 active+clean 2017-06-08 11:37:23.100645     0'0    16:41 [0]          0    [0]              0        0'0 2017-06-08 11:37:01.091443             0'0 2017-06-08 11:37:01.091443 
0.5           0                  0        0         0       0     0   0        0 active+clean 2017-06-08 11:37:23.201360     0'0    16:41 [0]          0    [0]              0        0'0 2017-06-08 11:37:01.091443             0'0 2017-06-08 11:37:01.091443 
0.4           0                  0        0         0       0     0   0        0 active+clean 2017-06-08 11:37:23.201217     0'0    16:41 [0]          0    [0]              0        0'0 2017-06-08 11:37:01.091443             0'0 2017-06-08 11:37:01.091443 
0.0           0                  0        0         0       0     0   0        0 active+clean 2017-06-08 11:37:23.201318     0'0    16:36 [0]          0    [0]              0        0'0 2017-06-08 11:37:01.091443             0'0 2017-06-08 11:37:01.091443 
0.1           0                  0        0         0       0     0   0        0 active+clean 2017-06-08 11:37:23.756707     0'0    16:11 [2]          2    [2]              2        0'0 2017-06-08 11:37:01.091443             0'0 2017-06-08 11:37:01.091443 
0.2           0                  0        0         0       0     0   0        0 active+clean 2017-06-08 11:37:23.201392     0'0    16:41 [0]          0    [0]              0        0'0 2017-06-08 11:37:01.091443             0'0 2017-06-08 11:37:01.091443 
0.3           0                  0        0         0       0     0   0        0 active+clean 2017-06-08 11:37:23.756706     0'0    16:11 [2]          2    [2]              2        0'0 2017-06-08 11:37:01.091443             0'0 2017-06-08 11:37:01.091443 

0 0 0 0 0 0 0 0 0 

sum 0 0 0 0 0 0 0 0 
OSD_STAT USED AVAIL  TOTAL HB_PEERS PG_SUM PRIMARY_PG_SUM 
2        174G  8076M  182G    [0,1]      2              2 
1        174G  8076M  182G    [0,2]      1              1 
0        174G  8076M  182G    [1,2]      5              5 
sum      523G 24229M  547G                                
(virtualenv) loic@fold:~/software/ceph/ceph/build$ rados put BAR3 /etc/group --pool rbd
(virtualenv) loic@fold:~/software/ceph/ceph/build$ ceph pg dump
dumped all
version 14
stamp 2017-06-08 11:37:39.022935
last_osdmap_epoch 0
last_pg_scan 0
full_ratio 0
nearfull_ratio 0
PG_STAT OBJECTS MISSING_ON_PRIMARY DEGRADED MISPLACED UNFOUND BYTES LOG DISK_LOG STATE        STATE_STAMP                VERSION REPORTED UP  UP_PRIMARY ACTING ACTING_PRIMARY LAST_SCRUB SCRUB_STAMP                LAST_DEEP_SCRUB DEEP_SCRUB_STAMP           
0.7           0                  0        0         0       0     0   0        0 active+clean 2017-06-08 11:37:23.100644     0'0    16:32 [1]          1    [1]              1        0'0 2017-06-08 11:37:01.091443             0'0 2017-06-08 11:37:01.091443 
0.6           0                  0        0         0       0     0   0        0 active+clean 2017-06-08 11:37:23.100645     0'0    16:41 [0]          0    [0]              0        0'0 2017-06-08 11:37:01.091443             0'0 2017-06-08 11:37:01.091443 
0.5           0                  0        0         0       0     0   0        0 active+clean 2017-06-08 11:37:23.201360     0'0    16:41 [0]          0    [0]              0        0'0 2017-06-08 11:37:01.091443             0'0 2017-06-08 11:37:01.091443 
0.4           0                  0        0         0       0     0   0        0 active+clean 2017-06-08 11:37:23.201217     0'0    16:41 [0]          0    [0]              0        0'0 2017-06-08 11:37:01.091443             0'0 2017-06-08 11:37:01.091443 
0.0           0                  0        0         0       0     0   0        0 active+clean 2017-06-08 11:37:23.201318     0'0    16:36 [0]          0    [0]              0        0'0 2017-06-08 11:37:01.091443             0'0 2017-06-08 11:37:01.091443 
0.1           0                  0        0         0       0     0   0        0 active+clean 2017-06-08 11:37:23.756707     0'0    16:11 [2]          2    [2]              2        0'0 2017-06-08 11:37:01.091443             0'0 2017-06-08 11:37:01.091443 
0.2           0                  0        0         0       0     0   0        0 active+clean 2017-06-08 11:37:23.201392     0'0    16:41 [0]          0    [0]              0        0'0 2017-06-08 11:37:01.091443             0'0 2017-06-08 11:37:01.091443 
0.3           0                  0        0         0       0     0   0        0 active+clean 2017-06-08 11:37:23.756706     0'0    16:11 [2]          2    [2]              2        0'0 2017-06-08 11:37:01.091443             0'0 2017-06-08 11:37:01.091443 

0 0 0 0 0 0 0 0 0 

sum 0 0 0 0 0 0 0 0 
OSD_STAT USED AVAIL  TOTAL HB_PEERS PG_SUM PRIMARY_PG_SUM 
2        174G  8076M  182G    [0,1]      2              2 
1        174G  8076M  182G    [0,2]      1              1 
0        174G  8076M  182G    [1,2]      5              5 
sum      523G 24229M  547G                                
(virtualenv) loic@fold:~/software/ceph/ceph/build$ ps fauwwx | grep ceph-osd
loic     18093  0.0  0.0  15796  1008 pts/24   S+   11:38   0:00          |   |       |   \_ grep ceph-osd
loic     17433  0.1  0.2 762160 33584 ?        Ssl  11:37   0:00          \_ /home/loic/software/ceph/ceph/build/bin/ceph-osd -i 0 -c /home/loic/software/ceph/ceph/build/ceph.conf
loic     17641  0.1  0.2 761136 32700 ?        Ssl  11:37   0:00          \_ /home/loic/software/ceph/ceph/build/bin/ceph-osd -i 1 -c /home/loic/software/ceph/ceph/build/ceph.conf
loic     17848  0.1  0.1 761140 31720 ?        Ssl  11:37   0:00          \_ /home/loic/software/ceph/ceph/build/bin/ceph-osd -i 2 -c /home/loic/software/ceph/ceph/build/ceph.conf
(virtualenv) loic@fold:~/software/ceph/ceph/build$ kill 17433
(virtualenv) loic@fold:~/software/ceph/ceph/build$ ceph pg dump
dumped all
version 40
stamp 2017-06-08 11:38:29.028347
last_osdmap_epoch 0
last_pg_scan 0
full_ratio 0
nearfull_ratio 0
PG_STAT OBJECTS MISSING_ON_PRIMARY DEGRADED MISPLACED UNFOUND BYTES LOG DISK_LOG STATE              STATE_STAMP                VERSION REPORTED UP  UP_PRIMARY ACTING ACTING_PRIMARY LAST_SCRUB SCRUB_STAMP                LAST_DEEP_SCRUB DEEP_SCRUB_STAMP           
0.7           0                  0        0         0       0     0   0        0       active+clean 2017-06-08 11:37:23.100644     0'0    16:32 [1]          1    [1]              1        0'0 2017-06-08 11:37:01.091443             0'0 2017-06-08 11:37:01.091443 
0.6           0                  0        0         0       0     0   0        0 stale+active+clean 2017-06-08 11:37:23.100645     0'0    16:41 [0]          0    [0]              0        0'0 2017-06-08 11:37:01.091443             0'0 2017-06-08 11:37:01.091443 
0.5           0                  0        0         0       0     0   0        0 stale+active+clean 2017-06-08 11:37:23.201360     0'0    16:41 [0]          0    [0]              0        0'0 2017-06-08 11:37:01.091443             0'0 2017-06-08 11:37:01.091443 
0.4           0                  0        0         0       0     0   0        0 stale+active+clean 2017-06-08 11:37:23.201217     0'0    16:41 [0]          0    [0]              0        0'0 2017-06-08 11:37:01.091443             0'0 2017-06-08 11:37:01.091443 
0.0           0                  0        0         0       0     0   0        0 stale+active+clean 2017-06-08 11:37:23.201318     0'0    16:36 [0]          0    [0]              0        0'0 2017-06-08 11:37:01.091443             0'0 2017-06-08 11:37:01.091443 
0.1           0                  0        0         0       0     0   0        0       active+clean 2017-06-08 11:37:23.756707     0'0    16:11 [2]          2    [2]              2        0'0 2017-06-08 11:37:01.091443             0'0 2017-06-08 11:37:01.091443 
0.2           1                  0        0         0       0  1499   1        1 stale+active+clean 2017-06-08 11:37:23.201392    16'1    16:42 [0]          0    [0]              0        0'0 2017-06-08 11:37:01.091443             0'0 2017-06-08 11:37:01.091443 
0.3           0                  0        0         0       0     0   0        0       active+clean 2017-06-08 11:37:23.756706     0'0    16:11 [2]          2    [2]              2        0'0 2017-06-08 11:37:01.091443             0'0 2017-06-08 11:37:01.091443 

0 1 0 0 0 0 1499 1 1 

sum 1 0 0 0 0 1499 1 1 
OSD_STAT USED AVAIL  TOTAL HB_PEERS PG_SUM PRIMARY_PG_SUM 
2        174G  8076M  182G    [0,1]      2              2 
1        174G  8076M  182G    [0,2]      1              1 
0        174G  8076M  182G    [1,2]      5              5 
sum      523G 24228M  547G                                
(virtualenv) loic@fold:~/software/ceph/ceph/build$ rados --debug-objecter 20 --debug-rados 20  put BAR2 /etc/group --pool rbd
2017-06-08 11:41:52.928208 7f1e52998e00  1 librados: starting msgr at -
2017-06-08 11:41:52.928224 7f1e52998e00  1 librados: starting objecter
2017-06-08 11:41:52.928500 7f1e52998e00  1 librados: setting wanted keys
2017-06-08 11:41:52.928501 7f1e52998e00  1 librados: calling monclient init
2017-06-08 11:41:52.929470 7f1e3debb700 10 client.?.objecter ms_handle_connect 0x5597ace69ec0
2017-06-08 11:41:52.929489 7f1e3debb700 10 client.?.objecter resend_mon_ops
2017-06-08 11:41:52.930829 7f1e52998e00 10 client.4120.objecter _maybe_request_map subscribing (onetime) to next osd map
2017-06-08 11:41:52.930870 7f1e52998e00  1 librados: init done
2017-06-08 11:41:52.930877 7f1e52998e00 10 librados: wait_for_osdmap waiting
2017-06-08 11:41:52.931078 7f1e3debb700 10 client.4120.objecter ms_dispatch 0x5597acd413a0 osd_map(18..18 src has 1..18) v3
2017-06-08 11:41:52.931087 7f1e3debb700  3 client.4120.objecter handle_osd_map got epochs [18,18] > 0
2017-06-08 11:41:52.931088 7f1e3debb700  3 client.4120.objecter handle_osd_map decoding full epoch 18
2017-06-08 11:41:52.931207 7f1e3debb700 20 client.4120.objecter dump_active .. 0 homeless
2017-06-08 11:41:52.931227 7f1e52998e00 10 librados: wait_for_osdmap done waiting
2017-06-08 11:41:52.931265 7f1e52998e00 10 librados: writefull oid=BAR2 nspace=
2017-06-08 11:41:52.931278 7f1e52998e00 10 client.4120.objecter _op_submit op 0x5597ace6dac0
2017-06-08 11:41:52.931281 7f1e52998e00 20 client.4120.objecter _calc_target epoch 18 base BAR2 @0 precalc_pgid 0 pgid 0.0 is_write
2017-06-08 11:41:52.931293 7f1e52998e00 20 client.4120.objecter _calc_target target BAR2 @0 -> pgid 0.39aeb7e2
2017-06-08 11:41:52.931297 7f1e3debb700 10 client.4120.objecter ms_handle_connect 0x7f1e28009690
2017-06-08 11:41:52.931313 7f1e52998e00 10 client.4120.objecter _calc_target  raw pgid 0.39aeb7e2 -> actual 0.2 acting [] primary -1
2017-06-08 11:41:52.931319 7f1e52998e00 20 client.4120.objecter _get_session osd=-1 returning homeless
2017-06-08 11:41:52.931322 7f1e52998e00 10 client.4120.objecter _maybe_request_map subscribing (onetime) to next osd map
2017-06-08 11:41:52.931343 7f1e52998e00 10 client.4120.objecter _op_submit oid BAR2 '@0' '@0' [writefull 0~1499] tid 1 osd.-1
2017-06-08 11:41:52.931351 7f1e52998e00 15 client.4120.objecter _session_op_assign -1 1
2017-06-08 11:41:52.931352 7f1e52998e00  5 client.4120.objecter 1 in flight
2017-06-08 11:41:57.930919 7f1e3e70c700 10 client.4120.objecter tick
2017-06-08 11:41:57.930935 7f1e3e70c700 10 client.4120.objecter _maybe_request_map subscribing (onetime) to next osd map
2017-06-08 11:42:02.931046 7f1e3e70c700 10 client.4120.objecter tick
2017-06-08 11:42:02.931057 7f1e3e70c700 10 client.4120.objecter _maybe_request_map subscribing (onetime) to next osd map
2017-06-08 11:42:07.931192 7f1e3e70c700 10 client.4120.objecter tick
2017-06-08 11:42:07.931208 7f1e3e70c700 10 client.4120.objecter _maybe_request_map subscribing (onetime) to next osd map
2017-06-08 11:42:12.931317 7f1e3e70c700 10 client.4120.objecter tick
2017-06-08 11:42:12.931334 7f1e3e70c700 10 client.4120.objecter _maybe_request_map subscribing (onetime) to next osd map
2017-06-08 11:42:17.931406 7f1e3e70c700 10 client.4120.objecter tick
2017-06-08 11:42:17.931423 7f1e3e70c700 10 client.4120.objecter _maybe_request_map subscribing (onetime) to next osd map
2017-06-08 11:42:22.931567 7f1e3e70c700 10 client.4120.objecter tick
2017-06-08 11:42:22.931586 7f1e3e70c700 10 client.4120.objecter _maybe_request_map subscribing (onetime) to next osd map
Actions

Also available in: Atom PDF