Bug #26971
closedfailed to become clean before timeout expired
0%
Description
1.8 0 0 0 0 0 0 0 0 active+undersized 2018-08-17 11:30:11.506468 0'0 617:557 [5,7] 5 [5,7] 5 0'0 2018-08-17 11:26:39.848677 0'0 2018-08-17 11:25:05.348213 0
2018-08-17T12:01:15.935 INFO:tasks.thrashosds.thrasher:Traceback (most recent call last):
File "/home/teuthworker/src/git.ceph.com_ceph-c_wip-zafman-testing4/qa/tasks/ceph_manager.py", line 861, in wrapper
return func(self)
File "/home/teuthworker/src/git.ceph.com_ceph-c_wip-zafman-testing4/qa/tasks/ceph_manager.py", line 980, in do_thrash
self.test_map_discontinuity()
File "/home/teuthworker/src/git.ceph.com_ceph-c_wip-zafman-testing4/qa/tasks/ceph_manager.py", line 779, in test_map_discontinuity
timeout=self.config.get('timeout')
File "/home/teuthworker/src/git.ceph.com_ceph-c_wip-zafman-testing4/qa/tasks/ceph_manager.py", line 2112, in wait_for_clean
'failed to become clean before timeout expired'
AssertionError: failed to become clean before timeout expired
Updated by Greg Farnum over 5 years ago
Looks like a PG is active+undersized state. Maybe the balancer screwed up?
Updated by David Zafman about 5 years ago
Seen recently in luminous.
yuriw-2019-02-28_14:42:05-rados-wip-yuri4-testing-2019-02-27-2159-luminous-distro-basic-smithi/3650479
I'm retesting in master filtering for closest equivalent:
--suite rados:thrash --filter '0-size-min-size-overrides/3-size-2-min-size.yaml 1-pg-log-overrides/short_pg_log.yaml 2-recovery-overrides/{more-active-recovery.yaml} backoff/peering_and_degraded.yaml ceph.yaml clusters/{fixed-2.yaml openstack.yaml} d-balancer/upmap.yaml msgr-failures/osd-delay.yaml msgr/random.yaml objectstore/filestore-xfs.yaml rados.yaml supported-random-distro$/{ubuntu_latest.yaml} thrashers/mapgap.yaml thrashosds-health.yaml workloads/radosbench.yaml'
Updated by David Zafman about 5 years ago
- Status changed from New to 12
- Priority changed from Normal to High
Reproduced on master in 1 of 10 duplicate runs:
dzafman-2019-03-05_10:43:39-rados:thrash-master-distro-basic-smithi/3671598
Updated by David Zafman about 5 years ago
I'm not sure what this means, but pg 1.0 (size 3) needs to pick another one of the 2 remaining OSDs (4 OSDs in) to recovery to. However, we hit this during Peering.
2019-03-05 19:30:13.102 7f90e1c87700 10 osd.3 pg_epoch: 601 pg[1.0( empty local-lis/les=576/577 n=0 ec=14/14 lis/c 576/576 les/c/f 577/577/0 601/601/601) [3,6] r=0 lpr=601 pi=[576,601)/1 crt=0'0 mlcod 0'0 peering mbc={}] up_thru 585 < same_since 601, must notify monitor
2019-03-05 19:30:13.106 7f90e5c8f700 10 osd.3 pg_epoch: 601 pg[1.0( empty local-lis/les=576/577 n=0 ec=14/14 lis/c 576/576 les/c/f 577/577/0 601/601/601) [3,6] r=0 lpr=601 pi=[576,601)/1 crt=0'0 mlcod 0'0 peering mbc={}] state<Started/Primary/Peering/GetMissing>: still need up_thru update before going active
2019-03-05 19:30:14.110 7f90e1c87700 10 osd.3 pg_epoch: 602 pg[1.0( empty local-lis/les=576/577 n=0 ec=14/14 lis/c 576/576 les/c/f 577/577/0 601/601/601) [3,6] r=0 lpr=601 pi=[576,601)/1 crt=0'0 mlcod 0'0 peering mbc={}] adjust_need_up_thru now 601, need_up_thru now false
Updated by David Zafman about 5 years ago
dzafman-2019-03-26_16:39:54-rados:thrash-wip-zafman-26971-diag-distro-basic-smithi/3776762
Another run with diagnostics added to calc_replicated_acting():
This shows that OSDs 6 and 7 are available for these other PGs.
1.4 0 0 0 0 0 0 0 0 0 0 active+undersized 2019-03-27 00:14:02.954625 0'0 659:716 [3,0] 3 [3,0] 3 0'0 2019-03-27 00:12:55.474184 0'0 2019-03-27 00:07:35.533921 0 1.3 0 0 0 0 0 0 0 0 0 0 active+clean 2019-03-27 00:54:22.082685 0'0 660:526 [7,0,3] 7 [7,0,3] 7 0'0 2019-03-27 00:54:22.082626 0'0 2019-03-27 00:54:22.082626 0 1.2 0 0 0 0 0 0 0 0 0 0 active+clean 2019-03-27 00:53:56.778513 0'0 660:761 [0,3,6] 0 [0,3,6] 0 0'0 2019-03-27 00:53:56.778380 0'0 2019-03-27 00:52:17.282249 0
I added a message if "want->size() >= size of" that did not output. So "up" must NOT include 6 and 7. Also,the candidate_by_last_update is empty so 6 and 7 aren't strays and weren't in acting to be rejected (no message for that).
I guess I could have assumed that up acting [3,0] coming in here. Maybe the "up" is the problem before we try to compute "acting."
2019-03-27 00:14:01.945 7f6ec2a23700 10 osd.3 pg_epoch: 398 pg[1.4( empty local-lis/les=232/233 n=0 ec=14/14 lis/c 232/232 les/c/f 233/233/0 398/398/14) [3,0] r=0 lpr=398 pi=[232,398)/1 crt=0'0 mlcod 0'0 peering mbc={}] calc_replicated_acting newest update on osd.3 with 1.4( empty local-lis/les=232/233 n=0 ec=14/14 lis/c 232/232 les/c/f 233/233/0 398/398/14) up_primary: 3) selected as primary primary is osd.3 with 1.4( empty local-lis/les=232/233 n=0 ec=14/14 lis/c 232/232 les/c/f 233/233/0 398/398/14) osd.0 (up) accepted 1.4( empty local-lis/les=232/233 n=0 ec=14/14 lis/c 232/232 les/c/f 233/233/0 398/398/14) candidate_by_last_update is empty
Updated by Neha Ojha about 5 years ago
The up set seems to be problem here.
This is the point when we find out that osd.5 is down
2019-03-27 00:14:01.941 7f6ebea1b700 10 osd.3 pg_epoch: 398 pg[1.4( empty local-lis/les=232/233 n=0 ec=14/14 lis/c 232/232 les/c/f 233/233/0 398/398/14) [3,0] r=0 lpr=398 pi=[232,398)/1 crt=0'0 mlcod 0'0 peering mbc={}] build_prior prior osd.5 is down
Beyond that newup/newacting is always reported as [3,0]/[3,0]
2019-03-27 00:14:01.941 7f6ebea1b700 10 osd.3 pg_epoch: 397 pg[1.4( empty local-lis/les=232/233 n=0 ec=14/14 lis/c 232/232 les/c/f 233/233/0 232/232/14) [3,0,5] r=0 lpr=232 crt=0'0 mlcod 0'0 active+clean] handle_advance_map [3,0]/[3,0] -- 3/3 2019-03-27 00:14:02.941 7f6ec2a23700 10 osd.3 pg_epoch: 398 pg[1.4( empty local-lis/les=232/233 n=0 ec=14/14 lis/c 232/232 les/c/f 233/233/0 398/398/14) [3,0] r=0 lpr=398 pi=[232,398)/1 crt=0'0 mlcod 0'0 peering mbc={}] handle_advance_map [3,0]/[3,0] -- 3/3 2019-03-27 00:14:03.945 7f6ec2a23700 10 osd.3 pg_epoch: 399 pg[1.4( empty local-lis/les=398/399 n=0 ec=14/14 lis/c 398/232 les/c/f 399/233/0 398/398/14) [3,0] r=0 lpr=398 pi=[232,398)/1 crt=0'0 mlcod 0'0 active+undersized mbc={}] handle_advance_map [3,0]/[3,0] -- 3/3 2019-03-27 00:14:04.949 7f6ebea1b700 10 osd.3 pg_epoch: 400 pg[1.4( empty local-lis/les=398/399 n=0 ec=14/14 lis/c 398/232 les/c/f 399/233/0 398/398/14) [3,0] r=0 lpr=398 pi=[232,398)/1 crt=0'0 mlcod 0'0 active+undersized mbc={}] handle_advance_map [3,0]/[3,0] -- 3/3 2019-03-27 00:14:06.957 7f6ebea1b700 10 osd.3 pg_epoch: 401 pg[1.4( empty local-lis/les=398/399 n=0 ec=14/14 lis/c 398/232 les/c/f 399/233/0 398/398/14) [3,0] r=0 lpr=398 pi=[232,398)/1 crt=0'0 mlcod 0'0 active+undersized mbc={}] handle_advance_map [3,0]/[3,0] -- 3/3 2019-03-27 00:14:09.981 7f6ebea1b700 10 osd.3 pg_epoch: 402 pg[1.4( empty local-lis/les=398/399 n=0 ec=14/14 lis/c 398/232 les/c/f 399/233/0 398/398/14) [3,0] r=0 lpr=398 pi=[232,398)/1 crt=0'0 mlcod 0'0 active+undersized mbc={}] handle_advance_map [3,0]/[3,0] -- 3/3 2019-03-27 00:14:12.749 7f6ec2a23700 10 osd.3 pg_epoch: 403 pg[1.4( empty local-lis/les=398/399 n=0 ec=14/14 lis/c 398/232 les/c/f 399/233/0 398/398/14) [3,0] r=0 lpr=398 pi=[232,398)/1 crt=0'0 mlcod 0'0 active+undersized mbc={}] handle_advance_map [3,0]/[3,0] -- 3/3 2019-03-27 00:14:13.769 7f6ebea1b700 10 osd.3 pg_epoch: 404 pg[1.4( empty local-lis/les=398/399 n=0 ec=14/14 lis/c 398/232 les/c/f 399/233/0 398/398/14) [3,0] r=0 lpr=398 pi=[232,398)/1 crt=0'0 mlcod 0'0 active+undersized mbc={}] handle_advance_map [3,0]/[3,0] -- 3/3 2019-03-27 00:14:16.785 7f6ec2a23700 10 osd.3 pg_epoch: 405 pg[1.4( empty local-lis/les=398/399 n=0 ec=14/14 lis/c 398/232 les/c/f 399/233/0 398/398/14) [3,0] r=0 lpr=398 pi=[232,398)/1 crt=0'0 mlcod 0'0 active+undersized mbc={}] handle_advance_map [3,0]/[3,0] -- 3/3 2019-03-27 00:14:19.821 7f6ec2a23700 10 osd.3 pg_epoch: 406 pg[1.4( empty local-lis/les=398/399 n=0 ec=14/14 lis/c 398/232 les/c/f 399/233/0 398/398/14) [3,0] r=0 lpr=398 pi=[232,398)/1 crt=0'0 mlcod 0'0 active+undersized mbc={}] handle_advance_map [3,0]/[3,0] -- 3/3 2019-03-27 00:14:22.725 7f6ebea1b700 10 osd.3 pg_epoch: 407 pg[1.4( empty local-lis/les=398/399 n=0 ec=14/14 lis/c 398/232 les/c/f 399/233/0 398/398/14) [3,0] r=0 lpr=398 pi=[232,398)/1 crt=0'0 mlcod 0'0 active+undersized mbc={}] handle_advance_map [3,0]/[3,0] -- 3/3 2019-03-27 00:14:23.829 7f6ec2a23700 10 osd.3 pg_epoch: 408 pg[1.4( empty local-lis/les=398/399 n=0 ec=14/14 lis/c 398/232 les/c/f 399/233/0 398/398/14) [3,0] r=0 lpr=398 pi=[232,398)/1 crt=0'0 mlcod 0'0 active+undersized mbc={}] handle_advance_map [3,0]/[3,0] -- 3/3 2019-03-27 00:14:24.849 7f6ec2a23700 10 osd.3 pg_epoch: 409 pg[1.4( empty local-lis/les=398/399 n=0 ec=14/14 lis/c 398/232 les/c/f 399/233/0 398/398/14) [3,0] r=0 lpr=398 pi=[232,398)/1 crt=0'0 mlcod 0'0 active+undersized mbc={}] handle_advance_map [3,0]/[3,0] -- 3/3 2019-03-27 00:14:26.857 7f6ebea1b700 10 osd.3 pg_epoch: 410 pg[1.4( empty local-lis/les=398/399 n=0 ec=14/14 lis/c 398/232 les/c/f 399/233/0 398/398/14) [3,0] r=0 lpr=398 pi=[232,398)/1 crt=0'0 mlcod 0'0 active+undersized mbc={}] handle_advance_map [3,0]/[3,0] -- 3/3 2019-03-27 00:14:27.717 7f6ebea1b700 10 osd.3 pg_epoch: 411 pg[1.4( empty local-lis/les=398/399 n=0 ec=14/14 lis/c 398/232 les/c/f 399/233/0 398/398/14) [3,0] r=0 lpr=398 pi=[232,398)/1 crt=0'0 mlcod 0'0 active+undersized mbc={}] handle_advance_map [3,0]/[3,0] -- 3/3 2019-03-27 00:14:30.893 7f6ebea1b700 10 osd.3 pg_epoch: 412 pg[1.4( empty local-lis/les=398/399 n=0 ec=14/14 lis/c 398/232 les/c/f 399/233/0 398/398/14) [3,0] r=0 lpr=398 pi=[232,398)/1 crt=0'0 mlcod 0'0 active+undersized mbc={}] handle_advance_map [3,0]/[3,0] -- 3/3 2019-03-27 00:14:32.725 7f6ebea1b700 10 osd.3 pg_epoch: 413 pg[1.4( empty local-lis/les=398/399 n=0 ec=14/14 lis/c 398/232 les/c/f 399/233/0 398/398/14) [3,0] r=0 lpr=398 pi=[232,398)/1 crt=0'0 mlcod 0'0 active+undersized mbc={}] handle_advance_map [3,0]/[3,0] -- 3/3 2019-03-27 00:14:33.925 7f6ec2a23700 10 osd.3 pg_epoch: 414 pg[1.4( empty local-lis/les=398/399 n=0 ec=14/14 lis/c 398/232 les/c/f 399/233/0 398/398/14) [3,0] r=0 lpr=398 pi=[232,398)/1 crt=0'0 mlcod 0'0 active+undersized mbc={}] handle_advance_map [3,0]/[3,0] -- 3/3 2019-03-27 00:14:34.961 7f6ebea1b700 10 osd.3 pg_epoch: 415 pg[1.4( empty local-lis/les=398/399 n=0 ec=14/14 lis/c 398/232 les/c/f 399/233/0 398/398/14) [3,0] r=0 lpr=398 pi=[232,398)/1 crt=0'0 mlcod 0'0 active+undersized mbc={}] handle_advance_map [3,0]/[3,0] -- 3/3 . .
Updated by Sage Weil about 5 years ago
This is just CRUSH failing. I extracted the osdmap from the data/mon.a.tgz and verified with osdmaptool that it's just failing to find a third available osd after 50 tries.
In theory the probability of that happening is about .00000056, or (roughly) one in a million. I think it's coming up in QA quite a bit more often than that (even considering there are 10s of PGs in each run.. probably because the hash that crush is using isn't as strong as it could be?
for the record,
$ ~/src/ceph/build/bin/osdmaptool 640 --test-map-pg 1.4 ... crush_bucket_choose -5 x=1039746536 r=51 weight 0xdf76 item 1 weight 0xdf76 item 0 weight 0xdf76 item 3 weight 0xdf76 item 2 item 0 type 0 reject 0 collide 1 ftotal 50 flocal 1 crush_bucket_choose -1 x=1039746536 r=52 weight 0x5964 item -3 weight 0x37dd8 item -5 item -5 type 1 crush_bucket_choose -5 x=1039746536 r=52 weight 0xdf76 item 1 weight 0xdf76 item 0 weight 0xdf76 item 3 weight 0xdf76 item 2 item 1 type 0 reject 1 collide 0 ftotal 51 flocal 1 skip rep CHOOSE returns 2 1.4 raw ([3,0], p3) up ([3,0], p3) acting ([3,0], p3)
with this patch applied
diff --git a/src/crush/mapper.c b/src/crush/mapper.c index 73f92a77d9f..c0ade62839e 100644 --- a/src/crush/mapper.c +++ b/src/crush/mapper.c @@ -25,7 +25,7 @@ #include "crush_ln_table.h" #include "mapper.h" -#define dprintk(args...) /* printf(args) */ +#define dprintk(args...) printf(args) /* * Implement the core CRUSH mapping algorithm.
Updated by Sage Weil about 5 years ago
oh, it's because there's alos 1/10th the probability of choosing the second host:
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 3.84076 root default -5 3.49158 host smithi025 0 ssd 0.87289 osd.0 up 1.00000 1.00000 1 ssd 0.87289 osd.1 up 0 1.00000 2 ssd 0.87289 osd.2 up 0 1.00000 3 ssd 0.87289 osd.3 up 1.00000 1.00000 -3 0.34918 host smithi124 4 ssd 0.08730 osd.4 up 0 1.00000 5 ssd 0.08730 osd.5 down 0 0.43417 6 ssd 0.08730 osd.6 up 1.00000 1.00000 7 ssd 0.08730 osd.7 up 1.00000 1.00000
because,
teuthology:3820599 10:15 PM $ ssh smithi124 lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sda 8:0 0 931.5G 0 disk └─sda1 8:1 0 931.5G 0 part / nvme0n1 259:0 0 372.6G 0 disk ├─vg_nvme-lv_5 253:0 0 14.9G 0 lvm /var/lib/ceph ├─vg_nvme-lv_4 253:1 0 89.4G 0 lvm ├─vg_nvme-lv_3 253:2 0 89.4G 0 lvm ├─vg_nvme-lv_2 253:3 0 89.4G 0 lvm └─vg_nvme-lv_1 253:4 0 89.4G 0 lvm teuthology:3820599 10:15 PM $ ssh smithi025 lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sda 8:0 0 931.5G 0 disk `-sda1 8:1 0 931.5G 0 part / nvme0n1 259:0 0 3.7T 0 disk |-vg_nvme-lv_5 253:0 0 149G 0 lvm /var/lib/ceph |-vg_nvme-lv_4 253:1 0 894.2G 0 lvm /var/lib/ceph/osd/ceph-4 |-vg_nvme-lv_3 253:2 0 894.2G 0 lvm /var/lib/ceph/osd/ceph-7 |-vg_nvme-lv_2 253:3 0 894.2G 0 lvm /var/lib/ceph/osd/ceph-6 `-vg_nvme-lv_1 253:4 0 894.2G 0 lvm /var/lib/ceph/osd/ceph-5
Updated by Sage Weil about 5 years ago
Updated by David Zafman about 5 years ago
- Is duplicate of Bug #39149: smithi025 has LVs that are 10x the others added