Project

General

Profile

Actions

Bug #26971

closed

failed to become clean before timeout expired

Added by David Zafman over 5 years ago. Updated about 5 years ago.

Status:
Duplicate
Priority:
High
Assignee:
David Zafman
Category:
-
Target version:
-
% Done:

0%

Source:
Development
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

http://qa-proxy.ceph.com/teuthology/dzafman-2018-08-16_17:35:08-rados:thrash-wip-zafman-testing4-distro-basic-smithi/2911870

1.8 0 0 0 0 0 0 0 0 active+undersized 2018-08-17 11:30:11.506468 0'0 617:557 [5,7] 5 [5,7] 5 0'0 2018-08-17 11:26:39.848677 0'0 2018-08-17 11:25:05.348213 0

2018-08-17T12:01:15.935 INFO:tasks.thrashosds.thrasher:Traceback (most recent call last):
File "/home/teuthworker/src/git.ceph.com_ceph-c_wip-zafman-testing4/qa/tasks/ceph_manager.py", line 861, in wrapper
return func(self)
File "/home/teuthworker/src/git.ceph.com_ceph-c_wip-zafman-testing4/qa/tasks/ceph_manager.py", line 980, in do_thrash
self.test_map_discontinuity()
File "/home/teuthworker/src/git.ceph.com_ceph-c_wip-zafman-testing4/qa/tasks/ceph_manager.py", line 779, in test_map_discontinuity
timeout=self.config.get('timeout')
File "/home/teuthworker/src/git.ceph.com_ceph-c_wip-zafman-testing4/qa/tasks/ceph_manager.py", line 2112, in wait_for_clean
'failed to become clean before timeout expired'
AssertionError: failed to become clean before timeout expired


Related issues 1 (0 open1 closed)

Is duplicate of sepia - Bug #39149: smithi025 has LVs that are 10x the othersResolvedDavid Galloway04/08/2019

Actions
Actions #1

Updated by Greg Farnum over 5 years ago

Looks like a PG is active+undersized state. Maybe the balancer screwed up?

Actions #2

Updated by David Zafman about 5 years ago

  • Assignee set to David Zafman
Actions #3

Updated by David Zafman about 5 years ago

Seen recently in luminous.

yuriw-2019-02-28_14:42:05-rados-wip-yuri4-testing-2019-02-27-2159-luminous-distro-basic-smithi/3650479

I'm retesting in master filtering for closest equivalent:

--suite rados:thrash --filter '0-size-min-size-overrides/3-size-2-min-size.yaml 1-pg-log-overrides/short_pg_log.yaml 2-recovery-overrides/{more-active-recovery.yaml} backoff/peering_and_degraded.yaml ceph.yaml clusters/{fixed-2.yaml openstack.yaml} d-balancer/upmap.yaml msgr-failures/osd-delay.yaml msgr/random.yaml objectstore/filestore-xfs.yaml rados.yaml supported-random-distro$/{ubuntu_latest.yaml} thrashers/mapgap.yaml thrashosds-health.yaml workloads/radosbench.yaml'

Actions #4

Updated by David Zafman about 5 years ago

  • Status changed from New to 12
  • Priority changed from Normal to High

Reproduced on master in 1 of 10 duplicate runs:

dzafman-2019-03-05_10:43:39-rados:thrash-master-distro-basic-smithi/3671598

Actions #5

Updated by David Zafman about 5 years ago

I'm not sure what this means, but pg 1.0 (size 3) needs to pick another one of the 2 remaining OSDs (4 OSDs in) to recovery to. However, we hit this during Peering.

2019-03-05 19:30:13.102 7f90e1c87700 10 osd.3 pg_epoch: 601 pg[1.0( empty local-lis/les=576/577 n=0 ec=14/14 lis/c 576/576 les/c/f 577/577/0 601/601/601) [3,6] r=0 lpr=601 pi=[576,601)/1 crt=0'0 mlcod 0'0 peering mbc={}] up_thru 585 < same_since 601, must notify monitor
2019-03-05 19:30:13.106 7f90e5c8f700 10 osd.3 pg_epoch: 601 pg[1.0( empty local-lis/les=576/577 n=0 ec=14/14 lis/c 576/576 les/c/f 577/577/0 601/601/601) [3,6] r=0 lpr=601 pi=[576,601)/1 crt=0'0 mlcod 0'0 peering mbc={}] state<Started/Primary/Peering/GetMissing>: still need up_thru update before going active
2019-03-05 19:30:14.110 7f90e1c87700 10 osd.3 pg_epoch: 602 pg[1.0( empty local-lis/les=576/577 n=0 ec=14/14 lis/c 576/576 les/c/f 577/577/0 601/601/601) [3,6] r=0 lpr=601 pi=[576,601)/1 crt=0'0 mlcod 0'0 peering mbc={}] adjust_need_up_thru now 601, need_up_thru now false

Actions #6

Updated by David Zafman about 5 years ago

dzafman-2019-03-26_16:39:54-rados:thrash-wip-zafman-26971-diag-distro-basic-smithi/3776762

Another run with diagnostics added to calc_replicated_acting():

This shows that OSDs 6 and 7 are available for these other PGs.

1.4           0                  0        0         0       0     0           0          0   0        0 active+undersized 2019-03-27 00:14:02.954625     0'0  659:716   [3,0]          3   [3,0]              3        0'0 2019-03-27 00:12:55.474184             0'0 2019-03-27 00:07:35.533921             0
1.3           0                  0        0         0       0     0           0          0   0        0      active+clean 2019-03-27 00:54:22.082685     0'0  660:526 [7,0,3]          7 [7,0,3]              7        0'0 2019-03-27 00:54:22.082626             0'0 2019-03-27 00:54:22.082626             0
1.2           0                  0        0         0       0     0           0          0   0        0      active+clean 2019-03-27 00:53:56.778513     0'0  660:761 [0,3,6]          0 [0,3,6]              0        0'0 2019-03-27 00:53:56.778380             0'0 2019-03-27 00:52:17.282249             0

I added a message if "want->size() >= size of" that did not output. So "up" must NOT include 6 and 7. Also,the candidate_by_last_update is empty so 6 and 7 aren't strays and weren't in acting to be rejected (no message for that).

I guess I could have assumed that up acting [3,0] coming in here. Maybe the "up" is the problem before we try to compute "acting."

2019-03-27 00:14:01.945 7f6ec2a23700 10 osd.3 pg_epoch: 398 pg[1.4( empty local-lis/les=232/233 n=0 ec=14/14 lis/c 232/232 les/c/f 233/233/0 398/398/14) [3,0] r=0 lpr=398 pi=[232,398)/1 crt=0'0 mlcod 0'0 peering mbc={}] calc_replicated_acting newest update on osd.3 with 1.4( empty local-lis/les=232/233 n=0 ec=14/14 lis/c 232/232 les/c/f 233/233/0 398/398/14)
up_primary: 3) selected as primary
primary is osd.3 with 1.4( empty local-lis/les=232/233 n=0 ec=14/14 lis/c 232/232 les/c/f 233/233/0 398/398/14)
osd.0 (up) accepted 1.4( empty local-lis/les=232/233 n=0 ec=14/14 lis/c 232/232 les/c/f 233/233/0 398/398/14)
candidate_by_last_update is empty
Actions #7

Updated by Neha Ojha about 5 years ago

The up set seems to be problem here.

This is the point when we find out that osd.5 is down

2019-03-27 00:14:01.941 7f6ebea1b700 10 osd.3 pg_epoch: 398 pg[1.4( empty local-lis/les=232/233 n=0 ec=14/14 lis/c 232/232 les/c/f 233/233/0 398/398/14) [3,0] r=0 lpr=398 pi=[232,398)/1 crt=0'0 mlcod 0'0 peering mbc={}] build_prior  prior osd.5 is down

Beyond that newup/newacting is always reported as [3,0]/[3,0]

2019-03-27 00:14:01.941 7f6ebea1b700 10 osd.3 pg_epoch: 397 pg[1.4( empty local-lis/les=232/233 n=0 ec=14/14 lis/c 232/232 les/c/f 233/233/0 232/232/14) [3,0,5] r=0 lpr=232 crt=0'0 mlcod 0'0 active+clean] handle_advance_map [3,0]/[3,0] -- 3/3
2019-03-27 00:14:02.941 7f6ec2a23700 10 osd.3 pg_epoch: 398 pg[1.4( empty local-lis/les=232/233 n=0 ec=14/14 lis/c 232/232 les/c/f 233/233/0 398/398/14) [3,0] r=0 lpr=398 pi=[232,398)/1 crt=0'0 mlcod 0'0 peering mbc={}] handle_advance_map [3,0]/[3,0] -- 3/3
2019-03-27 00:14:03.945 7f6ec2a23700 10 osd.3 pg_epoch: 399 pg[1.4( empty local-lis/les=398/399 n=0 ec=14/14 lis/c 398/232 les/c/f 399/233/0 398/398/14) [3,0] r=0 lpr=398 pi=[232,398)/1 crt=0'0 mlcod 0'0 active+undersized mbc={}] handle_advance_map [3,0]/[3,0] -- 3/3
2019-03-27 00:14:04.949 7f6ebea1b700 10 osd.3 pg_epoch: 400 pg[1.4( empty local-lis/les=398/399 n=0 ec=14/14 lis/c 398/232 les/c/f 399/233/0 398/398/14) [3,0] r=0 lpr=398 pi=[232,398)/1 crt=0'0 mlcod 0'0 active+undersized mbc={}] handle_advance_map [3,0]/[3,0] -- 3/3
2019-03-27 00:14:06.957 7f6ebea1b700 10 osd.3 pg_epoch: 401 pg[1.4( empty local-lis/les=398/399 n=0 ec=14/14 lis/c 398/232 les/c/f 399/233/0 398/398/14) [3,0] r=0 lpr=398 pi=[232,398)/1 crt=0'0 mlcod 0'0 active+undersized mbc={}] handle_advance_map [3,0]/[3,0] -- 3/3
2019-03-27 00:14:09.981 7f6ebea1b700 10 osd.3 pg_epoch: 402 pg[1.4( empty local-lis/les=398/399 n=0 ec=14/14 lis/c 398/232 les/c/f 399/233/0 398/398/14) [3,0] r=0 lpr=398 pi=[232,398)/1 crt=0'0 mlcod 0'0 active+undersized mbc={}] handle_advance_map [3,0]/[3,0] -- 3/3
2019-03-27 00:14:12.749 7f6ec2a23700 10 osd.3 pg_epoch: 403 pg[1.4( empty local-lis/les=398/399 n=0 ec=14/14 lis/c 398/232 les/c/f 399/233/0 398/398/14) [3,0] r=0 lpr=398 pi=[232,398)/1 crt=0'0 mlcod 0'0 active+undersized mbc={}] handle_advance_map [3,0]/[3,0] -- 3/3
2019-03-27 00:14:13.769 7f6ebea1b700 10 osd.3 pg_epoch: 404 pg[1.4( empty local-lis/les=398/399 n=0 ec=14/14 lis/c 398/232 les/c/f 399/233/0 398/398/14) [3,0] r=0 lpr=398 pi=[232,398)/1 crt=0'0 mlcod 0'0 active+undersized mbc={}] handle_advance_map [3,0]/[3,0] -- 3/3
2019-03-27 00:14:16.785 7f6ec2a23700 10 osd.3 pg_epoch: 405 pg[1.4( empty local-lis/les=398/399 n=0 ec=14/14 lis/c 398/232 les/c/f 399/233/0 398/398/14) [3,0] r=0 lpr=398 pi=[232,398)/1 crt=0'0 mlcod 0'0 active+undersized mbc={}] handle_advance_map [3,0]/[3,0] -- 3/3
2019-03-27 00:14:19.821 7f6ec2a23700 10 osd.3 pg_epoch: 406 pg[1.4( empty local-lis/les=398/399 n=0 ec=14/14 lis/c 398/232 les/c/f 399/233/0 398/398/14) [3,0] r=0 lpr=398 pi=[232,398)/1 crt=0'0 mlcod 0'0 active+undersized mbc={}] handle_advance_map [3,0]/[3,0] -- 3/3
2019-03-27 00:14:22.725 7f6ebea1b700 10 osd.3 pg_epoch: 407 pg[1.4( empty local-lis/les=398/399 n=0 ec=14/14 lis/c 398/232 les/c/f 399/233/0 398/398/14) [3,0] r=0 lpr=398 pi=[232,398)/1 crt=0'0 mlcod 0'0 active+undersized mbc={}] handle_advance_map [3,0]/[3,0] -- 3/3
2019-03-27 00:14:23.829 7f6ec2a23700 10 osd.3 pg_epoch: 408 pg[1.4( empty local-lis/les=398/399 n=0 ec=14/14 lis/c 398/232 les/c/f 399/233/0 398/398/14) [3,0] r=0 lpr=398 pi=[232,398)/1 crt=0'0 mlcod 0'0 active+undersized mbc={}] handle_advance_map [3,0]/[3,0] -- 3/3
2019-03-27 00:14:24.849 7f6ec2a23700 10 osd.3 pg_epoch: 409 pg[1.4( empty local-lis/les=398/399 n=0 ec=14/14 lis/c 398/232 les/c/f 399/233/0 398/398/14) [3,0] r=0 lpr=398 pi=[232,398)/1 crt=0'0 mlcod 0'0 active+undersized mbc={}] handle_advance_map [3,0]/[3,0] -- 3/3
2019-03-27 00:14:26.857 7f6ebea1b700 10 osd.3 pg_epoch: 410 pg[1.4( empty local-lis/les=398/399 n=0 ec=14/14 lis/c 398/232 les/c/f 399/233/0 398/398/14) [3,0] r=0 lpr=398 pi=[232,398)/1 crt=0'0 mlcod 0'0 active+undersized mbc={}] handle_advance_map [3,0]/[3,0] -- 3/3
2019-03-27 00:14:27.717 7f6ebea1b700 10 osd.3 pg_epoch: 411 pg[1.4( empty local-lis/les=398/399 n=0 ec=14/14 lis/c 398/232 les/c/f 399/233/0 398/398/14) [3,0] r=0 lpr=398 pi=[232,398)/1 crt=0'0 mlcod 0'0 active+undersized mbc={}] handle_advance_map [3,0]/[3,0] -- 3/3
2019-03-27 00:14:30.893 7f6ebea1b700 10 osd.3 pg_epoch: 412 pg[1.4( empty local-lis/les=398/399 n=0 ec=14/14 lis/c 398/232 les/c/f 399/233/0 398/398/14) [3,0] r=0 lpr=398 pi=[232,398)/1 crt=0'0 mlcod 0'0 active+undersized mbc={}] handle_advance_map [3,0]/[3,0] -- 3/3
2019-03-27 00:14:32.725 7f6ebea1b700 10 osd.3 pg_epoch: 413 pg[1.4( empty local-lis/les=398/399 n=0 ec=14/14 lis/c 398/232 les/c/f 399/233/0 398/398/14) [3,0] r=0 lpr=398 pi=[232,398)/1 crt=0'0 mlcod 0'0 active+undersized mbc={}] handle_advance_map [3,0]/[3,0] -- 3/3
2019-03-27 00:14:33.925 7f6ec2a23700 10 osd.3 pg_epoch: 414 pg[1.4( empty local-lis/les=398/399 n=0 ec=14/14 lis/c 398/232 les/c/f 399/233/0 398/398/14) [3,0] r=0 lpr=398 pi=[232,398)/1 crt=0'0 mlcod 0'0 active+undersized mbc={}] handle_advance_map [3,0]/[3,0] -- 3/3
2019-03-27 00:14:34.961 7f6ebea1b700 10 osd.3 pg_epoch: 415 pg[1.4( empty local-lis/les=398/399 n=0 ec=14/14 lis/c 398/232 les/c/f 399/233/0 398/398/14) [3,0] r=0 lpr=398 pi=[232,398)/1 crt=0'0 mlcod 0'0 active+undersized mbc={}] handle_advance_map [3,0]/[3,0] -- 3/3
.
.
Actions #8

Updated by Sage Weil about 5 years ago

This is just CRUSH failing. I extracted the osdmap from the data/mon.a.tgz and verified with osdmaptool that it's just failing to find a third available osd after 50 tries.

In theory the probability of that happening is about .00000056, or (roughly) one in a million. I think it's coming up in QA quite a bit more often than that (even considering there are 10s of PGs in each run.. probably because the hash that crush is using isn't as strong as it could be?

for the record,

$ ~/src/ceph/build/bin/osdmaptool  640 --test-map-pg 1.4
...
 crush_bucket_choose -5 x=1039746536 r=51
weight 0xdf76 item 1
weight 0xdf76 item 0
weight 0xdf76 item 3
weight 0xdf76 item 2
  item 0 type 0
  reject 0  collide 1  ftotal 50  flocal 1
 crush_bucket_choose -1 x=1039746536 r=52
weight 0x5964 item -3
weight 0x37dd8 item -5
  item -5 type 1
 crush_bucket_choose -5 x=1039746536 r=52
weight 0xdf76 item 1
weight 0xdf76 item 0
weight 0xdf76 item 3
weight 0xdf76 item 2
  item 1 type 0
  reject 1  collide 0  ftotal 51  flocal 1
skip rep
CHOOSE returns 2
1.4 raw ([3,0], p3) up ([3,0], p3) acting ([3,0], p3)

with this patch applied
diff --git a/src/crush/mapper.c b/src/crush/mapper.c
index 73f92a77d9f..c0ade62839e 100644
--- a/src/crush/mapper.c
+++ b/src/crush/mapper.c
@@ -25,7 +25,7 @@
 #include "crush_ln_table.h" 
 #include "mapper.h" 

-#define dprintk(args...) /* printf(args) */
+#define dprintk(args...)  printf(args) 

 /*
  * Implement the core CRUSH mapping algorithm.

Actions #9

Updated by Sage Weil about 5 years ago

oh, it's because there's alos 1/10th the probability of choosing the second host:

ID CLASS WEIGHT  TYPE NAME          STATUS REWEIGHT PRI-AFF 
-1       3.84076 root default                               
-5       3.49158     host smithi025                         
 0   ssd 0.87289         osd.0          up  1.00000 1.00000 
 1   ssd 0.87289         osd.1          up        0 1.00000 
 2   ssd 0.87289         osd.2          up        0 1.00000 
 3   ssd 0.87289         osd.3          up  1.00000 1.00000 
-3       0.34918     host smithi124                         
 4   ssd 0.08730         osd.4          up        0 1.00000 
 5   ssd 0.08730         osd.5        down        0 0.43417 
 6   ssd 0.08730         osd.6          up  1.00000 1.00000 
 7   ssd 0.08730         osd.7          up  1.00000 1.00000 

because,

teuthology:3820599  10:15 PM $ ssh smithi124 lsblk
NAME           MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda              8:0    0 931.5G  0 disk 
└─sda1           8:1    0 931.5G  0 part /
nvme0n1        259:0    0 372.6G  0 disk 
├─vg_nvme-lv_5 253:0    0  14.9G  0 lvm  /var/lib/ceph
├─vg_nvme-lv_4 253:1    0  89.4G  0 lvm  
├─vg_nvme-lv_3 253:2    0  89.4G  0 lvm  
├─vg_nvme-lv_2 253:3    0  89.4G  0 lvm  
└─vg_nvme-lv_1 253:4    0  89.4G  0 lvm  
teuthology:3820599  10:15 PM $ ssh smithi025 lsblk
NAME           MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda              8:0    0 931.5G  0 disk 
`-sda1           8:1    0 931.5G  0 part /
nvme0n1        259:0    0   3.7T  0 disk 
|-vg_nvme-lv_5 253:0    0   149G  0 lvm  /var/lib/ceph
|-vg_nvme-lv_4 253:1    0 894.2G  0 lvm  /var/lib/ceph/osd/ceph-4
|-vg_nvme-lv_3 253:2    0 894.2G  0 lvm  /var/lib/ceph/osd/ceph-7
|-vg_nvme-lv_2 253:3    0 894.2G  0 lvm  /var/lib/ceph/osd/ceph-6
`-vg_nvme-lv_1 253:4    0 894.2G  0 lvm  /var/lib/ceph/osd/ceph-5

Actions #11

Updated by David Zafman about 5 years ago

  • Is duplicate of Bug #39149: smithi025 has LVs that are 10x the others added
Actions #12

Updated by David Zafman about 5 years ago

  • Status changed from 12 to Duplicate
Actions

Also available in: Atom PDF