Project

General

Profile

Bug #43597

stuck waiting for pg to advance to epoch

Added by Sage Weil about 4 years ago. Updated about 4 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

(gdb) bt
#0  0x00007f541f1599f3 in futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x5607978c1604) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
#1  __pthread_cond_wait_common (abstime=0x0, mutex=0x5607978c1548, cond=0x5607978c15d8) at pthread_cond_wait.c:502
#2  __pthread_cond_wait (cond=0x5607978c15d8, mutex=0x5607978c1548) at pthread_cond_wait.c:655
#3  0x00007f541e83086c in std::condition_variable::wait(std::unique_lock<std::mutex>&) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#4  0x000056078d844268 in std::condition_variable::wait<OSDShard::wait_min_pg_epoch(epoch_t)::<lambda()> > (__p=..., __lock=..., this=0x5607978c15d8) at ./src/osd/OSD.cc:9993
#5  OSDShard::wait_min_pg_epoch(unsigned int) () at ./src/osd/OSD.cc:9987
#6  0x000056078d8445f7 in OSD::<lambda(int)>::operator() (__closure=0x5607978bfaf8, r=<optimized out>) at ./src/osd/OSDMap.h:670
#7  LambdaContext<OSD::_preboot(unsigned int, unsigned int)::{lambda(int)#9}>::finish(int) () at ./src/include/Context.h:129
#8  0x000056078d869b89 in Context::complete (this=0x5607978bfaf0, r=<optimized out>) at ./src/include/Context.h:77
#9  0x000056078de101ad in Finisher::finisher_thread_entry() () at ./src/common/Finisher.cc:66
#10 0x00007f541f1536db in start_thread (arg=0x7f5415c56700) at pthread_create.c:463
#11 0x00007f541def388f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
(gdb) f 5
#5  OSDShard::wait_min_pg_epoch(unsigned int) () at ./src/osd/OSD.cc:9987
9987    ./src/osd/OSD.cc: No such file or directory.
(gdb) p need
$2 = 326

meanwhile, hte osd is at epcoh 5000 or something.

/a/sage-2020-01-12_21:37:03-rados-wip-sage-testing-2020-01-12-0621-distro-basic-smithi/4660838

History

#1 Updated by Sage Weil about 4 years ago

1.c

2020-01-14T06:17:28.100+0000 7f53fa219700 20 osd.1 646 advance_pg 1.4 is merge target, sources are 1.c
2020-01-14T06:17:28.100+0000 7f53fa219700 10 osd.1 pg_epoch: 621 pg[1.4( empty local-lis/les=313/314 n=0 ec=15/15 lis/c=313/313 les/c/f=314/314/0 sis=620) [3,4]/[3,4,2] r=-1 lpr=620 pi=[313,620)/6 crt=0'0 mlcod 0'0 remapped NOTIFY mbc={}] merge_from from {1.c=0x560799640000} split_bits 3
2020-01-14T06:17:28.116+0000 7f540ba3c700 20 osd.1:4.identify_splits_and_merges identify_splits_and_merges slot 1.c has no pg and waiting_for_split 
2020-01-14T06:17:28.120+0000 7f540ba3c700 20 osd.1:4.consume_map consume_map 1.c
2020-01-14T06:17:28.120+0000 7f540ba3c700 20 osd.1:4.consume_map consume_map  1.c empty, pruning

1.10
2020-01-14T06:17:28.088+0000 7f53fc21d700 10 osd.1 pg_epoch: 605 pg[1.0( empty local-lis/les=305/306 n=0 ec=15/15 lis/c=305/305 les/c/f=306/306/0 sis=604) [2,0]/[2,0,6] r=-1 lpr=604 pi=[305,604)/3 crt=0'0 mlcod 0'0 remapped NOTIFY mbc={}] merge_from from {1.10=0x56079cc28000} split_bits 4
2020-01-14T06:17:28.096+0000 7f540ba3c700 20 osd.1:0.identify_splits_and_merges identify_splits_and_merges slot 1.10 has no pg and waiting_for_split 
2020-01-14T06:17:28.096+0000 7f540ba3c700 20 osd.1:0.consume_map consume_map 1.10
2020-01-14T06:17:28.096+0000 7f540ba3c700 20 osd.1:0.consume_map consume_map  1.10 empty, pruning

1.14
2020-01-14T06:17:28.076+0000 7f53fe221700 10 osd.1 pg_epoch: 569 pg[1.4( empty local-lis/les=313/314 n=0 ec=15/15 lis/c=313/313 les/c/f=314/314/0 sis=568) [3,6]/[3,6,2] r=-1 lpr=568 pi=[313,568)/5 crt=0'0 mlcod 0'0 remapped NOTIFY mbc={}] merge_from from {1.14=0x56079cc2b400} split_bits 4
2020-01-14T06:17:28.096+0000 7f540ba3c700 20 osd.1:4.identify_splits_and_merges identify_splits_and_merges slot 1.14 has no pg and waiting_for_split 
2020-01-14T06:17:28.096+0000 7f540ba3c700 20 osd.1:4.consume_map consume_map 1.14
2020-01-14T06:17:28.096+0000 7f540ba3c700 20 osd.1:4.consume_map consume_map  1.14 empty, pruning

1.13
2020-01-14T06:17:28.076+0000 7f53faa1a700 10 osd.1 pg_epoch: 573 pg[1.13( empty local-lis/les=317/318 n=0 ec=534/15 lis/c=317/317 les/c/f=318/318/0 sis=572) [6,0]/[6,0,5] r=-1 lpr=572 pi=[317,572)/3 crt=0'0 mlcod 0'0 remapped NOTIFY mbc={}] cancel_recovery
2020-01-14T06:17:28.076+0000 7f53faa1a700 10 osd.1 pg_epoch: 573 pg[1.13( empty local-lis/les=317/318 n=0 ec=534/15 lis/c=317/317 les/c/f=318/318/0 sis=572) [6,0]/[6,0,5] r=-1 lpr=572 pi=[317,572)/3 crt=0'0 mlcod 0'0 remapped NOTIFY mbc={}] clear_recovery_state
2020-01-14T06:17:28.076+0000 7f53faa1a700 10 osd.1:3._detach_pg 1.13 0x56079cd9e000
2020-01-14T06:17:28.076+0000 7f53faa1a700 10 osd.1 606 add_merge_waiter added merge_waiter 1.13 for 1.3, have 1/1
2020-01-14T06:17:28.076+0000 7f53faa1a700 20 osd.1 606 advance_pg 1.3 is merge target, sources are 1.13
2020-01-14T06:17:28.076+0000 7f53faa1a700 10 osd.1 pg_epoch: 573 pg[1.3( empty local-lis/les=317/318 n=0 ec=15/15 lis/c=317/317 les/c/f=318/318/0 sis=572) [6,0]/[6,0,5] r=-1 lpr=572 pi=[317,572)/2 crt=0'0 mlcod 0'0 remapped NOTIFY mbc={}] merge_from from {1.13=0x56079cd9e000} split_bits 4
2020-01-14T06:17:28.096+0000 7f540ba3c700 20 osd.1:3.identify_splits_and_merges identify_splits_and_merges slot 1.13 has no pg and waiting_for_split 
2020-01-14T06:17:28.096+0000 7f540ba3c700 20 osd.1:3.consume_map consume_map 1.13
2020-01-14T06:17:28.096+0000 7f540ba3c700 20 osd.1:3.consume_map consume_map  1.13 empty, pruning

Also available in: Atom PDF