Project

General

Profile

Bug #1925

osd: segfault during _scan_list

Added by Sage Weil almost 8 years ago. Updated almost 8 years ago.

Status:
Closed
Priority:
High
Assignee:
-
Category:
OSD
Target version:
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature:

Description


(gdb) bt
#0  0x00007f0038dd2a0b in raise (sig=<value optimized out>) at ../nptl/sysdeps/unix/sysv/linux/pt-raise.c:42
#1  0x0000000000a39753 in reraise_fatal (signum=11) at global/signal_handler.cc:59
#2  0x0000000000a39960 in handle_fatal_signal (signum=11) at global/signal_handler.cc:109
#3  <signal handler called>
#4  0x00000000007a8750 in OSDMap::get_epoch() const ()
#5  0x00000000009692aa in PG::gen_prefix (this=0x2378200) at osd/PG.cc:67
#6  0x00000000009690e5 in _prefix (_dout=0x1f0f010, pg=0x2378200) at osd/PG.cc:37
#7  0x000000000097fecd in PG::_scan_list (this=0x2378200, map=..., ls=...) at osd/PG.cc:2389
#8  0x0000000000981c1d in PG::build_scrub_map (this=0x2378200, map=...) at osd/PG.cc:2545
#9  0x0000000000983564 in PG::scrub (this=0x2378200) at osd/PG.cc:2755
#10 0x00000000008601e6 in OSD::ScrubWQ::_process(PG*) ()
#11 0x00000000008a4748 in ThreadPool::WorkQueue<PG>::_void_process(void*) ()
#12 0x00000000008d4f93 in ThreadPool::worker (this=0x1f36668) at common/WorkQueue.cc:54
#13 0x000000000085db56 in ThreadPool::WorkThread::entry() ()
#14 0x00000000008ae061 in Thread::_entry_func (arg=0x1f047e0) at common/Thread.cc:41
#15 0x00007f0038dca971 in start_thread (arg=<value optimized out>) at pthread_create.c:304
#16 0x00007f003745592d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#17 0x0000000000000000 in ?? ()

log

2012-01-11 06:53:05.441658 7f0029f39700 osd.0 17 dequeue_op osd_sub_op_reply(unknown.0.0:0 0.31 0//0 [scrub-reserve] ack, result = 0) v1 pg pg[0.31( v 17'200 (17'198,17'200] n=12 ec=1 les/c 4/17 5/5/5) [0,7] r=0 lpr=5 mlcod 17'199 active+clean]
2012-01-11 06:53:05.441678 7f0029f39700 osd.0 17 _share_map_outgoing osd.7 10.3.14.140:6801/1932 already has epoch 17
2012-01-11 06:53:05.441696 7f0029f39700 osd.0 17 pg[0.31( v 17'200 (17'198,17'200] n=12 ec=1 les/c 4/17 5/5/5) [0,7] r=0 lpr=5 mlcod 17'199 active+clean] sub_op_scrub_reserve_reply
2012-01-11 06:53:05.441716 7f0029f39700 osd.0 17 pg[0.31( v 17'200 (17'198,17'200] n=12 ec=1 les/c 4/17 5/5/5) [0,7] r=0 lpr=5 mlcod 17'199 active+clean]  osd.7 scrub reserve = success
2012-01-11 06:53:05.441735 7f0029f39700 osd.0 17 pg[0.31( v 17'200 (17'198,17'200] n=12 ec=1 les/c 4/17 5/5/5) [0,7] r=0 lpr=5 mlcod 17'199 active+clean] sched_scrub: success, reserved self and replicas
2012-01-11 06:53:05.441754 7f0029f39700 osd.0 17 dequeue_op 0x25a4c80 finish
2012-01-11 06:53:05.441783 7f0028736700 osd.0 17 pg[0.31( v 17'200 (17'198,17'200] n=12 ec=1 les/c 4/17 5/5/5) [0,7] r=0 lpr=5 mlcod 17'199 active+clean+scrubbing] scrub start
2012-01-11 06:53:05.441809 7f0028736700 osd.0 17 pg[0.31( v 17'200 (17'198,17'200] n=12 ec=1 les/c 4/17 5/5/5) [0,7] r=0 lpr=5 mlcod 17'199 active+clean+scrubbing] update_stats 5'235
2012-01-11 06:53:05.441830 7f0028736700 osd.0 17 pg[0.31( v 17'200 (17'198,17'200] n=12 ec=1 les/c 4/17 5/5/5) [0,7] r=0 lpr=5 mlcod 17'199 active+clean+scrubbing] scrub  requesting scrubmap from osd.7
2012-01-11 06:53:05.441848 7f0028736700 -- 10.3.14.139:6801/1938 --> osd.7 10.3.14.140:6801/1932 -- replica scrub(pg: 0.31,from:0'0,to:17'200epoch:17) v1 -- ?+0 0x2563700
2012-01-11 06:53:05.441881 7f0028736700 osd.0 17 pg[0.31( v 17'200 (17'198,17'200] n=12 ec=1 les/c 4/17 5/5/5) [0,7] r=0 lpr=5 mlcod 17'199 active+clean+scrubbing] build_scrub_map
2012-01-11 06:53:05.443617 7f0028736700 osd.0 0 pg[0.31( v 17'200 (17'198,17'200] n=12 ec=1 les/c 4/17 5/5/5) [0,7] r=0 lpr=5 mlcod 17'199 active+clean+scrubbing] _scan_list scanning 12 objects
2012-01-11 06:53:05.443883 7f0028736700 osd.0 0 pg[0.31( v 17'200 (17'198,17'200] n=12 ec=1 les/c 4/17 5/5/5) [0,7] r=0 lpr=5 mlcod 17'199 active+clean+scrubbing] _scan_list  ca1be331/obj-eVtQKVv2FzTvvSJ/head
2012-01-11 06:53:05.444129 7f0028736700 osd.0 0 pg[0.31( v 17'200 (17'198,17'200] n=12 ec=1 les/c 4/17 5/5/5) [0,7] r=0 lpr=5 mlcod 17'199 active+clean+scrubbing] _scan_list  6a212831/obj-N2MfKPEgY2Di7-r/head
2012-01-11 06:53:05.444372 7f0028736700 osd.0 0 pg[0.31( v 17'200 (17'198,17'200] n=12 ec=1 les/c 4/17 5/5/5) [0,7] r=0 lpr=5 mlcod 17'199 active+clean+scrubbing] _scan_list  2c33da31/obj-A8oN5R_EicDizSq/head
2012-01-11 06:53:05.444616 7f0028736700 osd.0 0 pg[0.31( v 17'200 (17'198,17'200] n=12 ec=1 les/c 4/17 5/5/5) [0,7] r=0 lpr=5 mlcod 17'199 active+clean+scrubbing] _scan_list  1a8b3c31/obj-eMqcNbaIwwHJPr6/head
2012-01-11 06:53:05.444860 7f0028736700 osd.0 0 pg[0.31( v 17'200 (17'198,17'200] n=12 ec=1 les/c 4/17 5/5/5) [0,7] r=0 lpr=5 mlcod 17'199 active+clean+scrubbing] _scan_list  b164e31/obj-XLvQY6Cdv7PVJfC/head
2012-01-11 06:53:05.445104 7f0028736700 osd.0 0 pg[0.31( v 17'200 (17'198,17'200] n=12 ec=1 les/c 4/17 5/5/5) [0,7] r=0 lpr=5 mlcod 17'199 active+clean+scrubbing] _scan_list  89996271/obj-wDV5psNVNtPo1FP/head
2012-01-11 06:53:05.445347 7f0028736700 osd.0 0 pg[0.31( v 17'200 (17'198,17'200] n=12 ec=1 les/c 4/17 5/5/5) [0,7] r=0 lpr=5 mlcod 17'199 active+clean+scrubbing] _scan_list  5e609b71/obj-Fnao7nEFeadGnBB/head
2012-01-11 06:53:05.445589 7f0028736700 osd.0 0 pg[0.31( v 17'200 (17'198,17'200] n=12 ec=1 les/c 4/17 5/5/5) [0,7] r=0 lpr=5 mlcod 17'199 active+clean+scrubbing] _scan_list  1044ef71/obj-jbQAaC2RRpZi5om/head
2012-01-11 06:53:05.445832 7f0028736700 osd.0 0 pg[0.31( v 17'200 (17'198,17'200] n=12 ec=1 les/c 4/17 5/5/5) [0,7] r=0 lpr=5 mlcod 17'199 active+clean+scrubbing] _scan_list  a7c3c3b1/obj-qtKm7Jr0ctAR_sr/head
2012-01-11 06:53:05.446078 7f0028736700 osd.0 0 pg[0.31( v 17'200 (17'198,17'200] n=12 ec=1 les/c 4/17 5/5/5) [0,7] r=0 lpr=5 mlcod 17'199 active+clean+scrubbing] _scan_list  b6ad07b1/obj-hJTHiCwY3hTQNV1/head
2012-01-11 06:53:05.446341 7f0028736700 osd.0 0 pg[0.31( v 17'200 (17'198,17'200] n=12 ec=1 les/c 4/17 5/5/5) [0,7] r=0 lpr=5 mlcod 17'199 active+clean+scrubbing] _scan_list  202950f1/obj-M88KFzmC1hTEcKX/head
2012-01-11 06:53:05.446499 7f002c940700 -- 10.3.14.139:6801/1938 <== osd.7 10.3.14.140:6801/1932 242 ==== osd_sub_op(unknown.0.0:0 0.31 0//0 [scrub-map] v 0'0 snapset=0=[]:[] snapc=0=[]) v3 ==== 499+0+25461 (2967384482 0 690399351) 0x24f4b80 con 0x1f558c0
2012-01-11 06:53:05.446517 7f002c940700 osd.0 17 _dispatch 0x24f4b80 osd_sub_op(unknown.0.0:0 0.31 0//0 [scrub-map] v 0'0 snapset=0=[]:[] snapc=0=[]) v3
2012-01-11 06:53:05.446531 7f002c940700 osd.0 17 handle_sub_op osd_sub_op(unknown.0.0:0 0.31 0//0 [scrub-map] v 0'0 snapset=0=[]:[] snapc=0=[]) v3 epoch 17
2012-01-11 06:53:05.446541 7f002c940700 osd.0 17 require_same_or_newer_map 17 (i am 17) 0x24f4b80
2012-01-11 06:53:05.446556 7f002c940700 osd.0 17 _share_map_incoming osd.7 10.3.14.140:6801/1932 17
2012-01-11 06:53:05.446567 7f002c940700 osd.0 17 note_peer_epoch osd.7 has 17 >= 17
2012-01-11 06:53:05.446592 7f002c940700 osd.0 17 pg[0.31( v 17'200 (17'198,17'200] n=12 ec=1 les/c 4/17 5/5/5) [0,7] r=0 lpr=5 mlcod 17'199 active+clean+scrubbing] enqueue_op 0x24f4b80 osd_sub_op(unknown.0.0:0 0.31 0//0 [scrub-map] v 0'0 snapset=0=[]:[] snapc=0=[]) v3
*** Caught signal (Segmentation fault) **
 in thread 7f0028736700
 ceph version 0.39-401-gd4815e5 (commit:d4815e5bd4727f499643f2bebe2715cc85faaa42)
 1: (ceph::BackTrace::BackTrace(int)+0x2d) [0x8bba37]
 2: /tmp/cephtest/binary/usr/local/bin/ceph-osd() [0xa398dd]
 3: (()+0xfb40) [0x7f0038dd2b40]
 4: (OSDMap::get_epoch() const+0xc) [0x7a8750]
 5: (PG::gen_prefix() const+0x70) [0x9692aa]
 6: /tmp/cephtest/binary/usr/local/bin/ceph-osd() [0x9690e5]
 7: (PG::_scan_list(ScrubMap&, std::vector<hobject_t, std::allocator<hobject_t> >&)+0x38b) [0x97fecd]
 8: (PG::build_scrub_map(ScrubMap&)+0x211) [0x981c1d]
 9: (PG::scrub()+0x4fa) [0x983564]
 10: (OSD::ScrubWQ::_process(PG*)+0x1c) [0x8601e6]
 11: (ThreadPool::WorkQueue<PG>::_void_process(void*)+0x2e) [0x8a4748]
 12: (ThreadPool::worker()+0x4af) [0x8d4f93]
 13: (ThreadPool::WorkThread::entry()+0x1c) [0x85db56]
 14: (Thread::_entry_func(void*)+0x23) [0x8ae061]
 15: (()+0x7971) [0x7f0038dca971]
 16: (clone()+0x6d) [0x7f003745592d]

full log at metropolis:/home/sage/osd.scrub.log

History

#1 Updated by Samuel Just almost 8 years ago

  • Status changed from New to Closed

b93bf285c9f05ab943e8e506ea2125af0f1f97ad should fix it.

Also available in: Atom PDF