Actions
Bug #4748
closedmon: failed assert in OSDMonitor::build_incremental
% Done:
0%
Source:
Q/A
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
-1> 2013-04-18 02:42:17.325016 7fa4f1d9d700 5 mon.a@0(synchronizing sync( requester state start )).osd e1679 send_incremental [1679..1679] to client.4120 10.214.131.36:0/1017075 0> 2013-04-18 02:42:17.451620 7fa4f1d9d700 -1 mon/OSDMonitor.cc: In function 'MOSDMap* OSDMonitor::build_incremental(epoch_t, epoch_t)' thread 7fa4f1d9d700 time 2013-04-18 02:42:17.450143 mon/OSDMonitor.cc: 1475: FAILED assert(0) ceph version 0.60-523-ga0e457a (a0e457ae18624cc1bf0e48ec5060aec98fd94764) 1: (OSDMonitor::build_incremental(unsigned int, unsigned int)+0x12a3) [0x4f75f3] 2: (OSDMonitor::send_incremental(unsigned int, entity_inst_t&, bool)+0x1a8) [0x4f7818] 3: (OSDMonitor::check_sub(Subscription*)+0xa0) [0x4f7c00] 4: (Monitor::handle_subscribe(MMonSubscribe*)+0x9dd) [0x4a5c7d] 5: (Monitor::_ms_dispatch(Message*)+0x10e3) [0x4c0003] 6: (Monitor::ms_dispatch(Message*)+0x32) [0x4d70a2] 7: (DispatchQueue::entry()+0x341) [0x6ab701] 8: (DispatchQueue::DispatchThread::entry()+0xd) [0x63997d] 9: (()+0x7e9a) [0x7fa4f667ae9a] 10: (clone()+0x6d) [0x7fa4f4e97cbd] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
job was
ubuntu@teuthology:/a/teuthology-2013-04-18_01:00:07-rados-next-testing-basic/14978$ cat orig.config.yaml kernel: kdb: true sha1: 625d6ec696a70600fac15c79c5ef83f706c9db21 nuke-on-error: true overrides: ceph: conf: global: ms inject socket failures: 5000 fs: ext4 log-whitelist: - slow request sha1: a0e457ae18624cc1bf0e48ec5060aec98fd94764 s3tests: branch: next workunit: sha1: a0e457ae18624cc1bf0e48ec5060aec98fd94764 roles: - - mon.a - mon.c - osd.0 - osd.1 - osd.2 - - mon.b - mds.a - osd.3 - osd.4 - osd.5 - client.0 tasks: - chef: null - clock: null - install: null - ceph: log-whitelist: - wrongly marked me down - objects unfound and apparently lost - thrashosds: chance_pgnum_grow: 1 chance_pgpnum_fix: 1 timeout: 1200 - rados: clients: - client.0 objects: 50 op_weights: delete: 50 read: 100 rollback: 50 snap_create: 50 snap_remove: 50 write: 100 ops: 4000
also
ubuntu@teuthology:/a/teuthology-2013-04-18_01:00:07-rados-next-testing-basic/14984
Updated by Joao Eduardo Luis about 11 years ago
- Assignee set to Joao Eduardo Luis
Updated by Joao Eduardo Luis about 11 years ago
- Status changed from New to In Progress
Updated by Sage Weil about 11 years ago
kicked of job sage-e1 to try to reproduce this with logs
Updated by Sage Weil about 11 years ago
- Assignee changed from Joao Eduardo Luis to Sage Weil
got logs, ubuntu@teuthology:/a/sage-e1/313
Updated by Sage Weil about 11 years ago
last little bit of log:
2013-04-24 10:38:53.906498 7f99a1ffb700 10 mon.a@0(probing) e1 sync_start entity( mon.2 10.214.131.27:6790/0 ) 2013-04-24 10:38:54.082770 7f99a1ffb700 10 mon.a@0(synchronizing sync( requester state start )).monmap v1 get_monmap ver 0 2013-04-24 10:38:54.113164 7f99a1ffb700 10 mon.a@0(synchronizing sync( requester state start )) e1 sync_store_init backup current monmap 2013-04-24 10:38:54.132404 7f99a1ffb700 1 -- 10.214.131.27:6789/0 --> mon.2 10.214.131.27:6790/0 -- mon_sync( start ) v1 -- ?+0 0x7f9988332770 2013-04-24 10:38:54.132466 7f99a1ffb700 1 -- 10.214.131.27:6789/0 <== mon.1 10.214.131.28:6789/0 26693 ==== forward(osd_alive(want up_thru 1693 have 1693) v1) to leader v1 ==== 292+0+0 (2477513207 0 0) 0x7f995801c040 con 0x1483700 2013-04-24 10:38:54.132477 7f99a1ffb700 20 mon.a@0(synchronizing sync( requester state start )) e1 have connection 2013-04-24 10:38:54.132483 7f99a1ffb700 20 mon.a@0(synchronizing sync( requester state start )) e1 ms_dispatch existing session MonSession: mon.1 10.214.131.28:6789/0 is openallow * for mon.1 10.214.131.28:6789/0 2013-04-24 10:38:54.132489 7f99a1ffb700 20 mon.a@0(synchronizing sync( requester state start )) e1 caps allow * 2013-04-24 10:38:54.132491 7f99a1ffb700 10 mon.a@0(synchronizing sync( requester state start )) e1 received forwarded message from osd.1 10.214.131.27:6807/9377 via mon.1 10.214.131.28:6789/0 2013-04-24 10:38:54.132500 7f99a1ffb700 10 mon.a@0(synchronizing sync( requester state start )) e1 mesg 0x7f995801ead0 from 10.214.131.28:6789/0 2013-04-24 10:38:54.132504 7f99a1ffb700 20 mon.a@0(synchronizing sync( requester state start )) e1 have connection 2013-04-24 10:38:54.132506 7f99a1ffb700 20 mon.a@0(synchronizing sync( requester state start )) e1 ms_dispatch existing session MonSession: osd.1 :/0 is openallow * for osd.1 :/0 2013-04-24 10:38:54.132509 7f99a1ffb700 20 mon.a@0(synchronizing sync( requester state start )) e1 caps allow * 2013-04-24 10:38:54.132512 7f99a1ffb700 10 mon.a@0(synchronizing sync( requester state start )).paxosservice(osdmap) dispatch osd_alive(want up_thru 1693 have 1693) v1 from osd.1 10.214.131.27:6807/9377 2013-04-24 10:38:54.159939 7f99a1ffb700 10 mon.a@0(synchronizing sync( requester state start )).paxosservice(osdmap) waiting for paxos -> readable (v1693) 2013-04-24 10:38:54.159988 7f99a1ffb700 1 -- 10.214.131.27:6789/0 <== mon.2 10.214.131.27:6790/0 25940 ==== forward(osd_pgtemp(e1691 {1.c=[5,1],1.10=[5,1],1.13=[5,1],1.15=[5,1],1.16=[5,1],1.17=[5,1],2.b=[5,1],2.c=[5,1],2.f=[5,1],2.12=[5,1],2.14=[5,1],2.15=[5,1],2.16=[5,1],3.a=[1,5],3.b=[1,5],3.e=[5,1],3.24=[1,5],3 .28=[1,5],3.2c=[5,1],3.33=[1,5],3.46=[1,5],3.47=[1,5],3.49=[1,5],3.4f=[1,5],3.55=[5,1],3.5c=[5,1],3.69=[5,1],3.6a=[1,5],3.70=[1,5],3.7b=[5,1],3.7d=[5,1],3.81=[5,1],3.85=[1,5],3.8e=[1,5],3.93=[1,5],3.94=[1,5],3.96=[1,5],3.a6=[5,1],3.a7=[1,5],3.ac=[1,5],3.ad=[1,5],3.b1=[1,5],3.b3=[5,1],3.b8=[1,5],3.b9=[1,5],3.bd=[1,5 ],3.bf=[1,5],3.c5=[1,5],3.c8=[5,1],3.d3=[1,5],3.d9=[1,5],3.dc=[5,1],3.dd=[5,1],3.e9=[5,1],3.ea=[1,5],3.f0=[1,5]} v1691) v1) to leader v1 ==== 1920+0+0 (2818135077 0 0) 0x7f994010cc60 con 0x147eed0 2013-04-24 10:38:54.160039 7f99a1ffb700 20 mon.a@0(synchronizing sync( requester state start )) e1 have connection 2013-04-24 10:38:54.160043 7f99a1ffb700 20 mon.a@0(synchronizing sync( requester state start )) e1 ms_dispatch existing session MonSession: mon.2 10.214.131.27:6790/0 is openallow * for mon.2 10.214.131.27:6790/0 2013-04-24 10:38:54.160050 7f99a1ffb700 20 mon.a@0(synchronizing sync( requester state start )) e1 caps allow * 2013-04-24 10:38:54.160052 7f99a1ffb700 10 mon.a@0(synchronizing sync( requester state start )) e1 received forwarded message from osd.5 10.214.131.28:6801/3534 via mon.2 10.214.131.27:6790/0 2013-04-24 10:38:54.160061 7f99a1ffb700 10 mon.a@0(synchronizing sync( requester state start )) e1 mesg 0x7f994010cf90 from 10.214.131.27:6790/0 2013-04-24 10:38:54.160065 7f99a1ffb700 20 mon.a@0(synchronizing sync( requester state start )) e1 have connection 2013-04-24 10:38:54.160067 7f99a1ffb700 20 mon.a@0(synchronizing sync( requester state start )) e1 ms_dispatch existing session MonSession: osd.5 :/0 is openallow * for osd.5 :/0 2013-04-24 10:38:54.160070 7f99a1ffb700 20 mon.a@0(synchronizing sync( requester state start )) e1 caps allow * 2013-04-24 10:38:54.160072 7f99a1ffb700 10 mon.a@0(synchronizing sync( requester state start )).paxosservice(osdmap) dispatch osd_pgtemp(e1691 {1.c=[5,1],1.10=[5,1],1.13=[5,1],1.15=[5,1],1.16=[5,1],1.17=[5,1],2.b=[5,1],2.c=[5,1],2.f=[5,1],2.12=[5,1],2.14=[5,1],2.15=[5,1],2.16=[5,1],3.a=[1,5],3.b=[1,5],3.e=[5,1],3.2 4=[1,5],3.28=[1,5],3.2c=[5,1],3.33=[1,5],3.46=[1,5],3.47=[1,5],3.49=[1,5],3.4f=[1,5],3.55=[5,1],3.5c=[5,1],3.69=[5,1],3.6a=[1,5],3.70=[1,5],3.7b=[5,1],3.7d=[5,1],3.81=[5,1],3.85=[1,5],3.8e=[1,5],3.93=[1,5],3.94=[1,5],3.96=[1,5],3.a6=[5,1],3.a7=[1,5],3.ac=[1,5],3.ad=[1,5],3.b1=[1,5],3.b3=[5,1],3.b8=[1,5],3.b9=[1,5], 3.bd=[1,5],3.bf=[1,5],3.c5=[1,5],3.c8=[5,1],3.d3=[1,5],3.d9=[1,5],3.dc=[5,1],3.dd=[5,1],3.e9=[5,1],3.ea=[1,5],3.f0=[1,5]} v1691) v1 from osd.5 10.214.131.28:6801/3534 2013-04-24 10:38:54.187198 7f99a1ffb700 10 mon.a@0(synchronizing sync( requester state start )).paxosservice(osdmap) waiting for paxos -> readable (v1691) 2013-04-24 10:38:54.187244 7f99a1ffb700 1 -- 10.214.131.27:6789/0 <== osd.3 10.214.131.28:6807/2695 25 ==== mon_subscribe({monmap=2+,osd_pg_creates=0,osdmap=1690}) v2 ==== 69+0+0 (4248948145 0 0) 0x7f9944001000 con 0x7f9994003340 2013-04-24 10:38:54.187262 7f99a1ffb700 20 mon.a@0(synchronizing sync( requester state start )) e1 have connection 2013-04-24 10:38:54.187266 7f99a1ffb700 20 mon.a@0(synchronizing sync( requester state start )) e1 ms_dispatch existing session MonSession: osd.3 10.214.131.28:6807/2695 is openallow * for osd.3 10.214.131.28:6807/2695 2013-04-24 10:38:54.187273 7f99a1ffb700 20 mon.a@0(synchronizing sync( requester state start )) e1 caps allow * 2013-04-24 10:38:54.187276 7f99a1ffb700 10 mon.a@0(synchronizing sync( requester state start )) e1 handle_subscribe mon_subscribe({monmap=2+,osd_pg_creates=0,osdmap=1690}) v2 2013-04-24 10:38:54.187282 7f99a1ffb700 10 mon.a@0(synchronizing sync( requester state start )) e1 check_sub monmap next 2 have 1 2013-04-24 10:38:54.187291 7f99a1ffb700 5 mon.a@0(synchronizing sync( requester state start )).osd e1693 send_incremental [1690..1693] to osd.3 10.214.131.28:6807/2695 2013-04-24 10:38:54.217461 7f99a1ffb700 10 mon.a@0(synchronizing sync( requester state start )).osd e1693 build_incremental [1690..1693] 2013-04-24 10:38:54.324263 7f99a1ffb700 -1 mon/OSDMonitor.cc: In function 'MOSDMap* OSDMonitor::build_incremental(epoch_t, epoch_t)' thread 7f99a1ffb700 time 2013-04-24 10:38:54.305016 mon/OSDMonitor.cc: 1475: FAILED assert(0) ceph version 0.60-641-gc7a0477 (c7a0477bad6bfbec4ef325295ca0489ec1977926) 1: (OSDMonitor::build_incremental(unsigned int, unsigned int)+0x12e3) [0x4f96e3] 2: (OSDMonitor::send_incremental(unsigned int, entity_inst_t&, bool)+0x1a8) [0x4f9908] 3: (OSDMonitor::check_sub(Subscription*)+0xa0) [0x4f9cd0] 4: (Monitor::handle_subscribe(MMonSubscribe*)+0x9dd) [0x4a7edd] 5: (Monitor::_ms_dispatch(Message*)+0x10e3) [0x4c21a3] 6: (Monitor::ms_dispatch(Message*)+0x32) [0x4d8e12] 7: (DispatchQueue::entry()+0x3f1) [0x6aec41] 8: (DispatchQueue::DispatchThread::entry()+0xd) [0x63cbad] 9: (()+0x7e9a) [0x7f99aa654e9a] 10: (clone()+0x6d) [0x7f99a8e71cbd] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Actions