Project

General

Profile

Bug #4748

mon: failed assert in OSDMonitor::build_incremental

Added by Sage Weil almost 11 years ago. Updated almost 11 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
Monitor
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

    -1> 2013-04-18 02:42:17.325016 7fa4f1d9d700  5 mon.a@0(synchronizing sync( requester state start )).osd e1679 send_incremental [1679..1679] to client.4120 10.214.131.36:0/1017075
     0> 2013-04-18 02:42:17.451620 7fa4f1d9d700 -1 mon/OSDMonitor.cc: In function 'MOSDMap* OSDMonitor::build_incremental(epoch_t, epoch_t)' thread 7fa4f1d9d700 time 2013-04-18 02:42:17.450143
mon/OSDMonitor.cc: 1475: FAILED assert(0)

 ceph version 0.60-523-ga0e457a (a0e457ae18624cc1bf0e48ec5060aec98fd94764)
 1: (OSDMonitor::build_incremental(unsigned int, unsigned int)+0x12a3) [0x4f75f3]
 2: (OSDMonitor::send_incremental(unsigned int, entity_inst_t&, bool)+0x1a8) [0x4f7818]
 3: (OSDMonitor::check_sub(Subscription*)+0xa0) [0x4f7c00]
 4: (Monitor::handle_subscribe(MMonSubscribe*)+0x9dd) [0x4a5c7d]
 5: (Monitor::_ms_dispatch(Message*)+0x10e3) [0x4c0003]
 6: (Monitor::ms_dispatch(Message*)+0x32) [0x4d70a2]
 7: (DispatchQueue::entry()+0x341) [0x6ab701]
 8: (DispatchQueue::DispatchThread::entry()+0xd) [0x63997d]
 9: (()+0x7e9a) [0x7fa4f667ae9a]
 10: (clone()+0x6d) [0x7fa4f4e97cbd]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

job was
ubuntu@teuthology:/a/teuthology-2013-04-18_01:00:07-rados-next-testing-basic/14978$ cat orig.config.yaml 
kernel:
  kdb: true
  sha1: 625d6ec696a70600fac15c79c5ef83f706c9db21
nuke-on-error: true
overrides:
  ceph:
    conf:
      global:
        ms inject socket failures: 5000
    fs: ext4
    log-whitelist:
    - slow request
    sha1: a0e457ae18624cc1bf0e48ec5060aec98fd94764
  s3tests:
    branch: next
  workunit:
    sha1: a0e457ae18624cc1bf0e48ec5060aec98fd94764
roles:
- - mon.a
  - mon.c
  - osd.0
  - osd.1
  - osd.2
- - mon.b
  - mds.a
  - osd.3
  - osd.4
  - osd.5
  - client.0
tasks:
- chef: null
- clock: null
- install: null
- ceph:
    log-whitelist:
    - wrongly marked me down
    - objects unfound and apparently lost
- thrashosds:
    chance_pgnum_grow: 1
    chance_pgpnum_fix: 1
    timeout: 1200
- rados:
    clients:
    - client.0
    objects: 50
    op_weights:
      delete: 50
      read: 100
      rollback: 50
      snap_create: 50
      snap_remove: 50
      write: 100
    ops: 4000

also
ubuntu@teuthology:/a/teuthology-2013-04-18_01:00:07-rados-next-testing-basic/14984

Related issues

Related to Ceph - Bug #4813: pgs stuck creating Resolved 04/25/2013

History

#1 Updated by Joao Eduardo Luis almost 11 years ago

Possibly related to #4521

#2 Updated by Joao Eduardo Luis almost 11 years ago

  • Assignee set to Joao Eduardo Luis

#3 Updated by Joao Eduardo Luis almost 11 years ago

  • Status changed from New to In Progress

#4 Updated by Sage Weil almost 11 years ago

kicked of job sage-e1 to try to reproduce this with logs

#5 Updated by Sage Weil almost 11 years ago

  • Assignee changed from Joao Eduardo Luis to Sage Weil

got logs, ubuntu@teuthology:/a/sage-e1/313

#6 Updated by Sage Weil almost 11 years ago

last little bit of log:

2013-04-24 10:38:53.906498 7f99a1ffb700 10 mon.a@0(probing) e1 sync_start entity( mon.2 10.214.131.27:6790/0 )
2013-04-24 10:38:54.082770 7f99a1ffb700 10 mon.a@0(synchronizing sync( requester state start )).monmap v1 get_monmap ver 0
2013-04-24 10:38:54.113164 7f99a1ffb700 10 mon.a@0(synchronizing sync( requester state start )) e1 sync_store_init backup current monmap
2013-04-24 10:38:54.132404 7f99a1ffb700  1 -- 10.214.131.27:6789/0 --> mon.2 10.214.131.27:6790/0 -- mon_sync( start ) v1 -- ?+0 0x7f9988332770
2013-04-24 10:38:54.132466 7f99a1ffb700  1 -- 10.214.131.27:6789/0 <== mon.1 10.214.131.28:6789/0 26693 ==== forward(osd_alive(want up_thru 1693 have 1693) v1) to leader v1 ==== 292+0+0 (2477513207 0 0) 0x7f995801c040 con 0x1483700
2013-04-24 10:38:54.132477 7f99a1ffb700 20 mon.a@0(synchronizing sync( requester state start )) e1 have connection
2013-04-24 10:38:54.132483 7f99a1ffb700 20 mon.a@0(synchronizing sync( requester state start )) e1 ms_dispatch existing session MonSession: mon.1 10.214.131.28:6789/0 is openallow * for mon.1 10.214.131.28:6789/0
2013-04-24 10:38:54.132489 7f99a1ffb700 20 mon.a@0(synchronizing sync( requester state start )) e1  caps allow *
2013-04-24 10:38:54.132491 7f99a1ffb700 10 mon.a@0(synchronizing sync( requester state start )) e1 received forwarded message from osd.1 10.214.131.27:6807/9377 via mon.1 10.214.131.28:6789/0
2013-04-24 10:38:54.132500 7f99a1ffb700 10 mon.a@0(synchronizing sync( requester state start )) e1  mesg 0x7f995801ead0 from 10.214.131.28:6789/0
2013-04-24 10:38:54.132504 7f99a1ffb700 20 mon.a@0(synchronizing sync( requester state start )) e1 have connection
2013-04-24 10:38:54.132506 7f99a1ffb700 20 mon.a@0(synchronizing sync( requester state start )) e1 ms_dispatch existing session MonSession: osd.1 :/0 is openallow * for osd.1 :/0
2013-04-24 10:38:54.132509 7f99a1ffb700 20 mon.a@0(synchronizing sync( requester state start )) e1  caps allow *
2013-04-24 10:38:54.132512 7f99a1ffb700 10 mon.a@0(synchronizing sync( requester state start )).paxosservice(osdmap) dispatch osd_alive(want up_thru 1693 have 1693) v1 from osd.1 10.214.131.27:6807/9377
2013-04-24 10:38:54.159939 7f99a1ffb700 10 mon.a@0(synchronizing sync( requester state start )).paxosservice(osdmap)  waiting for paxos -> readable (v1693)
2013-04-24 10:38:54.159988 7f99a1ffb700  1 -- 10.214.131.27:6789/0 <== mon.2 10.214.131.27:6790/0 25940 ==== forward(osd_pgtemp(e1691 {1.c=[5,1],1.10=[5,1],1.13=[5,1],1.15=[5,1],1.16=[5,1],1.17=[5,1],2.b=[5,1],2.c=[5,1],2.f=[5,1],2.12=[5,1],2.14=[5,1],2.15=[5,1],2.16=[5,1],3.a=[1,5],3.b=[1,5],3.e=[5,1],3.24=[1,5],3
.28=[1,5],3.2c=[5,1],3.33=[1,5],3.46=[1,5],3.47=[1,5],3.49=[1,5],3.4f=[1,5],3.55=[5,1],3.5c=[5,1],3.69=[5,1],3.6a=[1,5],3.70=[1,5],3.7b=[5,1],3.7d=[5,1],3.81=[5,1],3.85=[1,5],3.8e=[1,5],3.93=[1,5],3.94=[1,5],3.96=[1,5],3.a6=[5,1],3.a7=[1,5],3.ac=[1,5],3.ad=[1,5],3.b1=[1,5],3.b3=[5,1],3.b8=[1,5],3.b9=[1,5],3.bd=[1,5
],3.bf=[1,5],3.c5=[1,5],3.c8=[5,1],3.d3=[1,5],3.d9=[1,5],3.dc=[5,1],3.dd=[5,1],3.e9=[5,1],3.ea=[1,5],3.f0=[1,5]} v1691) v1) to leader v1 ==== 1920+0+0 (2818135077 0 0) 0x7f994010cc60 con 0x147eed0
2013-04-24 10:38:54.160039 7f99a1ffb700 20 mon.a@0(synchronizing sync( requester state start )) e1 have connection
2013-04-24 10:38:54.160043 7f99a1ffb700 20 mon.a@0(synchronizing sync( requester state start )) e1 ms_dispatch existing session MonSession: mon.2 10.214.131.27:6790/0 is openallow * for mon.2 10.214.131.27:6790/0
2013-04-24 10:38:54.160050 7f99a1ffb700 20 mon.a@0(synchronizing sync( requester state start )) e1  caps allow *
2013-04-24 10:38:54.160052 7f99a1ffb700 10 mon.a@0(synchronizing sync( requester state start )) e1 received forwarded message from osd.5 10.214.131.28:6801/3534 via mon.2 10.214.131.27:6790/0
2013-04-24 10:38:54.160061 7f99a1ffb700 10 mon.a@0(synchronizing sync( requester state start )) e1  mesg 0x7f994010cf90 from 10.214.131.27:6790/0
2013-04-24 10:38:54.160065 7f99a1ffb700 20 mon.a@0(synchronizing sync( requester state start )) e1 have connection
2013-04-24 10:38:54.160067 7f99a1ffb700 20 mon.a@0(synchronizing sync( requester state start )) e1 ms_dispatch existing session MonSession: osd.5 :/0 is openallow * for osd.5 :/0
2013-04-24 10:38:54.160070 7f99a1ffb700 20 mon.a@0(synchronizing sync( requester state start )) e1  caps allow *
2013-04-24 10:38:54.160072 7f99a1ffb700 10 mon.a@0(synchronizing sync( requester state start )).paxosservice(osdmap) dispatch osd_pgtemp(e1691 {1.c=[5,1],1.10=[5,1],1.13=[5,1],1.15=[5,1],1.16=[5,1],1.17=[5,1],2.b=[5,1],2.c=[5,1],2.f=[5,1],2.12=[5,1],2.14=[5,1],2.15=[5,1],2.16=[5,1],3.a=[1,5],3.b=[1,5],3.e=[5,1],3.2
4=[1,5],3.28=[1,5],3.2c=[5,1],3.33=[1,5],3.46=[1,5],3.47=[1,5],3.49=[1,5],3.4f=[1,5],3.55=[5,1],3.5c=[5,1],3.69=[5,1],3.6a=[1,5],3.70=[1,5],3.7b=[5,1],3.7d=[5,1],3.81=[5,1],3.85=[1,5],3.8e=[1,5],3.93=[1,5],3.94=[1,5],3.96=[1,5],3.a6=[5,1],3.a7=[1,5],3.ac=[1,5],3.ad=[1,5],3.b1=[1,5],3.b3=[5,1],3.b8=[1,5],3.b9=[1,5],
3.bd=[1,5],3.bf=[1,5],3.c5=[1,5],3.c8=[5,1],3.d3=[1,5],3.d9=[1,5],3.dc=[5,1],3.dd=[5,1],3.e9=[5,1],3.ea=[1,5],3.f0=[1,5]} v1691) v1 from osd.5 10.214.131.28:6801/3534
2013-04-24 10:38:54.187198 7f99a1ffb700 10 mon.a@0(synchronizing sync( requester state start )).paxosservice(osdmap)  waiting for paxos -> readable (v1691)
2013-04-24 10:38:54.187244 7f99a1ffb700  1 -- 10.214.131.27:6789/0 <== osd.3 10.214.131.28:6807/2695 25 ==== mon_subscribe({monmap=2+,osd_pg_creates=0,osdmap=1690}) v2 ==== 69+0+0 (4248948145 0 0) 0x7f9944001000 con 0x7f9994003340
2013-04-24 10:38:54.187262 7f99a1ffb700 20 mon.a@0(synchronizing sync( requester state start )) e1 have connection
2013-04-24 10:38:54.187266 7f99a1ffb700 20 mon.a@0(synchronizing sync( requester state start )) e1 ms_dispatch existing session MonSession: osd.3 10.214.131.28:6807/2695 is openallow * for osd.3 10.214.131.28:6807/2695
2013-04-24 10:38:54.187273 7f99a1ffb700 20 mon.a@0(synchronizing sync( requester state start )) e1  caps allow *
2013-04-24 10:38:54.187276 7f99a1ffb700 10 mon.a@0(synchronizing sync( requester state start )) e1 handle_subscribe mon_subscribe({monmap=2+,osd_pg_creates=0,osdmap=1690}) v2
2013-04-24 10:38:54.187282 7f99a1ffb700 10 mon.a@0(synchronizing sync( requester state start )) e1 check_sub monmap next 2 have 1
2013-04-24 10:38:54.187291 7f99a1ffb700  5 mon.a@0(synchronizing sync( requester state start )).osd e1693 send_incremental [1690..1693] to osd.3 10.214.131.28:6807/2695
2013-04-24 10:38:54.217461 7f99a1ffb700 10 mon.a@0(synchronizing sync( requester state start )).osd e1693 build_incremental [1690..1693]
2013-04-24 10:38:54.324263 7f99a1ffb700 -1 mon/OSDMonitor.cc: In function 'MOSDMap* OSDMonitor::build_incremental(epoch_t, epoch_t)' thread 7f99a1ffb700 time 2013-04-24 10:38:54.305016
mon/OSDMonitor.cc: 1475: FAILED assert(0)

 ceph version 0.60-641-gc7a0477 (c7a0477bad6bfbec4ef325295ca0489ec1977926)
 1: (OSDMonitor::build_incremental(unsigned int, unsigned int)+0x12e3) [0x4f96e3]
 2: (OSDMonitor::send_incremental(unsigned int, entity_inst_t&, bool)+0x1a8) [0x4f9908]
 3: (OSDMonitor::check_sub(Subscription*)+0xa0) [0x4f9cd0]
 4: (Monitor::handle_subscribe(MMonSubscribe*)+0x9dd) [0x4a7edd]
 5: (Monitor::_ms_dispatch(Message*)+0x10e3) [0x4c21a3]
 6: (Monitor::ms_dispatch(Message*)+0x32) [0x4d8e12]
 7: (DispatchQueue::entry()+0x3f1) [0x6aec41]
 8: (DispatchQueue::DispatchThread::entry()+0xd) [0x63cbad]
 9: (()+0x7e9a) [0x7f99aa654e9a]
 10: (clone()+0x6d) [0x7f99a8e71cbd]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

#7 Updated by Sage Weil almost 11 years ago

  • Status changed from In Progress to 7

#8 Updated by Sage Weil almost 11 years ago

  • Status changed from 7 to Resolved

Also available in: Atom PDF