Project

General

Profile

Bug #19639

mon crash on shutdown

Added by John Spray almost 7 years ago. Updated almost 7 years ago.

Status:
Can't reproduce
Priority:
High
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Monitor
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Mon crash happening during shutdown in a cephfs test run.

Assertion: /mnt/jenkins/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.0.0-2683-g1f1f8e9/rpm/el7/BUILD/ceph-12.0.0-2683-g1f1f8e9/src/mon/Monitor.cc: 1610: FAILED assert(is_probing() || is_synchronizing())
ceph version 12.0.0-2683-g1f1f8e9 (1f1f8e953e708883a2551bf6fb19ca524d2946af)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x110) [0x64ac80]
 2: (Monitor::probe_timeout(int)+0x96) [0x447476]
 3: (Context::complete(int)+0x9) [0x45a109]
 4: (SafeTimer::timer_thread()+0x104) [0x6451c4]
 5: (SafeTimerThread::entry()+0xd) [0x646bed]
 6: (()+0x7dc5) [0xa66cdc5]
 7: (clone()+0x6d) [0xd09a73d]

/a/jspray-2017-04-16_16:07:43-multimds-wip-jcsp-testing-20170415b-multimds-testing-basic-smithi/1032808

History

#1 Updated by Sage Weil almost 7 years ago

  • Priority changed from Normal to Urgent

#2 Updated by John Spray almost 7 years ago

  • Subject changed from mon crashes on shutdown to mon crashe on shutdown
  • Description updated (diff)

Split out the propose_pending on into http://tracker.ceph.com/issues/19738 with a candidate fix, the other one is still a mystery to me.

#3 Updated by John Spray almost 7 years ago

  • Subject changed from mon crashe on shutdown to mon crash on shutdown

#4 Updated by Sage Weil almost 7 years ago

  • Status changed from New to Need More Info

what is the "other one" (besides probe_timeout #19738)?

#5 Updated by John Spray almost 7 years ago

Sorry, I made the history confusing by editing the description. The "other one" is the one that is now the only one in the description of this ticket (i.e. the probe_timeout backtrace).

#6 Updated by Greg Farnum almost 7 years ago

Is it reproducing? Wouldn't surprise me if these were linked.

#7 Updated by Greg Farnum almost 7 years ago

  • Project changed from Ceph to RADOS
  • Category deleted (Monitor)
  • Priority changed from Urgent to High
  • Component(RADOS) Monitor added

Turning this down; should close if we don't get it happening again.

#8 Updated by John Spray almost 7 years ago

I haven't seen this happen again in recent memory.

#9 Updated by Greg Farnum almost 7 years ago

  • Status changed from Need More Info to Can't reproduce

Also available in: Atom PDF