Project

General

Profile

Actions

Bug #17837

closed

ceph-mon crashed after upgrade from hammer 0.94.7 to jewel 10.2.3

Added by alexander walker over 7 years ago. Updated almost 7 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
jewel
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

I've a cluster of three nodes:

ID WEIGHT  TYPE NAME      UP/DOWN REWEIGHT PRIMARY-AFFINITY
-1 5.45993 root default
-2 1.81998     host ceph1
 0 0.90999         osd.0       up  1.00000          1.00000
 1 0.90999         osd.1       up  1.00000          1.00000
-3 1.81998     host ceph2
 2 0.90999         osd.2       up  1.00000          1.00000
 3 0.90999         osd.3       up  1.00000          1.00000
-4 1.81998     host ceph3
 4 0.90999         osd.4       up  1.00000          1.00000
 5 0.90999         osd.5       up  1.00000          1.00000

I've updated first the ceph3 node and now I can't start monitor daemon. It's crashed


cephus@ceph3:~$ sudo /usr/bin/ceph-mon --cluster=ceph -i ceph3 -f --setuser ceph --setgroup ceph --debug_mon 10
starting mon.ceph3 rank 2 at 192.168.49.103:6789/0 mon_data /var/lib/ceph/mon/ceph-ceph3 fsid 3c58a184-bf27-4273-8000-405513006a7b
mds/FSMap.cc: In function 'void FSMap::sanity() const' thread 7fb0cf4564c0 time 2016-11-09 14:58:58.437225
mds/FSMap.cc: 628: FAILED assert(i.second.state == MDSMap::STATE_STANDBY)
 ceph version 10.2.3 (ecc23778eb545d8dd55e2e4735b53cc93f92e65b)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0x5606d480b1eb]
 2: (FSMap::sanity() const+0x932) [0x5606d4730112]
 3: (MDSMonitor::update_from_paxos(bool*)+0x450) [0x5606d455b160]
 4: (PaxosService::refresh(bool*)+0x19a) [0x5606d44ceb4a]
 5: (Monitor::refresh_from_paxos(bool*)+0x143) [0x5606d446b433]
 6: (Monitor::init_paxos()+0x85) [0x5606d446b845]
 7: (Monitor::preinit()+0x925) [0x5606d447bec5]
 8: (main()+0x236d) [0x5606d4409e9d]
 9: (__libc_start_main()+0xf5) [0x7fb0cc9d5f45]
 10: (()+0x26106a) [0x5606d445c06a]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
2016-11-09 14:58:58.440166 7fb0cf4564c0 -1 mds/FSMap.cc: In function 'void FSMap::sanity() const' thread 7fb0cf4564c0 time 2016-11-09 14:58:58.437225
mds/FSMap.cc: 628: FAILED assert(i.second.state == MDSMap::STATE_STANDBY)

 ceph version 10.2.3 (ecc23778eb545d8dd55e2e4735b53cc93f92e65b)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0x5606d480b1eb]
 2: (FSMap::sanity() const+0x932) [0x5606d4730112]
 3: (MDSMonitor::update_from_paxos(bool*)+0x450) [0x5606d455b160]
 4: (PaxosService::refresh(bool*)+0x19a) [0x5606d44ceb4a]
 5: (Monitor::refresh_from_paxos(bool*)+0x143) [0x5606d446b433]
 6: (Monitor::init_paxos()+0x85) [0x5606d446b845]
 7: (Monitor::preinit()+0x925) [0x5606d447bec5]
 8: (main()+0x236d) [0x5606d4409e9d]
 9: (__libc_start_main()+0xf5) [0x7fb0cc9d5f45]
 10: (()+0x26106a) [0x5606d445c06a]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

     0> 2016-11-09 14:58:58.440166 7fb0cf4564c0 -1 mds/FSMap.cc: In function 'void FSMap::sanity() const' thread 7fb0cf4564c0 time 2016-11-09 14:58:58.437225
mds/FSMap.cc: 628: FAILED assert(i.second.state == MDSMap::STATE_STANDBY)

 ceph version 10.2.3 (ecc23778eb545d8dd55e2e4735b53cc93f92e65b)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0x5606d480b1eb]
 2: (FSMap::sanity() const+0x932) [0x5606d4730112]
 3: (MDSMonitor::update_from_paxos(bool*)+0x450) [0x5606d455b160]
 4: (PaxosService::refresh(bool*)+0x19a) [0x5606d44ceb4a]
 5: (Monitor::refresh_from_paxos(bool*)+0x143) [0x5606d446b433]
 6: (Monitor::init_paxos()+0x85) [0x5606d446b845]
 7: (Monitor::preinit()+0x925) [0x5606d447bec5]
 8: (main()+0x236d) [0x5606d4409e9d]
 9: (__libc_start_main()+0xf5) [0x7fb0cc9d5f45]
 10: (()+0x26106a) [0x5606d445c06a]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

*** Caught signal (Aborted) **
 in thread 7fb0cf4564c0 thread_name:ceph-mon
 ceph version 10.2.3 (ecc23778eb545d8dd55e2e4735b53cc93f92e65b)
 1: (()+0x4f6222) [0x5606d46f1222]
 2: (()+0x10330) [0x7fb0ce764330]
 3: (gsignal()+0x37) [0x7fb0cc9eac37]
 4: (abort()+0x148) [0x7fb0cc9ee028]
 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x265) [0x5606d480b3c5]
 6: (FSMap::sanity() const+0x932) [0x5606d4730112]
 7: (MDSMonitor::update_from_paxos(bool*)+0x450) [0x5606d455b160]
 8: (PaxosService::refresh(bool*)+0x19a) [0x5606d44ceb4a]
 9: (Monitor::refresh_from_paxos(bool*)+0x143) [0x5606d446b433]
 10: (Monitor::init_paxos()+0x85) [0x5606d446b845]
 11: (Monitor::preinit()+0x925) [0x5606d447bec5]
 12: (main()+0x236d) [0x5606d4409e9d]
 13: (__libc_start_main()+0xf5) [0x7fb0cc9d5f45]
 14: (()+0x26106a) [0x5606d445c06a]
2016-11-09 14:58:58.442973 7fb0cf4564c0 -1 *** Caught signal (Aborted) **
 in thread 7fb0cf4564c0 thread_name:ceph-mon

 ceph version 10.2.3 (ecc23778eb545d8dd55e2e4735b53cc93f92e65b)
 1: (()+0x4f6222) [0x5606d46f1222]
 2: (()+0x10330) [0x7fb0ce764330]
 3: (gsignal()+0x37) [0x7fb0cc9eac37]
 4: (abort()+0x148) [0x7fb0cc9ee028]
 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x265) [0x5606d480b3c5]
 6: (FSMap::sanity() const+0x932) [0x5606d4730112]
 7: (MDSMonitor::update_from_paxos(bool*)+0x450) [0x5606d455b160]
 8: (PaxosService::refresh(bool*)+0x19a) [0x5606d44ceb4a]
 9: (Monitor::refresh_from_paxos(bool*)+0x143) [0x5606d446b433]
 10: (Monitor::init_paxos()+0x85) [0x5606d446b845]
 11: (Monitor::preinit()+0x925) [0x5606d447bec5]
 12: (main()+0x236d) [0x5606d4409e9d]
 13: (__libc_start_main()+0xf5) [0x7fb0cc9d5f45]
 14: (()+0x26106a) [0x5606d445c06a]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

     0> 2016-11-09 14:58:58.442973 7fb0cf4564c0 -1 *** Caught signal (Aborted) **
 in thread 7fb0cf4564c0 thread_name:ceph-mon

 ceph version 10.2.3 (ecc23778eb545d8dd55e2e4735b53cc93f92e65b)
 1: (()+0x4f6222) [0x5606d46f1222]
 2: (()+0x10330) [0x7fb0ce764330]
 3: (gsignal()+0x37) [0x7fb0cc9eac37]
 4: (abort()+0x148) [0x7fb0cc9ee028]
 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x265) [0x5606d480b3c5]
 6: (FSMap::sanity() const+0x932) [0x5606d4730112]
 7: (MDSMonitor::update_from_paxos(bool*)+0x450) [0x5606d455b160]
 8: (PaxosService::refresh(bool*)+0x19a) [0x5606d44ceb4a]
 9: (Monitor::refresh_from_paxos(bool*)+0x143) [0x5606d446b433]
 10: (Monitor::init_paxos()+0x85) [0x5606d446b845]
 11: (Monitor::preinit()+0x925) [0x5606d447bec5]
 12: (main()+0x236d) [0x5606d4409e9d]
 13: (__libc_start_main()+0xf5) [0x7fb0cc9d5f45]
 14: (()+0x26106a) [0x5606d445c06a]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.


Files

mdsmap.bin.local (1.09 KB) mdsmap.bin.local alexander walker, 11/17/2016 06:26 AM

Related issues 2 (1 open1 closed)

Is duplicate of CephFS - Bug #16592: Jewel: monitor asserts on "mon/MDSMonitor.cc: 2796: FAILED assert(info.state == MDSMap::STATE_STANDBY)"Need More Info11/09/2016

Actions
Copied to CephFS - Backport #18100: jewel: ceph-mon crashed after upgrade from hammer 0.94.7 to jewel 10.2.3ResolvedJohn SprayActions
Actions

Also available in: Atom PDF