Project

General

Profile

Actions

Bug #19606

closed

monitors crash on incorrect OSD UUID (and bad uuid following reboot?)

Added by George Shuklin about 7 years ago. Updated over 6 years ago.

Status:
Can't reproduce
Priority:
High
Assignee:
-
Category:
Correctness/Safety
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
jewel
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Monitor, OSD
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

I had restarted host with few OSDs and all three monitors crashed.

version: 10.2.6-0ubuntu0.16.04.1

Trace:

4> 2017-04-13 12:34:39.204645 7fd2857aa700  5 - op tracker -- seq: 22, time: 2017-04-13 12:34:39.204645, event: osdmap:preprocess_query, op: osd_boot(osd.21 booted 0 features 576460752032874495 v2022)
3> 2017-04-13 12:34:39.204655 7fd2857aa700 5 - op tracker -- seq: 22, time: 2017-04-13 12:34:39.204655, event: osdmap:preprocess_boot, op: osd_boot(osd.21 booted 0 features 576460752032874495 v2022)
2> 2017-04-13 12:34:39.204681 7fd2857aa700 5 - op tracker -- seq: 22, time: 2017-04-13 12:34:39.204681, event: osdmap:prepare_update, op: osd_boot(osd.21 booted 0 features 576460752032874495 v2022)
1> 2017-04-13 12:34:39.204693 7fd2857aa700 5 - op tracker -- seq: 22, time: 2017-04-13 12:34:39.204692, event: osdmap:prepare_boot, op: osd_boot(osd.21 booted 0 features 576460752032874495 v2022)
0> 2017-04-13 12:34:39.213266 7fd2857aa700 1 mon/OSDMonitor.cc: In function 'bool OSDMonitor::prepare_boot(MonOpRequestRef)' thread 7fd2857aa700 time 2017-04-13 12:34:39.204709
mon/OSDMonitor.cc: 2105: FAILED assert(osdmap.get_uuid(from) == m
>sb.osd_fsid)
ceph version 10.2.6 (656b5b63ed7c43bd014bcafd81b001959d5f089f)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x80) [0x55f6c17cc260]
2: (OSDMonitor::prepare_boot(std::shared_ptr<MonOpRequest>)+0x1bd2) [0x55f6c1477e82]
3: (OSDMonitor::prepare_update(std::shared_ptr<MonOpRequest>)+0x28b) [0x55f6c14aaa1b]
4: (PaxosService::dispatch(std::shared_ptr<MonOpRequest>)+0xb4f) [0x55f6c145b84f]
5: (PaxosService::C_RetryMessage::_finish(int)+0x58) [0x55f6c145ce38]
6: (C_MonOp::finish(int)+0x82) [0x55f6c14250c2]
7: (Context::complete(int)+0x9) [0x55f6c14241a9]
8: (void finish_contexts<Context>(CephContext*, std::__cxx11::list<Context*, std::allocator<Context*> >&, int)+0x1fb) [0x55f6c142a8db]
9: (Paxos::finish_round()+0x287) [0x55f6c14512b7]
10: (Paxos::handle_last(std::shared_ptr<MonOpRequest>)+0xe19) [0x55f6c1452499]
11: (Paxos::dispatch(std::shared_ptr<MonOpRequest>)+0x250) [0x55f6c1452cc0]
12: (Monitor::dispatch_op(std::shared_ptr<MonOpRequest>)+0xa38) [0x55f6c141e5c8]
13: (Monitor::_ms_dispatch(Message*)+0x554) [0x55f6c141edc4]
14: (Monitor::ms_dispatch(Message*)+0x23) [0x55f6c1441e93]
15: (DispatchQueue::entry()+0xf2b) [0x55f6c18c1fab]
16: (DispatchQueue::DispatchThread::entry()+0xd) [0x55f6c17b25ad]
17: (()+0x76fa) [0x7fd28de0e6fa]
18: (clone()+0x6d) [0x7fd28c0c8b5d]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- logging levels ---
0/ 5 none
0/ 1 lockdep
0/ 1 context
1/ 1 crush
1/ 5 mds
1/ 5 mds_balancer
1/ 5 mds_locker
1/ 5 mds_log
1/ 5 mds_log_expire
1/ 5 mds_migrator
0/ 1 buffer
0/ 1 timer
0/ 1 filer
0/ 1 striper
0/ 1 objecter
0/ 5 rados
0/ 5 rbd
0/ 5 rbd_mirror
0/ 5 rbd_replay
0/ 5 journaler
0/ 5 objectcacher
0/ 5 client
0/ 5 osd
0/ 5 optracker
0/ 5 objclass
1/ 3 filestore
1/ 3 journal
0/ 5 ms
1/ 5 mon
0/10 monc
1/ 5 paxos
0/ 5 tp
1/ 5 auth
1/ 5 crypto
1/ 1 finisher
1/ 5 heartbeatmap
1/ 5 perfcounter
1/ 5 rgw
1/10 civetweb
1/ 5 javaclient
1/ 5 asok
1/ 1 throttle
0/ 0 refs
1/ 5 xio
1/ 5 compressor
1/ 5 newstore
1/ 5 bluestore
1/ 5 bluefs
1/ 3 bdev
1/ 5 kstore
4/ 5 rocksdb
4/ 5 leveldb
1/ 5 kinetic
1/ 5 fuse
2/-2 (syslog threshold)
-1/-1 (stderr threshold)
max_recent 10000
max_new 1000
log_file /var/log/ceph/ceph-mon.mon3.log
--
end dump of recent events ---

Copy of monitor database is in an attachment.

Ubuntu bug: https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/1682424


Files

mon.tar.gz (673 KB) mon.tar.gz George Shuklin, 04/13/2017 01:02 PM
Actions #1

Updated by Ken Dreyer about 7 years ago

  • Backport set to jewel
Actions #2

Updated by Greg Farnum almost 7 years ago

  • Project changed from Ceph to RADOS
  • Subject changed from All moonitors have been crashing after OSD host had been rebooted to monitors crash on incorrect OSD UUID (and bad uuid following reboot?)
  • Category changed from Monitor to Correctness/Safety
  • Priority changed from Normal to High
  • Component(RADOS) Monitor, OSD added
Actions #3

Updated by Sage Weil over 6 years ago

  • Status changed from New to Can't reproduce
Actions

Also available in: Atom PDF