Project

General

Profile

Actions

Bug #214

closed

don't fail on assertion when mkcephfs is mis-used

Added by ar Fred almost 14 years ago. Updated over 13 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
OSD
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

3 boxes, each with 1 mon 1 mds 1 osd

I wanted a clean base for further testing, so on each boxes, I did a:

mkcephfs -c /etc/ceph/ceph.conf --mkbtrfs --clobber_old_data -k /etc/ceph/keyring.bin

it apparently worked fine.

starting the cluster, every daeamon runs, but there seem to be a problem with osds joining in, ceph -w reports:

pg v2: 792 pgs: 792 creating; 0 KB data, 0 KB used, 0 KB / 0 KB avail
mds e5: 1/1/1 up {0=up:creating}, 2 up:standby
osd e1: 0 osds: 0 up, 0 in
mon e1: 3 mons at 172.16.20.9:6789/0 172.16.20.10:6789/0 172.167.20.11:6789/0

then trying to restart osd0 issuing;

/etc/init.d/ceph restart osd

that osd crashes after being rstarted, and the two non-leader mon crash

stacktrace for the osd:

#0  0x00007f9dd1983a75 in raise () from /lib/libc.so.6
#1  0x00007f9dd19875c0 in abort () from /lib/libc.so.6
#2  0x00007f9dd22388e5 in __gnu_cxx::__verbose_terminate_handler() () from /usr/lib/libstdc++.so.6
#3  0x00007f9dd2236d16 in ?? () from /usr/lib/libstdc++.so.6
#4  0x00007f9dd2236d43 in std::terminate() () from /usr/lib/libstdc++.so.6
#5  0x00007f9dd2236e3e in __cxa_throw () from /usr/lib/libstdc++.so.6
#6  0x00000000005b39f8 in ceph::__ceph_assert_fail (assertion=0x5e3440 "ceph_fsid_compare(&inc.fsid, &fsid) == 0", file=<value optimized out>, line=482, func=<value optimized out>)
    at common/assert.cc:30
#7  0x0000000000514228 in OSDMap::apply_incremental(OSDMap::Incremental&) ()
#8  0x00000000004dc3a1 in OSD::handle_osd_map (this=0x18cf6b0, m=<value optimized out>) at osd/OSD.cc:2175
#9  0x00000000004e7c20 in OSD::_dispatch (this=0x18cf6b0, m=0x7f9dc000ac70) at osd/OSD.cc:1837
#10 0x00000000004e8619 in OSD::ms_dispatch (this=0x18cf6b0, m=0x7f9dc000ac70) at osd/OSD.cc:1728
#11 0x0000000000460769 in Messenger::ms_deliver_dispatch (this=<value optimized out>) at msg/Messenger.h:97
#12 SimpleMessenger::dispatch_entry (this=<value optimized out>) at msg/SimpleMessenger.cc:332
#13 0x00000000004567cc in SimpleMessenger::DispatchThread::entry (this=0x18ccb30) at msg/SimpleMessenger.h:497
#14 0x0000000000469a4a in Thread::_entry_func (arg=0x7ab9) at ./common/Thread.h:39
#15 0x00007f9dd28169ca in start_thread () from /lib/libpthread.so.0
#16 0x00007f9dd1a366cd in clone () from /lib/libc.so.6
#17 0x0000000000000000 in ?? ()

and for the monitors

#0  0x00007fc350d7da75 in raise () from /lib/libc.so.6
#1  0x00007fc350d815c0 in abort () from /lib/libc.so.6
#2  0x00007fc3516328e5 in __gnu_cxx::__verbose_terminate_handler() () from /usr/lib/libstdc++.so.6
#3  0x00007fc351630d16 in ?? () from /usr/lib/libstdc++.so.6
#4  0x00007fc351630d43 in std::terminate() () from /usr/lib/libstdc++.so.6
#5  0x00007fc351630e3e in __cxa_throw () from /usr/lib/libstdc++.so.6
#6  0x0000000000535ae8 in ceph::__ceph_assert_fail (assertion=0x56b668 "ceph_fsid_compare(&inc.fsid, &fsid) == 0", file=<value optimized out>, line=482, func=<value optimized out>)
    at common/assert.cc:30
#7  0x000000000049d5ef in OSDMap::apply_incremental(OSDMap::Incremental&) ()
#8  0x000000000048bbdc in OSDMonitor::update_from_paxos (this=0x1b79740) at mon/OSDMonitor.cc:97
#9  0x000000000046b261 in Monitor::_ms_dispatch (this=<value optimized out>, m=0x6038010) at mon/Monitor.cc:717
#10 0x0000000000476a0b in Monitor::ms_dispatch(Message*) ()
#11 0x0000000000450d39 in Messenger::ms_deliver_dispatch (this=<value optimized out>) at msg/Messenger.h:97
#12 SimpleMessenger::dispatch_entry (this=<value optimized out>) at msg/SimpleMessenger.cc:332
#13 0x000000000044793c in SimpleMessenger::DispatchThread::entry (this=0x1b70bf0) at msg/SimpleMessenger.h:497
#14 0x000000000045a02a in Thread::_entry_func (arg=0x4e9a) at ./common/Thread.h:39
#15 0x00007fc351c109ca in start_thread () from /lib/libpthread.so.0
#16 0x00007fc350e306cd in clone () from /lib/libc.so.6
#17 0x0000000000000000 in ?? ()

So, as discussed with Sage on IRC, I mis-used mkcephfs

<sage> ah.  mkcephfs doesn't currently support running independnetly on differnt hosts.. it has to be run from one host with -a (--all-hosts)
<sage> otherwise the fsid/shared data won't match.  we still need to make a mode that will allow it to be run in parallel

maybe an error message instead of an assert would be a good thing ?

Actions #1

Updated by Sage Weil almost 14 years ago

  • Category set to OSD
  • Assignee set to Greg Farnum

handle_osd_map should log an error and return if the fsid doesn't match

Actions #2

Updated by Greg Farnum almost 14 years ago

  • Status changed from New to Resolved

OSD will now warn to log and shutdown on a bad fsid. (Map updates can only come from trusted sources, so if it gets a mismatched fsid that needs attention.)
Fixed in 9bbeec4745fa6f04835587654492fc371fcfdbeb.

Actions

Also available in: Atom PDF