Project

General

Profile

Bug #5037

Ceph-MDS asserts after upgrade 0.56.2 -> 0.56.6

Added by Christopher Kunz almost 11 years ago. Updated over 7 years ago.

Status:
Can't reproduce
Priority:
High
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

After upgrading our Ceph setup to 0.56.6 from 0.56.2, the MDS processes assert() on start and will not work.
This is the assert() from the log:

2013-05-10 14:43:26.910355 7f97d7eff700 1 mds.0.15 waiting for osdmap 54251 (which blacklists prior instance)
2013-05-10 14:43:26.910402 7f97d7eff700 1 mds.0.cache handle_mds_failure mds.0 : recovery peers are
2013-05-10 14:43:26.912222 7f97d66fc700 -1 mds/MDSTable.cc: In function 'void MDSTable::load_2(int, ceph::bufferlist&, Context*)' thread 7f97d66fc700 time 2013-05-10 14:43:26.911096
mds/MDSTable.cc: 150: FAILED assert(0)

ceph version 0.56.6 (95a0bda7f007a33b0dc7adf4b330778fa1e5d70c)
1: (MDSTable::load_2(int, ceph::buffer::list&, Context*)+0x421) [0x679b91]
2: (Context::complete(int)+0xa) [0x4ab36a]
3: (Objecter::check_op_pool_dne(Objecter::Op*)+0xdd) [0x6c6ccd]
4: (Objecter::C_Op_Map_Latest::finish(int)+0x27a) [0x6c725a]
5: (Finisher::finisher_thread_entry()+0x1c0) [0x7ed100]
6: (()+0x7e9a) [0x7f97dcaede9a]
7: (clone()+0x6d) [0x7f97db98eccd]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

This is the backtrace:

root@fcmsmon0:/# gdb `which ceph-mds` ./core
GNU gdb (Ubuntu/Linaro 7.4-2012.04-0ubuntu2.1) 7.4-2012.04
Copyright (C) 2012 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
For bug reporting instructions, please see:
<http://bugs.launchpad.net/gdb-linaro/>...
Reading symbols from /usr/bin/ceph-mds...(no debugging symbols found)...done.
[New LWP 10822]
[New LWP 10826]
[New LWP 10816]
[New LWP 10814]
[New LWP 10817]
[New LWP 10818]
[New LWP 10815]
[New LWP 10820]
[New LWP 10819]
[New LWP 10813]
[New LWP 10821]
[New LWP 10823]
[New LWP 10827]
warning: Can't read pathname for load map: Input/output error.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/usr/bin/ceph-mds -i 0 --pid-file /var/run/ceph/mds.0.pid -c /etc/ceph/ceph.con'.
Program terminated with signal 6, Aborted.
#0 0x00007f97dcaf5b7b in raise () from /lib/x86_64-linux-gnu/libpthread.so.0
(gdb) thr app all bt
Thread 13 (Thread 0x7f97d5cf9700 (LWP 10827)):
#0 0x00007f97dcaf20fe in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0
#1 0x000000000077632b in SafeTimer::timer_thread() ()
#2 0x0000000000776cdd in SafeTimerThread::entry() ()
#3 0x00007f97dcaede9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#4 0x00007f97db98eccd in clone () from /lib/x86_64-linux-gnu/libc.so.6
#5 0x0000000000000000 in ?? ()
Thread 12 (Thread 0x7f97d5efb700 (LWP 10823)):
#0 0x00007f97dcaf1d84 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0
#1 0x000000000080577d in Pipe::writer() ()
#2 0x000000000080f86d in Pipe::Writer::entry() ()
#3 0x00007f97dcaede9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#4 0x00007f97db98eccd in clone () from /lib/x86_64-linux-gnu/libc.so.6
#5 0x0000000000000000 in ?? ()
Thread 11 (Thread 0x7f97d6efd700 (LWP 10821)):
#0 0x00007f97dcaf20fe in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0
#1 0x000000000077632b in SafeTimer::timer_thread() ()
#2 0x0000000000776cdd in SafeTimerThread::entry() ()
#3 0x00007f97dcaede9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#4 0x00007f97db98eccd in clone () from /lib/x86_64-linux-gnu/libc.so.6
#5 0x0000000000000000 in ?? ()
Thread 10 (Thread 0x7f97dd45b780 (LWP 10813)):
#0 0x00007f97dcaef148 in pthread_join () from /lib/x86_64-linux-gnu/libpthread.so.0
#1 0x00000000007712e2 in Thread::join(void**) ()
#2 0x000000000076a1e8 in SimpleMessenger::wait() ()
#3 0x00000000004a4d2a in main ()
Thread 9 (Thread 0x7f97d7eff700 (LWP 10819)):
#0 0x00007f97dcaf1d84 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0
#1 0x00000000007e7118 in DispatchQueue::entry() ()
#2 0x000000000076f3fd in DispatchQueue::DispatchThread::entry() ()
#3 0x00007f97dcaede9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#4 0x00007f97db98eccd in clone () from /lib/x86_64-linux-gnu/libc.so.6
#5 0x0000000000000000 in ?? ()
Thread 8 (Thread 0x7f97d76fe700 (LWP 10820)):
#0 0x00007f97db983313 in poll () from /lib/x86_64-linux-gnu/libc.so.6
#1 0x00000000007caeb9 in Accepter::entry() ()
#2 0x00007f97dcaede9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
---Type <return> to continue, or q <return> to quit---
#3 0x00007f97db98eccd in clone () from /lib/x86_64-linux-gnu/libc.so.6
#4 0x0000000000000000 in ?? ()
Thread 7 (Thread 0x7f97d9f03700 (LWP 10815)):
#0 0x00007f97dcaf40c1 in sem_timedwait () from /lib/x86_64-linux-gnu/libpthread.so.0
#1 0x000000000078b538 in CephContextServiceThread::entry() ()
#2 0x00007f97dcaede9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#3 0x00007f97db98eccd in clone () from /lib/x86_64-linux-gnu/libc.so.6
#4 0x0000000000000000 in ?? ()
Thread 6 (Thread 0x7f97d8700700 (LWP 10818)):
#0 0x00007f97db988033 in select () from /lib/x86_64-linux-gnu/libc.so.6
#1 0x0000000000817e84 in SignalHandler::entry() ()
#2 0x00007f97dcaede9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#3 0x00007f97db98eccd in clone () from /lib/x86_64-linux-gnu/libc.so.6
#4 0x0000000000000000 in ?? ()
Thread 5 (Thread 0x7f97d8f01700 (LWP 10817)):
#0 0x00007f97dcaf1d84 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0
#1 0x000000000076b4b1 in SimpleMessenger::reaper_entry() ()
#2 0x000000000076fced in SimpleMessenger::ReaperThread::entry() ()
#3 0x00007f97dcaede9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#4 0x00007f97db98eccd in clone () from /lib/x86_64-linux-gnu/libc.so.6
#5 0x0000000000000000 in ?? ()
Thread 4 (Thread 0x7f97dae4d700 (LWP 10814)):
#0 0x00007f97dcaf1d84 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0
#1 0x00000000006fe32b in ceph::log::Log::entry() ()
#2 0x00007f97dcaede9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#3 0x00007f97db98eccd in clone () from /lib/x86_64-linux-gnu/libc.so.6
#4 0x0000000000000000 in ?? ()
Thread 3 (Thread 0x7f97d9702700 (LWP 10816)):
#0 0x00007f97db983313 in poll () from /lib/x86_64-linux-gnu/libc.so.6
#1 0x0000000000778fab in AdminSocket::entry() ()
#2 0x00007f97dcaede9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#3 0x00007f97db98eccd in clone () from /lib/x86_64-linux-gnu/libc.so.6
#4 0x0000000000000000 in ?? ()
Thread 2 (Thread 0x7f97d5dfa700 (LWP 10826)):
#0 0x00007f97db983313 in poll () from /lib/x86_64-linux-gnu/libc.so.6
#1 0x00000000007f9b12 in Pipe::tcp_read_wait() ()
#2 0x00000000007f9e30 in Pipe::tcp_read(char*, int) ()
---Type <return> to continue, or q <return> to quit---
#3 0x000000000080c744 in Pipe::reader() ()
#4 0x000000000080f88d in Pipe::Reader::entry() ()
#5 0x00007f97dcaede9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#6 0x00007f97db98eccd in clone () from /lib/x86_64-linux-gnu/libc.so.6
#7 0x0000000000000000 in ?? ()
Thread 1 (Thread 0x7f97d66fc700 (LWP 10822)):
#0 0x00007f97dcaf5b7b in raise () from /lib/x86_64-linux-gnu/libpthread.so.0
#1 0x000000000081720e in ?? ()
#2 <signal handler called>
#3 0x00007f97db8d1425 in raise () from /lib/x86_64-linux-gnu/libc.so.6
#4 0x00007f97db8d4b8b in abort () from /lib/x86_64-linux-gnu/libc.so.6
#5 0x00007f97dc22369d in _gnu_cxx::_verbose_terminate_handler() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#6 0x00007f97dc221846 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#7 0x00007f97dc221873 in std::terminate() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#8 0x00007f97dc22196e in _cxa_throw () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#9 0x000000000077f91f in ceph::
_ceph_assert_fail(char const*, char const*, int, char const*) ()
#10 0x0000000000679b91 in MDSTable::load_2(int, ceph::buffer::list&, Context*) ()
#11 0x00000000004ab36a in Context::complete(int) ()
#12 0x00000000006c6ccd in Objecter::check_op_pool_dne(Objecter::Op*) ()
#13 0x00000000006c725a in Objecter::C_Op_Map_Latest::finish(int) ()
#14 0x00000000007ed100 in Finisher::finisher_thread_entry() ()
#15 0x00007f97dcaede9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#16 0x00007f97db98eccd in clone () from /lib/x86_64-linux-gnu/libc.so.6
#17 0x0000000000000000 in ?? ()

This is the

History

#1 Updated by Sage Weil almost 11 years ago

  • Priority changed from Normal to High

#2 Updated by Greg Farnum almost 11 years ago

  • Project changed from Ceph to CephFS
  • Category changed from 1 to 47

It couldn't find the actual table object in RADOS. We've seen this pop up a few times, but I believe it's always been on new/unused servers. Does yours have any data, Christopher?

#3 Updated by Christopher Kunz almost 11 years ago

Our ceph is productive, yeah. We are only using rbd, not CephFS or RadosGW, though. SJust and Sage are familiar with our cluster and there should be a documentation PDF flying around your offices somewhere.

#4 Updated by Sage Weil over 10 years ago

  • Status changed from New to Can't reproduce

#5 Updated by Greg Farnum over 7 years ago

  • Component(FS) MDS added

Also available in: Atom PDF