Actions
Bug #696
closedosd: _put_pool, assert(p->num_pg > 0)
% Done:
0%
Spent time:
Source:
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
Yesterday I tried to remove some pools from my Ceph system on 'noisy', running unstable commit eace4398cb163a670ff6bbd657de9aa5c917fcb9
I tried to remove the pools created by the RADOS gateway (.rgw, .users, .users.email) and one of my 4 OSD's went down with:
#0 0x00007f6dc69897bb in raise () from /lib/libpthread.so.0 #1 0x000000000061a96b in handle_fatal_signal (signum=6) at config.cc:253 #2 <signal handler called> #3 0x00007f6dc5559a75 in raise () from /lib/libc.so.6 #4 0x00007f6dc555d5c0 in abort () from /lib/libc.so.6 #5 0x00007f6dc5e0f8e5 in __gnu_cxx::__verbose_terminate_handler() () from /usr/lib/libstdc++.so.6 #6 0x00007f6dc5e0dd16 in ?? () from /usr/lib/libstdc++.so.6 #7 0x00007f6dc5e0dd43 in std::terminate() () from /usr/lib/libstdc++.so.6 #8 0x00007f6dc5e0de3e in __cxa_throw () from /usr/lib/libstdc++.so.6 #9 0x0000000000604992 in ceph::__ceph_assert_fail (assertion=0x63d077 "p->num_pg > 0", file=<value optimized out>, line=874, func=<value optimized out>) at common/assert.cc:25 #10 0x00000000004e9b27 in OSD::_put_pool (this=0x151c000, p=0x293b000) at osd/OSD.cc:874 #11 0x00000000004eba93 in OSD::_remove_pg (this=0x151c000, pg=<value optimized out>) at osd/OSD.cc:4609 #12 0x0000000000607a76 in ThreadPool::worker (this=0x151c5e8) at common/WorkQueue.cc:44 #13 0x0000000000523b7d in ThreadPool::WorkThread::entry() () #14 0x0000000000481f0a in Thread::_entry_func (arg=0x6185) at ./common/Thread.h:39 #15 0x00007f6dc69809ca in start_thread () from /lib/libpthread.so.0 #16 0x00007f6dc560c70d in clone () from /lib/libc.so.6 #17 0x0000000000000000 in ?? ()
The last few log lines (logging was a bit low).
2011-01-08 19:47:10.922490 7f6dba9f7700 osd3 149 pg[17.1( v 143'1 (0'0,143'1] n=1 ec=141 les=143 141/141/141) [3,1,0] r=0 mlc od 0'0 active+clean] oi.user_version=143'1 is_modify=0 2011-01-08 19:47:10.928918 7f6dbcafc700 osd3 149 OSD::ms_handle_reset() 2011-01-08 19:47:10.928960 7f6dbcafc700 osd3 149 OSD::ms_handle_reset() s=0x2d4eea0 osd/OSD.cc: In function 'void OSD::_put_pool(PGPool*)', In thread 7f6db91f4700 osd/OSD.cc:874: FAILED assert(p->num_pg > 0) ceph version 0.25~rc (commit:eace4398cb163a670ff6bbd657de9aa5c917fcb9) 1: (OSD::_put_pool(PGPool*)+0x267) [0x4e9b27] 2: (OSD::_remove_pg(PG*)+0x1653) [0x4eba93] 3: (ThreadPool::worker()+0x556) [0x607a76] 4: (ThreadPool::WorkThread::entry()+0xd) [0x523b7d] 5: (Thread::_entry_func(void*)+0xa) [0x481f0a] 6: (()+0x69ca) [0x7f6dc69809ca] 7: (clone()+0x6d) [0x7f6dc560c70d] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. *** Caught signal (Aborted) *** in thread 7f6db91f4700 ceph version 0.25~rc (commit:eace4398cb163a670ff6bbd657de9aa5c917fcb9) 1: (handle_fatal_signal(int)+0x13c) [0x61a92c] 2: (()+0xf8f0) [0x7f6dc69898f0] 3: (gsignal()+0x35) [0x7f6dc5559a75] 4: (abort()+0x180) [0x7f6dc555d5c0] 5: (__gnu_cxx::__verbose_terminate_handler()+0x115) [0x7f6dc5e0f8e5] 6: (()+0xcad16) [0x7f6dc5e0dd16] 7: (()+0xcad43) [0x7f6dc5e0dd43] 8: (()+0xcae3e) [0x7f6dc5e0de3e] 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x2f2) [0x604992] a: (OSD::_put_pool(PGPool*)+0x267) [0x4e9b27] b: (OSD::_remove_pg(PG*)+0x1653) [0x4eba93] c: (ThreadPool::worker()+0x556) [0x607a76] d: (ThreadPool::WorkThread::entry()+0xd) [0x523b7d] e: (Thread::_entry_func(void*)+0xa) [0x481f0a] f: (()+0x69ca) [0x7f6dc69809ca] 10: (clone()+0x6d) [0x7f6dc560c70d]
I could start the OSD afterwards and it recovered nicely.
I removed the pools rather quickly, they were all removed within a few seconds.
I've collected the data and uploaded it to logger.ceph.widodh.nl:/srv/ceph/issues/osd_crash_put_pool
Actions