Project

General

Profile

Actions

Bug #696

closed

osd: _put_pool, assert(p->num_pg > 0)

Added by Wido den Hollander over 13 years ago. Updated over 13 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
OSD
Target version:
% Done:

0%

Spent time:
Source:
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Yesterday I tried to remove some pools from my Ceph system on 'noisy', running unstable commit eace4398cb163a670ff6bbd657de9aa5c917fcb9

I tried to remove the pools created by the RADOS gateway (.rgw, .users, .users.email) and one of my 4 OSD's went down with:

#0  0x00007f6dc69897bb in raise () from /lib/libpthread.so.0
#1  0x000000000061a96b in handle_fatal_signal (signum=6) at config.cc:253
#2  <signal handler called>
#3  0x00007f6dc5559a75 in raise () from /lib/libc.so.6
#4  0x00007f6dc555d5c0 in abort () from /lib/libc.so.6
#5  0x00007f6dc5e0f8e5 in __gnu_cxx::__verbose_terminate_handler() () from /usr/lib/libstdc++.so.6
#6  0x00007f6dc5e0dd16 in ?? () from /usr/lib/libstdc++.so.6
#7  0x00007f6dc5e0dd43 in std::terminate() () from /usr/lib/libstdc++.so.6
#8  0x00007f6dc5e0de3e in __cxa_throw () from /usr/lib/libstdc++.so.6
#9  0x0000000000604992 in ceph::__ceph_assert_fail (assertion=0x63d077 "p->num_pg > 0", file=<value optimized out>, 
    line=874, func=<value optimized out>) at common/assert.cc:25
#10 0x00000000004e9b27 in OSD::_put_pool (this=0x151c000, p=0x293b000) at osd/OSD.cc:874
#11 0x00000000004eba93 in OSD::_remove_pg (this=0x151c000, pg=<value optimized out>) at osd/OSD.cc:4609
#12 0x0000000000607a76 in ThreadPool::worker (this=0x151c5e8) at common/WorkQueue.cc:44
#13 0x0000000000523b7d in ThreadPool::WorkThread::entry() ()
#14 0x0000000000481f0a in Thread::_entry_func (arg=0x6185) at ./common/Thread.h:39
#15 0x00007f6dc69809ca in start_thread () from /lib/libpthread.so.0
#16 0x00007f6dc560c70d in clone () from /lib/libc.so.6
#17 0x0000000000000000 in ?? ()

The last few log lines (logging was a bit low).

2011-01-08 19:47:10.922490 7f6dba9f7700 osd3 149 pg[17.1( v 143'1 (0'0,143'1] n=1 ec=141 les=143 141/141/141) [3,1,0] r=0 mlc
od 0'0 active+clean] oi.user_version=143'1 is_modify=0
2011-01-08 19:47:10.928918 7f6dbcafc700 osd3 149 OSD::ms_handle_reset()
2011-01-08 19:47:10.928960 7f6dbcafc700 osd3 149 OSD::ms_handle_reset() s=0x2d4eea0
osd/OSD.cc: In function 'void OSD::_put_pool(PGPool*)', In thread 7f6db91f4700
osd/OSD.cc:874: FAILED assert(p->num_pg > 0)
 ceph version 0.25~rc (commit:eace4398cb163a670ff6bbd657de9aa5c917fcb9)
 1: (OSD::_put_pool(PGPool*)+0x267) [0x4e9b27]
 2: (OSD::_remove_pg(PG*)+0x1653) [0x4eba93]
 3: (ThreadPool::worker()+0x556) [0x607a76]
 4: (ThreadPool::WorkThread::entry()+0xd) [0x523b7d]
 5: (Thread::_entry_func(void*)+0xa) [0x481f0a]
 6: (()+0x69ca) [0x7f6dc69809ca]
 7: (clone()+0x6d) [0x7f6dc560c70d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
*** Caught signal (Aborted) ***
in thread 7f6db91f4700
 ceph version 0.25~rc (commit:eace4398cb163a670ff6bbd657de9aa5c917fcb9)
 1: (handle_fatal_signal(int)+0x13c) [0x61a92c]
 2: (()+0xf8f0) [0x7f6dc69898f0]
 3: (gsignal()+0x35) [0x7f6dc5559a75]
 4: (abort()+0x180) [0x7f6dc555d5c0]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x115) [0x7f6dc5e0f8e5]
 6: (()+0xcad16) [0x7f6dc5e0dd16]
 7: (()+0xcad43) [0x7f6dc5e0dd43]
 8: (()+0xcae3e) [0x7f6dc5e0de3e]
 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x2f2) [0x604992]
 a: (OSD::_put_pool(PGPool*)+0x267) [0x4e9b27]
 b: (OSD::_remove_pg(PG*)+0x1653) [0x4eba93]
 c: (ThreadPool::worker()+0x556) [0x607a76]
 d: (ThreadPool::WorkThread::entry()+0xd) [0x523b7d]
 e: (Thread::_entry_func(void*)+0xa) [0x481f0a]
 f: (()+0x69ca) [0x7f6dc69809ca]
 10: (clone()+0x6d) [0x7f6dc560c70d]

I could start the OSD afterwards and it recovered nicely.

I removed the pools rather quickly, they were all removed within a few seconds.

I've collected the data and uploaded it to logger.ceph.widodh.nl:/srv/ceph/issues/osd_crash_put_pool


Related issues 1 (0 open1 closed)

Related to Ceph - Bug #629: cosd segfaults when deleting a pool containing degraded objectsResolvedColin McCabe12/02/2010

Actions
Actions

Also available in: Atom PDF