Project

General

Profile

Actions

Bug #1992

closed

OSD::get_or_create_pg

Added by Wido den Hollander over 12 years ago. Updated about 12 years ago.

Status:
Can't reproduce
Priority:
Normal
Assignee:
-
Category:
OSD
Target version:
-
% Done:

0%

Spent time:
Source:
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

I've just upgraded my 0.39 cluster to 0.40 and that didn't go that well.

The whole cluster started bouncing and crashed eventually (50% of the OSD's) with:

2012-01-27 16:11:07.278037 7f53f6e06700 -- [2a00:f10:11b:cef0:225:90ff:fe33:49a4]:0/15043 <== osd.7 [2a00:f10:11b:cef0:225:90ff:fe32:cf64]:6811/20485 170 ==== osd_ping(heartbeat e0 as_of 12473) v1 ==== 61+0+0 (3540073830 0 0) 0x52924c40 con 0x51a53640
2012-01-27 16:11:07.336807 7f53f6e06700 -- [2a00:f10:11b:cef0:225:90ff:fe33:49a4]:0/15043 <== osd.5 [2a00:f10:11b:cef0:225:90ff:fe32:cf64]:6805/20353 170 ==== osd_ping(heartbeat e0 as_of 12473) v1 ==== 61+0+0 (1234559449 0 0) 0x52316a80 con 0x51d2c640
2012-01-27 16:11:07.343286 7f53f6e06700 -- [2a00:f10:11b:cef0:225:90ff:fe33:49a4]:0/15043 <== osd.2 [2a00:f10:11b:cef0:225:90ff:fe33:49fe]:6808/22634 172 ==== osd_ping(heartbeat e0 as_of 12473) v1 ==== 61+0+0 (1234559449 0 0) 0x529a6e00 con 0x512f9140
2012-01-27 16:11:07.455950 7f53f6e06700 -- [2a00:f10:11b:cef0:225:90ff:fe33:49a4]:0/15043 <== osd.3 [2a00:f10:11b:cef0:225:90ff:fe33:49fe]:6811/22821 173 ==== osd_ping(heartbeat e0 as_of 12473) v1 ==== 61+0+0 (1234559449 0 0) 0x4b611c40 con 0x51a5d640
2012-01-27 16:11:07.474723 7f53f6e06700 -- [2a00:f10:11b:cef0:225:90ff:fe33:49a4]:0/15043 <== osd.6 [2a00:f10:11b:cef0:225:90ff:fe32:cf64]:6808/20419 176 ==== osd_ping(heartbeat e0 as_of 12473) v1 ==== 61+0+0 (1234559449 0 0) 0x14ec0e00 con 0x512f98c0
2012-01-27 16:11:07.500584 7f53f7607700 -- [2a00:f10:11b:cef0:225:90ff:fe33:49a4]:6804/15042 <== osd.24 [2a00:f10:11b:cef0:225:90ff:fe33:49ca]:6801/28963 32 ==== pg_log(2.ee epoch 12531 query_epoch 12531) v2 ==== 779+0+0 (2196340623 0 0) 0x9c95b00 con 0x55b0a500
osd/OSD.cc: In function 'PG* OSD::get_or_create_pg(const PG::Info&, epoch_t, int, int&, bool, ObjectStore::Transaction**, C_Contexts**)', in thread '7f53f7607700'
osd/OSD.cc: 1242: FAILED assert(!info.dne())
 ceph version 0.40 (commit:7eea40ea37fb3a68a2042a2218c9b8c9c40a843e)
 1: (OSD::get_or_create_pg(PG::Info const&, unsigned int, int, int&, bool, ObjectStore::Transaction**, C_Contexts**)+0xbb1) [0x54b2d1]
 2: (OSD::handle_pg_log(MOSDPGLog*)+0x1d0) [0x54bae0]
 3: (OSD::_dispatch(Message*)+0x5c8) [0x553c98]
 4: (OSD::ms_dispatch(Message*)+0x11e) [0x5549de]
 5: (SimpleMessenger::dispatch_entry()+0x84b) [0x5bc0db]
 6: (SimpleMessenger::DispatchThread::entry()+0x1c) [0x4b237c]
 7: (()+0x7efc) [0x7f5403ae6efc]
 8: (clone()+0x6d) [0x7f540211789d]
 ceph version 0.40 (commit:7eea40ea37fb3a68a2042a2218c9b8c9c40a843e)
 1: (OSD::get_or_create_pg(PG::Info const&, unsigned int, int, int&, bool, ObjectStore::Transaction**, C_Contexts**)+0xbb1) [0x54b2d1]
 2: (OSD::handle_pg_log(MOSDPGLog*)+0x1d0) [0x54bae0]
 3: (OSD::_dispatch(Message*)+0x5c8) [0x553c98]
 4: (OSD::ms_dispatch(Message*)+0x11e) [0x5549de]
 5: (SimpleMessenger::dispatch_entry()+0x84b) [0x5bc0db]
 6: (SimpleMessenger::DispatchThread::entry()+0x1c) [0x4b237c]
 7: (()+0x7efc) [0x7f5403ae6efc]
 8: (clone()+0x6d) [0x7f540211789d]
*** Caught signal (Aborted) **
 in thread 7f53f7607700
 ceph version 0.40 (commit:7eea40ea37fb3a68a2042a2218c9b8c9c40a843e)
 1: /usr/bin/ceph-osd() [0x5fd926]
 2: (()+0x10060) [0x7f5403aef060]
 3: (gsignal()+0x35) [0x7f540206c3a5]
 4: (abort()+0x17b) [0x7f540206fb0b]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f540292ad7d]
 6: (()+0xb9f26) [0x7f5402928f26]
 7: (()+0xb9f53) [0x7f5402928f53]
 8: (()+0xba04e) [0x7f540292904e]
 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x193) [0x5cfd33]
 10: (OSD::get_or_create_pg(PG::Info const&, unsigned int, int, int&, bool, ObjectStore::Transaction**, C_Contexts**)+0xbb1) [0x54b2d1]
 11: (OSD::handle_pg_log(MOSDPGLog*)+0x1d0) [0x54bae0]
 12: (OSD::_dispatch(Message*)+0x5c8) [0x553c98]
 13: (OSD::ms_dispatch(Message*)+0x11e) [0x5549de]
 14: (SimpleMessenger::dispatch_entry()+0x84b) [0x5bc0db]
 15: (SimpleMessenger::DispatchThread::entry()+0x1c) [0x4b237c]
 16: (()+0x7efc) [0x7f5403ae6efc]
 17: (clone()+0x6d) [0x7f540211789d]

Eventually all OSD's went down.

Anything to test?

Actions

Also available in: Atom PDF