Project

General

Profile

Actions

Bug #17570

open

rgw: segfault on shutdown after failure to start meta_sync_processor_thread

Added by Casey Bodley over 7 years ago. Updated over 6 years ago.

Status:
In Progress
Priority:
Normal
Assignee:
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
jewel
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

2016-10-11 15:47:12.836764 7fb43d6b9700  1 -- 10.17.151.111:0/4067878165 <== osd.0 10.17.151.111:6804/10462 87 ==== osd_op_reply(88 mdlog.sync-status [read 0~0] v0'0 uv3 ondisk = 0) v7 ==== 137+0+0 (2172530144 0 0) 0x563f64bec000 con 0x563f64bdf800
2016-10-11 15:47:12.836835 7fb43d6b9700  1 -- 10.17.151.111:0/4067878165 <== osd.0 10.17.151.111:6804/10462 88 ==== osd_op_reply(89 gc.15 [call] v0'0 uv3 ondisk = 0) v7 ==== 125+0+11 (462209934 0 1993775135) 0x563f64bec000 con 0x563f64bdf800
2016-10-11 15:47:12.837036 7fb437eae700  1 -- 10.17.151.111:0/4067878165 --> 10.17.151.111:6804/10462 -- osd_op(unknown.0.0:90 4.685c6f7 gc.15 [call lock.unlock] snapc 0=[] ondisk+write+known_if_redirected e12) v7 -- 0x563f64bec9c0 con 0
2016-10-11 15:47:12.837237 7fb4366ab700 20 rados->read r=0 bl.length=0
2016-10-11 15:47:12.837321 7fb45369fac0 20 cr:s=0x563f64a7b0c0:op=0x563f64c4a000:26RGWReadSyncStatusCoroutine: operate()
2016-10-11 15:47:12.837380 7fb43d6b9700  1 -- 10.17.151.111:0/4067878165 <== osd.0 10.17.151.111:6804/10462 89 ==== osd_op_reply(87 obj_delete_at_hint.0000000003 [call] v12'8 uv8 ondisk = 0) v7 ==== 149+0+0 (126494742 0 0) 0x563f64bec000 con 0x563f64bdf800
2016-10-11 15:47:12.837544 7fb4376ad700  1 -- 10.17.151.111:0/4067878165 --> 10.17.151.111:6804/10462 -- osd_op(unknown.0.0:91 6.ed5c9b88 obj_delete_at_hint.0000000003 [call timeindex.list] snapc 0=[] ack+read+known_if_redirected e12) v7 -- 0x563f64becd00 con 0
2016-10-11 15:47:12.837587 7fb45369fac0 20 cr:s=0x563f64a7b0c0:op=0x563f64c4a000:26RGWReadSyncStatusCoroutine: operate() returned r=-5
2016-10-11 15:47:12.837599 7fb45369fac0 20 stack->operate() returned ret=-5
2016-10-11 15:47:12.837604 7fb45369fac0 20 run: stack=0x563f64a7b0c0 is done
2016-10-11 15:47:12.837613 7fb45369fac0 20 run(stacks) returned r=-5
2016-10-11 15:47:12.837626 7fb45369fac0 -1 meta sync: ERROR: failed to read sync status, r=-5
2016-10-11 15:47:12.837642 7fb45369fac0  0 ERROR: sync.init() returned -5
2016-10-11 15:47:12.837651 7fb45369fac0  0 ERROR: failed to initialize meta sync thread
2016-10-11 15:47:12.838569 7fb43d6b9700  1 -- 10.17.151.111:0/4067878165 <== osd.0 10.17.151.111:6804/10462 90 ==== osd_op_reply(91 obj_delete_at_hint.0000000003 [call] v0'0 uv8 ondisk = 0) v7 ==== 149+0+15 (2152002425 0 2149983739) 0x563f64bec000 con 0x563f64bdf800
2016-10-11 15:47:12.838725 7fb45369fac0  1 -- 10.17.151.111:0/4067878165 >> 10.17.151.111:6804/10462 conn(0x563f64bdf800 :-1 s=STATE_OPEN pgs=7 cs=1 l=1).mark_down
2016-10-11 15:47:12.839264 7fb45369fac0  1 -- 10.17.151.111:0/4067878165 >> 10.17.151.111:6840/0 conn(0x563f64bde000 :-1 s=STATE_OPEN pgs=10 cs=1 l=1).mark_down
2016-10-11 15:47:12.839399 7fb45369fac0  1 -- 10.17.151.111:0/4067878165 shutdown_connections
2016-10-11 15:47:12.840933 7fb45369fac0  1 -- 10.17.151.111:0/4067878165 shutdown_connections
2016-10-11 15:47:12.841025 7fb45369fac0  1 -- 10.17.151.111:0/4067878165 wait complete.
2016-10-11 15:47:12.841062 7fb45369fac0  1 -- 10.17.151.111:0/4067878165 >> 10.17.151.111:0/4067878165 conn(0x563f64bc4000 :-1 s=STATE_NONE pgs=0 cs=0 l=0).mark_down
2016-10-11 15:47:12.842456 7fb45369fac0 -1 Couldn't init storage provider (RADOS)
2016-10-11 15:47:12.845468 7fb4376ad700 -1 *** Caught signal (Segmentation fault) **
 in thread 7fb4376ad700 thread_name:rgw_obj_expirer

 ceph version Development (no_version)
 1: (ceph::BackTrace::BackTrace(int)+0x2d) [0x563f59f336eb]
 2: (()+0xe097ea) [0x563f59f327ea]
 3: (()+0x10a00) [0x7fb447999a00]
 4: (librados::IoCtxImpl::operate(object_t const&, ObjectOperation*, std::chrono::time_point<ceph::time_detail::real_clock, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> > >*, int)+0x64) [0x7fb44a889e82]
 5: (librados::IoCtx::operate(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, librados::ObjectWriteOperation*)+0x6d) [0x7fb44a848c0d]
 6: (rados::cls::lock::unlock(librados::IoCtx*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0x62) [0x563f59f118df]
 7: (rados::cls::lock::Lock::unlock(librados::IoCtx*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0x30) [0x563f59f1259c]
 8: (RGWObjectExpirer::process_single_shard(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, utime_t const&, utime_t const&)+0x544) [0x563f59ecf6ca]
 9: (RGWObjectExpirer::inspect_all_shards(utime_t const&, utime_t const&)+0x1f3) [0x563f59ecf98f]
 10: (RGWObjectExpirer::OEWorker::entry()+0x161) [0x563f59ecfca9]
 11: (Thread::entry_wrapper()+0xc1) [0x563f59f9a82d]
 12: (Thread::_entry_func(void*)+0x18) [0x563f59f9a762]
 13: (()+0x761a) [0x7fb44799061a]
 14: (clone()+0x6d) [0x7fb44618259d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

Related issues 1 (0 open1 closed)

Related to rgw - Bug #17568: multisite: race between ReadSyncStatus and InitSyncStatus leads to EIO errorsResolvedCasey Bodley10/13/2016

Actions
Actions #1

Updated by Casey Bodley over 7 years ago

  • Related to Bug #17568: multisite: race between ReadSyncStatus and InitSyncStatus leads to EIO errors added
Actions #2

Updated by Yehuda Sadeh over 7 years ago

  • Priority changed from Normal to High
Actions #3

Updated by Casey Bodley over 7 years ago

  • Assignee set to Casey Bodley
Actions #4

Updated by Ken Dreyer over 7 years ago

  • Backport set to jewel
Actions #5

Updated by Yehuda Sadeh over 6 years ago

  • Subject changed from multisite: segfault on shutdown after failure to start meta_sync_processor_thread to rgw: segfault on shutdown after failure to start meta_sync_processor_thread

probably a general issue with initialization failures.

Actions #6

Updated by Matt Benjamin over 6 years ago

  • Status changed from New to In Progress
Actions #7

Updated by Matt Benjamin over 6 years ago

  • Priority changed from High to Normal
Actions

Also available in: Atom PDF