Actions
Bug #17638
openradosgw does not gracefully handle errors during initialization
Status:
New
Priority:
Normal
Assignee:
-
Target version:
-
% Done:
0%
Source:
other
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
If initialization fails during RGWRados::init_complete(), or later when starting frontends, resources that were already initialized are not cleaned up properly. Currently this manifests as a segfault in the RGWObjectExpirer thread.
In this stack trace, the meta sync thread fails to start:
2016-10-11 15:47:12.837626 7fb45369fac0 -1 meta sync: ERROR: failed to read sync status, r=-5 2016-10-11 15:47:12.837642 7fb45369fac0 0 ERROR: sync.init() returned -5 2016-10-11 15:47:12.837651 7fb45369fac0 0 ERROR: failed to initialize meta sync thread 2016-10-11 15:47:12.838569 7fb43d6b9700 1 -- 10.17.151.111:0/4067878165 <== osd.0 10.17.151.111:6804/10462 90 ==== osd_op_reply(91 obj_delete_at_hint.0000000003 [call] v0'0 uv8 ondisk = 0) v7 ==== 149+0+15 (2152002425 0 2149983739) 0x563f64bec000 con 0x563f64bdf800 2016-10-11 15:47:12.838725 7fb45369fac0 1 -- 10.17.151.111:0/4067878165 >> 10.17.151.111:6804/10462 conn(0x563f64bdf800 :-1 s=STATE_OPEN pgs=7 cs=1 l=1).mark_down 2016-10-11 15:47:12.839264 7fb45369fac0 1 -- 10.17.151.111:0/4067878165 >> 10.17.151.111:6840/0 conn(0x563f64bde000 :-1 s=STATE_OPEN pgs=10 cs=1 l=1).mark_down 2016-10-11 15:47:12.839399 7fb45369fac0 1 -- 10.17.151.111:0/4067878165 shutdown_connections 2016-10-11 15:47:12.840933 7fb45369fac0 1 -- 10.17.151.111:0/4067878165 shutdown_connections 2016-10-11 15:47:12.841025 7fb45369fac0 1 -- 10.17.151.111:0/4067878165 wait complete. 2016-10-11 15:47:12.841062 7fb45369fac0 1 -- 10.17.151.111:0/4067878165 >> 10.17.151.111:0/4067878165 conn(0x563f64bc4000 :-1 s=STATE_NONE pgs=0 cs=0 l=0).mark_down 2016-10-11 15:47:12.842456 7fb45369fac0 -1 Couldn't init storage provider (RADOS) 2016-10-11 15:47:12.845468 7fb4376ad700 -1 *** Caught signal (Segmentation fault) ** in thread 7fb4376ad700 thread_name:rgw_obj_expirer ceph version Development (no_version) 1: (ceph::BackTrace::BackTrace(int)+0x2d) [0x563f59f336eb] 2: (()+0xe097ea) [0x563f59f327ea] 3: (()+0x10a00) [0x7fb447999a00] 4: (librados::IoCtxImpl::operate(object_t const&, ObjectOperation*, std::chrono::time_point<ceph::time_detail::real_clock, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> > >*, int)+0x64) [0x7fb44a889e82] 5: (librados::IoCtx::operate(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, librados::ObjectWriteOperation*)+0x6d) [0x7fb44a848c0d] 6: (rados::cls::lock::unlock(librados::IoCtx*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0x62) [0x563f59f118df] 7: (rados::cls::lock::Lock::unlock(librados::IoCtx*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0x30) [0x563f59f1259c] 8: (RGWObjectExpirer::process_single_shard(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, utime_t const&, utime_t const&)+0x544) [0x563f59ecf6ca] 9: (RGWObjectExpirer::inspect_all_shards(utime_t const&, utime_t const&)+0x1f3) [0x563f59ecf98f] 10: (RGWObjectExpirer::OEWorker::entry()+0x161) [0x563f59ecfca9] 11: (Thread::entry_wrapper()+0xc1) [0x563f59f9a82d] 12: (Thread::_entry_func(void*)+0x18) [0x563f59f9a762] 13: (()+0x761a) [0x7fb44799061a] 14: (clone()+0x6d) [0x7fb44618259d]
And here a frontend fails to run (from downstream testing):
2016-10-18 09:30:14.109302 7efdcf2889c0 0 ceph version 10.2.3-7.el7cp (f69f9569b426f45d948df4be635aa92f4d656654), process radosgw, pid 8145 2016-10-18 09:30:14.185347 7efd4ffff700 0 RGWGC::process() failed to acquire lock on gc.4 2016-10-18 09:30:14.185884 7efd4ffff700 0 RGWGC::process() failed to acquire lock on gc.5 2016-10-18 09:30:14.186366 7efd4ffff700 0 RGWGC::process() failed to acquire lock on gc.6 2016-10-18 09:30:14.187322 7efdcf2889c0 0 starting handler: civetweb 2016-10-18 09:30:14.187432 7efdcf2889c0 0 civetweb: 0x7efdcf4e2dc0: load_dll: cannot load libssl.so 2016-10-18 09:30:14.187475 7efdcf2889c0 0 civetweb: 0x7efdcf4e2dc0: load_dll: cannot load libcrypto.so 2016-10-18 09:30:14.187485 7efdcf2889c0 -1 ERROR: failed run 2016-10-18 09:30:14.189308 7efd4f7fe700 -1 *** Caught signal (Aborted) ** in thread 7efd4f7fe700 thread_name:rgw_obj_expirer ceph version 10.2.3-7.el7cp (f69f9569b426f45d948df4be635aa92f4d656654) 1: (()+0x56f89a) [0x7efdc59a189a] 2: (()+0xf370) [0x7efdc4db1370] 3: (gsignal()+0x37) [0x7efdc42f41d7] 4: (abort()+0x148) [0x7efdc42f58c8] 5: (__gnu_cxx::__verbose_terminate_handler()+0x165) [0x7efdc48f6ab5] 6: (()+0x5ea26) [0x7efdc48f4a26] 7: (()+0x5ea53) [0x7efdc48f4a53] 8: (()+0x5ec73) [0x7efdc48f4c73] 9: (operator new(unsigned long)+0x7d) [0x7efdc48f520d] 10: (std::string::_Rep::_S_create(unsigned long, unsigned long, std::allocator<char> const&)+0x59) [0x7efdc4953ce9] 11: (std::string::_Rep::_M_clone(std::allocator<char> const&, unsigned long)+0x1b) [0x7efdc49548fb] 12: (std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string(std::string const&)+0x5c) [0x7efdc4954fcc] 13: (RGWObjectExpirer::process_single_shard(std::string const&, utime_t const&, utime_t const&)+0x133) [0x7efdc57d9943] 14: (RGWObjectExpirer::inspect_all_shards(utime_t const&, utime_t const&)+0xb2) [0x7efdc57d9fb2] 15: (RGWObjectExpirer::OEWorker::entry()+0x7f) [0x7efdc57da25f] 16: (()+0x7dc5) [0x7efdc4da9dc5] 17: (clone()+0x6d) [0x7efdc43b673d]
No data to display
Actions