Project

General

Profile

Actions

Bug #17638

open

radosgw does not gracefully handle errors during initialization

Added by Casey Bodley over 7 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

If initialization fails during RGWRados::init_complete(), or later when starting frontends, resources that were already initialized are not cleaned up properly. Currently this manifests as a segfault in the RGWObjectExpirer thread.

In this stack trace, the meta sync thread fails to start:

2016-10-11 15:47:12.837626 7fb45369fac0 -1 meta sync: ERROR: failed to read sync status, r=-5
2016-10-11 15:47:12.837642 7fb45369fac0  0 ERROR: sync.init() returned -5
2016-10-11 15:47:12.837651 7fb45369fac0  0 ERROR: failed to initialize meta sync thread
2016-10-11 15:47:12.838569 7fb43d6b9700  1 -- 10.17.151.111:0/4067878165 <== osd.0 10.17.151.111:6804/10462 90 ==== osd_op_reply(91 obj_delete_at_hint.0000000003 [call] v0'0 uv8 ondisk = 0) v7 ==== 149+0+15 (2152002425 0 2149983739) 0x563f64bec000 con 0x563f64bdf800
2016-10-11 15:47:12.838725 7fb45369fac0  1 -- 10.17.151.111:0/4067878165 >> 10.17.151.111:6804/10462 conn(0x563f64bdf800 :-1 s=STATE_OPEN pgs=7 cs=1 l=1).mark_down
2016-10-11 15:47:12.839264 7fb45369fac0  1 -- 10.17.151.111:0/4067878165 >> 10.17.151.111:6840/0 conn(0x563f64bde000 :-1 s=STATE_OPEN pgs=10 cs=1 l=1).mark_down
2016-10-11 15:47:12.839399 7fb45369fac0  1 -- 10.17.151.111:0/4067878165 shutdown_connections
2016-10-11 15:47:12.840933 7fb45369fac0  1 -- 10.17.151.111:0/4067878165 shutdown_connections
2016-10-11 15:47:12.841025 7fb45369fac0  1 -- 10.17.151.111:0/4067878165 wait complete.
2016-10-11 15:47:12.841062 7fb45369fac0  1 -- 10.17.151.111:0/4067878165 >> 10.17.151.111:0/4067878165 conn(0x563f64bc4000 :-1 s=STATE_NONE pgs=0 cs=0 l=0).mark_down
2016-10-11 15:47:12.842456 7fb45369fac0 -1 Couldn't init storage provider (RADOS)
2016-10-11 15:47:12.845468 7fb4376ad700 -1 *** Caught signal (Segmentation fault) **
 in thread 7fb4376ad700 thread_name:rgw_obj_expirer

 ceph version Development (no_version)
 1: (ceph::BackTrace::BackTrace(int)+0x2d) [0x563f59f336eb]
 2: (()+0xe097ea) [0x563f59f327ea]
 3: (()+0x10a00) [0x7fb447999a00]
 4: (librados::IoCtxImpl::operate(object_t const&, ObjectOperation*, std::chrono::time_point<ceph::time_detail::real_clock, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> > >*, int)+0x64) [0x7fb44a889e82]
 5: (librados::IoCtx::operate(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, librados::ObjectWriteOperation*)+0x6d) [0x7fb44a848c0d]
 6: (rados::cls::lock::unlock(librados::IoCtx*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0x62) [0x563f59f118df]
 7: (rados::cls::lock::Lock::unlock(librados::IoCtx*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0x30) [0x563f59f1259c]
 8: (RGWObjectExpirer::process_single_shard(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, utime_t const&, utime_t const&)+0x544) [0x563f59ecf6ca]
 9: (RGWObjectExpirer::inspect_all_shards(utime_t const&, utime_t const&)+0x1f3) [0x563f59ecf98f]
 10: (RGWObjectExpirer::OEWorker::entry()+0x161) [0x563f59ecfca9]
 11: (Thread::entry_wrapper()+0xc1) [0x563f59f9a82d]
 12: (Thread::_entry_func(void*)+0x18) [0x563f59f9a762]
 13: (()+0x761a) [0x7fb44799061a]
 14: (clone()+0x6d) [0x7fb44618259d]

And here a frontend fails to run (from downstream testing):

2016-10-18 09:30:14.109302 7efdcf2889c0  0 ceph version 10.2.3-7.el7cp (f69f9569b426f45d948df4be635aa92f4d656654), process radosgw, pid 8145
2016-10-18 09:30:14.185347 7efd4ffff700  0 RGWGC::process() failed to acquire lock on gc.4
2016-10-18 09:30:14.185884 7efd4ffff700  0 RGWGC::process() failed to acquire lock on gc.5
2016-10-18 09:30:14.186366 7efd4ffff700  0 RGWGC::process() failed to acquire lock on gc.6
2016-10-18 09:30:14.187322 7efdcf2889c0  0 starting handler: civetweb
2016-10-18 09:30:14.187432 7efdcf2889c0  0 civetweb: 0x7efdcf4e2dc0: load_dll: cannot load libssl.so
2016-10-18 09:30:14.187475 7efdcf2889c0  0 civetweb: 0x7efdcf4e2dc0: load_dll: cannot load libcrypto.so
2016-10-18 09:30:14.187485 7efdcf2889c0 -1 ERROR: failed run
2016-10-18 09:30:14.189308 7efd4f7fe700 -1 *** Caught signal (Aborted) **
 in thread 7efd4f7fe700 thread_name:rgw_obj_expirer

 ceph version 10.2.3-7.el7cp (f69f9569b426f45d948df4be635aa92f4d656654)
 1: (()+0x56f89a) [0x7efdc59a189a]
 2: (()+0xf370) [0x7efdc4db1370]
 3: (gsignal()+0x37) [0x7efdc42f41d7]
 4: (abort()+0x148) [0x7efdc42f58c8]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x165) [0x7efdc48f6ab5]
 6: (()+0x5ea26) [0x7efdc48f4a26]
 7: (()+0x5ea53) [0x7efdc48f4a53]
 8: (()+0x5ec73) [0x7efdc48f4c73]
 9: (operator new(unsigned long)+0x7d) [0x7efdc48f520d]
 10: (std::string::_Rep::_S_create(unsigned long, unsigned long, std::allocator<char> const&)+0x59) [0x7efdc4953ce9]
 11: (std::string::_Rep::_M_clone(std::allocator<char> const&, unsigned long)+0x1b) [0x7efdc49548fb]
 12: (std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string(std::string const&)+0x5c) [0x7efdc4954fcc]
 13: (RGWObjectExpirer::process_single_shard(std::string const&, utime_t const&, utime_t const&)+0x133) [0x7efdc57d9943]
 14: (RGWObjectExpirer::inspect_all_shards(utime_t const&, utime_t const&)+0xb2) [0x7efdc57d9fb2]
 15: (RGWObjectExpirer::OEWorker::entry()+0x7f) [0x7efdc57da25f]
 16: (()+0x7dc5) [0x7efdc4da9dc5]
 17: (clone()+0x6d) [0x7efdc43b673d]

No data to display

Actions

Also available in: Atom PDF