Bug #54397
openRadosGW Crash on Start
0%
252a97ceebcd30e1396dbdf2f6872a71c40bcfa4a58e684a1ddcb07a1248237c
5ba32bd2dae20e6377ebba271c5bfaaef8d2f7fdce2467c88d25c2bf64fd7fca
89af7ee16dcaec4566fbb4c4c6c79d25d9f81901435eb76eacb3c7ee8e92064f
Description
Some of our gateways crash if restartet via systemctl restart ceph-radosgw@rgw07.service
. After trying several times they started again. Restarting it again can work but not always.
I found a similar sounding crash: https://tracker.ceph.com/issues/52744
2022-02-24T10:13:36.378+0100 7f8ff0951700 -1 received signal: Terminated from /sbin/init (PID: 1) UID: 0 2022-02-24T10:13:36.378+0100 7f8ff0951700 1 handle_sigterm 2022-02-24T10:13:36.378+0100 7f8ff0951700 1 handle_sigterm set alarm for 120 2022-02-24T10:13:36.378+0100 7f8ff452c540 -1 shutting down 2022-02-24T10:15:06.514+0100 7ff2c58a1540 0 deferred set uid:gid to 64045:64045 (ceph:ceph) 2022-02-24T10:15:06.514+0100 7ff2c58a1540 0 ceph version 15.2.15 (efc1e7ef13525f02d58d97ccd62fec325323c618) octopus (stable), process radosgw, pid 990334 2022-02-24T10:15:06.514+0100 7ff2c58a1540 0 framework: beast 2022-02-24T10:15:06.514+0100 7ff2c58a1540 0 framework conf key: port, val: 7480 2022-02-24T10:15:06.514+0100 7ff2c58a1540 1 radosgw_Main not setting numa affinity 2022-02-24T10:15:16.754+0100 7ff2c58a1540 1 int RGWSI_Notify::robust_notify(RGWSI_RADOS::Obj&, const RGWCacheNotifyInfo&, optional_yield):394 Notify failed on object .log:meta.history: (110) Connection timed out 2022-02-24T10:15:16.754+0100 7ff2c58a1540 1 int RGWSI_Notify::robust_notify(RGWSI_RADOS::Obj&, const RGWCacheNotifyInfo&, optional_yield):397 Backtrace: : ceph version 15.2.15 (efc1e7ef13525f02d58d97ccd62fec325323c618) octopus (stable) 1: (RGWSI_Notify::distribute(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, RGWCacheNotifyInfo const&, optional_yield)+0xa7) [0x7ff2d0846fc7] 2: (RGWSI_SysObj_Cache::distribute_cache(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, rgw_raw_obj const&, ObjectCacheInfo&, int, optional_yield)+0x378) [0x7ff2d08536f8] 3: (RGWSI_SysObj_Cache::write(rgw_raw_obj const&, std::chrono::time_point<ceph::time_detail::real_clock, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> > >*, std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, ceph::buffer::v15_2_0::list, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, ceph::buffer::v15_2_0::list> > >&, bool, ceph::buffer::v15_2_0::list const&, RGWObjVersionTracker*, std::chrono::time_point<ceph::time_detail::real_clock, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> > >, optional_yield)+0x39c) [0x7ff2d0855bec] 4: (RGWSI_SysObj::Obj::WOp::write(ceph::buffer::v15_2_0::list&, optional_yield)+0x3a) [0x7ff2d03c18fa] 5: (rgw_put_system_obj(RGWSysObjectCtx&, rgw_pool const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, ceph::buffer::v15_2_0::list&, bool, RGWObjVersionTracker*, std::chrono::time_point<ceph::time_detail::real_clock, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> > >, optional_yield, std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, ceph::buffer::v15_2_0::list, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, ceph::buffer::v15_2_0::list> > >*)+0x11f) [0x7ff2d07770df] 6: (rgw_put_system_obj(RGWSysObjectCtx&, rgw_pool const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, ceph::buffer::v15_2_0::list&, bool, RGWObjVersionTracker*, std::chrono::time_point<ceph::time_detail::real_clock, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> > >, std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, ceph::buffer::v15_2_0::list, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, ceph::buffer::v15_2_0::list> > >*)+0x19) [0x7ff2d0777169] 7: (RGWSI_MDLog::write_history(RGWMetadataLogHistory const&, RGWObjVersionTracker*, bool)+0x139) [0x7ff2d03bace9] 8: (RGWSI_MDLog::init_oldest_log_period()+0x6c5) [0x7ff2d03bd825] 9: (RGWSI_MDLog::do_start()+0xec) [0x7ff2d03bdedc] 10: (RGWServiceInstance::start()+0x1a) [0x7ff2d03e5c7a] 11: (RGWServices_Def::init(ceph::common::CephContext*, bool, bool, bool)+0xab3) [0x7ff2d03e76f3] 12: (RGWServices::do_init(ceph::common::CephContext*, bool, bool, bool)+0x1a) [0x7ff2d03e915a] 13: (RGWRados::initialize()+0xed) [0x7ff2d06daf8d] 14: (RGWStoreManager::init_storage_provider(ceph::common::CephContext*, bool, bool, bool, bool, bool, bool)+0x9f) [0x7ff2d076ea7f] 15: (radosgw_Main(int, char const**)+0xdde) [0x7ff2d0389cee] 16: (__libc_start_main()+0xea) [0x7ff2cfc9dd0a] 17: (_start()+0x2a) [0x55aab100509a] 2022-02-24T10:15:16.754+0100 7ff2c58a1540 1 int RGWSI_Notify::robust_notify(RGWSI_RADOS::Obj&, const RGWCacheNotifyInfo&, optional_yield):413 Invalidating obj=.log:meta.history tries=0 2022-02-24T10:15:26.758+0100 7ff2c58a1540 1 int RGWSI_Notify::robust_notify(RGWSI_RADOS::Obj&, const RGWCacheNotifyInfo&, optional_yield):418 invalidation attempt 0 failed: (110) Connection timed out 2022-02-24T10:15:26.758+0100 7ff2c58a1540 1 int RGWSI_Notify::robust_notify(RGWSI_RADOS::Obj&, const RGWCacheNotifyInfo&, optional_yield):413 Invalidating obj=.log:meta.history tries=1
Files
Updated by Casey Bodley about 2 years ago
rgw's initialization failed because of this error:
int RGWSI_Notify::robust_notify(RGWSI_RADOS::Obj&, const RGWCacheNotifyInfo&, optional_yield):394 Notify failed on object .log:meta.history: (110) Connection timed out
it shouldn't crash here, but the error would prevent a successful startup anyway
Updated by Casey Bodley about 2 years ago
- Related to Bug #51911: crash: RGWSI_Notify::unwatch added
Updated by Casey Bodley about 2 years ago
- Status changed from New to Triaged
- Assignee set to Mark Kogan
- Tags set to startup services
Updated by Casey Bodley about 2 years ago
- Has duplicate Bug #54739: crash: RGWSI_Notify::unwatch(RGWSI_RADOS::Obj&, unsigned long) added
Updated by Casey Bodley about 2 years ago
- Has duplicate Bug #54722: crash: virtual ceph::common::RefCountedObject::~RefCountedObject(): assert(nref == 0) added
Updated by Casey Bodley about 2 years ago
- Has duplicate Bug #54719: crash: RGWSI_Notify::unwatch(RGWSI_RADOS::Obj&, unsigned long) added
Updated by Casey Bodley about 2 years ago
- Has duplicate Bug #54699: crash: RGWSI_Notify::finalize_watch() added
Updated by Casey Bodley about 2 years ago
- Has duplicate Bug #54694: crash: virtual ceph::common::RefCountedObject::~RefCountedObject(): assert(nref == 0) added
Updated by Casey Bodley about 2 years ago
- Has duplicate Bug #54683: crash: tcmalloc::allocate_full_cpp_throw_oom(unsigned long) added
Updated by Casey Bodley about 2 years ago
- Has duplicate Bug #54637: crash: virtual ceph::common::RefCountedObject::~RefCountedObject(): assert(nref == 0) added
Updated by Tony Fontenay over 1 year ago
Hello,
I encounter the same problem in Pacific v16.2.10. Is there a workaround to get the container started.
2022-12-19T10:48:51.825+0000 7fc2e7da05c0 1 rgw main: int RGWSI_Notify::robust_notify(const DoutPrefixProvider*, RGWSI_RADOS::Obj&, const RGWCacheNotifyInfo&, optional_yield):402 Notify failed on object fr-ws.rgw.log:meta.history: (110) Connection timed out 2022-12-19T10:48:51.825+0000 7fc2e7da05c0 1 rgw main: int RGWSI_Notify::robust_notify(const DoutPrefixProvider*, RGWSI_RADOS::Obj&, const RGWCacheNotifyInfo&, optional_yield):418 Invalidating obj=fr-ws.rgw.log:meta.history tries=0 2022-12-19T10:49:01.829+0000 7fc2e7da05c0 1 rgw main: int RGWSI_Notify::robust_notify(const DoutPrefixProvider*, RGWSI_RADOS::Obj&, const RGWCacheNotifyInfo&, optional_yield):423 invalidation attempt 0 failed: (110) Connection timed out 2022-12-19T10:49:01.829+0000 7fc2e7da05c0 1 rgw main: int RGWSI_Notify::robust_notify(const DoutPrefixProvider*, RGWSI_RADOS::Obj&, const RGWCacheNotifyInfo&, optional_yield):418 Invalidating obj=fr-ws.rgw.log:meta.history tries=1 2022-12-19T10:53:40.905+0000 7fc2d3242700 -1 Initialization timeout, failed to initialize
Updated by Tony Fontenay over 1 year ago
Tony Fontenay wrote:
Hello,
I encounter the same problem in Pacific v16.2.10. Is there a workaround to get the container started.
[...]
Ultimately, After several attempts, the container was able to start
Updated by Peter Razumovsky about 1 year ago
- File rgw logs.rtf rgw logs.rtf added
Faced the same issue on v16.2.11:
debug 2023-02-14T15:10:41.558+0000 7f3f28ed85c0 0 deferred set uid:gid to 167:167 (ceph:ceph)
debug 2023-02-14T15:10:41.558+0000 7f3f28ed85c0 0 ceph version 16.2.11 (3cf40e2dca667f68c6ce3ff5cd94f01e711af894) pacific (stable), process radosgw, pid 1
debug 2023-02-14T15:10:41.558+0000 7f3f28ed85c0 0 framework: beast
debug 2023-02-14T15:10:41.558+0000 7f3f28ed85c0 0 framework conf key: port, val: 80
debug 2023-02-14T15:10:41.558+0000 7f3f28ed85c0 0 framework conf key: ssl_port, val: 8443
debug 2023-02-14T15:10:41.558+0000 7f3f28ed85c0 0 framework conf key: ssl_certificate, val: /etc/ceph/private/rgw-cert.pem
debug 2023-02-14T15:10:41.558+0000 7f3f28ed85c0 1 radosgw_Main not setting numa affinity
debug 2023-02-14T15:10:51.973+0000 7f3f28ed85c0 1 rgw main: int RGWSI_Notify::robust_notify(const DoutPrefixProvider*, RGWSI_RADOS::Obj&, const RGWCacheNotifyInfo&, optional_yield):402 Notify failed on object openstack-store.rgw.log:meta.history: (110) Connection timed out
debug 2023-02-14T15:10:51.973+0000 7f3f28ed85c0 1 rgw main: int RGWSI_Notify::robust_notify(const DoutPrefixProvider*, RGWSI_RADOS::Obj&, const RGWCacheNotifyInfo&, optional_yield):418 Invalidating obj=openstack-store.rgw.log:meta.history tries=0
debug 2023-02-14T15:11:01.977+0000 7f3f28ed85c0 1 rgw main: int RGWSI_Notify::robust_notify(const DoutPrefixProvider*, RGWSI_RADOS::Obj&, const RGWCacheNotifyInfo&, optional_yield):423 invalidation attempt 0 failed: (110) Connection timed out
debug 2023-02-14T15:11:01.977+0000 7f3f28ed85c0 1 rgw main: int RGWSI_Notify::robust_notify(const DoutPrefixProvider*, RGWSI_RADOS::Obj&, const RGWCacheNotifyInfo&, optional_yield):418 Invalidating obj=openstack-store.rgw.log:meta.history tries=1
after restart logs are in the attached file.
Updated by Telemetry Bot 12 months ago
- Crash signature (v1) updated (diff)
- Crash signature (v2) updated (diff)
- Affected Versions v16.2.6, v16.2.7, v17.2.5, v17.2.6 added
Sanitized backtrace:
RGWSI_Notify::unwatch(RGWSI_RADOS::Obj&, unsigned long) RGWWatcher::reinit() Context::complete(int) Finisher::finisher_thread_entry()
Crash dump sample:
{ "archived": "2023-04-26 15:27:58.174335", "backtrace": [ "/lib64/libpthread.so.0(+0x12cf0) [0x7f8f9b1f8cf0]", "/lib64/librados.so.2(+0xa523a) [0x7f8f9ea5523a]", "/lib64/librados.so.2(+0xaf938) [0x7f8f9ea5f938]", "(RGWSI_Notify::unwatch(RGWSI_RADOS::Obj&, unsigned long)+0x30) [0x7f8f9dfbf520]", "(RGWWatcher::reinit()+0x3a) [0x7f8f9dfc653a]", "(Context::complete(int)+0xd) [0x7f8f9d86963d]", "(Finisher::finisher_thread_entry()+0x18d) [0x7f8f9bbf956d]", "/lib64/libpthread.so.0(+0x81ca) [0x7f8f9b1ee1ca]", "clone()" ], "ceph_version": "17.2.6", "crash_id": "2023-04-26T15:20:28.684265Z_8cf2caba-3178-4615-9916-7df375f669ab", "entity_name": "client.2d8b922bc596d7a7534981e8107ac04735617f84", "os_id": "centos", "os_name": "CentOS Stream", "os_version": "8", "os_version_id": "8", "process_name": "radosgw", "stack_sig": "252a97ceebcd30e1396dbdf2f6872a71c40bcfa4a58e684a1ddcb07a1248237c", "timestamp": "2023-04-26T15:20:28.684265Z", "utsname_machine": "x86_64", "utsname_release": "5.17.0-1020-oem", "utsname_sysname": "Linux", "utsname_version": "#21-Ubuntu SMP PREEMPT Fri Oct 14 09:33:24 UTC 2022" }