Bug #56770
crash: void OSDShard::register_and_wake_split_child(PG*): assert(p != pg_slots.end())
Status:
New
Priority:
High
Assignee:
-
Category:
-
Target version:
-
% Done:
0%
Source:
Telemetry
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
3f66289ef3dde58c31697747e3918bff9113cac5235150605391ecc28e8a95e3
Crash signature (v2):
Tags:
Description
Assert condition: p != pg_slots.end()
Assert function: void OSDShard::register_and_wake_split_child(PG*)
Sanitized backtrace:
OSDShard::register_and_wake_split_child(PG*) OSD::_finish_splits(std::set<boost::intrusive_ptr<PG>, std::less<boost::intrusive_ptr<PG> >, std::allocator<boost::intrusive_ptr<PG> > >&) Context::complete(int) OSD::ShardedOpWQ::handle_oncommits(std::list<Context*, std::allocator<Context*> >&) OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*) ShardedThreadPool::shardedthreadpool_worker(unsigned int) ShardedThreadPool::WorkThreadSharded::entry()
Crash dump sample:
{ "assert_condition": "p != pg_slots.end()", "assert_file": "osd/OSD.cc", "assert_func": "void OSDShard::register_and_wake_split_child(PG*)", "assert_line": 10685, "assert_msg": "osd/OSD.cc: In function 'void OSDShard::register_and_wake_split_child(PG*)' thread 7f848c1d5700 time 2022-07-25T21:07:35.537660+0100\nosd/OSD.cc: 10685: FAILED ceph_assert(p != pg_slots.end())", "assert_thread_name": "tp_osd_tp", "backtrace": [ "/lib/x86_64-linux-gnu/libpthread.so.0(+0x14140) [0x7f84ac363140]", "gsignal()", "abort()", "(ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x17e) [0x5615e8ee0abc]", "/usr/bin/ceph-osd(+0xc21c00) [0x5615e8ee0c00]", "(OSDShard::register_and_wake_split_child(PG*)+0x9c3) [0x5615e8fa67b3]", "(OSD::_finish_splits(std::set<boost::intrusive_ptr<PG>, std::less<boost::intrusive_ptr<PG> >, std::allocator<boost::intrusive_ptr<PG> > >&)+0x14e) [0x5615e8faac1e]", "(Context::complete(int)+0x9) [0x5615e8fddf89]", "(OSD::ShardedOpWQ::handle_oncommits(std::__cxx11::list<Context*, std::allocator<Context*> >&)+0x24) [0x5615e8fef424]", "(OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x73f) [0x5615e8fcae5f]", "(ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x41a) [0x5615e96aa1aa]", "(ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x5615e96ac780]", "/lib/x86_64-linux-gnu/libpthread.so.0(+0x8ea7) [0x7f84ac357ea7]", "clone()" ], "ceph_version": "17.2.1", "crash_id": "2022-07-25T20:07:35.666784Z_6faaa4f1-d8d8-4e75-8264-c32662b68cec", "entity_name": "osd.80d14b2c0bf175fbe1c342ee221d7eca80db0e5f", "os_id": "11", "os_name": "Debian GNU/Linux 11 (bullseye)", "os_version": "11 (bullseye)", "os_version_id": "11", "process_name": "ceph-osd", "stack_sig": "3f66289ef3dde58c31697747e3918bff9113cac5235150605391ecc28e8a95e3", "timestamp": "2022-07-25T20:07:35.666784Z", "utsname_machine": "x86_64", "utsname_release": "5.15.39-1-pve", "utsname_sysname": "Linux", "utsname_version": "#1 SMP PVE 5.15.39-1 (Wed, 22 Jun 2022 17:22:00 +0200)" }
Related issues
History
#1 Updated by Telemetry Bot over 1 year ago
#2 Updated by Laura Flores 12 months ago
- Related to Bug #54589: OSD crash during node boot added
#3 Updated by Laura Flores 12 months ago
/a/yuriw-2022-12-08_15:36:34-rados-wip-yuri2-testing-2022-12-07-0821-pacific-distro-default-smithi/7108558
2022-12-08T20:27:41.468 INFO:journalctl@ceph.osd.6.smithi142.stdout:Dec 08 20:27:40 smithi142 ceph-0044fcce-7734-11ed-8441-001a4aab830c-osd-6[205355]: /home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/16.2.10-1451-gccaf42af/rpm/el8/BUILD/ceph-16.2.10-1451-gccaf42af/src/osd/OSD.cc: 10830: FAILED ceph_assert(p != pg_slots.end())
... repeats many times...
2022-12-08T20:27:41.468 INFO:journalctl@ceph.osd.6.smithi142.stdout:Dec 08 20:27:40 smithi142 ceph-0044fcce-7734-11ed-8441-001a4aab830c-osd-6[205355]: debug 2022-12-08T20:27:40.939+0000 7f3209de9700 -1 osd.6 532 failed to load OSD map for epoch 528, got 0 bytes
...
2022-12-08T20:27:41.512 INFO:journalctl@ceph.osd.6.smithi142.stdout:Dec 08 20:27:40 smithi142 ceph-0044fcce-7734-11ed-8441-001a4aab830c-osd-6[205355]: ceph version 16.2.10-1451-gccaf42af (ccaf42af7ffbf9ad0e95ca63f5b2bfc8de2ef11a) pacific (stable)
2022-12-08T20:27:41.512 INFO:journalctl@ceph.osd.6.smithi142.stdout:Dec 08 20:27:40 smithi142 ceph-0044fcce-7734-11ed-8441-001a4aab830c-osd-6[205355]: 1: /lib64/libpthread.so.0(+0x12cf0) [0x7f323290bcf0]
2022-12-08T20:27:41.512 INFO:journalctl@ceph.osd.6.smithi142.stdout:Dec 08 20:27:40 smithi142 ceph-0044fcce-7734-11ed-8441-001a4aab830c-osd-6[205355]: 2: gsignal()
2022-12-08T20:27:41.512 INFO:journalctl@ceph.osd.6.smithi142.stdout:Dec 08 20:27:40 smithi142 ceph-0044fcce-7734-11ed-8441-001a4aab830c-osd-6[205355]: 3: abort()
2022-12-08T20:27:41.512 INFO:journalctl@ceph.osd.6.smithi142.stdout:Dec 08 20:27:40 smithi142 ceph-0044fcce-7734-11ed-8441-001a4aab830c-osd-6[205355]: 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1a9) [0x560dd40ef271]
2022-12-08T20:27:41.513 INFO:journalctl@ceph.osd.6.smithi142.stdout:Dec 08 20:27:40 smithi142 ceph-0044fcce-7734-11ed-8441-001a4aab830c-osd-6[205355]: 5: /usr/bin/ceph-osd(+0x58343a) [0x560dd40ef43a]
2022-12-08T20:27:41.513 INFO:journalctl@ceph.osd.6.smithi142.stdout:Dec 08 20:27:40 smithi142 ceph-0044fcce-7734-11ed-8441-001a4aab830c-osd-6[205355]: 6: (OSDShard::register_and_wake_split_child(PG*)+0x810) [0x560dd4228f30]
2022-12-08T20:27:41.513 INFO:journalctl@ceph.osd.6.smithi142.stdout:Dec 08 20:27:40 smithi142 ceph-0044fcce-7734-11ed-8441-001a4aab830c-osd-6[205355]: 7: (OSD::_finish_splits(std::set<boost::intrusive_ptr<PG>, std::less<boost::intrusive_ptr<PG> >, std::allocator<boost::intrusive_ptr<PG> > >&)+0x2ad) [0x560dd422923d]
2022-12-08T20:27:41.513 INFO:journalctl@ceph.osd.6.smithi142.stdout:Dec 08 20:27:40 smithi142 ceph-0044fcce-7734-11ed-8441-001a4aab830c-osd-6[205355]: 8: (Context::complete(int)+0xd) [0x560dd4234d0d]
2022-12-08T20:27:41.513 INFO:journalctl@ceph.osd.6.smithi142.stdout:Dec 08 20:27:40 smithi142 ceph-0044fcce-7734-11ed-8441-001a4aab830c-osd-6[205355]: 9: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0xcac) [0x560dd4216cdc]
2022-12-08T20:27:41.514 INFO:journalctl@ceph.osd.6.smithi142.stdout:Dec 08 20:27:40 smithi142 ceph-0044fcce-7734-11ed-8441-001a4aab830c-osd-6[205355]: 10: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x5c4) [0x560dd489abf4]
2022-12-08T20:27:41.514 INFO:journalctl@ceph.osd.6.smithi142.stdout:Dec 08 20:27:40 smithi142 ceph-0044fcce-7734-11ed-8441-001a4aab830c-osd-6[205355]: 11: (ShardedThreadPool::WorkThreadSharded::entry()+0x14) [0x560dd489dad4]
2022-12-08T20:27:41.514 INFO:journalctl@ceph.osd.6.smithi142.stdout:Dec 08 20:27:40 smithi142 ceph-0044fcce-7734-11ed-8441-001a4aab830c-osd-6[205355]: 12: /lib64/libpthread.so.0(+0x81ca) [0x7f32329011ca]
2022-12-08T20:27:41.514 INFO:journalctl@ceph.osd.6.smithi142.stdout:Dec 08 20:27:40 smithi142 ceph-0044fcce-7734-11ed-8441-001a4aab830c-osd-6[205355]: 13: clone()
#4 Updated by Laura Flores 12 months ago
- Tags set to test-failure
- Priority changed from Normal to High
We should address this crash, as it's been seen in 7 clusters over 100 times.
#5 Updated by Telemetry Bot 7 months ago
- Affected Versions v16.2.9, v17.2.4, v17.2.5 added