Project

General

Profile

Actions

Bug #56770

open

crash: void OSDShard::register_and_wake_split_child(PG*): assert(p != pg_slots.end())

Added by Telemetry Bot over 1 year ago. Updated 11 months ago.

Status:
New
Priority:
High
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Telemetry
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):

3f66289ef3dde58c31697747e3918bff9113cac5235150605391ecc28e8a95e3


Description

http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=d9289f1067de7f0cc0e374ff0f760daa4519e04cfa1b7d05fab55e79a19192b3

Assert condition: p != pg_slots.end()
Assert function: void OSDShard::register_and_wake_split_child(PG*)

Sanitized backtrace:

    OSDShard::register_and_wake_split_child(PG*)
    OSD::_finish_splits(std::set<boost::intrusive_ptr<PG>, std::less<boost::intrusive_ptr<PG> >, std::allocator<boost::intrusive_ptr<PG> > >&)
    Context::complete(int)
    OSD::ShardedOpWQ::handle_oncommits(std::list<Context*, std::allocator<Context*> >&)
    OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)
    ShardedThreadPool::shardedthreadpool_worker(unsigned int)
    ShardedThreadPool::WorkThreadSharded::entry()

Crash dump sample:
{
    "assert_condition": "p != pg_slots.end()",
    "assert_file": "osd/OSD.cc",
    "assert_func": "void OSDShard::register_and_wake_split_child(PG*)",
    "assert_line": 10685,
    "assert_msg": "osd/OSD.cc: In function 'void OSDShard::register_and_wake_split_child(PG*)' thread 7f848c1d5700 time 2022-07-25T21:07:35.537660+0100\nosd/OSD.cc: 10685: FAILED ceph_assert(p != pg_slots.end())",
    "assert_thread_name": "tp_osd_tp",
    "backtrace": [
        "/lib/x86_64-linux-gnu/libpthread.so.0(+0x14140) [0x7f84ac363140]",
        "gsignal()",
        "abort()",
        "(ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x17e) [0x5615e8ee0abc]",
        "/usr/bin/ceph-osd(+0xc21c00) [0x5615e8ee0c00]",
        "(OSDShard::register_and_wake_split_child(PG*)+0x9c3) [0x5615e8fa67b3]",
        "(OSD::_finish_splits(std::set<boost::intrusive_ptr<PG>, std::less<boost::intrusive_ptr<PG> >, std::allocator<boost::intrusive_ptr<PG> > >&)+0x14e) [0x5615e8faac1e]",
        "(Context::complete(int)+0x9) [0x5615e8fddf89]",
        "(OSD::ShardedOpWQ::handle_oncommits(std::__cxx11::list<Context*, std::allocator<Context*> >&)+0x24) [0x5615e8fef424]",
        "(OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x73f) [0x5615e8fcae5f]",
        "(ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x41a) [0x5615e96aa1aa]",
        "(ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x5615e96ac780]",
        "/lib/x86_64-linux-gnu/libpthread.so.0(+0x8ea7) [0x7f84ac357ea7]",
        "clone()" 
    ],
    "ceph_version": "17.2.1",
    "crash_id": "2022-07-25T20:07:35.666784Z_6faaa4f1-d8d8-4e75-8264-c32662b68cec",
    "entity_name": "osd.80d14b2c0bf175fbe1c342ee221d7eca80db0e5f",
    "os_id": "11",
    "os_name": "Debian GNU/Linux 11 (bullseye)",
    "os_version": "11 (bullseye)",
    "os_version_id": "11",
    "process_name": "ceph-osd",
    "stack_sig": "3f66289ef3dde58c31697747e3918bff9113cac5235150605391ecc28e8a95e3",
    "timestamp": "2022-07-25T20:07:35.666784Z",
    "utsname_machine": "x86_64",
    "utsname_release": "5.15.39-1-pve",
    "utsname_sysname": "Linux",
    "utsname_version": "#1 SMP PVE 5.15.39-1 (Wed, 22 Jun 2022 17:22:00 +0200)" 
}


Related issues 1 (1 open0 closed)

Related to Ceph - Bug #54589: OSD crash during node bootNew

Actions
Actions #1

Updated by Telemetry Bot over 1 year ago

  • Crash signature (v1) updated (diff)
  • Crash signature (v2) updated (diff)
  • Affected Versions v16.2.5, v16.2.6, v16.2.7, v17.2.1 added
Actions #2

Updated by Laura Flores over 1 year ago

  • Related to Bug #54589: OSD crash during node boot added
Actions #3

Updated by Laura Flores over 1 year ago

/a/yuriw-2022-12-08_15:36:34-rados-wip-yuri2-testing-2022-12-07-0821-pacific-distro-default-smithi/7108558

2022-12-08T20:27:41.468 INFO:journalctl@ceph.osd.6.smithi142.stdout:Dec 08 20:27:40 smithi142 ceph-0044fcce-7734-11ed-8441-001a4aab830c-osd-6[205355]: /home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/16.2.10-1451-gccaf42af/rpm/el8/BUILD/ceph-16.2.10-1451-gccaf42af/src/osd/OSD.cc: 10830: FAILED ceph_assert(p != pg_slots.end())

... repeats many times...
2022-12-08T20:27:41.468 INFO:journalctl@ceph.osd.6.smithi142.stdout:Dec 08 20:27:40 smithi142 ceph-0044fcce-7734-11ed-8441-001a4aab830c-osd-6[205355]: debug 2022-12-08T20:27:40.939+0000 7f3209de9700 -1 osd.6 532 failed to load OSD map for epoch 528, got 0 bytes
...

2022-12-08T20:27:41.512 INFO:journalctl@ceph.osd.6.smithi142.stdout:Dec 08 20:27:40 smithi142 ceph-0044fcce-7734-11ed-8441-001a4aab830c-osd-6[205355]:  ceph version 16.2.10-1451-gccaf42af (ccaf42af7ffbf9ad0e95ca63f5b2bfc8de2ef11a) pacific (stable)
2022-12-08T20:27:41.512 INFO:journalctl@ceph.osd.6.smithi142.stdout:Dec 08 20:27:40 smithi142 ceph-0044fcce-7734-11ed-8441-001a4aab830c-osd-6[205355]:  1: /lib64/libpthread.so.0(+0x12cf0) [0x7f323290bcf0]
2022-12-08T20:27:41.512 INFO:journalctl@ceph.osd.6.smithi142.stdout:Dec 08 20:27:40 smithi142 ceph-0044fcce-7734-11ed-8441-001a4aab830c-osd-6[205355]:  2: gsignal()
2022-12-08T20:27:41.512 INFO:journalctl@ceph.osd.6.smithi142.stdout:Dec 08 20:27:40 smithi142 ceph-0044fcce-7734-11ed-8441-001a4aab830c-osd-6[205355]:  3: abort()
2022-12-08T20:27:41.512 INFO:journalctl@ceph.osd.6.smithi142.stdout:Dec 08 20:27:40 smithi142 ceph-0044fcce-7734-11ed-8441-001a4aab830c-osd-6[205355]:  4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1a9) [0x560dd40ef271]
2022-12-08T20:27:41.513 INFO:journalctl@ceph.osd.6.smithi142.stdout:Dec 08 20:27:40 smithi142 ceph-0044fcce-7734-11ed-8441-001a4aab830c-osd-6[205355]:  5: /usr/bin/ceph-osd(+0x58343a) [0x560dd40ef43a]
2022-12-08T20:27:41.513 INFO:journalctl@ceph.osd.6.smithi142.stdout:Dec 08 20:27:40 smithi142 ceph-0044fcce-7734-11ed-8441-001a4aab830c-osd-6[205355]:  6: (OSDShard::register_and_wake_split_child(PG*)+0x810) [0x560dd4228f30]
2022-12-08T20:27:41.513 INFO:journalctl@ceph.osd.6.smithi142.stdout:Dec 08 20:27:40 smithi142 ceph-0044fcce-7734-11ed-8441-001a4aab830c-osd-6[205355]:  7: (OSD::_finish_splits(std::set<boost::intrusive_ptr<PG>, std::less<boost::intrusive_ptr<PG> >, std::allocator<boost::intrusive_ptr<PG> > >&)+0x2ad) [0x560dd422923d]
2022-12-08T20:27:41.513 INFO:journalctl@ceph.osd.6.smithi142.stdout:Dec 08 20:27:40 smithi142 ceph-0044fcce-7734-11ed-8441-001a4aab830c-osd-6[205355]:  8: (Context::complete(int)+0xd) [0x560dd4234d0d]
2022-12-08T20:27:41.513 INFO:journalctl@ceph.osd.6.smithi142.stdout:Dec 08 20:27:40 smithi142 ceph-0044fcce-7734-11ed-8441-001a4aab830c-osd-6[205355]:  9: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0xcac) [0x560dd4216cdc]
2022-12-08T20:27:41.514 INFO:journalctl@ceph.osd.6.smithi142.stdout:Dec 08 20:27:40 smithi142 ceph-0044fcce-7734-11ed-8441-001a4aab830c-osd-6[205355]:  10: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x5c4) [0x560dd489abf4]
2022-12-08T20:27:41.514 INFO:journalctl@ceph.osd.6.smithi142.stdout:Dec 08 20:27:40 smithi142 ceph-0044fcce-7734-11ed-8441-001a4aab830c-osd-6[205355]:  11: (ShardedThreadPool::WorkThreadSharded::entry()+0x14) [0x560dd489dad4]
2022-12-08T20:27:41.514 INFO:journalctl@ceph.osd.6.smithi142.stdout:Dec 08 20:27:40 smithi142 ceph-0044fcce-7734-11ed-8441-001a4aab830c-osd-6[205355]:  12: /lib64/libpthread.so.0(+0x81ca) [0x7f32329011ca]
2022-12-08T20:27:41.514 INFO:journalctl@ceph.osd.6.smithi142.stdout:Dec 08 20:27:40 smithi142 ceph-0044fcce-7734-11ed-8441-001a4aab830c-osd-6[205355]:  13: clone()

Actions #4

Updated by Laura Flores over 1 year ago

  • Translation missing: en.field_tag_list set to test-failure
  • Priority changed from Normal to High

We should address this crash, as it's been seen in 7 clusters over 100 times.

Actions #5

Updated by Telemetry Bot 11 months ago

  • Affected Versions v16.2.9, v17.2.4, v17.2.5 added
Actions

Also available in: Atom PDF