Bug #51278
openmds: "FAILED ceph_assert(!segments.empty())"
0%
Description
2021-06-17T00:38:52.017 INFO:tasks.ceph.mds.f.smithi175.stderr:/home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.0.0-5307-g06fb5cf0/rpm/el8/BUILD/ceph-17.0.0-5307-g06fb5cf0/src/mds/MDLog.h: In function 'LogSegment* MDLog::get_current_segment()' thread 7f196e925700 time 2021-06-17T00:38:52.017144+0000 2021-06-17T00:38:52.018 INFO:tasks.ceph.mds.f.smithi175.stderr:/home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.0.0-5307-g06fb5cf0/rpm/el8/BUILD/ceph-17.0.0-5307-g06fb5cf0/src/mds/MDLog.h: 99: FAILED ceph_assert(!segments.empty()) 2021-06-17T00:38:52.019 INFO:tasks.ceph.mds.f.smithi175.stderr: ceph version 17.0.0-5307-g06fb5cf0 (06fb5cf0031e099ece537a86a27543dc4010ce0c) quincy (dev) 2021-06-17T00:38:52.019 INFO:tasks.ceph.mds.f.smithi175.stderr: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x152) [0x7f1978b7e00c] 2021-06-17T00:38:52.020 INFO:tasks.ceph.mds.f.smithi175.stderr: 2: /usr/lib64/ceph/libceph-common.so.2(+0x27d214) [0x7f1978b7e214] 2021-06-17T00:38:52.020 INFO:tasks.ceph.mds.f.smithi175.stderr: 3: ceph-mds(+0x271483) [0x5562dbfa2483] 2021-06-17T00:38:52.020 INFO:tasks.ceph.mds.f.smithi175.stderr: 4: (MDCache::dispatch_fragment_dir(boost::intrusive_ptr<MDRequestImpl>&)+0x4c4) [0x5562dc02c094] 2021-06-17T00:38:52.020 INFO:tasks.ceph.mds.f.smithi175.stderr: 5: (MDCache::fragment_frozen(boost::intrusive_ptr<MDRequestImpl>&, int)+0x2cc) [0x5562dc02ce1c] 2021-06-17T00:38:52.020 INFO:tasks.ceph.mds.f.smithi175.stderr: 6: (MDCache::fragment_mark_and_complete(boost::intrusive_ptr<MDRequestImpl>&)+0xa2c) [0x5562dc03d65c] 2021-06-17T00:38:52.021 INFO:tasks.ceph.mds.f.smithi175.stderr: 7: (MDCache::merge_dir(CInode*, frag_t)+0x5a3) [0x5562dc03e483] 2021-06-17T00:38:52.021 INFO:tasks.ceph.mds.f.smithi175.stderr: 8: ceph-mds(+0x3ed2a2) [0x5562dc11e2a2] 2021-06-17T00:38:52.022 INFO:tasks.ceph.mds.f.smithi175.stderr: 9: (Context::complete(int)+0xd) [0x5562dbeec07d] 2021-06-17T00:38:52.022 INFO:tasks.ceph.mds.f.smithi175.stderr: 10: (SafeTimer::timer_thread()+0x1c0) [0x7f1978c830b0] 2021-06-17T00:38:52.022 INFO:tasks.ceph.mds.f.smithi175.stderr: 11: (SafeTimerThread::entry()+0x11) [0x7f1978c85c51] 2021-06-17T00:38:52.022 INFO:tasks.ceph.mds.f.smithi175.stderr: 12: (Thread::_entry_func(void*)+0xd) [0x7f1978c74d2d] 2021-06-17T00:38:52.023 INFO:tasks.ceph.mds.f.smithi175.stderr: 13: /lib64/libpthread.so.0(+0x814a) [0x7f197791914a] 2021-06-17T00:38:52.023 INFO:tasks.ceph.mds.f.smithi175.stderr: 14: clone() 2021-06-17T00:38:52.023 INFO:tasks.ceph.mds.f.smithi175.stderr:*** Caught signal (Aborted) **
From: /ceph/teuthology-archive/pdonnell-2021-06-16_21:26:55-fs-wip-pdonnell-testing-20210616.191804-distro-basic-smithi/6175747/teuthology.log
Updated by Patrick Donnelly almost 3 years ago
- Related to Bug #50821: qa: untar_snap_rm failure during mds thrashing added
Updated by Patrick Donnelly almost 3 years ago
- Backport changed from FAILED ceph_assert(!segments.empty())pacific,octopus to pacific,octopus
Updated by Patrick Donnelly almost 3 years ago
- Status changed from New to Triaged
- Assignee set to Patrick Donnelly
Updated by Venky Shankar over 2 years ago
Might be related to: https://tracker.ceph.com/issues/51589
Updated by Venky Shankar over 2 years ago
- Related to Bug #53753: mds: crash (assert hit) when merging dirfrags added
Updated by Venky Shankar almost 2 years ago
Latest occurrence with similar backtrace - https://pulpito.ceph.com/vshankar-2022-06-03_10:03:27-fs-wip-vshankar-testing1-20220603-134300-testing-default-smithi/6862102/
-1> 2022-06-03T11:49:41.369+0000 7f3ed4ff3700 -1 /home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.0.0-12791-g44457532/rpm/el8/BUILD/ceph-17.0.0-12791-g4445753 2/src/mds/MDLog.h: In function 'uint64_t MDLog::get_last_segment_seq() const' thread 7f3ed4ff3700 time 2022-06-03T11:49:41.370159+0000 /home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.0.0-12791-g44457532/rpm/el8/BUILD/ceph-17.0.0-12791-g44457532/src/mds/MDLog.h: 247: FAILED ceph_assert(!segments. empty()) ceph version 17.0.0-12791-g44457532 (44457532553f59daa44f901930401f299f7ef2a5) quincy (dev) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x152) [0x7f3ee68c1b14] 2: /usr/lib64/ceph/libceph-common.so.2(+0x2c1d35) [0x7f3ee68c1d35] 3: (MDLog::trim_all()+0x94d) [0x55caacd13b7d] 4: (MDCache::shutdown_pass()+0xa6d) [0x55caacb3a1bd] 5: (MDSRankDispatcher::tick()+0x2d0) [0x55caac9f0ca0] 6: (Context::complete(int)+0xd) [0x55caac9cd2fd] 7: (CommonSafeTimer<ceph::fair_mutex>::timer_thread()+0x18b) [0x7f3ee69d967b] 8: (CommonSafeTimerThread<ceph::fair_mutex>::entry()+0x11) [0x7f3ee69db681] 9: /lib64/libpthread.so.0(+0x817a) [0x7f3ee500817a] 10: clone()
Patrick, did you root cause this?
Updated by Venky Shankar almost 2 years ago
- Target version changed from v17.0.0 to v18.0.0
- Backport changed from pacific,octopus to quincy,pacific
- Severity changed from 3 - minor to 2 - major
Updated by Venky Shankar almost 2 years ago
- Category set to Correctness/Safety
- Assignee changed from Patrick Donnelly to Venky Shankar
Updated by Patrick Donnelly over 1 year ago
- Has duplicate Bug #57204: MDLog.h: 99: FAILED ceph_assert(!segments.empty()) added
Updated by Stephen Cuppett over 1 year ago
Venky Shankar wrote:
Latest occurrence with similar backtrace - https://pulpito.ceph.com/vshankar-2022-06-03_10:03:27-fs-wip-vshankar-testing1-20220603-134300-testing-default-smithi/6862102/
[...]
Patrick, did you root cause this?
If it helps, I had a similar failure on rook 1.10.3 with ceph 17.2.3:
[scuppett@nzxt ~]$ kubectl -n $ROOK_CLUSTER_NAMESPACE exec -it $TOOLS_POD -- ceph status cluster: id: 5564c7f9-ff56-4641-8d2e-99978fcfd257 health: HEALTH_WARN 1 daemons have recently crashed services: mon: 3 daemons, quorum b,c,d (age 4d) mgr: a(active, since 4d) mds: 1/1 daemons up, 1 hot standby osd: 5 osds: 5 up (since 29m), 5 in (since 4d) data: volumes: 1/1 healthy pools: 3 pools, 65 pgs objects: 303.69k objects, 44 GiB usage: 137 GiB used, 503 GiB / 640 GiB avail pgs: 65 active+clean io: client: 20 KiB/s rd, 18 KiB/s wr, 3 op/s rd, 5 op/s wr [scuppett@nzxt ~]$ kubectl -n $ROOK_CLUSTER_NAMESPACE exec -it $TOOLS_POD -- ceph crash ls ID ENTITY NEW 2022-10-26T19:51:15.385675Z_d6cd885c-2141-4562-b1a2-2c88f26d690d mds.myfs-b * [scuppett@nzxt ~]$ kubectl -n $ROOK_CLUSTER_NAMESPACE exec -it $TOOLS_POD -- ceph crash info 2022-10-26T19:51:15.385675Z_d6cd885c-2141-4562-b1a2-2c88f26d690d { "assert_condition": "!segments.empty()", "assert_file": "/home/jenkins-build/build/workspace/ceph-build/ARCH/arm64/AVAILABLE_ARCH/arm64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.2.3/rpm/el8/BUILD/ceph-17.2.3/src/mds/MDLog.h", "assert_func": "LogSegment* MDLog::get_current_segment()", "assert_line": 99, "assert_msg": "/home/jenkins-build/build/workspace/ceph-build/ARCH/arm64/AVAILABLE_ARCH/arm64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.2.3/rpm/el8/BUILD/ceph-17.2.3/src/mds/MDLog.h: In function 'LogSegment* MDLog::get_current_segment()' thread ffff9970b000 time 2022-10-26T19:51:15.383031+0000\n/home/jenkins-build/build/workspace/ceph-build/ARCH/arm64/AVAILABLE_ARCH/arm64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.2.3/rpm/el8/BUILD/ceph-17.2.3/src/mds/MDLog.h: 99: FAILED ceph_assert(!segments.empty())\n", "assert_thread_name": "ms_dispatch", "backtrace": [ "__kernel_rt_sigreturn()", "gsignal()", "abort()", "(ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1b4) [0xffffa00202cc]", "(ceph::__ceph_assertf_fail(char const*, char const*, int, char const*, char const*, ...)+0) [0xffffa002046c]", "(Server::journal_close_session(Session*, int, Context*)+0x6cc) [0xaaaac0c2108c]", "(Server::kill_session(Session*, Context*)+0x218) [0xaaaac0c21748]", "(Server::apply_blocklist()+0xf8) [0xaaaac0c21a00]", "(MDSRank::apply_blocklist(std::set<entity_addr_t, std::less<entity_addr_t>, std::allocator<entity_addr_t> > const&, unsigned int)+0x44) [0xaaaac0be3954]", "(MDSRankDispatcher::handle_osd_map()+0xd0) [0xaaaac0be3c70]", "(MDSDaemon::handle_core_message(boost::intrusive_ptr<Message const> const&)+0x348) [0xaaaac0bcd6b8]", "(MDSDaemon::ms_dispatch2(boost::intrusive_ptr<Message> const&)+0xd4) [0xaaaac0bce0d4]", "(DispatchQueue::entry()+0x11f8) [0xffffa0242188]", "(DispatchQueue::DispatchThread::entry()+0x10) [0xffffa02eba70]", "/lib64/libpthread.so.0(+0x78b8) [0xffff9f9068b8]", "/lib64/libc.so.6(+0x23a7c) [0xffff9f492a7c]" ], "ceph_version": "17.2.3", "crash_id": "2022-10-26T19:51:15.385675Z_d6cd885c-2141-4562-b1a2-2c88f26d690d", "entity_name": "mds.myfs-b", "os_id": "centos", "os_name": "CentOS Stream", "os_version": "8", "os_version_id": "8", "process_name": "ceph-mds", "stack_sig": "46495e15d9854b9436e341bda663a9d0a5b88afbd4c94dad07c353304c72cdc8", "timestamp": "2022-10-26T19:51:15.385675Z", "utsname_hostname": "rook-ceph-mds-myfs-b-8578b5f859-cq9zz", "utsname_machine": "aarch64", "utsname_release": "5.4.209-116.367.amzn2.aarch64", "utsname_sysname": "Linux", "utsname_version": "#1 SMP Wed Aug 31 00:10:02 UTC 2022" }
Updated by Leonid Usov 6 months ago
- Related to Bug #63281: src/mds/MDLog.h: 100: FAILED ceph_assert(!segments.empty()) added
Updated by Leonid Usov 6 months ago
Venky, the instances you add in the comments have the same assert but different call stacks. I think they are all different issues, while the original call stack in the ticket is the same as the one in the related Bug #63281. I'll address the issue there and you can decide whether you would like this ticket duplicated against that one