Project

General

Profile

Actions

Bug #56012

closed

mds: src/mds/MDLog.cc: 283: FAILED ceph_assert(!mds->is_ any_replay())

Added by Venky Shankar almost 2 years ago. Updated over 1 year ago.

Status:
Resolved
Priority:
High
Category:
Correctness/Safety
Target version:
% Done:

100%

Source:
Q/A
Tags:
Backport:
quincy, pacific
Regression:
No
Severity:
1 - critical
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
crash
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Seen with fs:worklood - https://pulpito.ceph.com/vshankar-2022-06-10_01:04:46-fs-wip-vshankar-testing-20220609-175550-testing-default-smithi/6871692/

Description: fs/workload/{0-rhel_8 begin/{0-install 1-cephadm 2-logrotate} clusters/1a11s-mds-1c-client-3node conf/{client mds mon osd} mount/kclient/{base/{mount-syntax/{v2} mount overrides/{distro/testing/k-testing ms-die-on-skipped}} ms_mode/secure wsync/no} objectstore-ec/bluestore-ec-root omap_limit/10000 overrides/{frag ignorelist_health ignorelist_wrongly_marked_down osd-asserts session_timeout} ranks/5 scrub/yes standby-replay subvolume/{with-namespace-isolated-and-quota} tasks/{0-check-counter workunit/suites/blogbench}}

This crash is seen when running fs:workload with CephFS subvolume using isolated namespace and quota option.

Backtrace from ./remote/smithi059/log/190c0084-e866-11ec-8422-001a4aab830c/ceph-mds.g.log.gz

2022-06-10T05:53:50.740+0000 7f3762fc6700 -1 /home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.0.0-12928-gf8d2b950/rpm/el8/BUILD/ceph-17.0.0-12928-gf8d2b950/src/mds/MDLog.cc: In function 'void MDLog::_submit_entry(LogEvent*, MDSLogContextBase*)' thread 7f3762fc6700 time 2022-06-10T05:53:50.739247+0000
/home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.0.0-12928-gf8d2b950/rpm/el8/BUILD/ceph-17.0.0-12928-gf8d2b950/src/mds/MDLog.cc: 283: FAILED ceph_assert(!mds->is_any_replay())

 ceph version 17.0.0-12928-gf8d2b950 (f8d2b9504a9cd99497a832388bcd728a71152663) quincy (dev)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x152) [0x7f376e8a2b14]
 2: /usr/lib64/ceph/libceph-common.so.2(+0x2c1d35) [0x7f376e8a2d35]
 3: (MDLog::_submit_entry(LogEvent*, MDSLogContextBase*)+0x3f) [0x55eb9cd0fb8f]
 4: (Server::journal_close_session(Session*, int, Context*)+0x78c) [0x55eb9ca2274c]
 5: (Server::kill_session(Session*, Context*)+0x212) [0x55eb9ca22ea2]
 6: (Server::apply_blocklist()+0x10d) [0x55eb9ca2315d]
 7: (MDSRank::apply_blocklist(std::set<entity_addr_t, std::less<entity_addr_t>, std::allocator<entity_addr_t> > const&, unsigned int)+0x34) [0x55eb9c9e12e4]
 8: (MDSRankDispatcher::handle_osd_map()+0xf6) [0x55eb9c9e1626]
 9: (MDSDaemon::handle_core_message(boost::intrusive_ptr<Message const> const&)+0x39b) [0x55eb9c9ca87b]
 10: (MDSDaemon::ms_dispatch2(boost::intrusive_ptr<Message> const&)+0xc3) [0x55eb9c9cb233]
 11: (DispatchQueue::entry()+0x14fa) [0x7f376eb30e5a]
 12: (DispatchQueue::DispatchThread::entry()+0x11) [0x7f376ebe93b1]
 13: /lib64/libpthread.so.0(+0x81ca) [0x7f376d83b1ca]
 14: clone()

Related issues 3 (0 open3 closed)

Has duplicate CephFS - Bug #56802: crash: void MDLog::_submit_entry(LogEvent*, MDSLogContextBase*): assert(!mds->is_any_replay())Duplicate

Actions
Copied to CephFS - Backport #56526: quincy: mds: src/mds/MDLog.cc: 283: FAILED ceph_assert(!mds->is_ any_replay())ResolvedKotresh Hiremath RavishankarActions
Copied to CephFS - Backport #56527: pacific: mds: src/mds/MDLog.cc: 283: FAILED ceph_assert(!mds->is_ any_replay())ResolvedKotresh Hiremath RavishankarActions
Actions #1

Updated by Venky Shankar almost 2 years ago

Another instance of the crash, but this time with plain vanilla subvolume - https://pulpito.ceph.com/vshankar-2022-06-10_01:04:46-fs-wip-vshankar-testing-20220609-175550-testing-default-smithi/6871700/

Description: fs/workload/{0-rhel_8 begin/{0-install 1-cephadm 2-logrotate} clusters/1a11s-mds-1c-client-3node conf/{client mds mon osd} mount/kclient/{base/{mount-syntax/{v2} mount overrides/{distro/testing/k-testing ms-die-on-skipped}} ms_mode/crc wsync/no} objectstore-ec/bluestore-comp-ec-root omap_limit/10000 overrides/{frag ignorelist_health ignorelist_wrongly_marked_down osd-asserts session_timeout} ranks/3 scrub/yes standby-replay subvolume/{no-subvolume} tasks/{0-check-counter workunit/suites/blogbench}}
Actions #2

Updated by Venky Shankar almost 2 years ago

  • Status changed from New to Triaged
  • Assignee set to Kotresh Hiremath Ravishankar
  • Priority changed from Normal to High
  • Source set to Q/A
Actions #3

Updated by Patrick Donnelly almost 2 years ago

/ceph/teuthology-archive/pdonnell-2022-06-12_05:08:12-fs:workload-wip-pdonnell-testing-20220612.004943-distro-default-smithi/6875258/teuthology.log
/ceph/teuthology-archive/pdonnell-2022-06-12_05:08:12-fs:workload-wip-pdonnell-testing-20220612.004943-distro-default-smithi/6875303/teuthology.log

Actions #4

Updated by Kotresh Hiremath Ravishankar almost 2 years ago

  • Status changed from Triaged to Fix Under Review
  • Pull request ID set to 46833
Actions #5

Updated by Kotresh Hiremath Ravishankar almost 2 years ago

  • Status changed from Fix Under Review to Pending Backport
Actions #6

Updated by Backport Bot almost 2 years ago

  • Copied to Backport #56526: quincy: mds: src/mds/MDLog.cc: 283: FAILED ceph_assert(!mds->is_ any_replay()) added
Actions #7

Updated by Backport Bot almost 2 years ago

  • Copied to Backport #56527: pacific: mds: src/mds/MDLog.cc: 283: FAILED ceph_assert(!mds->is_ any_replay()) added
Actions #8

Updated by Patrick Donnelly over 1 year ago

  • Has duplicate Bug #56802: crash: void MDLog::_submit_entry(LogEvent*, MDSLogContextBase*): assert(!mds->is_any_replay()) added
Actions #9

Updated by Backport Bot over 1 year ago

  • Tags set to backport_processed
Actions #10

Updated by Konstantin Shalygin over 1 year ago

  • Status changed from Pending Backport to Resolved
  • % Done changed from 0 to 100
  • Tags deleted (backport_processed)
Actions

Also available in: Atom PDF