Bug #44680
closedmds/Mutation.h: 128: FAILED ceph_assert(num_auth_pins == 0)
0%
Description
{ "assert_condition": "num_auth_pins == 0", "assert_file": "/home/jenkins-build/build/workspace/ceph-dev-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/15.1.1-91-g126c444/rpm/el8/BUILD/ceph-15.1.1-91-g126c444/src/mds/Mutation.h", "assert_func": "virtual MutationImpl::~MutationImpl()", "assert_line": 128, "assert_msg": "/home/jenkins-build/build/workspace/ceph-dev-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/15.1.1-91-g126c444/rpm/el8/BUILD/ceph-15.1.1-91-g126c444/src/mds/Mutation.h: In function 'virtual MutationImpl::~MutationImpl()' thread 7f061e8a0700 time 2020-03-19T02:09:57.010882+0000\n/home/jenkins-build/build/workspace/ceph-dev-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/15.1.1-91-g126c444/rpm/el8/BUILD/ceph-15.1.1-91-g126c444/src/mds/Mutation.h: 128: FAILED ceph_assert(num_auth_pins == 0)\n", "assert_thread_name": "MR_Finisher", "backtrace": [ "(()+0x12dc0) [0x7f062ba3adc0]", "(pthread_getname_np()+0x48) [0x7f062ba3c038]", "(ceph::logging::Log::dump_recent()+0x428) [0x7f062cf49b28]", "(()+0x4ab4cb) [0x560a9267a4cb]", "(()+0x12dc0) [0x7f062ba3adc0]", "(gsignal()+0x10f) [0x7f062a4fe8df]", "(abort()+0x127) [0x7f062a4e8cf5]", "(ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1a9) [0x7f062cc05c21]", "(()+0x27adea) [0x7f062cc05dea]", "(MutationImpl::~MutationImpl()+0x205) [0x560a923e0e55]", "(TrackedOp::put()+0x71) [0x560a923c9a91]", "(C_Locker_FileUpdate_finish::~C_Locker_FileUpdate_finish()+0x32) [0x560a924e58a2]", "(MDSIOContextBase::complete(int)+0xfa) [0x560a925e6c7a]", "(MDSLogContextBase::complete(int)+0x44) [0x560a925e7044]", "(Finisher::finisher_thread_entry()+0x1a5) [0x7f062cc96385]", "(()+0x82de) [0x7f062ba302de]", "(clone()+0x43) [0x7f062a5c3133]" ], "ceph_version": "15.1.1-91-g126c444", "crash_id": "2020-03-19T02:09:57.042400Z_0d2ae68f-e51b-4eee-8baa-b6186684d079", "entity_name": "mds.cephfs.reesi002.euduff", "os_id": "centos", "os_name": "CentOS Linux", "os_version": "8 (Core)", "os_version_id": "8", "process_name": "ceph-mds", "stack_sig": "87c07aac5002b1b764575dc3e6e6411c5eac461da93d0097c1a0f4ef3d1bfd5e", "timestamp": "2020-03-19T02:09:57.042400Z", "utsname_hostname": "reesi002", "utsname_machine": "x86_64", "utsname_release": "4.4.0-116-generic", "utsname_sysname": "Linux", "utsname_version": "#140-Ubuntu SMP Mon Feb 12 21:23:04 UTC 2018" }
two instances of this on the lab cluster this morning upgrading from yesterday's octopus to today's.
Updated by Greg Farnum about 4 years ago
Do we have any logs or more detail about what happened?
The only thing this flags in my head is https://github.com/ceph/ceph/pull/33291, but that's in the Migrator.
Or Patrick merged a commit changing how we handle Contexts on shutdown a little bit which I was hinky about, but no real solid evidence.
Updated by Greg Farnum about 4 years ago
[13:55:18] <@sage> it was triggered by the upgrade... i'm guessing when the old container was stopped and got blacklisted?
[13:55:55] <@sage> i almost didn't notice because every upgrade i've been seeing 2 crashes on the lab cluster due to the blacklist error code from rados triggering an assert. but iiuc that is fixed/cleaned up now
Okay on shutdown all I can think of is Context shutdown handling then.
Updated by Greg Farnum about 4 years ago
Yeah definitely the fault of https://github.com/ceph/ceph/pull/33538, which was trying to prevent us from asserting on EBLACKLIST errors on shutdown. But simply deletes any pending MDSIOContextBase on shutdown instead of letting them complete.
Updated by Zheng Yan about 4 years ago
maybe revert
https://github.com/ceph/ceph/commit/73436961512bd87981244fa48212085faf7028c4 and https://github.com/ceph/ceph/commit/79f5052a1ddb61043de4e1cbec19ede2a6b4f53b
and
don't assert empty io context list when shutting down
Updated by Zheng Yan about 4 years ago
- Status changed from New to Fix Under Review
- Pull request ID set to 34110
Updated by Greg Farnum about 4 years ago
- Status changed from Fix Under Review to Pending Backport
Updated by Nathan Cutler about 4 years ago
- Related to Bug #44295: mds: MDCache.cc: 6400: FAILED ceph_assert(r == 0 || r == -2) added
Updated by Nathan Cutler about 4 years ago
Updated by Greg Farnum about 4 years ago
- Backport changed from octopus to mimic, nautilus, octopus
Updated by Nathan Cutler about 4 years ago
- Copied to Backport #45026: mimic: mds/Mutation.h: 128: FAILED ceph_assert(num_auth_pins == 0) added
Updated by Nathan Cutler about 4 years ago
- Copied to Backport #45027: nautilus: mds/Mutation.h: 128: FAILED ceph_assert(num_auth_pins == 0) added
Updated by Nathan Cutler about 4 years ago
- Copied to Backport #45028: octopus: mds/Mutation.h: 128: FAILED ceph_assert(num_auth_pins == 0) added
Updated by Nathan Cutler almost 4 years ago
- Status changed from Pending Backport to Resolved
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".