Project

General

Profile

Actions

Bug #48517

closed

mds: "CDir.cc: 1530: FAILED ceph_assert(!is_complete())"

Added by Patrick Donnelly over 3 years ago. Updated over 3 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
Correctness/Safety
Target version:
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
crash, qa-failure
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

2020-12-09T04:21:21.274+0000 7fea84b9c700 10 mds.0.openfiles prefetch_inodes
2020-12-09T04:21:21.274+0000 7fea84b9c700 10 mds.0.openfiles _prefetch_inodes state 1
2020-12-09T04:21:21.274+0000 7fea84b9c700 10 mds.0.openfiles _prefetch_dirfrags
2020-12-09T04:21:21.274+0000 7fea84b9c700 10 mds.0.cache.dir(0x1) fetch on [dir 0x1 / [2,head] auth v=488 cv=0/0 dir_auth=0 state=1610874880 f(v0 m2020-12-09T04:13:41.581699+0000 1=0+1) n(v33 rc2020-12-09T04:20:35.914083+0000 b51685884 8790=7766+1024) hs=1+0,ss=0+0 dirty=1 | child=1 subtree=1 dirty=1 0x556e1f0f1180]
2020-12-09T04:21:21.274+0000 7fea84b9c700 10 mds.0.cache.dir(0x1) auth_pin by 0x556e1f0f1180 on [dir 0x1 / [2,head] auth v=488 cv=0/0 dir_auth=0 ap=1+0 state=1610874880 f(v0 m2020-12-09T04:13:41.581699+0000 1=0+1) n(v33 rc2020-12-09T04:20:35.914083+0000 b51685884 8790=7766+1024) hs=1+0,ss=0+0 dirty=1 | child=1 subtree=1 dirty=1 waiter=1 authpin=1 0x556e1f0f1180] count now 1
2020-12-09T04:21:21.274+0000 7fea84b9c700  1 -- [v2:172.21.15.73:6834/1493964564,v1:172.21.15.73:6835/1493964564] --> [v2:172.21.15.73:6824/16384,v1:172.21.15.73:6825/16384] -- osd_op(unknown.0.63:50 3.7 3:ff5b34d6:::1.00000000:head [omap-get-header,omap-get-vals in=16b,getxattr parent in=6b] snapc 0=[] ondisk+read+known_if_redirected+full_force e36) v8 -- 0x556e236b43c0 con 0x556e1f0f8000
2020-12-09T04:21:21.274+0000 7fea84b9c700 10 mds.0.cache.dir(0x1000000171d.0*) fetch on [dir 0x1000000171d.0* ~mds0/stray7/1000000171d/ [2,head] auth v=851 cv=0/0 state=1610612736 f()/f(v0 m2020-12-09T04:20:31.086183+0000) n(v5)/n(v5 rc2020-12-09T04:20:31.086183+0000) hs=0+244,ss=0+0 dirty=244 | child=1 dirty=1 0x556e241c9b00]
2020-12-09T04:21:21.274+0000 7fea84b9c700  7 mds.0.cache.dir(0x1000000171d.0*) fetch dirfrag for unlinked directory, mark complete
2020-12-09T04:21:21.274+0000 7fea84b9c700 10 mds.0.cache.dir(0x1000000171d.0*) fetch on [dir 0x1000000171d.0* ~mds0/stray7/1000000171d/ [2,head] auth v=851 cv=0/0 state=1610612737|complete f()/f(v0 m2020-12-09T04:20:31.086183+0000) n(v5)/n(v5 rc2020-12-09T04:20:31.086183+0000) hs=0+244,ss=0+0 dirty=244 | child=1 dirty=1 0x556e241c9b00]
2020-12-09T04:21:21.274+0000 7fea87ba2700  1 -- [v2:172.21.15.73:6834/1493964564,v1:172.21.15.73:6835/1493964564] <== osd.3 v2:172.21.15.73:6824/16384 3 ==== osd_op_reply(50 1.00000000 [omap-get-header out=274b,omap-get-vals out=5b,getxattr] v0'0 uv1 ondisk = 0) v8 ==== 238+0+279 (crc 0 0 0) 0x556e22af7cc0 con 0x556e1f0f8000
2020-12-09T04:21:21.274+0000 7fea7eb90700 10 MDSIOContextBase::complete: 21C_IO_Dir_OMAP_Fetched
2020-12-09T04:21:21.278+0000 7fea84b9c700 -1 /build/ceph-16.0.0-7967-ge4d2d676/src/mds/CDir.cc: In function 'void CDir::fetch(MDSContext*, std::string_view, bool)' thread 7fea84b9c700 time 2020-12-09T04:21:21.280738+0000
/build/ceph-16.0.0-7967-ge4d2d676/src/mds/CDir.cc: 1530: FAILED ceph_assert(!is_complete())

 ceph version 16.0.0-7967-ge4d2d676 (e4d2d67692ca960325f1d91d5ef9332c7cb9f37d) pacific (dev)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x14b) [0x7fea8cea1e2d]
 2: (ceph::__ceph_assertf_fail(char const*, char const*, int, char const*, char const*, ...)+0) [0x7fea8cea2008]
 3: (CDir::fetch(MDSContext*, std::basic_string_view<char, std::char_traits<char> >, bool)+0xc29) [0x556e1ddae5b9]
 4: (CDir::fetch(MDSContext*, bool)+0x3a) [0x556e1ddae6da]
 5: (OpenFileTable::_prefetch_dirfrags()+0x4cc) [0x556e1de766dc]
 6: (OpenFileTable::_open_ino_finish(inodeno_t, int)+0x246) [0x556e1de77506]
 7: (OpenFileTable::_prefetch_inodes()+0x280) [0x556e1de75db0]
 8: (OpenFileTable::prefetch_inodes()+0xf8) [0x556e1de77728]
 9: (MDCache::process_imported_caps()+0x184) [0x556e1dcb2634]
 10: (MDCache::rejoin_start(MDSContext*)+0x82) [0x556e1dcb40b2]
 11: (MDSRank::rejoin_start()+0x162) [0x556e1db5aba2]
...

From: /ceph/teuthology-archive/pdonnell-2020-12-09_02:22:27-fs-wip-pdonnell-testing-20201207.220433-distro-basic-smithi/5694731/remote/smithi073/log/ceph-mds.a.log.gz

It seems we're doing two fetches on the same dirfrag. It might be we need to make this a std::set:

https://github.com/ceph/ceph/blob/1c6af5eb33f9dc172b3dd186cc0c567521fbfadd/src/mds/OpenFileTable.cc#L1056

however I'm not yet certain it's not a bug that we're adding the same dirfrag to the vector twice.

Actions

Also available in: Atom PDF