Bug #48517
closedmds: "CDir.cc: 1530: FAILED ceph_assert(!is_complete())"
0%
Description
2020-12-09T04:21:21.274+0000 7fea84b9c700 10 mds.0.openfiles prefetch_inodes 2020-12-09T04:21:21.274+0000 7fea84b9c700 10 mds.0.openfiles _prefetch_inodes state 1 2020-12-09T04:21:21.274+0000 7fea84b9c700 10 mds.0.openfiles _prefetch_dirfrags 2020-12-09T04:21:21.274+0000 7fea84b9c700 10 mds.0.cache.dir(0x1) fetch on [dir 0x1 / [2,head] auth v=488 cv=0/0 dir_auth=0 state=1610874880 f(v0 m2020-12-09T04:13:41.581699+0000 1=0+1) n(v33 rc2020-12-09T04:20:35.914083+0000 b51685884 8790=7766+1024) hs=1+0,ss=0+0 dirty=1 | child=1 subtree=1 dirty=1 0x556e1f0f1180] 2020-12-09T04:21:21.274+0000 7fea84b9c700 10 mds.0.cache.dir(0x1) auth_pin by 0x556e1f0f1180 on [dir 0x1 / [2,head] auth v=488 cv=0/0 dir_auth=0 ap=1+0 state=1610874880 f(v0 m2020-12-09T04:13:41.581699+0000 1=0+1) n(v33 rc2020-12-09T04:20:35.914083+0000 b51685884 8790=7766+1024) hs=1+0,ss=0+0 dirty=1 | child=1 subtree=1 dirty=1 waiter=1 authpin=1 0x556e1f0f1180] count now 1 2020-12-09T04:21:21.274+0000 7fea84b9c700 1 -- [v2:172.21.15.73:6834/1493964564,v1:172.21.15.73:6835/1493964564] --> [v2:172.21.15.73:6824/16384,v1:172.21.15.73:6825/16384] -- osd_op(unknown.0.63:50 3.7 3:ff5b34d6:::1.00000000:head [omap-get-header,omap-get-vals in=16b,getxattr parent in=6b] snapc 0=[] ondisk+read+known_if_redirected+full_force e36) v8 -- 0x556e236b43c0 con 0x556e1f0f8000 2020-12-09T04:21:21.274+0000 7fea84b9c700 10 mds.0.cache.dir(0x1000000171d.0*) fetch on [dir 0x1000000171d.0* ~mds0/stray7/1000000171d/ [2,head] auth v=851 cv=0/0 state=1610612736 f()/f(v0 m2020-12-09T04:20:31.086183+0000) n(v5)/n(v5 rc2020-12-09T04:20:31.086183+0000) hs=0+244,ss=0+0 dirty=244 | child=1 dirty=1 0x556e241c9b00] 2020-12-09T04:21:21.274+0000 7fea84b9c700 7 mds.0.cache.dir(0x1000000171d.0*) fetch dirfrag for unlinked directory, mark complete 2020-12-09T04:21:21.274+0000 7fea84b9c700 10 mds.0.cache.dir(0x1000000171d.0*) fetch on [dir 0x1000000171d.0* ~mds0/stray7/1000000171d/ [2,head] auth v=851 cv=0/0 state=1610612737|complete f()/f(v0 m2020-12-09T04:20:31.086183+0000) n(v5)/n(v5 rc2020-12-09T04:20:31.086183+0000) hs=0+244,ss=0+0 dirty=244 | child=1 dirty=1 0x556e241c9b00] 2020-12-09T04:21:21.274+0000 7fea87ba2700 1 -- [v2:172.21.15.73:6834/1493964564,v1:172.21.15.73:6835/1493964564] <== osd.3 v2:172.21.15.73:6824/16384 3 ==== osd_op_reply(50 1.00000000 [omap-get-header out=274b,omap-get-vals out=5b,getxattr] v0'0 uv1 ondisk = 0) v8 ==== 238+0+279 (crc 0 0 0) 0x556e22af7cc0 con 0x556e1f0f8000 2020-12-09T04:21:21.274+0000 7fea7eb90700 10 MDSIOContextBase::complete: 21C_IO_Dir_OMAP_Fetched 2020-12-09T04:21:21.278+0000 7fea84b9c700 -1 /build/ceph-16.0.0-7967-ge4d2d676/src/mds/CDir.cc: In function 'void CDir::fetch(MDSContext*, std::string_view, bool)' thread 7fea84b9c700 time 2020-12-09T04:21:21.280738+0000 /build/ceph-16.0.0-7967-ge4d2d676/src/mds/CDir.cc: 1530: FAILED ceph_assert(!is_complete()) ceph version 16.0.0-7967-ge4d2d676 (e4d2d67692ca960325f1d91d5ef9332c7cb9f37d) pacific (dev) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x14b) [0x7fea8cea1e2d] 2: (ceph::__ceph_assertf_fail(char const*, char const*, int, char const*, char const*, ...)+0) [0x7fea8cea2008] 3: (CDir::fetch(MDSContext*, std::basic_string_view<char, std::char_traits<char> >, bool)+0xc29) [0x556e1ddae5b9] 4: (CDir::fetch(MDSContext*, bool)+0x3a) [0x556e1ddae6da] 5: (OpenFileTable::_prefetch_dirfrags()+0x4cc) [0x556e1de766dc] 6: (OpenFileTable::_open_ino_finish(inodeno_t, int)+0x246) [0x556e1de77506] 7: (OpenFileTable::_prefetch_inodes()+0x280) [0x556e1de75db0] 8: (OpenFileTable::prefetch_inodes()+0xf8) [0x556e1de77728] 9: (MDCache::process_imported_caps()+0x184) [0x556e1dcb2634] 10: (MDCache::rejoin_start(MDSContext*)+0x82) [0x556e1dcb40b2] 11: (MDSRank::rejoin_start()+0x162) [0x556e1db5aba2] ...
From: /ceph/teuthology-archive/pdonnell-2020-12-09_02:22:27-fs-wip-pdonnell-testing-20201207.220433-distro-basic-smithi/5694731/remote/smithi073/log/ceph-mds.a.log.gz
It seems we're doing two fetches on the same dirfrag. It might be we need to make this a std::set:
however I'm not yet certain it's not a bug that we're adding the same dirfrag to the vector twice.
Updated by Patrick Donnelly over 3 years ago
- Status changed from New to In Progress
- Assignee set to Patrick Donnelly
- Pull request ID set to 38512
Updated by Xiubo Li over 3 years ago
- Assignee changed from Patrick Donnelly to Xiubo Li
As we talked last week, I will take this and will figure out why the same CDir was fetched twice.
Thanks.
Updated by Xiubo Li over 3 years ago
The MDS is doing the merge:
10100002:2020-12-09T04:21:19.478+0000 7fea7d38d700 10 mds.0.cache.dir(0x1000000171d.0*) merge [0x556e23daa000,0x556e23daa480,0x556e23daa900,0x556e23daad80] 10100003:2020-12-09T04:21:19.478+0000 7fea7d38d700 10 mds.0.cache.dir(0x1000000171d.0*) setting dir_auth=-1,-2 from -1,-2 on [dir 0x1000000171d.0* /client.0/tmp/fsstress/ltp-full-20091231/testcases/open_posix_testsuite/conformance/interfaces/sigaction/ [2,head] auth v=0 cv=0/0 state=1073741824 f() n() hs=0+0,ss=0+0 0x556e241c9b00] 10100004:2020-12-09T04:21:19.478+0000 7fea7d38d700 15 mds.0.cache.ino(0x1000000171d) maybe_export_pin update=0 [inode 0x1000000171d [...2,head] /client.0/tmp/fsstress/ltp-full-20091231/testcases/open_posix_testsuite/conformance/interfaces/sigaction/ auth fragtree_t(*^1 1*^2) v8892 DIRTYPARENT f(v0 m2020-12-09T04:20:35.050101+0000 24=23+1) n(v5 rc2020-12-09T04:20:35.050101+0000 b110802 45=43+2) (inest sync dirty) (iversion lock) | dirtyscattered=1 dirfrag=8 dirtyparent=1 dirty=1 0x556e23d2f700] 10100005:2020-12-09T04:21:19.478+0000 7fea7d38d700 10 mds.0.cache.dir(0x1000000171d.0*) subfrag 000* [dir 0x1000000171d.000* /client.0/tmp/fsstress/ltp-full-20091231/testcases/open_posix_testsuite/conformance/interfaces/sigaction/ [2,head] auth v=705 cv=0/0 state=1610612737|complete f(v0 m2020-12-09T04:20:30.058203+0000) n(v5 rc2020-12-09T04:20:30.058203+0000) hs=0+55,ss=0+0 dirty=55 | child=1 dirty=1 0x556e23daa000] ... 10100062:2020-12-09T04:21:19.478+0000 7fea7d38d700 14 mds.0.cache.ino(0x1000000171d) close_dirfrag 000* 10100063:2020-12-09T04:21:19.478+0000 7fea7d38d700 12 mds.0.cache.dir(0x1000000171d.000*) remove_null_dentries [dir 0x1000000171d.000* /client.0/tmp/fsstress/ltp-full-20091231/testcases/open_posix_testsuite/conformance/interfaces/sigaction/ [2,head] auth v=705 cv=0/0 state=1073741825|complete f(v0 m2020-12-09T04:20:30.058203+0000) n(v5 rc2020-12-09T04:20:30.058203+0000) hs=0+0,ss=0+0 dirty=55 0x556e23daa000] 10100064:2020-12-09T04:21:19.478+0000 7fea7d38d700 10 mds.0.cache.dir(0x1000000171d.0*) subfrag 001* [dir 0x1000000171d.001* /client.0/tmp/fsstress/ltp-full-20091231/testcases/open_posix_testsuite/conformance/interfaces/sigaction/ [2,head] auth v=746 cv=0/0 state=1610612737|complete f(v0 m2020-12-09T04:20:30.190201+0000) n(v5 rc2020-12-09T04:20:30.190201+0000) hs=0+58,ss=0+0 dirty=58 | child=1 dirty=1 0x556e23daa480] .... 10100124:2020-12-09T04:21:19.478+0000 7fea7d38d700 14 mds.0.cache.ino(0x1000000171d) close_dirfrag 001* 10100125:2020-12-09T04:21:19.478+0000 7fea7d38d700 12 mds.0.cache.dir(0x1000000171d.001*) remove_null_dentries [dir 0x1000000171d.001* /client.0/tmp/fsstress/ltp-full-20091231/testcases/open_posix_testsuite/conformance/interfaces/sigaction/ [2,head] auth v=746 cv=0/0 state=1073741825|complete f(v0 m2020-12-09T04:20:30.190201+0000) n(v5 rc2020-12-09T04:20:30.190201+0000) hs=0+0,ss=0+0 dirty=58 0x556e23daa480] 10100126:2020-12-09T04:21:19.478+0000 7fea7d38d700 10 mds.0.cache.dir(0x1000000171d.0*) subfrag 010* [dir 0x1000000171d.010* /client.0/tmp/fsstress/ltp-full-20091231/testcases/open_posix_testsuite/conformance/interfaces/sigaction/ [2,head] auth v=743 cv=0/0 state=1610612737|complete f(v0 m2020-12-09T04:20:30.318198+0000) n(v5 rc2020-12-09T04:20:30.318198+0000) hs=0+56,ss=0+0 dirty=56 | child=1 dirty=1 0x556e23daa900] ... 10100184:2020-12-09T04:21:19.478+0000 7fea7d38d700 14 mds.0.cache.ino(0x1000000171d) close_dirfrag 010* 10100185:2020-12-09T04:21:19.478+0000 7fea7d38d700 12 mds.0.cache.dir(0x1000000171d.010*) remove_null_dentries [dir 0x1000000171d.010* /client.0/tmp/fsstress/ltp-full-20091231/testcases/open_posix_testsuite/conformance/interfaces/sigaction/ [2,head] auth v=743 cv=0/0 state=1073741825|complete f(v0 m2020-12-09T04:20:30.318198+0000) n(v5 rc2020-12-09T04:20:30.318198+0000) hs=0+0,ss=0+0 dirty=56 0x556e23daa900] 10100186:2020-12-09T04:21:19.478+0000 7fea7d38d700 10 mds.0.cache.dir(0x1000000171d.0*) subfrag 011* [dir 0x1000000171d.011* /client.0/tmp/fsstress/ltp-full-20091231/testcases/open_posix_testsuite/conformance/interfaces/sigaction/ [2,head] auth v=851 cv=0/0 state=1610612737|complete f(v0 m2020-12-09T04:20:31.086183+0000) n(v5 rc2020-12-09T04:20:31.086183+0000) hs=0+75,ss=0+0 dirty=75 | child=1 dirty=1 0x556e23daad80] ... 10100263:2020-12-09T04:21:19.478+0000 7fea7d38d700 14 mds.0.cache.ino(0x1000000171d) close_dirfrag 011* 10100264:2020-12-09T04:21:19.478+0000 7fea7d38d700 12 mds.0.cache.dir(0x1000000171d.011*) remove_null_dentries [dir 0x1000000171d.011* /client.0/tmp/fsstress/ltp-full-20091231/testcases/open_posix_testsuite/conformance/interfaces/sigaction/ [2,head] auth v=851 cv=0/0 state=1073741825|complete f(v0 m2020-12-09T04:20:31.086183+0000) n(v5 rc2020-12-09T04:20:31.086183+0000) hs=0+0,ss=0+0 dirty=75 0x556e23daad80] 10100266:2020-12-09T04:21:19.478+0000 7fea7d38d700 10 mds.0.journal EMetaBlob.replay dir 0x1000000171d.0* 10100267:2020-12-09T04:21:19.478+0000 7fea7d38d700 10 mds.0.journal EMetaBlob.replay updated dir [dir 0x1000000171d.0* /client.0/tmp/fsstress/ltp-full-20091231/testcases/open_posix_testsuite/conformance/interfaces/sigaction/ [2,head] auth v=851 cv=0/0 state=1610612736 f()/f(v0 m2020-12-09T04:20:31.086183+0000) n(v5)/n(v5 rc2020-12-09T04:20:31.086183+0000) hs=0+244,ss=0+0 dirty=244 | child=1 dirty=1 0x556e241c9b00]
After the subdirs of [0x1000000171d.000*, 0x1000000171d.001*, 0x1000000171d.010*, 0x1000000171d.011*] successfuly merged into [0x1000000171d.0*], they still remain in the loaded_anchor_map.
1067 for (auto& fg: anchor.frags) { 1068 CDir *dir = diri->get_dirfrag(fg); 1069 if (dir) { 1070 if (dir->is_auth() && !dir->is_complete()) 1071 fetch_queue.insert(dir); 1072 } else { 1073 frag_vec_t leaves; 1074 diri->dirfragtree.get_leaves_under(fg, leaves); 1075 if (leaves.empty()) 1076 leaves.push_back(diri->dirfragtree[fg.value()]); 1077 for (auto& leaf : leaves) { 1078 if (diri->is_auth()) { 1079 dir = diri->get_or_open_dirfrag(mdcache, leaf); 1080 } else { 1081 dir = diri->get_dirfrag(leaf); 1082 } 1083 if (dir && dir->is_auth() && !dir->is_complete()) 1084 fetch_queue.push_back(dir); 1085 } 1086 } 1087 }
Dues to the Line#1076 will always return and push the parent frag "0*" into the "leaves", when traversing the first 4 frags of [000*, 001*, 010*, 011*, 100*, 101*, 110*, 111*], which will lead to the (dir 0x1000000171d.0*) into fetch_queue vector 4 times.
Updated by Xiubo Li over 3 years ago
- Status changed from In Progress to Fix Under Review
- Pull request ID changed from 38512 to 38579
Updated by Patrick Donnelly over 3 years ago
- Status changed from Fix Under Review to Resolved