Actions
Bug #44380
closedmds: MDCache.cc: 2335: FAILED ceph_assert(!"unmatched rstat rbytes" == g_conf()->mds_verify_scatter)
% Done:
0%
Source:
Q/A
Tags:
Backport:
octopus,nautilus
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
2020-02-29T13:29:53.496+0000 7f67c0191700 10 MDSIOContextBase::complete: 13C_MDC_Recover 2020-02-29T13:29:53.496+0000 7f67c0191700 10 MDSContext::complete: 13C_MDC_Recover 2020-02-29T13:29:53.496+0000 7f67c0191700 10 mds.0 RecoveryQueue::_recovered _recovered r=0 size=2 mtime=2020-02-29T13:29:33.859441+0000 for [inode 0x10000000016 [head,head] /subdir/21 auth v356 ap=2 recovering s=2 n() (ifile excl->sync) (iversion lock) cr={5509=0-4194304@fffffffffffffffd} | ptrwaiter=0 request=0 lock=0 caps=0 dirtyrstat=1 dirty=0 authpin=1 0x55e6b547a800] 2020-02-29T13:29:53.496+0000 7f67c0191700 10 mds.0.locker check_inode_max_size new_ranges {} update_size 1 on [inode 0x10000000016 [head,head] /subdir/21 auth v356 ap=2 s=2 n() (ifile excl->sync) (iversion lock) cr={5509=0-4194304@fffffffffffffffd} | ptrwaiter=0 request=0 lock=0 caps=0 dirtyrstat=1 dirty=0 authpin=1 0x55e6b547a800] 2020-02-29T13:29:53.496+0000 7f67c0191700 15 mds.0.cache.ino(0x10000000016) project_inode 0x10000000016 2020-02-29T13:29:53.496+0000 7f67c0191700 10 mds.0.cache.dir(0x10000000000) pre_dirty 358 2020-02-29T13:29:53.496+0000 7f67c0191700 10 mds.0.cache.den(0x10000000000 21) pre_dirty [dentry #0x1/subdir/21 [head,head] auth (dversion lock) pv=358 v=356 ino=0x10000000016 state=1073741824 | lock=0 fragmenting=0 inodepin=1 dirty=0 authpin=0 0x55e6b5478480] 2020-02-29T13:29:53.496+0000 7f67c0191700 10 mds.0.cache.ino(0x10000000016) pre_dirty 358 (current v 356) 2020-02-29T13:29:53.496+0000 7f67c0191700 10 mds.0.locker check_inode_max_size client_ranges {5509=0-4194304@fffffffffffffffd} -> {} 2020-02-29T13:29:53.496+0000 7f67c0191700 10 mds.0.locker check_inode_max_size size 2 -> 2 2020-02-29T13:29:53.496+0000 7f67c0191700 10 mds.0.locker check_inode_max_size mtime 2020-02-29T13:29:33.000000+0000 -> 2020-02-29T13:29:33.859441+0000 2020-02-29T13:29:53.496+0000 7f67c0191700 10 mds.0.cache predirty_journal_parents linkunlink=0 primary_dn follows head [inode 0x10000000016 [head,head] /subdir/21 auth v356 pv358 ap=2 s=2 n()->n(v0 rc2020-02-29T13:29:33.859441+0000 b2)/n() (ifile excl->sync) (iversion lock) cr={5509=0-4194304@fffffffffffffffd} | ptrwaiter=0 request=0 lock=0 caps=0 dirtyrstat=1 dirty=0 authpin=1 0x55e6b547a800] 2020-02-29T13:29:53.496+0000 7f67c0191700 10 mds.0.cache.dir(0x10000000000) auth_pin by 0x55e6b54cd440 on [dir 0x10000000000 /subdir/ [2,head] auth v=357 cv=357/357 ap=1+2 state=1073741825|complete f(v0 100=100+0)/f(v0 m2020-02-29T13:29:28.159623+0000 100=100+0) n(v1 rc2020-02-29T13:29:28.160623+0000 b188 99=99+0) hs=100+0,ss=0+0 | child=1 frozen=0 dirty=0 authpin=1 0x55e6b54b2580] count now 1 2020-02-29T13:29:53.496+0000 7f67c0191700 10 mds.0.cache.dir(0x10000000000) project_fnode 0x55e6b45e4b10 2020-02-29T13:29:53.496+0000 7f67c0191700 10 mds.0.cache.dir(0x10000000000) pre_dirty 359 2020-02-29T13:29:53.496+0000 7f67c0191700 10 mds.0.cache taking wrlock on (inest lock) on [inode 0x10000000000 [...2,head] /subdir/ auth v410 f(v0 m2020-02-29T13:29:28.159623+0000 100=100+0) n(v1 rc2020-02-29T13:29:28.160623+0000 b190 101=100+1) (inest lock) (iversion lock) | dirtyscattered=0 request=0 lock=0 dirfrag=1 caps=0 dirtyrstat=0 dirty=1 waiter=0 authpin=0 0x55e6b5419800] 2020-02-29T13:29:53.496+0000 7f67c0191700 7 mds.0.locker wrlock_force on (inest lock) on [inode 0x10000000000 [...2,head] /subdir/ auth v410 f(v0 m2020-02-29T13:29:28.159623+0000 100=100+0) n(v1 rc2020-02-29T13:29:28.160623+0000 b190 101=100+1) (inest lock) (iversion lock) | dirtyscattered=0 request=0 lock=0 dirfrag=1 caps=0 dirtyrstat=0 dirty=1 waiter=0 authpin=0 0x55e6b5419800] 2020-02-29T13:29:53.496+0000 7f67c0191700 10 mds.0.cache.snaprealm(0x1 seq 1 0x55e6b5404500) have_past_parents_open [1,head] 2020-02-29T13:29:53.496+0000 7f67c0191700 10 mds.0.cache projected_rstat_inode_to_frag first head linkunlink 0 [inode 0x10000000016 [head,head] /subdir/21 auth v356 pv358 ap=2 s=2 n()->n(v0 rc2020-02-29T13:29:33.859441+0000 b2)/n() (ifile excl->sync) (iversion lock) cr={5509=0-4194304@fffffffffffffffd} | ptrwaiter=0 request=0 lock=0 caps=0 dirtyrstat=1 dirty=0 authpin=1 0x55e6b547a800] 2020-02-29T13:29:53.496+0000 7f67c0191700 20 mds.0.cache frag head is [2,head] 2020-02-29T13:29:53.496+0000 7f67c0191700 20 mds.0.cache inode update is [head,head] 2020-02-29T13:29:53.496+0000 7f67c0191700 20 mds.0.cache floor of head from parent dn [dentry #0x1/subdir/21 [head,head] auth (dversion lock) pv=358 v=356 ino=0x10000000016 state=1073741824 | lock=0 fragmenting=0 inodepin=1 dirty=0 authpin=0 0x55e6b5478480] 2020-02-29T13:29:53.496+0000 7f67c0191700 10 mds.0.cache.snaprealm(0x1 seq 1 0x55e6b5404500) have_past_parents_open [1,head] 2020-02-29T13:29:53.496+0000 7f67c0191700 10 mds.0.cache.snaprealm(0x1 seq 1 0x55e6b5404500) get_snaps (seq 1 cached_seq 1) 2020-02-29T13:29:53.496+0000 7f67c0191700 10 mds.0.cache _project_rstat_inode_to_frag [head,head] 2020-02-29T13:29:53.496+0000 7f67c0191700 20 mds.0.cache inode rstat n(v0 rc2020-02-29T13:29:33.859441+0000 b2) 2020-02-29T13:29:53.496+0000 7f67c0191700 20 mds.0.cache inode accounted_rstat n() 2020-02-29T13:29:53.496+0000 7f67c0191700 20 mds.0.cache delta n(v0 rc2020-02-29T13:29:33.859441+0000 b2) 2020-02-29T13:29:53.496+0000 7f67c0191700 20 mds.0.cache projecting to head [2,head] n(v1 rc2020-02-29T13:29:28.160623+0000 b188 99=99+0) 2020-02-29T13:29:53.496+0000 7f67c0191700 20 mds.0.cache project to [2,head] n(v1 rc2020-02-29T13:29:28.160623+0000 b188 99=99+0) 2020-02-29T13:29:53.496+0000 7f67c91a3700 1 -- [v2:172.21.15.2:6832/2828792259,v1:172.21.15.2:6833/2828792259] >> v1:172.21.15.40:0/1109131888 conn(0x55e6b53bc000 legacy=0x55e6b541b000 unknown :6833 s=STATE_CONNECTION_ESTABLISHED l=1).read_bulk peer close file descriptor 38 2020-02-29T13:29:53.496+0000 7f67c91a3700 1 -- [v2:172.21.15.2:6832/2828792259,v1:172.21.15.2:6833/2828792259] >> v1:172.21.15.40:0/1109131888 conn(0x55e6b53bc000 legacy=0x55e6b541b000 unknown :6833 s=STATE_CONNECTION_ESTABLISHED l=1).read_until read failed 2020-02-29T13:29:53.496+0000 7f67c0191700 20 mds.0.cache result [2,head] n(v1 rc2020-02-29T13:29:33.859441+0000 b190 99=99+0) [dir 0x10000000000 /subdir/ [2,head] auth pv=359 v=357 cv=357/357 ap=1+2 state=1073741825|complete f(v0 100=100+0)/f(v0 m2020-02-29T13:29:28.159623+0000 100=100+0)->f(v0 100=100+0)/f(v0 m2020-02-29T13:29:28.159623+0000 100=100+0) n(v1 rc2020-02-29T13:29:28.160623+0000 b188 99=99+0)->n(v1 rc2020-02-29T13:29:33.859441+0000 b190 99=99+0)/n(v1 rc2020-02-29T13:29:28.160623+0000 b188 99=99+0) hs=100+0,ss=0+0 | child=1 frozen=0 dirty=0 authpin=1 0x55e6b54b2580] 2020-02-29T13:29:53.496+0000 7f67c91a3700 1 --1- [v2:172.21.15.2:6832/2828792259,v1:172.21.15.2:6833/2828792259] >> v1:172.21.15.40:0/1109131888 conn(0x55e6b53bc000 0x55e6b541b000 :6833 s=OPENED pgs=2 cs=1 l=1).handle_message read tag failed 2020-02-29T13:29:53.496+0000 7f67c0191700 10 mds.0.cache.ino(0x10000000016) clear_dirty_rstat 2020-02-29T13:29:53.496+0000 7f67c91a3700 1 --1- [v2:172.21.15.2:6832/2828792259,v1:172.21.15.2:6833/2828792259] >> v1:172.21.15.40:0/1109131888 conn(0x55e6b53bc000 0x55e6b541b000 :6833 s=OPENED pgs=2 cs=1 l=1).fault on lossy channel, failing 2020-02-29T13:29:53.496+0000 7f67c0191700 7 mds.0.locker local_wrlock_grab on (iversion lock) on [inode 0x10000000000 [...2,head] /subdir/ auth v410 f(v0 m2020-02-29T13:29:28.159623+0000 100=100+0) n(v1 rc2020-02-29T13:29:28.160623+0000 b190 101=100+1) (inest lock w=1) (iversion lock) | dirtyscattered=0 request=0 lock=1 dirfrag=1 caps=0 dirtyrstat=0 dirty=1 waiter=0 authpin=0 0x55e6b5419800] 2020-02-29T13:29:53.496+0000 7f67c0191700 10 mds.0.cache.ino(0x10000000000) auth_pin by 0x55e6b54cd440 on [inode 0x10000000000 [...2,head] /subdir/ auth v410 ap=1 f(v0 m2020-02-29T13:29:28.159623+0000 100=100+0) n(v1 rc2020-02-29T13:29:28.160623+0000 b190 101=100+1) (inest lock w=1) (iversion lock w=1) | dirtyscattered=0 request=0 lock=2 dirfrag=1 caps=0 dirtyrstat=0 dirty=1 waiter=0 authpin=1 0x55e6b5419800] now 1 2020-02-29T13:29:53.496+0000 7f67c0191700 15 mds.0.cache.dir(0x1) adjust_nested_auth_pins 1 on [dir 0x1 / [2,head] auth v=411 cv=409/409 dir_auth=0 ap=0+1 state=1610612737|complete f(v0 m2020-02-29T13:29:28.065624+0000 1=0+1) n(v2 rc2020-02-29T13:29:28.160623+0000 b190 101=100+1) hs=1+0,ss=0+0 dirty=1 | child=1 subtree=1 subtreetemp=0 dirty=1 waiter=0 authpin=0 0x55e6b529f600] by 0x55e6b5419800 count now 0/1 2020-02-29T13:29:53.496+0000 7f67c0191700 10 mds.0.cache.snaprealm(0x3 seq 1 0x55e6b5404280) have_past_parents_open [1,head] 2020-02-29T13:29:53.496+0000 7f67c0191700 15 mds.0.cache.ino(0x10000000000) project_inode 0x10000000000 2020-02-29T13:29:53.496+0000 7f67c0191700 10 mds.0.cache.dir(0x1) pre_dirty 412 2020-02-29T13:29:53.496+0000 7f67c0191700 10 mds.0.cache.den(0x1 subdir) pre_dirty [dentry #0x1/subdir [2,head] auth (dversion lock) pv=412 v=410 ino=0x10000000000 state=1610612736 | request=0 lock=0 inodepin=1 dirty=1 authpin=0 0x55e6b5416fc0] 2020-02-29T13:29:53.496+0000 7f67c0191700 10 mds.0.cache.ino(0x10000000000) pre_dirty 412 (current v 410) 2020-02-29T13:29:53.496+0000 7f67c0191700 10 mds.0.cache predirty_journal_parents frag->inode on [dir 0x10000000000 /subdir/ [2,head] auth pv=359 v=357 cv=357/357 ap=1+2 state=1073741825|complete f(v0 100=100+0)/f(v0 m2020-02-29T13:29:28.159623+0000 100=100+0)->f(v0 100=100+0)/f(v0 m2020-02-29T13:29:28.159623+0000 100=100+0) n(v1 rc2020-02-29T13:29:28.160623+0000 b188 99=99+0)->n(v1 rc2020-02-29T13:29:33.859441+0000 b190 99=99+0)/n(v1 rc2020-02-29T13:29:28.160623+0000 b188 99=99+0) hs=100+0,ss=0+0 | child=1 frozen=0 dirty=0 authpin=1 0x55e6b54b2580] 2020-02-29T13:29:53.496+0000 7f67c0191700 10 mds.0.cache project_rstat_frag_to_inode [2,head] 2020-02-29T13:29:53.496+0000 7f67c0191700 20 mds.0.cache frag rstat n(v1 rc2020-02-29T13:29:33.859441+0000 b190 99=99+0) 2020-02-29T13:29:53.496+0000 7f67c0191700 20 mds.0.cache frag accounted_rstat n(v1 rc2020-02-29T13:29:28.160623+0000 b188 99=99+0) 2020-02-29T13:29:53.496+0000 7f67c0191700 20 mds.0.cache delta n(v1 rc2020-02-29T13:29:33.859441+0000 b2) 2020-02-29T13:29:53.496+0000 7f67c0191700 20 mds.0.cache projecting to [2,head] n(v1 rc2020-02-29T13:29:28.160623+0000 b190 101=100+1) 2020-02-29T13:29:53.496+0000 7f67c0191700 20 mds.0.cache result [2,head] n(v1 rc2020-02-29T13:29:33.859441+0000 b192 101=100+1) 2020-02-29T13:29:53.496+0000 7f67c0191700 -1 log_channel(cluster) log [ERR] : unmatched rstat rbytes on single dirfrag 0x10000000000, inode has n(v1 rc2020-02-29T13:29:33.859441+0000 b192 101=100+1), dirfrag has n(v1 rc2020-02-29T13:29:33.859441+0000 b190 99=99+0) 2020-02-29T13:29:53.497+0000 7f67c0191700 -1 /home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/huge/release/15.1.0-1435-g8e904aa/rpm/el8/BUILD/ceph-15.1.0-1435-g8e904aa/src/mds/MDCache.cc: In function 'void MDCache::predirty_journal_parents(MutationRef, EMetaBlob*, CInode*, CDir*, int, int, snapid_t)' thread 7f67c0191700 time 2020-02-29T13:29:53.497532+0000 /home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/huge/release/15.1.0-1435-g8e904aa/rpm/el8/BUILD/ceph-15.1.0-1435-g8e904aa/src/mds/MDCache.cc: 2335: FAILED ceph_assert(!"unmatched rstat rbytes" == g_conf()->mds_verify_scatter) ceph version 15.1.0-1435-g8e904aa (8e904aaab37a2615eec03fd9c9f166a80774d91e) octopus (rc) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x158) [0x7f67ce2e30f0] 2: (()+0x27930a) [0x7f67ce2e330a] 3: (MDCache::predirty_journal_parents(boost::intrusive_ptr<MutationImpl>, EMetaBlob*, CInode*, CDir*, int, int, snapid_t)+0x232a) [0x55e6b32d923a] 4: (Locker::check_inode_max_size(CInode*, bool, unsigned long, unsigned long, utime_t)+0x414) [0x55e6b33a9ab4] 5: (RecoveryQueue::_recovered(CInode*, int, unsigned long, utime_t)+0x8bd) [0x55e6b33819dd] 6: (MDSContext::complete(int)+0x56) [0x55e6b34c92d6] 7: (MDSIOContextBase::complete(int)+0xa3) [0x55e6b34c94e3] 8: (Filer::C_Probe::finish(int)+0xb5) [0x55e6b3565e75] 9: (Context::complete(int)+0xd) [0x55e6b31f9f5d] 10: (Finisher::finisher_thread_entry()+0x1a5) [0x7f67ce373a05] 11: (()+0x82de) [0x7f67cd10f2de] 12: (clone()+0x43) [0x7f67cbca2133]
From: /ceph/teuthology-archive/pdonnell-2020-02-29_02:56:38-kcephfs-wip-pdonnell-testing-20200229.001503-distro-basic-smithi/4811126/remote/smithi002/log/ceph-mds.a.log.gz
Only PR in this batch related to this code path is: https://github.com/ceph/ceph/pull/33538
I don't see how #33538 would cause this.
The last run of kcephfs on Feb 16th did not have this issue: http://pulpito.ceph.com/pdonnell-2020-02-16_17:35:17-kcephfs-wip-pdonnell-testing-20200215.033325-distro-basic-smithi/
Actions