Project

General

Profile

Bug #44380

mds: MDCache.cc: 2335: FAILED ceph_assert(!"unmatched rstat rbytes" == g_conf()->mds_verify_scatter)

Added by Patrick Donnelly 28 days ago. Updated 6 days ago.

Status:
New
Priority:
Urgent
Assignee:
Category:
-
Target version:
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
Pull request ID:
Crash signature:

Description

2020-02-29T13:29:53.496+0000 7f67c0191700 10 MDSIOContextBase::complete: 13C_MDC_Recover
2020-02-29T13:29:53.496+0000 7f67c0191700 10 MDSContext::complete: 13C_MDC_Recover
2020-02-29T13:29:53.496+0000 7f67c0191700 10 mds.0 RecoveryQueue::_recovered _recovered r=0 size=2 mtime=2020-02-29T13:29:33.859441+0000 for [inode 0x10000000016 [head,head] /subdir/21 auth v356 ap=2 recovering s=2 n() (ifile excl->sync) (iversion lock) cr={5509=0-4194304@fffffffffffffffd} | ptrwaiter=0 request=0 lock=0 caps=0 dirtyrstat=1 dirty=0 authpin=1 0x55e6b547a800]
2020-02-29T13:29:53.496+0000 7f67c0191700 10 mds.0.locker check_inode_max_size new_ranges {} update_size 1 on [inode 0x10000000016 [head,head] /subdir/21 auth v356 ap=2 s=2 n() (ifile excl->sync) (iversion lock) cr={5509=0-4194304@fffffffffffffffd} | ptrwaiter=0 request=0 lock=0 caps=0 dirtyrstat=1 dirty=0 authpin=1 0x55e6b547a800]
2020-02-29T13:29:53.496+0000 7f67c0191700 15 mds.0.cache.ino(0x10000000016) project_inode 0x10000000016
2020-02-29T13:29:53.496+0000 7f67c0191700 10 mds.0.cache.dir(0x10000000000) pre_dirty 358
2020-02-29T13:29:53.496+0000 7f67c0191700 10 mds.0.cache.den(0x10000000000 21) pre_dirty [dentry #0x1/subdir/21 [head,head] auth (dversion lock) pv=358 v=356 ino=0x10000000016 state=1073741824 | lock=0 fragmenting=0 inodepin=1 dirty=0 authpin=0 0x55e6b5478480]
2020-02-29T13:29:53.496+0000 7f67c0191700 10 mds.0.cache.ino(0x10000000016) pre_dirty 358 (current v 356)
2020-02-29T13:29:53.496+0000 7f67c0191700 10 mds.0.locker check_inode_max_size client_ranges {5509=0-4194304@fffffffffffffffd} -> {}
2020-02-29T13:29:53.496+0000 7f67c0191700 10 mds.0.locker check_inode_max_size size 2 -> 2
2020-02-29T13:29:53.496+0000 7f67c0191700 10 mds.0.locker check_inode_max_size mtime 2020-02-29T13:29:33.000000+0000 -> 2020-02-29T13:29:33.859441+0000
2020-02-29T13:29:53.496+0000 7f67c0191700 10 mds.0.cache predirty_journal_parents linkunlink=0 primary_dn follows head [inode 0x10000000016 [head,head] /subdir/21 auth v356 pv358 ap=2 s=2 n()->n(v0 rc2020-02-29T13:29:33.859441+0000 b2)/n() (ifile excl->sync) (iversion lock) cr={5509=0-4194304@fffffffffffffffd} | ptrwaiter=0 request=0 lock=0 caps=0 dirtyrstat=1 dirty=0 authpin=1 0x55e6b547a800]
2020-02-29T13:29:53.496+0000 7f67c0191700 10 mds.0.cache.dir(0x10000000000) auth_pin by 0x55e6b54cd440 on [dir 0x10000000000 /subdir/ [2,head] auth v=357 cv=357/357 ap=1+2 state=1073741825|complete f(v0 100=100+0)/f(v0 m2020-02-29T13:29:28.159623+0000 100=100+0) n(v1 rc2020-02-29T13:29:28.160623+0000 b188 99=99+0) hs=100+0,ss=0+0 | child=1 frozen=0 dirty=0 authpin=1 0x55e6b54b2580] count now 1
2020-02-29T13:29:53.496+0000 7f67c0191700 10 mds.0.cache.dir(0x10000000000) project_fnode 0x55e6b45e4b10
2020-02-29T13:29:53.496+0000 7f67c0191700 10 mds.0.cache.dir(0x10000000000) pre_dirty 359
2020-02-29T13:29:53.496+0000 7f67c0191700 10 mds.0.cache  taking wrlock on (inest lock) on [inode 0x10000000000 [...2,head] /subdir/ auth v410 f(v0 m2020-02-29T13:29:28.159623+0000 100=100+0) n(v1 rc2020-02-29T13:29:28.160623+0000 b190 101=100+1) (inest lock) (iversion lock) | dirtyscattered=0 request=0 lock=0 dirfrag=1 caps=0 dirtyrstat=0 dirty=1 waiter=0 authpin=0 0x55e6b5419800]
2020-02-29T13:29:53.496+0000 7f67c0191700  7 mds.0.locker wrlock_force  on (inest lock) on [inode 0x10000000000 [...2,head] /subdir/ auth v410 f(v0 m2020-02-29T13:29:28.159623+0000 100=100+0) n(v1 rc2020-02-29T13:29:28.160623+0000 b190 101=100+1) (inest lock) (iversion lock) | dirtyscattered=0 request=0 lock=0 dirfrag=1 caps=0 dirtyrstat=0 dirty=1 waiter=0 authpin=0 0x55e6b5419800]
2020-02-29T13:29:53.496+0000 7f67c0191700 10  mds.0.cache.snaprealm(0x1 seq 1 0x55e6b5404500) have_past_parents_open [1,head]
2020-02-29T13:29:53.496+0000 7f67c0191700 10 mds.0.cache projected_rstat_inode_to_frag first head linkunlink 0 [inode 0x10000000016 [head,head] /subdir/21 auth v356 pv358 ap=2 s=2 n()->n(v0 rc2020-02-29T13:29:33.859441+0000 b2)/n() (ifile excl->sync) (iversion lock) cr={5509=0-4194304@fffffffffffffffd} | ptrwaiter=0 request=0 lock=0 caps=0 dirtyrstat=1 dirty=0 authpin=1 0x55e6b547a800]
2020-02-29T13:29:53.496+0000 7f67c0191700 20 mds.0.cache     frag head is [2,head]
2020-02-29T13:29:53.496+0000 7f67c0191700 20 mds.0.cache  inode update is [head,head]
2020-02-29T13:29:53.496+0000 7f67c0191700 20 mds.0.cache  floor of head from parent dn [dentry #0x1/subdir/21 [head,head] auth (dversion lock) pv=358 v=356 ino=0x10000000016 state=1073741824 | lock=0 fragmenting=0 inodepin=1 dirty=0 authpin=0 0x55e6b5478480]
2020-02-29T13:29:53.496+0000 7f67c0191700 10  mds.0.cache.snaprealm(0x1 seq 1 0x55e6b5404500) have_past_parents_open [1,head]
2020-02-29T13:29:53.496+0000 7f67c0191700 10  mds.0.cache.snaprealm(0x1 seq 1 0x55e6b5404500) get_snaps  (seq 1 cached_seq 1)
2020-02-29T13:29:53.496+0000 7f67c0191700 10 mds.0.cache _project_rstat_inode_to_frag [head,head]
2020-02-29T13:29:53.496+0000 7f67c0191700 20 mds.0.cache   inode           rstat n(v0 rc2020-02-29T13:29:33.859441+0000 b2)
2020-02-29T13:29:53.496+0000 7f67c0191700 20 mds.0.cache   inode accounted_rstat n()
2020-02-29T13:29:53.496+0000 7f67c0191700 20 mds.0.cache                   delta n(v0 rc2020-02-29T13:29:33.859441+0000 b2)
2020-02-29T13:29:53.496+0000 7f67c0191700 20 mds.0.cache  projecting to head [2,head] n(v1 rc2020-02-29T13:29:28.160623+0000 b188 99=99+0)
2020-02-29T13:29:53.496+0000 7f67c0191700 20 mds.0.cache   project to [2,head] n(v1 rc2020-02-29T13:29:28.160623+0000 b188 99=99+0)
2020-02-29T13:29:53.496+0000 7f67c91a3700  1 -- [v2:172.21.15.2:6832/2828792259,v1:172.21.15.2:6833/2828792259] >> v1:172.21.15.40:0/1109131888 conn(0x55e6b53bc000 legacy=0x55e6b541b000 unknown :6833 s=STATE_CONNECTION_ESTABLISHED l=1).read_bulk peer close file descriptor 38
2020-02-29T13:29:53.496+0000 7f67c91a3700  1 -- [v2:172.21.15.2:6832/2828792259,v1:172.21.15.2:6833/2828792259] >> v1:172.21.15.40:0/1109131888 conn(0x55e6b53bc000 legacy=0x55e6b541b000 unknown :6833 s=STATE_CONNECTION_ESTABLISHED l=1).read_until read failed
2020-02-29T13:29:53.496+0000 7f67c0191700 20 mds.0.cache       result [2,head] n(v1 rc2020-02-29T13:29:33.859441+0000 b190 99=99+0) [dir 0x10000000000 /subdir/ [2,head] auth pv=359 v=357 cv=357/357 ap=1+2 state=1073741825|complete f(v0 100=100+0)/f(v0 m2020-02-29T13:29:28.159623+0000 100=100+0)->f(v0 100=100+0)/f(v0 m2020-02-29T13:29:28.159623+0000 100=100+0) n(v1 rc2020-02-29T13:29:28.160623+0000 b188 99=99+0)->n(v1 rc2020-02-29T13:29:33.859441+0000 b190 99=99+0)/n(v1 rc2020-02-29T13:29:28.160623+0000 b188 99=99+0) hs=100+0,ss=0+0 | child=1 frozen=0 dirty=0 authpin=1 0x55e6b54b2580]
2020-02-29T13:29:53.496+0000 7f67c91a3700  1 --1- [v2:172.21.15.2:6832/2828792259,v1:172.21.15.2:6833/2828792259] >> v1:172.21.15.40:0/1109131888 conn(0x55e6b53bc000 0x55e6b541b000 :6833 s=OPENED pgs=2 cs=1 l=1).handle_message read tag failed
2020-02-29T13:29:53.496+0000 7f67c0191700 10 mds.0.cache.ino(0x10000000016) clear_dirty_rstat
2020-02-29T13:29:53.496+0000 7f67c91a3700  1 --1- [v2:172.21.15.2:6832/2828792259,v1:172.21.15.2:6833/2828792259] >> v1:172.21.15.40:0/1109131888 conn(0x55e6b53bc000 0x55e6b541b000 :6833 s=OPENED pgs=2 cs=1 l=1).fault on lossy channel, failing
2020-02-29T13:29:53.496+0000 7f67c0191700  7 mds.0.locker local_wrlock_grab  on (iversion lock) on [inode 0x10000000000 [...2,head] /subdir/ auth v410 f(v0 m2020-02-29T13:29:28.159623+0000 100=100+0) n(v1 rc2020-02-29T13:29:28.160623+0000 b190 101=100+1) (inest lock w=1) (iversion lock) | dirtyscattered=0 request=0 lock=1 dirfrag=1 caps=0 dirtyrstat=0 dirty=1 waiter=0 authpin=0 0x55e6b5419800]
2020-02-29T13:29:53.496+0000 7f67c0191700 10 mds.0.cache.ino(0x10000000000) auth_pin by 0x55e6b54cd440 on [inode 0x10000000000 [...2,head] /subdir/ auth v410 ap=1 f(v0 m2020-02-29T13:29:28.159623+0000 100=100+0) n(v1 rc2020-02-29T13:29:28.160623+0000 b190 101=100+1) (inest lock w=1) (iversion lock w=1) | dirtyscattered=0 request=0 lock=2 dirfrag=1 caps=0 dirtyrstat=0 dirty=1 waiter=0 authpin=1 0x55e6b5419800] now 1
2020-02-29T13:29:53.496+0000 7f67c0191700 15 mds.0.cache.dir(0x1) adjust_nested_auth_pins 1 on [dir 0x1 / [2,head] auth v=411 cv=409/409 dir_auth=0 ap=0+1 state=1610612737|complete f(v0 m2020-02-29T13:29:28.065624+0000 1=0+1) n(v2 rc2020-02-29T13:29:28.160623+0000 b190 101=100+1) hs=1+0,ss=0+0 dirty=1 | child=1 subtree=1 subtreetemp=0 dirty=1 waiter=0 authpin=0 0x55e6b529f600] by 0x55e6b5419800 count now 0/1
2020-02-29T13:29:53.496+0000 7f67c0191700 10  mds.0.cache.snaprealm(0x3 seq 1 0x55e6b5404280) have_past_parents_open [1,head]
2020-02-29T13:29:53.496+0000 7f67c0191700 15 mds.0.cache.ino(0x10000000000) project_inode 0x10000000000
2020-02-29T13:29:53.496+0000 7f67c0191700 10 mds.0.cache.dir(0x1) pre_dirty 412
2020-02-29T13:29:53.496+0000 7f67c0191700 10 mds.0.cache.den(0x1 subdir) pre_dirty [dentry #0x1/subdir [2,head] auth (dversion lock) pv=412 v=410 ino=0x10000000000 state=1610612736 | request=0 lock=0 inodepin=1 dirty=1 authpin=0 0x55e6b5416fc0]
2020-02-29T13:29:53.496+0000 7f67c0191700 10 mds.0.cache.ino(0x10000000000) pre_dirty 412 (current v 410)
2020-02-29T13:29:53.496+0000 7f67c0191700 10 mds.0.cache predirty_journal_parents frag->inode on [dir 0x10000000000 /subdir/ [2,head] auth pv=359 v=357 cv=357/357 ap=1+2 state=1073741825|complete f(v0 100=100+0)/f(v0 m2020-02-29T13:29:28.159623+0000 100=100+0)->f(v0 100=100+0)/f(v0 m2020-02-29T13:29:28.159623+0000 100=100+0) n(v1 rc2020-02-29T13:29:28.160623+0000 b188 99=99+0)->n(v1 rc2020-02-29T13:29:33.859441+0000 b190 99=99+0)/n(v1 rc2020-02-29T13:29:28.160623+0000 b188 99=99+0) hs=100+0,ss=0+0 | child=1 frozen=0 dirty=0 authpin=1 0x55e6b54b2580]
2020-02-29T13:29:53.496+0000 7f67c0191700 10 mds.0.cache project_rstat_frag_to_inode [2,head]
2020-02-29T13:29:53.496+0000 7f67c0191700 20 mds.0.cache   frag           rstat n(v1 rc2020-02-29T13:29:33.859441+0000 b190 99=99+0)
2020-02-29T13:29:53.496+0000 7f67c0191700 20 mds.0.cache   frag accounted_rstat n(v1 rc2020-02-29T13:29:28.160623+0000 b188 99=99+0)
2020-02-29T13:29:53.496+0000 7f67c0191700 20 mds.0.cache                  delta n(v1 rc2020-02-29T13:29:33.859441+0000 b2)
2020-02-29T13:29:53.496+0000 7f67c0191700 20 mds.0.cache  projecting to [2,head] n(v1 rc2020-02-29T13:29:28.160623+0000 b190 101=100+1)
2020-02-29T13:29:53.496+0000 7f67c0191700 20 mds.0.cache         result [2,head] n(v1 rc2020-02-29T13:29:33.859441+0000 b192 101=100+1)
2020-02-29T13:29:53.496+0000 7f67c0191700 -1 log_channel(cluster) log [ERR] : unmatched rstat rbytes on single dirfrag 0x10000000000, inode has n(v1 rc2020-02-29T13:29:33.859441+0000 b192 101=100+1), dirfrag has n(v1 rc2020-02-29T13:29:33.859441+0000 b190 99=99+0)
2020-02-29T13:29:53.497+0000 7f67c0191700 -1 /home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/huge/release/15.1.0-1435-g8e904aa/rpm/el8/BUILD/ceph-15.1.0-1435-g8e904aa/src/mds/MDCache.cc: In function 'void MDCache::predirty_journal_parents(MutationRef, EMetaBlob*, CInode*, CDir*, int, int, snapid_t)' thread 7f67c0191700 time 2020-02-29T13:29:53.497532+0000
/home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/huge/release/15.1.0-1435-g8e904aa/rpm/el8/BUILD/ceph-15.1.0-1435-g8e904aa/src/mds/MDCache.cc: 2335: FAILED ceph_assert(!"unmatched rstat rbytes" == g_conf()->mds_verify_scatter)

 ceph version 15.1.0-1435-g8e904aa (8e904aaab37a2615eec03fd9c9f166a80774d91e) octopus (rc)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x158) [0x7f67ce2e30f0]
 2: (()+0x27930a) [0x7f67ce2e330a]
 3: (MDCache::predirty_journal_parents(boost::intrusive_ptr<MutationImpl>, EMetaBlob*, CInode*, CDir*, int, int, snapid_t)+0x232a) [0x55e6b32d923a]
 4: (Locker::check_inode_max_size(CInode*, bool, unsigned long, unsigned long, utime_t)+0x414) [0x55e6b33a9ab4]
 5: (RecoveryQueue::_recovered(CInode*, int, unsigned long, utime_t)+0x8bd) [0x55e6b33819dd]
 6: (MDSContext::complete(int)+0x56) [0x55e6b34c92d6]
 7: (MDSIOContextBase::complete(int)+0xa3) [0x55e6b34c94e3]
 8: (Filer::C_Probe::finish(int)+0xb5) [0x55e6b3565e75]
 9: (Context::complete(int)+0xd) [0x55e6b31f9f5d]
 10: (Finisher::finisher_thread_entry()+0x1a5) [0x7f67ce373a05]
 11: (()+0x82de) [0x7f67cd10f2de]
 12: (clone()+0x43) [0x7f67cbca2133]

From: /ceph/teuthology-archive/pdonnell-2020-02-29_02:56:38-kcephfs-wip-pdonnell-testing-20200229.001503-distro-basic-smithi/4811126/remote/smithi002/log/ceph-mds.a.log.gz

Only PR in this batch related to this code path is: https://github.com/ceph/ceph/pull/33538

I don't see how #33538 would cause this.

The last run of kcephfs on Feb 16th did not have this issue: http://pulpito.ceph.com/pdonnell-2020-02-16_17:35:17-kcephfs-wip-pdonnell-testing-20200215.033325-distro-basic-smithi/

History

#1 Updated by Xiubo Li 6 days ago

  • Assignee set to Xiubo Li

Also available in: Atom PDF