Project

General

Profile

Bug #50246

mds: failure replaying journal (EMetaBlob)

Added by Patrick Donnelly 13 days ago. Updated about 4 hours ago.

Status:
Fix Under Review
Priority:
High
Assignee:
Category:
-
Target version:
% Done:

0%

Source:
Q/A
Tags:
Backport:
pacific
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
qa, qa-failure
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

2021-04-08T16:44:08.043+0000 7f134a9e1700 10 mds.0.journal EMetaBlob.replay request client.4669:223057 trim_to 222902
2021-04-08T16:44:08.043+0000 7f134a9e1700 10 mds.0.log _replay 393514883~1056 / 394308203 2021-04-08T16:44:07.441097+0000: EOpen [metablob 0x10000000a35, 1 dirs], 1 open files
2021-04-08T16:44:08.043+0000 7f134a9e1700 10 mds.0.journal EOpen.replay
2021-04-08T16:44:08.043+0000 7f134a9e1700 10 mds.0.journal EMetaBlob.replay 1 dirlumps by unknown.0
2021-04-08T16:44:08.043+0000 7f134a9e1700 10 mds.0.journal EMetaBlob.replay dir 0x10000000a35
2021-04-08T16:44:08.043+0000 7f134a9e1700  0 mds.0.journal EMetaBlob.replay missing dir ino  0x10000000a35
2021-04-08T16:44:08.043+0000 7f134a9e1700 -1 log_channel(cluster) log [ERR] : failure replaying journal (EMetaBlob)
2021-04-08T16:44:08.043+0000 7f134a9e1700  5 mds.beacon.a set_want_state: up:standby-replay -> down:damaged

From: /ceph/teuthology-archive/pdonnell-2021-04-08_16:05:30-fs-wip-pdonnell-testing-20210408.142238-distro-basic-smithi/6028884/remote/smithi031/log/ceph-mds.a.log.gz

Might be because the standby-replay daemons are regularly trimming the journal now with [1].

[1] https://github.com/ceph/ceph/pull/40486

History

#1 Updated by Patrick Donnelly 9 days ago

  • Assignee set to Xiubo Li

#2 Updated by Xiubo Li 2 days ago

  • Status changed from New to In Progress

#3 Updated by Xiubo Li 1 day ago

Before replaying the journal, the inode 0x10000000a35 was removed from the inode_map in upkeep thread:

 -7139> 2021-04-08T16:44:06.006+0000 7f134e1e8700 20 mds.0.cache upkeep thread trimming cache; last trim 0.999983788s ago
 -7138> 2021-04-08T16:44:06.006+0000 7f134e1e8700  7 mds.0.cache trim bytes_used=15MB limit=4GB reservation=0.05% count=0
 -7137> 2021-04-08T16:44:06.006+0000 7f134e1e8700  7 mds.0.cache trim_lru trimming 0 items from LRU size=625 mid=161 pintail=12 pinned=395

[...]

 -5757> 2021-04-08T16:44:06.011+0000 7f134e1e8700 12 mds.0.cache.dir(0x10000000a35) remove_dentry [dentry #0x1/client.0/tmp/t/linux-5.4/Documentation/devicetree/bindings/hwmon/ads7828.txt [2,head] auth NULL (dversion lock) v=92 ino=(nil) state=1073741824 0x558ad621e780]
 -5756> 2021-04-08T16:44:06.011+0000 7f134e1e8700 12 mds.0.cache trim_dentry [dentry #0x1/client.0/tmp/t/linux-5.4/Documentation/devicetree/bindings/hwmon [2,head] auth (dversion lock) v=11722 ino=0x10000000a35 state=1073741824 0x558ad3d2af00]
 -5755> 2021-04-08T16:44:06.011+0000 7f134e1e8700 12 mds.0.cache  in container [dir 0x100000005da.010* /client.0/tmp/t/linux-5.4/Documentation/devicetree/bindings/ [2,head] auth v=12333 cv=0/0 dir_auth=0,0 state=1610612736 f(v1 21=3+18) n(v9 rc2021-04-08T16:32:40.521946+0000 b905042 607=580+27) hs=19+0,ss=0+0 | child=1 subtree=1 dirty=1 0x558ad5ed4480]
 -5754> 2021-04-08T16:44:06.011+0000 7f134e1e8700 15 mds.0.cache trim_inode [inode 0x10000000a35 [...2,head] /client.0/tmp/t/linux-5.4/Documentation/devicetree/bindings/hwmon/ auth v11722 f(v0 m2021-04-08T16:32:35.802030+0000 38=38+0) n(v0 rc2021-04-08T16:32:35.803030+0000 b31739 39=38+1) (iversion lock) 0x558ad4cdec00]
 -5753> 2021-04-08T16:44:06.011+0000 7f134e1e8700 15 mds.0.cache trim_dirfrag [dir 0x10000000a35 /client.0/tmp/t/linux-5.4/Documentation/devicetree/bindings/hwmon/ [2,head] auth v=153 cv=0/0 state=1073741824 f(v0 m2021-04-08T16:32:35.802030+0000 38=38+0) n(v0 rc2021-04-08T16:32:35.803030+0000 b31739 38=38+0) hs=0+0,ss=0+0 0x558ad6a19680]
 -5752> 2021-04-08T16:44:06.011+0000 7f134e1e8700 14 mds.0.cache.ino(0x10000000a35) close_dirfrag *
 -5751> 2021-04-08T16:44:06.011+0000 7f134e1e8700 12 mds.0.cache.dir(0x10000000a35) remove_null_dentries [dir 0x10000000a35 /client.0/tmp/t/linux-5.4/Documentation/devicetree/bindings/hwmon/ [2,head] auth v=153 cv=0/0 state=1073741824 f(v0 m2021-04-08T16:32:35.802030+0000 38=38+0) n(v0 rc2021-04-08T16:32:35.803030+0000 b31739 38=38+0) hs=0+0,ss=0+0 0x558ad6a19680]
 -5750> 2021-04-08T16:44:06.011+0000 7f134e1e8700 12 mds.0.cache.dir(0x100000005da.010*) unlink_inode [dentry #0x1/client.0/tmp/t/linux-5.4/Documentation/devicetree/bindings/hwmon [2,head] auth (dversion lock) v=11722 ino=0x10000000a35 state=1073741824 0x558ad3d2af00] [inode 0x10000000a35 [...2,head] /client.0/tmp/t/linux-5.4/Documentation/devicetree/bindings/hwmon/ auth v11722 f(v0 m2021-04-08T16:32:35.802030+0000 38=38+0) n(v0 rc2021-04-08T16:32:35.803030+0000 b31739 39=38+1) (iversion lock) 0x558ad4cdec00]
 -5749> 2021-04-08T16:44:06.011+0000 7f134e1e8700 14 mds.0.cache remove_inode [inode 0x10000000a35 [...2,head] #10000000a35/ auth v11722 f(v0 m2021-04-08T16:32:35.802030+0000 38=38+0) n(v0 rc2021-04-08T16:32:35.803030+0000 b31739 39=38+1) (iversion lock) 0x558ad4cdec00]

Then when doing the replay, it failed:

   -16> 2021-04-08T16:44:08.043+0000 7f134a9e1700 10 mds.0.journal EOpen.replay
   -15> 2021-04-08T16:44:08.043+0000 7f134a9e1700 10 mds.0.journal EMetaBlob.replay 1 dirlumps by unknown.0
   -14> 2021-04-08T16:44:08.043+0000 7f134a9e1700 10 mds.0.journal EMetaBlob.replay dir 0x10000000a35
   -13> 2021-04-08T16:44:08.043+0000 7f134a9e1700  0 mds.0.journal EMetaBlob.replay missing dir ino  0x10000000a35
   -12> 2021-04-08T16:44:08.043+0000 7f134a9e1700 -1 log_channel(cluster) log [ERR] : failure replaying journal (EMetaBlob)

#4 Updated by Patrick Donnelly about 24 hours ago

Xiubo Li wrote:

Before replaying the journal, the inode 0x10000000a35 was removed from the inode_map in upkeep thread:

[...]

Then when doing the replay, it failed:

[...]

I think this is a matter of having that inode pinned in cache but I'm not sure if it's missing a pin that already exists or if we need to make a replay pin if necessary (and how that should be applied needs investigated). Do you agree Xiubo?

#5 Updated by Xiubo Li about 16 hours ago

Patrick Donnelly wrote:

Xiubo Li wrote:

Before replaying the journal, the inode 0x10000000a35 was removed from the inode_map in upkeep thread:

[...]

Then when doing the replay, it failed:

[...]

I think this is a matter of having that inode pinned in cache but I'm not sure if it's missing a pin that already exists or if we need to make a replay pin if necessary (and how that should be applied needs investigated). Do you agree Xiubo?

Yeah, agree. I will check it.

#6 Updated by Xiubo Li about 12 hours ago

The [dir 0x10000000a35 /client.0/tmp/t/linux-5.4/Documentation/devicetree/bindings/hwmon/] was trimmed from the subtree and the dentry was unlinked and moved to the bottom_lru:

2021-04-08T16:43:36.728+0000 7f134a9e1700 10 mds.0.cache trim_non_auth_subtree(0x558ad36a9200) Checking dentry 0x558ad2bde000
2021-04-08T16:43:36.728+0000 7f134a9e1700 10 mds.0.cache trim_non_auth_subtree(0x558ad2bd4480) [dir 0x10000000a35 /client.0/tmp/t/linux-5.4/Documentation/devicetree/bindings/hwmon/ [2,head] auth v=153 cv=0/0 state=1073741824 f(v0 m2021-04-08T16:32:35.802030+0000 38=38+0) n(v0 rc2021-04-08T16:32:35.803030+0000 b31739 38=38+0) hs=0+0,ss=0+0 0x558ad2bd4480]
2021-04-08T16:43:36.728+0000 7f134a9e1700 14 mds.0.cache.ino(0x10000000a35) close_dirfrag *
2021-04-08T16:43:36.728+0000 7f134a9e1700 12 mds.0.cache.dir(0x10000000a35) remove_null_dentries [dir 0x10000000a35 /client.0/tmp/t/linux-5.4/Documentation/devicetree/bindings/hwmon/ [2,head] rep@-2.0 state=0 f(v0 m2021-04-08T16:32:35.802030+0000 38=38+0) n(v0 rc2021-04-08T16:32:35.803030+0000 b31739 38=38+0) hs=0+0,ss=0+0 0x558ad2bd4480]
2021-04-08T16:43:36.728+0000 7f134a9e1700 20 mds.0.cache trim_non_auth_subtree(0x558ad36a9200) removing inode 0x558ad2bdc100 with dentry0x558ad2bde000
2021-04-08T16:43:36.728+0000 7f134a9e1700 12 mds.0.cache.dir(0x100000005da.010*) unlink_inode [dentry #0x1/client.0/tmp/t/linux-5.4/Documentation/devicetree/bindings/hwmon [2,head] auth (dversion lock) v=11722 ino=0x10000000a35 state=1610612736 | inodepin=1 dirty=1 0x558ad2bde000] [inode 0x10000000a35 [...2,head] /client.0/tmp/t/linux-5.4/Documentation/devicetree/bindings/hwmon/ auth v11722 f(v0 m2021-04-08T16:32:35.802030+0000 38=38+0) n(v0 rc2021-04-08T16:32:35.803030+0000 b31739 39=38+1) (iversion lock) | dirfrag=0 dirtyparent=0 dirty=1 0x558ad2bdc100]
2021-04-08T16:43:36.728+0000 7f134a9e1700 14 mds.0.cache remove_inode [inode 0x10000000a35 [...2,head] #10000000a35/ auth v11722 f(v0 m2021-04-08T16:32:35.802030+0000 38=38+0) n(v0 rc2021-04-08T16:32:35.803030+0000 b31739 39=38+1) (iversion lock) | dirfrag=0 dirtyparent=0 dirty=1 0x558ad2bdc100]
2021-04-08T16:43:36.728+0000 7f134a9e1700 10 mds.0.cache.ino(0x10000000a35) mark_clean [inode 0x10000000a35 [...2,head] #10000000a35/ auth v11722 f(v0 m2021-04-08T16:32:35.802030+0000 38=38+0) n(v0 rc2021-04-08T16:32:35.803030+0000 b31739 39=38+1) (iversion lock) | dirfrag=0 dirtyparent=0 dirty=1 0x558ad2bdc100]
2021-04-08T16:43:36.728+0000 7f134a9e1700 12 mds.0.cache.dir(0x100000005da.010*) remove_dentry [dentry #0x1/client.0/tmp/t/linux-5.4/Documentation/devicetree/bindings/hwmon [2,head] auth NULL (dversion lock) v=11722 ino=(nil) state=1610612736 | inodepin=0 dirty=1 0x558ad2bde000]
2021-04-08T16:43:36.728+0000 7f134a9e1700 10 mds.0.cache.den(0x100000005da.010* hwmon) mark_clean [dentry #0x1/client.0/tmp/t/linux-5.4/Documentation/devicetree/bindings/hwmon [2,head] auth NULL (dversion lock) v=11722 ino=(nil) state=1610612736 | inodepin=0 dirty=1 0x558ad2bde000]
2021-04-08T16:43:36.728+0000 7f134a9e1700 10 mds.0.cache trim_non_auth_subtree(0x558ad36a9200) Checking dentry 0x558ad34a6f00

And after the PR https://github.com/ceph/ceph/pull/40486, in standby_replay mode in the upkeep_main() the inodes in bottom_lru will be trimmed.

#7 Updated by Xiubo Li about 10 hours ago

2021-04-08T16:44:08.043+0000 7f134a9e1700 10 mds.0.journal EMetaBlob.replay request client.4669:223057 trim_to 222902
2021-04-08T16:44:08.043+0000 7f134a9e1700 10 mds.0.log _replay 393514883~1056 / 394308203 2021-04-08T16:44:07.441097+0000: EOpen [metablob 0x10000000a35, 1 dirs], 1 open files
2021-04-08T16:44:08.043+0000 7f134a9e1700 10 mds.0.journal EOpen.replay
2021-04-08T16:44:08.043+0000 7f134a9e1700 10 mds.0.journal EMetaBlob.replay 1 dirlumps by unknown.0
2021-04-08T16:44:08.043+0000 7f134a9e1700 10 mds.0.journal EMetaBlob.replay dir 0x10000000a35
2021-04-08T16:44:08.043+0000 7f134a9e1700  0 mds.0.journal EMetaBlob.replay missing dir ino  0x10000000a35
2021-04-08T16:44:08.043+0000 7f134a9e1700 -1 log_channel(cluster) log [ERR] : failure replaying journal (EMetaBlob)
2021-04-08T16:44:08.043+0000 7f134a9e1700  5 mds.beacon.a set_want_state: up:standby-replay -> down:damaged

This journal EMetaBlob above is one "EOpen" type, which was from master mds's cap release operation after the dir was removed, the master mds logs:

2021-04-08T16:44:07.684+0000 7f9378af0700  7 mds.0.locker _do_cap_release for client.4669 on [inode 0x10000000a35 [...2,head] ~mds0/stray2/10000000a35/ auth v21948 DIRTYPARENT f(v0 m2021-04-08T16:44:07.512369+0000) n(v0 rc2021-04-08T16:44:07.513369+0000 1=0+1) (inest lock) (ifile excl) (iversion lock) caps={4669=pAsLsXsFsx/pAsLsXsFsx@83},l=4669 | request=0 lock=0 importingcaps=0 dirfrag=0 caps=1 dirtyparent=1 replicated=0 dirty=1 waiter=0 authpin=0 0x5635ef87d700]
2021-04-08T16:44:07.684+0000 7f9378af0700 10 mds.0.openfiles remove_inode [inode 0x10000000a35 [...2,head] ~mds0/stray2/10000000a35/ auth v21948 DIRTYPARENT f(v0 m2021-04-08T16:44:07.512369+0000) n(v0 rc2021-04-08T16:44:07.513369+0000 1=0+1) (inest lock) (ifile excl) (iversion lock) caps={4669=pAsLsXsFsx/pAsLsXsFsx@83},l=-1(4669) | request=0 lock=0 importingcaps=0 dirfrag=0 caps=1 dirtyparent=1 replicated=0 dirty=1 waiter=0 authpin=0 0x5635ef87d700]
2021-04-08T16:44:07.684+0000 7f9378af0700 10 mds.0.cache.ino(0x10000000a35) remove_client_cap last cap, leaving realm snaprealm(0x100 seq 1 lc 0 cr 0 cps 1 snaps={} past_parent_snaps= 0x5635e7132600)
2021-04-08T16:44:07.684+0000 7f9378af0700 10 mds.0.locker eval 3648 [inode 0x10000000a35 [...2,head] ~mds0/stray2/10000000a35/ auth v21948 DIRTYPARENT f(v0 m2021-04-08T16:44:07.512369+0000) n(v0 rc2021-04-08T16:44:07.513369+0000 1=0+1) (inest lock) (ifile excl) (iversion lock) | request=0 lock=0 importingcaps=0 dirfrag=0 caps=0 dirtyparent=1 replicated=0 dirty=1 waiter=0 authpin=0 0x5635ef87d700]
2021-04-08T16:44:07.684+0000 7f9378af0700  7 mds.0.locker file_eval wanted= loner_wanted= other_wanted=  filelock=(ifile excl) on [inode 0x10000000a35 [...2,head] ~mds0/stray2/10000000a35/ auth v21948 DIRTYPARENT f(v0 m2021-04-08T16:44:07.512369+0000) n(v0 rc2021-04-08T16:44:07.513369+0000 1=0+1) (inest lock) (ifile excl) (iversion lock) | request=0 lock=0 importingcaps=0 dirfrag=0 caps=0 dirtyparent=1 replicated=0 dirty=1 waiter=0 authpin=0 0x5635ef87d700]

And after the trim() in upkeep_main(), if it receives an EOpen journal event like this, the standby_replay rank daemon will hit this issue.

#8 Updated by Xiubo Li about 4 hours ago

  • Status changed from In Progress to Fix Under Review
  • Pull request ID set to 40963

In standby_replay, if some dentries just added/linked but not get a
chance to replay the EOpen journals followed, if the upkeep_main() is
excuted, which will may trim them out immediately. Then when playing
the EOpen journals later the replay will fail.

In standby_replay, let's skip trimming them if dentry's linkage inode
is not nullptr.

Also available in: Atom PDF