Bug #50246
closedmds: failure replaying journal (EMetaBlob)
0%
Description
2021-04-08T16:44:08.043+0000 7f134a9e1700 10 mds.0.journal EMetaBlob.replay request client.4669:223057 trim_to 222902 2021-04-08T16:44:08.043+0000 7f134a9e1700 10 mds.0.log _replay 393514883~1056 / 394308203 2021-04-08T16:44:07.441097+0000: EOpen [metablob 0x10000000a35, 1 dirs], 1 open files 2021-04-08T16:44:08.043+0000 7f134a9e1700 10 mds.0.journal EOpen.replay 2021-04-08T16:44:08.043+0000 7f134a9e1700 10 mds.0.journal EMetaBlob.replay 1 dirlumps by unknown.0 2021-04-08T16:44:08.043+0000 7f134a9e1700 10 mds.0.journal EMetaBlob.replay dir 0x10000000a35 2021-04-08T16:44:08.043+0000 7f134a9e1700 0 mds.0.journal EMetaBlob.replay missing dir ino 0x10000000a35 2021-04-08T16:44:08.043+0000 7f134a9e1700 -1 log_channel(cluster) log [ERR] : failure replaying journal (EMetaBlob) 2021-04-08T16:44:08.043+0000 7f134a9e1700 5 mds.beacon.a set_want_state: up:standby-replay -> down:damaged
From: /ceph/teuthology-archive/pdonnell-2021-04-08_16:05:30-fs-wip-pdonnell-testing-20210408.142238-distro-basic-smithi/6028884/remote/smithi031/log/ceph-mds.a.log.gz
Might be because the standby-replay daemons are regularly trimming the journal now with [1].
Updated by Xiubo Li about 3 years ago
Before replaying the journal, the inode 0x10000000a35 was removed from the inode_map in upkeep thread:
-7139> 2021-04-08T16:44:06.006+0000 7f134e1e8700 20 mds.0.cache upkeep thread trimming cache; last trim 0.999983788s ago -7138> 2021-04-08T16:44:06.006+0000 7f134e1e8700 7 mds.0.cache trim bytes_used=15MB limit=4GB reservation=0.05% count=0 -7137> 2021-04-08T16:44:06.006+0000 7f134e1e8700 7 mds.0.cache trim_lru trimming 0 items from LRU size=625 mid=161 pintail=12 pinned=395 [...] -5757> 2021-04-08T16:44:06.011+0000 7f134e1e8700 12 mds.0.cache.dir(0x10000000a35) remove_dentry [dentry #0x1/client.0/tmp/t/linux-5.4/Documentation/devicetree/bindings/hwmon/ads7828.txt [2,head] auth NULL (dversion lock) v=92 ino=(nil) state=1073741824 0x558ad621e780] -5756> 2021-04-08T16:44:06.011+0000 7f134e1e8700 12 mds.0.cache trim_dentry [dentry #0x1/client.0/tmp/t/linux-5.4/Documentation/devicetree/bindings/hwmon [2,head] auth (dversion lock) v=11722 ino=0x10000000a35 state=1073741824 0x558ad3d2af00] -5755> 2021-04-08T16:44:06.011+0000 7f134e1e8700 12 mds.0.cache in container [dir 0x100000005da.010* /client.0/tmp/t/linux-5.4/Documentation/devicetree/bindings/ [2,head] auth v=12333 cv=0/0 dir_auth=0,0 state=1610612736 f(v1 21=3+18) n(v9 rc2021-04-08T16:32:40.521946+0000 b905042 607=580+27) hs=19+0,ss=0+0 | child=1 subtree=1 dirty=1 0x558ad5ed4480] -5754> 2021-04-08T16:44:06.011+0000 7f134e1e8700 15 mds.0.cache trim_inode [inode 0x10000000a35 [...2,head] /client.0/tmp/t/linux-5.4/Documentation/devicetree/bindings/hwmon/ auth v11722 f(v0 m2021-04-08T16:32:35.802030+0000 38=38+0) n(v0 rc2021-04-08T16:32:35.803030+0000 b31739 39=38+1) (iversion lock) 0x558ad4cdec00] -5753> 2021-04-08T16:44:06.011+0000 7f134e1e8700 15 mds.0.cache trim_dirfrag [dir 0x10000000a35 /client.0/tmp/t/linux-5.4/Documentation/devicetree/bindings/hwmon/ [2,head] auth v=153 cv=0/0 state=1073741824 f(v0 m2021-04-08T16:32:35.802030+0000 38=38+0) n(v0 rc2021-04-08T16:32:35.803030+0000 b31739 38=38+0) hs=0+0,ss=0+0 0x558ad6a19680] -5752> 2021-04-08T16:44:06.011+0000 7f134e1e8700 14 mds.0.cache.ino(0x10000000a35) close_dirfrag * -5751> 2021-04-08T16:44:06.011+0000 7f134e1e8700 12 mds.0.cache.dir(0x10000000a35) remove_null_dentries [dir 0x10000000a35 /client.0/tmp/t/linux-5.4/Documentation/devicetree/bindings/hwmon/ [2,head] auth v=153 cv=0/0 state=1073741824 f(v0 m2021-04-08T16:32:35.802030+0000 38=38+0) n(v0 rc2021-04-08T16:32:35.803030+0000 b31739 38=38+0) hs=0+0,ss=0+0 0x558ad6a19680] -5750> 2021-04-08T16:44:06.011+0000 7f134e1e8700 12 mds.0.cache.dir(0x100000005da.010*) unlink_inode [dentry #0x1/client.0/tmp/t/linux-5.4/Documentation/devicetree/bindings/hwmon [2,head] auth (dversion lock) v=11722 ino=0x10000000a35 state=1073741824 0x558ad3d2af00] [inode 0x10000000a35 [...2,head] /client.0/tmp/t/linux-5.4/Documentation/devicetree/bindings/hwmon/ auth v11722 f(v0 m2021-04-08T16:32:35.802030+0000 38=38+0) n(v0 rc2021-04-08T16:32:35.803030+0000 b31739 39=38+1) (iversion lock) 0x558ad4cdec00] -5749> 2021-04-08T16:44:06.011+0000 7f134e1e8700 14 mds.0.cache remove_inode [inode 0x10000000a35 [...2,head] #10000000a35/ auth v11722 f(v0 m2021-04-08T16:32:35.802030+0000 38=38+0) n(v0 rc2021-04-08T16:32:35.803030+0000 b31739 39=38+1) (iversion lock) 0x558ad4cdec00]
Then when doing the replay, it failed:
-16> 2021-04-08T16:44:08.043+0000 7f134a9e1700 10 mds.0.journal EOpen.replay -15> 2021-04-08T16:44:08.043+0000 7f134a9e1700 10 mds.0.journal EMetaBlob.replay 1 dirlumps by unknown.0 -14> 2021-04-08T16:44:08.043+0000 7f134a9e1700 10 mds.0.journal EMetaBlob.replay dir 0x10000000a35 -13> 2021-04-08T16:44:08.043+0000 7f134a9e1700 0 mds.0.journal EMetaBlob.replay missing dir ino 0x10000000a35 -12> 2021-04-08T16:44:08.043+0000 7f134a9e1700 -1 log_channel(cluster) log [ERR] : failure replaying journal (EMetaBlob)
Updated by Patrick Donnelly about 3 years ago
Xiubo Li wrote:
Before replaying the journal, the inode 0x10000000a35 was removed from the inode_map in upkeep thread:
[...]
Then when doing the replay, it failed:
[...]
I think this is a matter of having that inode pinned in cache but I'm not sure if it's missing a pin that already exists or if we need to make a replay pin if necessary (and how that should be applied needs investigated). Do you agree Xiubo?
Updated by Xiubo Li about 3 years ago
Patrick Donnelly wrote:
Xiubo Li wrote:
Before replaying the journal, the inode 0x10000000a35 was removed from the inode_map in upkeep thread:
[...]
Then when doing the replay, it failed:
[...]
I think this is a matter of having that inode pinned in cache but I'm not sure if it's missing a pin that already exists or if we need to make a replay pin if necessary (and how that should be applied needs investigated). Do you agree Xiubo?
Yeah, agree. I will check it.
Updated by Xiubo Li about 3 years ago
The [dir 0x10000000a35 /client.0/tmp/t/linux-5.4/Documentation/devicetree/bindings/hwmon/] was trimmed from the subtree and the dentry was unlinked and moved to the bottom_lru:
2021-04-08T16:43:36.728+0000 7f134a9e1700 10 mds.0.cache trim_non_auth_subtree(0x558ad36a9200) Checking dentry 0x558ad2bde000 2021-04-08T16:43:36.728+0000 7f134a9e1700 10 mds.0.cache trim_non_auth_subtree(0x558ad2bd4480) [dir 0x10000000a35 /client.0/tmp/t/linux-5.4/Documentation/devicetree/bindings/hwmon/ [2,head] auth v=153 cv=0/0 state=1073741824 f(v0 m2021-04-08T16:32:35.802030+0000 38=38+0) n(v0 rc2021-04-08T16:32:35.803030+0000 b31739 38=38+0) hs=0+0,ss=0+0 0x558ad2bd4480] 2021-04-08T16:43:36.728+0000 7f134a9e1700 14 mds.0.cache.ino(0x10000000a35) close_dirfrag * 2021-04-08T16:43:36.728+0000 7f134a9e1700 12 mds.0.cache.dir(0x10000000a35) remove_null_dentries [dir 0x10000000a35 /client.0/tmp/t/linux-5.4/Documentation/devicetree/bindings/hwmon/ [2,head] rep@-2.0 state=0 f(v0 m2021-04-08T16:32:35.802030+0000 38=38+0) n(v0 rc2021-04-08T16:32:35.803030+0000 b31739 38=38+0) hs=0+0,ss=0+0 0x558ad2bd4480] 2021-04-08T16:43:36.728+0000 7f134a9e1700 20 mds.0.cache trim_non_auth_subtree(0x558ad36a9200) removing inode 0x558ad2bdc100 with dentry0x558ad2bde000 2021-04-08T16:43:36.728+0000 7f134a9e1700 12 mds.0.cache.dir(0x100000005da.010*) unlink_inode [dentry #0x1/client.0/tmp/t/linux-5.4/Documentation/devicetree/bindings/hwmon [2,head] auth (dversion lock) v=11722 ino=0x10000000a35 state=1610612736 | inodepin=1 dirty=1 0x558ad2bde000] [inode 0x10000000a35 [...2,head] /client.0/tmp/t/linux-5.4/Documentation/devicetree/bindings/hwmon/ auth v11722 f(v0 m2021-04-08T16:32:35.802030+0000 38=38+0) n(v0 rc2021-04-08T16:32:35.803030+0000 b31739 39=38+1) (iversion lock) | dirfrag=0 dirtyparent=0 dirty=1 0x558ad2bdc100] 2021-04-08T16:43:36.728+0000 7f134a9e1700 14 mds.0.cache remove_inode [inode 0x10000000a35 [...2,head] #10000000a35/ auth v11722 f(v0 m2021-04-08T16:32:35.802030+0000 38=38+0) n(v0 rc2021-04-08T16:32:35.803030+0000 b31739 39=38+1) (iversion lock) | dirfrag=0 dirtyparent=0 dirty=1 0x558ad2bdc100] 2021-04-08T16:43:36.728+0000 7f134a9e1700 10 mds.0.cache.ino(0x10000000a35) mark_clean [inode 0x10000000a35 [...2,head] #10000000a35/ auth v11722 f(v0 m2021-04-08T16:32:35.802030+0000 38=38+0) n(v0 rc2021-04-08T16:32:35.803030+0000 b31739 39=38+1) (iversion lock) | dirfrag=0 dirtyparent=0 dirty=1 0x558ad2bdc100] 2021-04-08T16:43:36.728+0000 7f134a9e1700 12 mds.0.cache.dir(0x100000005da.010*) remove_dentry [dentry #0x1/client.0/tmp/t/linux-5.4/Documentation/devicetree/bindings/hwmon [2,head] auth NULL (dversion lock) v=11722 ino=(nil) state=1610612736 | inodepin=0 dirty=1 0x558ad2bde000] 2021-04-08T16:43:36.728+0000 7f134a9e1700 10 mds.0.cache.den(0x100000005da.010* hwmon) mark_clean [dentry #0x1/client.0/tmp/t/linux-5.4/Documentation/devicetree/bindings/hwmon [2,head] auth NULL (dversion lock) v=11722 ino=(nil) state=1610612736 | inodepin=0 dirty=1 0x558ad2bde000] 2021-04-08T16:43:36.728+0000 7f134a9e1700 10 mds.0.cache trim_non_auth_subtree(0x558ad36a9200) Checking dentry 0x558ad34a6f00
And after the PR https://github.com/ceph/ceph/pull/40486, in standby_replay mode in the upkeep_main() the inodes in bottom_lru will be trimmed.
Updated by Xiubo Li about 3 years ago
2021-04-08T16:44:08.043+0000 7f134a9e1700 10 mds.0.journal EMetaBlob.replay request client.4669:223057 trim_to 222902 2021-04-08T16:44:08.043+0000 7f134a9e1700 10 mds.0.log _replay 393514883~1056 / 394308203 2021-04-08T16:44:07.441097+0000: EOpen [metablob 0x10000000a35, 1 dirs], 1 open files 2021-04-08T16:44:08.043+0000 7f134a9e1700 10 mds.0.journal EOpen.replay 2021-04-08T16:44:08.043+0000 7f134a9e1700 10 mds.0.journal EMetaBlob.replay 1 dirlumps by unknown.0 2021-04-08T16:44:08.043+0000 7f134a9e1700 10 mds.0.journal EMetaBlob.replay dir 0x10000000a35 2021-04-08T16:44:08.043+0000 7f134a9e1700 0 mds.0.journal EMetaBlob.replay missing dir ino 0x10000000a35 2021-04-08T16:44:08.043+0000 7f134a9e1700 -1 log_channel(cluster) log [ERR] : failure replaying journal (EMetaBlob) 2021-04-08T16:44:08.043+0000 7f134a9e1700 5 mds.beacon.a set_want_state: up:standby-replay -> down:damaged
This journal EMetaBlob above is one "EOpen" type, which was from master mds's cap release operation after the dir was removed, the master mds logs:
2021-04-08T16:44:07.684+0000 7f9378af0700 7 mds.0.locker _do_cap_release for client.4669 on [inode 0x10000000a35 [...2,head] ~mds0/stray2/10000000a35/ auth v21948 DIRTYPARENT f(v0 m2021-04-08T16:44:07.512369+0000) n(v0 rc2021-04-08T16:44:07.513369+0000 1=0+1) (inest lock) (ifile excl) (iversion lock) caps={4669=pAsLsXsFsx/pAsLsXsFsx@83},l=4669 | request=0 lock=0 importingcaps=0 dirfrag=0 caps=1 dirtyparent=1 replicated=0 dirty=1 waiter=0 authpin=0 0x5635ef87d700] 2021-04-08T16:44:07.684+0000 7f9378af0700 10 mds.0.openfiles remove_inode [inode 0x10000000a35 [...2,head] ~mds0/stray2/10000000a35/ auth v21948 DIRTYPARENT f(v0 m2021-04-08T16:44:07.512369+0000) n(v0 rc2021-04-08T16:44:07.513369+0000 1=0+1) (inest lock) (ifile excl) (iversion lock) caps={4669=pAsLsXsFsx/pAsLsXsFsx@83},l=-1(4669) | request=0 lock=0 importingcaps=0 dirfrag=0 caps=1 dirtyparent=1 replicated=0 dirty=1 waiter=0 authpin=0 0x5635ef87d700] 2021-04-08T16:44:07.684+0000 7f9378af0700 10 mds.0.cache.ino(0x10000000a35) remove_client_cap last cap, leaving realm snaprealm(0x100 seq 1 lc 0 cr 0 cps 1 snaps={} past_parent_snaps= 0x5635e7132600) 2021-04-08T16:44:07.684+0000 7f9378af0700 10 mds.0.locker eval 3648 [inode 0x10000000a35 [...2,head] ~mds0/stray2/10000000a35/ auth v21948 DIRTYPARENT f(v0 m2021-04-08T16:44:07.512369+0000) n(v0 rc2021-04-08T16:44:07.513369+0000 1=0+1) (inest lock) (ifile excl) (iversion lock) | request=0 lock=0 importingcaps=0 dirfrag=0 caps=0 dirtyparent=1 replicated=0 dirty=1 waiter=0 authpin=0 0x5635ef87d700] 2021-04-08T16:44:07.684+0000 7f9378af0700 7 mds.0.locker file_eval wanted= loner_wanted= other_wanted= filelock=(ifile excl) on [inode 0x10000000a35 [...2,head] ~mds0/stray2/10000000a35/ auth v21948 DIRTYPARENT f(v0 m2021-04-08T16:44:07.512369+0000) n(v0 rc2021-04-08T16:44:07.513369+0000 1=0+1) (inest lock) (ifile excl) (iversion lock) | request=0 lock=0 importingcaps=0 dirfrag=0 caps=0 dirtyparent=1 replicated=0 dirty=1 waiter=0 authpin=0 0x5635ef87d700]
And after the trim() in upkeep_main(), if it receives an EOpen journal event like this, the standby_replay rank daemon will hit this issue.
Updated by Xiubo Li about 3 years ago
- Status changed from In Progress to Fix Under Review
- Pull request ID set to 40963
In standby_replay, if some dentries just added/linked but not get a
chance to replay the EOpen journals followed, if the upkeep_main() is
excuted, which will may trim them out immediately. Then when playing
the EOpen journals later the replay will fail.
In standby_replay, let's skip trimming them if dentry's linkage inode
is not nullptr.
Updated by Patrick Donnelly almost 3 years ago
- Status changed from Fix Under Review to Pending Backport
- Backport changed from pacific to pacific,octopus,nautilus
Updated by Backport Bot almost 3 years ago
- Copied to Backport #50632: pacific: mds: failure replaying journal (EMetaBlob) added
Updated by Backport Bot almost 3 years ago
- Copied to Backport #50633: octopus: mds: failure replaying journal (EMetaBlob) added
Updated by Backport Bot almost 3 years ago
- Copied to Backport #50634: nautilus: mds: failure replaying journal (EMetaBlob) added
Updated by Loïc Dachary almost 3 years ago
- Status changed from Pending Backport to Resolved
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".
Updated by Patrick Donnelly over 1 year ago
- Related to Bug #48673: High memory usage on standby replay MDS added