Project

General

Profile

Actions

Bug #62663

closed

MDS: inode nlink value is -1 causing MDS to continuously crash

Added by Austin Axworthy 8 months ago. Updated 7 months ago.

Status:
Can't reproduce
Priority:
High
Assignee:
-
Category:
Correctness/Safety
Target version:
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
No
Severity:
1 - critical
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

All MDS daemons are continuously crashing. The logs are reporting an inode nlink value is set to -1. I have included details below of the filesystem workflow.

Workflow:
This filesystem has a heavy workload using hardlinks. Data within the filesystem can be processed with up to 10-20 processes at a time. Each process will generate a hardlink, meaning there could be up to 20 hardlinks at a time. Once the processing is complete the hard links are removed and cleaned up.
Leading up to the crash, the MDS performance was very degraded, which lead to a restart of the active MDS. The secondary daemon was experiencing similar issues, and eventually the MDS was failed back over to the original daemon. The original MDS then entered a continuous crash, causing the filesystem to go offline. When investigating the logs the following error was found, inode 0x10005f79654 nl=-1 as well as FAILED ceph_assert(stray_in->get_inode()->nlink >= 1).

4982059 2023-07-12T13:16:33.413-0400 7f337eec5700 10 mds.0.cache.strays  inode is [inode 0x10005f79654 [...10,head] ~mds0/stray1/10005f79654 auth v22224244 snaprealm=0x55b0a8b17600 DIRTYPARENT s=13123258 nl=-1 n(v04982059  rc2023-07-03T14:19:43.854341-0400 b13123258 1=1+0) (iversion lock) | openingsnapparents=0 dirtyparent=1 dirty=0 0x55b0aa54ec00]
4982060 2023-07-12T13:16:33.413-0400 7f337eec5700 20 mds.0.cache.strays _eval_stray_remote [dentry #0x100/stray1/10005f79654 [10,head] auth (dversion lock) v=22224244 ino=0x10005f79654 state=1342177296 | inodepin=14982060  dirty=0 0x55b0a9306f00]
4982061 2023-07-12T13:16:33.414-0400 7f337eec5700 -1 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.2.6/rpm/el84982061 /BUILD/ceph-17.2.6/src/mds/StrayManager.cc: In function 'void StrayManager::_eval_stray_remote(CDentry*, CDentry*)' thread 7f337eec5700 time 2023-07-12T13:16:33.414966-0400
4982062 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.2.6/rpm/el8/BUILD/ceph-17.2.6/src/mds/StrayManager.cc: 64982062 22: FAILED ceph_assert(stray_in->get_inode()->nlink >= 1)

A "cephfs-data-scan scan_links" was done after removing the omap key of the object reporting the issue. The output of the scan_links is as follows.

]

2023-07-24T21:23:19.830-0400 7f00195a4740 -1 datascan.scan_links: Bad nlink on 0x1000609ae80 expected 1 has 0
2023-07-24T21:23:19.830-0400 7f00195a4740 -1 datascan.scan_links: Remove duplicated ino 0x0x1000609ae85 from 0x10005fa3bd4/filename
2023-07-24T21:23:19.830-0400 7f00195a4740 -1 datascan.scan_links: Bad nlink on 0x1000609ae85 expected 1 has 0
2023-07-24T21:23:19.830-0400 7f00195a4740 -1 datascan.scan_links: Remove duplicated ino 0x0x1000609ae8f from 0x10005fa3bd4/filename
2023-07-24T21:23:19.830-0400 7100195a4740 -1 datascan.scan_links: Bad nlink on 0x1000609ae8f expected 1 has 0
2023-07-24T21:23:19.830-0400 7f00195a4740 -1 datascan.scan_links: Remove duplicated ino 0x0x1000609aec5 from 0x10005fa3bd4/filename
2023-07-24T21:23:19.830-0400 7100195a4740 -1 datascan.scan_links: Bad nlink on 0x1000609aec5 expected 1 has 0
2023-07-24T21:23:19.830-0400 7f00195a4740 -1 datascan.scan_links: Remove duplicated ino 0x0x1000609aeda from 0x1000609aed9/filename
2023-07-24T21:23:19.830-0400 7f00195a4740 -1 datascan.scan_links: Bad nlink on 0x1000609aede expected 1 has 0 
2023-07-24T21:23:19.830-0400 7f00195a4740 -1 datascan.scan_links: Bad nlink on 0x1000609aeed expected 1 has 0 
2023-07-24T21:23:19.830-0400 7f00195a4740 -1 datascan.scan_links: Bad nlink on 0x1000609aefb expected 1 has 0 
2023-07-24T21:23:19.830-0400 7f00195a4740 -1 datascan.scan_links: Bad nlink on 0x1000609af05 expected 1 has 0 
2023-07-24T21:23:19.830-0400 7f00195a4740 -1 datascan.scan_links: Bad nlink on 0x1000609af0e expected 1 has 0 
2023-07-24T21:23:19.830-0400 7f00195a4740 -1 datascan.scan_links: Remove duplicated ino 0x0x1000609af1b from 0x10005fa3bd4/filename
2023-07-24T21:23:19.830-0400 7700195a4740 -1 datascan.scan_links: Bad link on 0x1000609af1b expected 1 has 0 
2023-07-24T21:23:19.830-0400 700195a4740 -1 datascan.scan_links: Remove duplicated ino 0x0x1000609af3a from 0x10005fa3bd4/filename
2023-07-24T21:23:19.830-0400 7100195a4740 -1 datascan.scan_links: Bad link on 0x1000609af3a expected 1 has 0
2023-07-24T21:23:19.830-0400 7f00195a4740 -1 datascan.scan_links: Remove duplicated ino 0x0x1000609af44 from 0x10005fa3bd4/filename
2023-07-24T21:23:19.830-0400 7100195a4740 -1 datascan.scan_links: Bad nlink on 0x1000609af44 expected 1 has 0 
2023-07-24T21:23:19.830-0400 700195a4740 -1 datascan.scan_links: Remove duplicated ino 0x0x1000609af6c from 0x1000609af6b/filename
2023-07-24T21:23:19.830-0400 7f00195a4740 -1 datascan.scan_links: Remove duplicated ino 0x0x1000609b32f from 0x1000609b32c/filename
2023-07-24T21:23:19.830-0400 700195a4740 -1 datascan.scan_links: Remove duplicated ino 0x0x1000609c012 from 0x1000609c011/filename
2023-07-24T21:23:19.830-0400 7100195a4740 -1 datascan.scan_links: Remove duplicated ino 0x0x1000609c968 from 0x1000609c966/filename
2023-07-24T21:23:19.830-0400 7f00195a4740 -1 datascan.scan_links: Remove duplicated ino 0x0x1000609cb02 from 0x1000609cb00/filename
2023-07-24T21:23:19.830-0400 700195a4740 -1 datascan.scan_links: Remove duplicated ino 0x0x1000609d3c2 from 0x1000609d3c1/filename
2023-07-24T21:23:19.830-0400 7100195a4740 -1 datascan.scan_links: Remove duplicated ino 0x0x1000609d523 from 0x1000609d522/filename
2023-07-24T21:23:19.830-0400 7f00195a4740 -1 datascan.scan_links: Remove duplicated ino 0x0x1000609d7fc from 0x1000609d7fa/filename
2023-07-24T21:23:19.830-0400 7f00195a4740 -1 datascan.scan_links: Remove duplicated ino 0x0x1000609dbec from 0x1000609dbea/filename

It was determined that the specific inode reporting the error had 3 hardlinks, one of which was in the process of being deleted when the issues first presented.

The filesystem has a single rank with a standby daemon. This cluster has had ENOSPCE issues in the past due to the number of stray files generated by the deletion of hardlinks.

Actions #1

Updated by Venky Shankar 8 months ago

  • Project changed from Ceph to CephFS
  • Category set to Correctness/Safety
  • Target version set to v19.0.0
  • Component(FS) MDS added
Actions #2

Updated by Venky Shankar 8 months ago

Austin Axworthy wrote:

All MDS daemons are continuously crashing. The logs are reporting an inode nlink value is set to -1. I have included details below of the filesystem workflow.

Workflow:
This filesystem has a heavy workload using hardlinks. Data within the filesystem can be processed with up to 10-20 processes at a time. Each process will generate a hardlink, meaning there could be up to 20 hardlinks at a time. Once the processing is complete the hard links are removed and cleaned up.
Leading up to the crash, the MDS performance was very degraded, which lead to a restart of the active MDS. The secondary daemon was experiencing similar issues, and eventually the MDS was failed back over to the original daemon. The original MDS then entered a continuous crash, causing the filesystem to go offline. When investigating the logs the following error was found, inode 0x10005f79654 nl=-1 as well as FAILED ceph_assert(stray_in->get_inode()->nlink >= 1).

The ceph version used here (17.2.6) is pretty recent which is makes me worried about this bug lurking around. I don't have an explanation about the root cause atm.

[...]

A "cephfs-data-scan scan_links" was done after removing the omap key of the object reporting the issue. The output of the scan_links is as follows.

Did you flush the mds journal before removing the omap key?

The scan tool detected duplicate primary inodes. I think there is a corner case that could possibly make this happen - need to carefully look at that bit.

Anyhow, for the duplicate primaries, the scan tool will choose the one which has the highest version and remove others and expects the number of hardlinks to match the nlink count in the inode. The nlink count in the inode is fixed up with the number of hardlinks it found while scanning. I hope tool recovered the file system?

[...]

It was determined that the specific inode reporting the error had 3 hardlinks, one of which was in the process of being deleted when the issues first presented.

Nothing unusual stands out with having a bunch of hardlinks and deleting some. The unlink code probably needs to be carefully inspected to explain this problem.

Could you help us out with anything else out of the ordinary - custom packages, destructive commands/tools used when the file system was online, etc.?

Actions #3

Updated by Venky Shankar 8 months ago

  • Priority changed from Normal to High
  • Source set to Community (user)
  • Severity changed from 2 - major to 1 - critical
Actions #4

Updated by Austin Axworthy 8 months ago

The issue originally occurred on 15.2.17. Apologies for the confusion on that the cluster during the troubleshooting.

Originally the omap key value was removed, but replaying the journal recreated the key. Removing the omap key and flushing the MDS journal allowed the MDS to recover.

Nothing out of the ordinary is running on the cluster, it is a bare metal deployment using ceph-ansible. The clients connect to the storage through CephFS mounts. No other software is stored on the cluster nodes.

Actions #5

Updated by Venky Shankar 7 months ago

  • Status changed from New to Can't reproduce

Hi Austin,

Austin Axworthy wrote:

The issue originally occurred on 15.2.17. Apologies for the confusion on that the cluster during the troubleshooting.

Originally the omap key value was removed, but replaying the journal recreated the key. Removing the omap key and flushing the MDS journal allowed the MDS to recover.

Nothing out of the ordinary is running on the cluster, it is a bare metal deployment using ceph-ansible. The clients connect to the storage through CephFS mounts. No other software is stored on the cluster nodes.

Since we do not have enough point to debug and given that this issue was first experienced in. EOL'd release, I'm marking this tracker as "Can't reproduce". Please reopen if this is seen again and it would be immensely helpful that MDS debug logs be shared.

Actions

Also available in: Atom PDF