Project

General

Profile

Actions

Bug #59163

open

mds: stuck in up:rejoin when it cannot "open" missing directory inode

Added by Patrick Donnelly about 1 year ago. Updated 9 days ago.

Status:
New
Priority:
Normal
Assignee:
Category:
Correctness/Safety
Target version:
% Done:

0%

Source:
Q/A
Tags:
Backport:
reef,quincy,pacific
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

tasks.cephfs.test_damage.TestDamage.test_object_deletion tests for damage when no clients are in the session list (for better or worse). Due to a bug in the kernel (#59162), the kernel does not correctly close its session so the MDS keeps a directory inode in the OpenFileTable. The presence of this entry causes the MDS to try to open the dirfrag during up:rejoin. The directory was deliberately removed to test for response to metadata damage. Apparently, in this scenario, the up:rejoin process becomes stuck.

Other mutations also fail, probably for the same reason.

See:

2023-03-23T19:30:35.545 INFO:teuthology.orchestra.run.smithi119.stderr:dumped fsmap epoch 94
2023-03-23T19:30:35.565 DEBUG:tasks.cephfs.filesystem:are_daemons_healthy: mds map: {'epoch': 94, 'flags': 18, 'flags_state': {'joinable': True, 'allow_snaps': True, 'allow_multimds_snaps': True, 'allow_standby_replay': False, 'refuse_client_session': False}, 'ever_allowed_features': 0, 'explicitly_allowed_features': 0, 'created': '2023-03-23T19:28:02.712518+0000', 'modified': '2023-03-23T19:30:21.752731+0000', 'tableserver': 0, 'root': 0, 'session_timeout': 60, 'session_autoclose': 300, 'required_client_features': {}, 'max_file_size': 1099511627776, 'max_xattr_size': 65536, 'last_failure': 0, 'last_failure_osd_epoch': 102, 'compat': {'compat': {}, 'ro_compat': {}, 'incompat': {'feature_1': 'base v0.20', 'feature_2': 'client writeable ranges', 'feature_3': 'default file layouts on dirs', 'feature_4': 'dir inode in separate object', 'feature_5': 'mds uses versioned encoding', 'feature_6': 'dirfrag is stored in omap', 'feature_7': 'mds uses inline data', 'feature_8': 'no anchor table', 'feature_9': 'file layout v2', 'feature_10': 'snaprealm v2'}}, 'max_mds': 1, 'in': [0], 'up': {'mds_0': 7947}, 'failed': [], 'damaged': [], 'stopped': [], 'info': {'gid_7947': {'gid': 7947, 'name': 'c', 'rank': 0, 'incarnation': 91, 'state': 'up:rejoin', 'state_seq': 17, 'addr': '172.21.15.119:6837/4165463071', 'addrs': {'addrvec': [{'type': 'v2', 'addr': '172.21.15.119:6836', 'nonce': 4165463071}, {'type': 'v1', 'addr': '172.21.15.119:6837', 'nonce': 4165463071}]}, 'join_fscid': -1, 'export_targets': [], 'features': 4540138322906710015, 'flags': 0, 'compat': {'compat': {}, 'ro_compat': {}, 'incompat': {'feature_1': 'base v0.20', 'feature_2': 'client writeable ranges', 'feature_3': 'default file layouts on dirs', 'feature_4': 'dir inode in separate object', 'feature_5': 'mds uses versioned encoding', 'feature_6': 'dirfrag is stored in omap', 'feature_7': 'mds uses inline data', 'feature_8': 'no anchor table', 'feature_9': 'file layout v2', 'feature_10': 'snaprealm v2'}}}}, 'data_pools': [15], 'metadata_pool': 14, 'enabled': True, 'fs_name': 'cephfs', 'balancer': '', 'bal_rank_mask': '-1', 'standby_count_wanted': 0}
2023-03-23T19:30:35.566 WARNING:tasks.cephfs.filesystem:Unhealthy mds state gid_7947:up:rejoin
2023-03-23T19:30:35.566 INFO:tasks.cephfs.test_damage:Result: Mutation 'Delete 10000000000.00000000' should have left us healthy, actually not.

/teuthology/pdonnell-2023-03-23_18:44:22-fs-wip-pdonnell-testing-20230323.162417-distro-default-smithi/7217902/teuthology.log


Related issues 2 (1 open1 closed)

Related to Linux kernel client - Bug #59162: kernel completes umount without waiting for mds closeFix Under ReviewXiubo Li

Actions
Has duplicate CephFS - Bug #59230: Test failure: test_object_deletion (tasks.cephfs.test_damage.TestDamage)Duplicate

Actions
Actions

Also available in: Atom PDF