Bug #59163
mds: stuck in up:rejoin when it cannot "open" missing directory inode
0%
Description
tasks.cephfs.test_damage.TestDamage.test_object_deletion tests for damage when no clients are in the session list (for better or worse). Due to a bug in the kernel (#59162), the kernel does not correctly close its session so the MDS keeps a directory inode in the OpenFileTable. The presence of this entry causes the MDS to try to open the dirfrag during up:rejoin. The directory was deliberately removed to test for response to metadata damage. Apparently, in this scenario, the up:rejoin process becomes stuck.
Other mutations also fail, probably for the same reason.
See:
2023-03-23T19:30:35.545 INFO:teuthology.orchestra.run.smithi119.stderr:dumped fsmap epoch 94 2023-03-23T19:30:35.565 DEBUG:tasks.cephfs.filesystem:are_daemons_healthy: mds map: {'epoch': 94, 'flags': 18, 'flags_state': {'joinable': True, 'allow_snaps': True, 'allow_multimds_snaps': True, 'allow_standby_replay': False, 'refuse_client_session': False}, 'ever_allowed_features': 0, 'explicitly_allowed_features': 0, 'created': '2023-03-23T19:28:02.712518+0000', 'modified': '2023-03-23T19:30:21.752731+0000', 'tableserver': 0, 'root': 0, 'session_timeout': 60, 'session_autoclose': 300, 'required_client_features': {}, 'max_file_size': 1099511627776, 'max_xattr_size': 65536, 'last_failure': 0, 'last_failure_osd_epoch': 102, 'compat': {'compat': {}, 'ro_compat': {}, 'incompat': {'feature_1': 'base v0.20', 'feature_2': 'client writeable ranges', 'feature_3': 'default file layouts on dirs', 'feature_4': 'dir inode in separate object', 'feature_5': 'mds uses versioned encoding', 'feature_6': 'dirfrag is stored in omap', 'feature_7': 'mds uses inline data', 'feature_8': 'no anchor table', 'feature_9': 'file layout v2', 'feature_10': 'snaprealm v2'}}, 'max_mds': 1, 'in': [0], 'up': {'mds_0': 7947}, 'failed': [], 'damaged': [], 'stopped': [], 'info': {'gid_7947': {'gid': 7947, 'name': 'c', 'rank': 0, 'incarnation': 91, 'state': 'up:rejoin', 'state_seq': 17, 'addr': '172.21.15.119:6837/4165463071', 'addrs': {'addrvec': [{'type': 'v2', 'addr': '172.21.15.119:6836', 'nonce': 4165463071}, {'type': 'v1', 'addr': '172.21.15.119:6837', 'nonce': 4165463071}]}, 'join_fscid': -1, 'export_targets': [], 'features': 4540138322906710015, 'flags': 0, 'compat': {'compat': {}, 'ro_compat': {}, 'incompat': {'feature_1': 'base v0.20', 'feature_2': 'client writeable ranges', 'feature_3': 'default file layouts on dirs', 'feature_4': 'dir inode in separate object', 'feature_5': 'mds uses versioned encoding', 'feature_6': 'dirfrag is stored in omap', 'feature_7': 'mds uses inline data', 'feature_8': 'no anchor table', 'feature_9': 'file layout v2', 'feature_10': 'snaprealm v2'}}}}, 'data_pools': [15], 'metadata_pool': 14, 'enabled': True, 'fs_name': 'cephfs', 'balancer': '', 'bal_rank_mask': '-1', 'standby_count_wanted': 0} 2023-03-23T19:30:35.566 WARNING:tasks.cephfs.filesystem:Unhealthy mds state gid_7947:up:rejoin 2023-03-23T19:30:35.566 INFO:tasks.cephfs.test_damage:Result: Mutation 'Delete 10000000000.00000000' should have left us healthy, actually not.
/teuthology/pdonnell-2023-03-23_18:44:22-fs-wip-pdonnell-testing-20230323.162417-distro-default-smithi/7217902/teuthology.log
Related issues
History
#1 Updated by Patrick Donnelly about 1 year ago
- Related to Bug #59162: kernel completes umount without waiting for mds close added
#2 Updated by Venky Shankar about 1 year ago
- Category set to Correctness/Safety
- Assignee set to Venky Shankar
#3 Updated by Patrick Donnelly 12 months ago
- Duplicated by Bug #59230: Test failure: test_object_deletion (tasks.cephfs.test_damage.TestDamage) added
#4 Updated by Venky Shankar 11 months ago
The MDS got marked as down:damaged as it could not decode the CDir fnode:
-49> 2023-03-23T19:41:16.405+0000 7f9a85ad7700 10 mds.0.cache rejoin_start -48> 2023-03-23T19:41:16.405+0000 7f9a85ad7700 10 mds.0.cache process_imported_caps -47> 2023-03-23T19:41:16.405+0000 7f9a85ad7700 10 mds.0.openfiles prefetch_inodes -46> 2023-03-23T19:41:16.405+0000 7f9a85ad7700 10 mds.0.openfiles _prefetch_inodes state 1 -45> 2023-03-23T19:41:16.405+0000 7f9a85ad7700 10 mds.0.cache open_ino_batch_start -44> 2023-03-23T19:41:16.405+0000 7f9a85ad7700 10 mds.0.cache open_ino 0x10000000000 pool 14 want_replica 0 -43> 2023-03-23T19:41:16.405+0000 7f9a85ad7700 10 mds.0.cache _open_ino_traverse_dir: ino 0x10000000000 ret 0 -42> 2023-03-23T19:41:16.405+0000 7f9a85ad7700 10 mds.0.cache open_ino_traverse_dir ino 0x10000000000 [<0x1/subdir v0>] -41> 2023-03-23T19:41:16.405+0000 7f9a85ad7700 20 mds.0.cache.dir(0x1) lookup (subdir, 'head') -40> 2023-03-23T19:41:16.405+0000 7f9a85ad7700 10 mds.0.cache fetching incomplete [dir 0x1 / [2,head] auth v=12 cv=0/0 dir_auth=0 state=1073741824 f(v0 m2023-03-23T19:28:10.297554+0000 1=0+1) n(v1 rc2023-03-23T19:28:10.481550+0000 b6291456 2=1+1) hs=0+0,ss=0+0 | subtre e=1 0x5591337c8d00] -39> 2023-03-23T19:41:16.405+0000 7f9a85ad7700 10 mds.0.cache open_ino_batch_submit -38> 2023-03-23T19:41:16.405+0000 7f9a85ad7700 10 mds.0.cache.dir(0x1) fetch_keys 1 keys on [dir 0x1 / [2,head] auth v=12 cv=0/0 dir_auth=0 state=1073741824 f(v0 m2023-03-23T19:28:10.297554+0000 1=0+1) n(v1 rc2023-03-23T19:28:10.481550+0000 b6291456 2=1+1) hs=0+0,ss=0+0 | subtree=1 0x5591337c8d00] -37> 2023-03-23T19:41:16.405+0000 7f9a85ad7700 10 mds.0.cache.dir(0x1) auth_pin by 0x5591337c8d00 on [dir 0x1 / [2,head] auth v=12 cv=0/0 dir_auth=0 ap=1+0 state=1073741824 f(v0 m2023-03-23T19:28:10.297554+0000 1=0+1) n(v1 rc2023-03-23T19:28:10.481550+0000 b6291456 2=1+ 1) hs=0+0,ss=0+0 | dnwaiter=1 subtree=1 authpin=1 0x5591337c8d00] count now 1 -36> 2023-03-23T19:41:16.405+0000 7f9a85ad7700 1 -- [v2:172.21.15.119:6836/2518833919,v1:172.21.15.119:6837/2518833919] --> [v2:172.21.15.148:6824/33963,v1:172.21.15.148:6825/33963] -- osd_op(unknown.0.366:24 14.1f 14:ff5b34d6:::1.00000000:head [omap-get-header,omap-ge t-vals-by-keys in=19b,getxattr parent in=6b] snapc 0=[] ondisk+read+known_if_redirected+full_force+supports_pool_eio e145) v8 -- 0x5591345de800 con 0x5591345df000 -35> 2023-03-23T19:41:16.405+0000 7f9a85ad7700 20 mds.0.bal hit_dir 3 pop is 1, frag * size 1 [pop IRD:[C 0.00e+00] IWR:[C 0.00e+00] RDR:[C 0.00e+00] FET:[C 1.00e+00] STR:[C 0.00e+00] *LOAD:2.0] -34> 2023-03-23T19:41:16.405+0000 7f9a85ad7700 1 mds.0.366 rejoin_joint_start -33> 2023-03-23T19:41:16.405+0000 7f9a85ad7700 10 mds.0.cache rejoin_send_rejoins with recovery_set -32> 2023-03-23T19:41:16.405+0000 7f9a85ad7700 7 mds.0.cache rejoin_send_rejoins still processing imported caps, delaying -31> 2023-03-23T19:41:16.405+0000 7f9a85ad7700 10 notify_mdsmap: mds.metrics -30> 2023-03-23T19:41:16.405+0000 7f9a85ad7700 10 notify_mdsmap: mds.metrics: rank0 is unavailable -29> 2023-03-23T19:41:16.405+0000 7f9a85ad7700 10 reset_seq: mds.metrics: last_updated_seq=0 -28> 2023-03-23T19:41:16.405+0000 7f9a85ad7700 20 set_next_seq: mds.metrics: current sequence number 0, setting next sequence number 0 -27> 2023-03-23T19:41:16.405+0000 7f9a88add700 1 -- [v2:172.21.15.119:6836/2518833919,v1:172.21.15.119:6837/2518833919] <== osd.7 v2:172.21.15.148:6824/33963 5 ==== osd_op_reply(24 1.00000000 [omap-get-header out=80b,omap-get-vals-by-keys out=523b,getxattr (30) out=30b ] v0'0 uv543 ondisk = 0) v8 ==== 238+0+633 (crc 0 0 0) 0x5591338aafc0 con 0x5591345df000 -26> 2023-03-23T19:41:16.405+0000 7f9a7facb700 10 MDSIOContextBase::complete: 21C_IO_Dir_OMAP_Fetched -25> 2023-03-23T19:41:16.405+0000 7f9a7facb700 10 MDSContext::complete: 21C_IO_Dir_OMAP_Fetched -24> 2023-03-23T19:41:16.405+0000 7f9a7facb700 10 mds.0.cache.dir(0x1) _fetched header 80 bytes 1 keys for [dir 0x1 / [2,head] auth v=12 cv=0/0 dir_auth=0 ap=1+0 state=1073741824 f(v0 m2023-03-23T19:28:10.297554+0000 1=0+1) n(v1 rc2023-03-23T19:28:10.481550+0000 b629145 6 2=1+1) hs=0+0,ss=0+0 | dnwaiter=1 subtree=1 authpin=1 0x5591337c8d00] -23> 2023-03-23T19:41:16.405+0000 7f9a7facb700 -1 mds.0.cache.dir(0x1) Corrupt fnode in dirfrag 0x1: void fnode_t::decode(ceph::buffer::v15_2_0::list::const_iterator&) no longer understand old encoding version 3 < 101: Malformed input [buffer:3] -22> 2023-03-23T19:41:16.405+0000 7f9a7facb700 0 log_channel(cluster) log [WRN] : Corrupt fnode header in 0x1: void fnode_t::decode(ceph::buffer::v15_2_0::list::const_iterator&) no longer understand old encoding version 3 < 101: Malformed input [buffer:3] () -21> 2023-03-23T19:41:16.405+0000 7f9a7facb700 10 mds.0.cache.dir(0x1) go_bad * -20> 2023-03-23T19:41:16.405+0000 7f9a7facb700 -1 mds.0.damage notify_dirfrag Damage to fragment * of ino 0x1 is fatal because it is a system directory for this rank -19> 2023-03-23T19:41:16.405+0000 7f9a7facb700 5 mds.beacon.c set_want_state: up:rejoin -> down:damaged
#5 Updated by Venky Shankar 10 months ago
Sorry for the delay - back to working on this.