Bug #41426
closedmds: wrongly signals directory is empty when dentry is damaged?
0%
Description
In this test:
/ceph/teuthology-archive/pdonnell-2019-08-24_04:19:23-fs-wip-pdonnell-testing-20190824.014616-distro-basic-smithi/4248277/teuthology.log
This command fails:
2019-08-24T19:19:45.765 INFO:teuthology.orchestra.run:Running command with timeout 900 2019-08-24T19:19:45.765 INFO:teuthology.orchestra.run.smithi183:Running: 2019-08-24T19:19:45.765 INFO:teuthology.orchestra.run.smithi183:> sudo chmod 1777 /home/ubuntu/cephtest/mnt.0 2019-08-24T19:19:45.888 INFO:teuthology.orchestra.run.smithi183:Running: 2019-08-24T19:19:45.888 INFO:teuthology.orchestra.run.smithi183:> sudo adjust-ulimits daemon-helper kill python -c ' 2019-08-24T19:19:45.888 INFO:teuthology.orchestra.run.smithi183:> import os 2019-08-24T19:19:45.888 INFO:teuthology.orchestra.run.smithi183:> import stat 2019-08-24T19:19:45.888 INFO:teuthology.orchestra.run.smithi183:> import json 2019-08-24T19:19:45.888 INFO:teuthology.orchestra.run.smithi183:> import sys 2019-08-24T19:19:45.888 INFO:teuthology.orchestra.run.smithi183:> 2019-08-24T19:19:45.889 INFO:teuthology.orchestra.run.smithi183:> try: 2019-08-24T19:19:45.889 INFO:teuthology.orchestra.run.smithi183:> s = os.stat("/home/ubuntu/cephtest/mnt.0/./subdir/sixmegs") 2019-08-24T19:19:45.889 INFO:teuthology.orchestra.run.smithi183:> except OSError as e: 2019-08-24T19:19:45.889 INFO:teuthology.orchestra.run.smithi183:> sys.exit(e.errno) 2019-08-24T19:19:45.889 INFO:teuthology.orchestra.run.smithi183:> 2019-08-24T19:19:45.889 INFO:teuthology.orchestra.run.smithi183:> attrs = ["st_mode", "st_ino", "st_dev", "st_nlink", "st_uid", "st_gid", "st_size", "st_atime", "st_mtime", "st_ctime"] 2019-08-24T19:19:45.889 INFO:teuthology.orchestra.run.smithi183:> print json.dumps( 2019-08-24T19:19:45.890 INFO:teuthology.orchestra.run.smithi183:> dict([(a, getattr(s, a)) for a in attrs]), 2019-08-24T19:19:45.890 INFO:teuthology.orchestra.run.smithi183:> indent=2) 2019-08-24T19:19:45.890 INFO:teuthology.orchestra.run.smithi183:> ' 2019-08-24T19:19:45.894 INFO:teuthology.orchestra.run:waiting for 20 2019-08-24T19:19:46.192 INFO:teuthology.orchestra.run.smithi183.stderr:daemon-helper: command failed with exit status 2 2019-08-24T19:19:51.996 DEBUG:teuthology.orchestra.run:got remote process result: 2 2019-08-24T19:19:51.996 INFO:tasks.cephfs.test_damage:Result: unexpected error Command failed on smithi183 with status 2: 'sudo adjust-ulimits daemon-helper kill python -c \'\nimport os\nimport stat\nimport json\nimport sys\n\ntry:\n s = os.stat("/home/ubuntu/cephtest/mnt.0/./subdir/sixmegs")\nexcept OSError as e:\n sys.exit(e.errno)\n\nattrs = ["st_mode", "st_ino", "st_dev", "st_nlink", "st_uid", "st_gid", "st_size", "st_atime", "st_mtime", "st_ctime"]\nprint json.dumps(\n dict([(a, getattr(s, a)) for a in attrs]),\n indent=2)\n\'' on client
The client wrongly returns an ENOENT response for sixmegs, believing that its cache of subdir is complete. In fact, the sixmegs dentry has been corrupted by the test so the client should be returning EIO. Here's the relevant bits from the client log:
2019-08-24T19:19:45.849+0000 7f984effd700 10 client.9217 send_request client_request(unknown.0:3 readdir #0x10000000000 2019-08-24T19:19:45.850110+0000 caller_uid=0, caller_gid=0{0,}) v4 to mds.0 2019-08-24T19:19:45.849+0000 7f984effd700 1 -- 172.21.15.183:0/1345946766 --> [v2:172.21.15.151:6834/4028287181,v1:172.21.15.151:6835/4028287181] -- client_request(unknown.0:3 readdir #0x10000000000 2019-08-24T19:19:45.850110+0000 caller_uid=0, caller_gid=0{0,}) v4 -- 0x7f98440242d0 con 0x557417db1740 2019-08-24T19:19:45.849+0000 7f984effd700 20 client.9217 awaiting reply|forward|kick on 0x7f984effb160 2019-08-24T19:19:45.851+0000 7f9868ff9700 1 -- 172.21.15.183:0/1345946766 <== mds.0 v2:172.21.15.151:6834/4028287181 5 ==== client_reply(???:3 = 0 (0) Success) v1 ==== 408+0+0 (crc 0 0 0) 0x7f9864005a20 con 0x557417db1740 2019-08-24T19:19:45.852+0000 7f9868ff9700 20 client.9217 handle_client_reply got a reply. Safe:1 tid 3 2019-08-24T19:19:45.852+0000 7f9868ff9700 10 client.9217 insert_trace from 2019-08-24T19:19:45.850149+0000 mds.0 is_target=1 is_dentry=0 2019-08-24T19:19:45.852+0000 7f9868ff9700 10 client.9217 features 0xffffffffffffffff 2019-08-24T19:19:45.852+0000 7f9868ff9700 10 client.9217 update_snap_trace len 48 2019-08-24T19:19:45.852+0000 7f9868ff9700 20 client.9217 get_snap_realm 0x1 0x7f9854008400 2 -> 3 2019-08-24T19:19:45.852+0000 7f9868ff9700 10 client.9217 update_snap_trace snaprealm(0x1 nref=3 c=0 seq=1 parent=0x0 my_snaps=[] cached_snapc=1=[]) seq 1 <= 1 and same parent, SKIPPING 2019-08-24T19:19:45.852+0000 7f9868ff9700 10 client.9217 hrm is_target=1 is_dentry=0 2019-08-24T19:19:45.852+0000 7f9868ff9700 12 client.9217 add_update_inode had 0x10000000000.head(faked_ino=0 ref=4 ll_ref=2 cap_refs={} open={} mode=40755 size=0/0 nlink=1 btime=2019-08-24T19:08:14.715964+0000 mtime=2019-08-24T19:08:14.963491+0000 ctime=2019-08-24T19:08:14.963491+0000 caps=pAsLsXsFs(0=pAsLsXsFs) parents=0x1.head["subdir"] 0x7f9854008c80) caps pAsLsXsFs 2019-08-24T19:19:45.852+0000 7f9868ff9700 20 client.9217 dir hash is 2 2019-08-24T19:19:45.852+0000 7f9868ff9700 10 client.9217 add_update_cap issued pAsLsXsFs -> pAsLsXsFs from mds.0 on 0x10000000000.head(faked_ino=0 ref=4 ll_ref=2 cap_refs={} open={} mode=40755 size=0/0 nlink=1 btime=2019-08-24T19:08:14.715964+0000 mtime=2019-08-24T19:08:14.963491+0000 ctime=2019-08-24T19:08:14.963491+0000 caps=pAsLsXsFs(0=pAsLsXsFs) parents=0x1.head["subdir"] 0x7f9854008c80) 2019-08-24T19:19:45.852+0000 7f9868ff9700 15 open_dir 0x7f9854009400 on 0x7f9854008c80 2019-08-24T19:19:45.852+0000 7f9868ff9700 15 inode.get on 0x7f9854008c80 0x10000000000.head now 5 2019-08-24T19:19:45.852+0000 7f9868ff9700 10 client.9217 insert_readdir_results 0 readdir items, end=1, hash_order=1, readdir_start , last_hash 0, next_offset 2 2019-08-24T19:19:45.852+0000 7f9868ff9700 10 client.9217 _readdir_drop_dirp_buffer 0x7f9840022f70 2019-08-24T19:19:45.852+0000 7f9868ff9700 15 client.9217 close_dir dir 0x7f9854009400 on 0x7f9854008c80 2019-08-24T19:19:45.852+0000 7f9868ff9700 10 client.9217 put_inode on 0x10000000000.head(faked_ino=0 ref=5 ll_ref=2 cap_refs={} open={} mode=40755 size=0/0 nlink=1 btime=2019-08-24T19:08:14.715964+0000 mtime=2019-08-24T19:08:14.963491+0000 ctime=2019-08-24T19:08:14.963491+0000 caps=pAsLsXsFs(0=pAsLsXsFs) parents=0x1.head["subdir"] 0x7f9854008c80) 2019-08-24T19:19:45.852+0000 7f9868ff9700 15 inode.put on 0x7f9854008c80 0x10000000000.head now 4 2019-08-24T19:19:45.852+0000 7f9868ff9700 20 client.9217 put_snap_realm 0x1 0x7f9854008400 3 -> 2 2019-08-24T19:19:45.852+0000 7f9868ff9700 15 inode.get on 0x7f9854008c80 0x10000000000.head now 5 2019-08-24T19:19:45.852+0000 7f9868ff9700 20 client.9217 handle_client_reply signalling caller 0x7f984effb160 2019-08-24T19:19:45.852+0000 7f9868ff9700 20 client.9217 handle_client_reply awaiting kickback on tid 3 0x7f9868ff74f0 2019-08-24T19:19:45.852+0000 7f984effd700 20 client.9217 sendrecv kickback on tid 3 0x7f9868ff74f0 2019-08-24T19:19:45.852+0000 7f984effd700 20 client.9217 lat 0.002458 2019-08-24T19:19:45.852+0000 7f984effd700 10 client.9217 _readdir_get_frag 0x7f9840022f70 got frag * size 0 2019-08-24T19:19:45.852+0000 7f984effd700 10 client.9217 frag * buffer size 0 offset 2 2019-08-24T19:19:45.852+0000 7f984effd700 10 client.9217 marking (I_COMPLETE|I_DIR_ORDERED) on 0x10000000000.head(faked_ino=0 ref=5 ll_ref=2 cap_refs={} open={} mode=40755 size=0/0 nlink=1 btime=2019-08-24T19:08:14.715964+0000 mtime=2019-08-24T19:08:14.963491+0000 ctime=2019-08-24T19:08:14.963491+0000 caps=pAsLsXsFs(0=pAsLsXsFs) parents=0x1.head["subdir"] 0x7f9854008c80) 2019-08-24T19:19:45.852+0000 7f9868ff9700 10 client.9217 put_inode on 0x10000000000.head(faked_ino=0 ref=5 ll_ref=2 cap_refs={} open={} mode=40755 size=0/0 nlink=1 btime=2019-08-24T19:08:14.715964+0000 mtime=2019-08-24T19:08:14.963491+0000 ctime=2019-08-24T19:08:14.963491+0000 caps=pAsLsXsFs(0=pAsLsXsFs) COMPLETE parents=0x1.head["subdir"] 0x7f9854008c80) 2019-08-24T19:19:45.852+0000 7f9868ff9700 15 inode.put on 0x7f9854008c80 0x10000000000.head now 4 2019-08-24T19:19:45.852+0000 7f9868ff9700 10 client.9217 put_inode on 0x10000000000.head(faked_ino=0 ref=4 ll_ref=2 cap_refs={} open={} mode=40755 size=0/0 nlink=1 btime=2019-08-24T19:08:14.715964+0000 mtime=2019-08-24T19:08:14.963491+0000 ctime=2019-08-24T19:08:14.963491+0000 caps=pAsLsXsFs(0=pAsLsXsFs) COMPLETE parents=0x1.head["subdir"] 0x7f9854008c80) 2019-08-24T19:19:45.852+0000 7f9868ff9700 15 inode.put on 0x7f9854008c80 0x10000000000.head now 3 2019-08-24T19:19:45.852+0000 7f984f7fe700 3 client.9217 seekdir(0x7f9840022f70, 2) 2019-08-24T19:19:45.852+0000 7f984f7fe700 10 client.9217 readdir_r_cb 0x10000000000.head(faked_ino=0 ref=3 ll_ref=2 cap_refs={} open={} mode=40755 size=0/0 nlink=1 btime=2019-08-24T19:08:14.715964+0000 mtime=2019-08-24T19:08:14.963491+0000 ctime=2019-08-24T19:08:14.963491+0000 caps=pAsLsXsFs(0=pAsLsXsFs) COMPLETE parents=0x1.head["subdir"] 0x7f9854008c80) offset 2 at_end=0 hash_order=0 2019-08-24T19:19:45.852+0000 7f984f7fe700 10 client.9217 offset 2 snapid head (complete && ordered) 1 issued pAsLsXsFs 2019-08-24T19:19:45.852+0000 7f984f7fe700 10 client.9217 _readdir_cache_cb 0x7f9840022f70 on 0x10000000000 last_name offset 2 2019-08-24T19:19:45.852+0000 7f984f7fe700 10 client.9217 dir is empty
From: /ceph/teuthology-archive/pdonnell-2019-08-24_04:19:23-fs-wip-pdonnell-testing-20190824.014616-distro-basic-smithi/4248277/remote/smithi183/log/ceph-client.0.102180.log.gz
And the MDS:
2019-08-24T19:19:45.837+0000 7fad38523700 1 -- [v2:172.21.15.151:6834/4028287181,v1:172.21.15.151:6835/4028287181] --> [v2:172.21.15.183:6809/13455,v1:172.21.15.183:6811/13455] -- osd_op(unknown.0.229:37 6.0 6:0cf6ad0f:::10000000000.00000000:head [omap-get-header,omap-get-vals,getxattr parent] snapc 0=[] ondisk+read+known_if_redirected+full_force e128) v8 -- 0x55b6dc006dc0 con 0x55b6dc034800 2019-08-24T19:19:45.838+0000 7fad3a527700 1 -- [v2:172.21.15.151:6834/4028287181,v1:172.21.15.151:6835/4028287181] <== osd.0 v2:172.21.15.183:6809/13455 5 ==== osd_op_reply(37 10000000000.00000000 [omap-get-header,omap-get-vals,getxattr (62)] v0'0 uv970 ondisk = 0) v8 ==== 248+0+441 (crc 0 0 0) 0x55b6dc07a500 con 0x55b6dc034800 2019-08-24T19:19:45.838+0000 7fad31d16700 10 MDSIOContextBase::complete: 21C_IO_Dir_OMAP_Fetched 2019-08-24T19:19:45.838+0000 7fad31d16700 10 MDSContext::complete: 21C_IO_Dir_OMAP_Fetched 2019-08-24T19:19:45.838+0000 7fad31d16700 10 mds.0.cache.ino(0x10000000000) verify_diri_backtrace 2019-08-24T19:19:45.838+0000 7fad31d16700 10 mds.0.cache.dir(0x10000000000) _fetched header 274 bytes 1 keys for [dir 0x10000000000 /subdir/ [2,head] auth v=0 cv=0/0 ap=1+0 state=1073741888|fetching f() n() hs=0+0,ss=0+0 | waiter=1 authpin=1 0x55b6dc069900] 2019-08-24T19:19:45.838+0000 7fad31d16700 10 mds.0.cache.dir(0x10000000000) _fetched version 9 2019-08-24T19:19:45.838+0000 7fad31d16700 10 mds.0.cache.snaprealm(0x1 seq 1 0x55b6dbffaa00) have_past_parents_open [1,head] 2019-08-24T19:19:45.838+0000 7fad31d16700 10 mds.0.cache.snaprealm(0x1 seq 1 0x55b6dbffaa00) have_past_parents_open [1,head] 2019-08-24T19:19:45.838+0000 7fad31d16700 10 mds.0.cache.snaprealm(0x1 seq 1 0x55b6dbffaa00) have_past_parents_open [1,head] 2019-08-24T19:19:45.838+0000 7fad31d16700 10 mds.0.cache.snaprealm(0x1 seq 1 0x55b6dbffaa00) get_snaps (seq 1 cached_seq 1) 2019-08-24T19:19:45.838+0000 7fad31d16700 10 mds.0.cache.snaprealm(0x1 seq 1 0x55b6dbffaa00) have_past_parents_open [1,head] 2019-08-24T19:19:45.838+0000 7fad31d16700 10 mds.0.cache.dir(0x10000000000) snap_purged_thru 0 < 1, snap purge based on 2019-08-24T19:19:45.838+0000 7fad31d16700 10 mds.0.cache.snaprealm(0x1 seq 1 0x55b6dbffaa00) have_past_parents_open [1,head] 2019-08-24T19:19:45.838+0000 7fad31d16700 20 mds.0.cache.dir(0x10000000000) _fetched pos 0 marker 'd' dname 'sixmegs [6665656264616564,head] 2019-08-24T19:19:45.838+0000 7fad31d16700 20 mds.0.cache.dir(0x10000000000) lookup (head, 'sixmegs') 2019-08-24T19:19:45.838+0000 7fad31d16700 0 log_channel(cluster) log [WRN] : Corrupt dentry 'sixmegs' in dir frag 0x10000000000: buffer::malformed_input: Invalid tag char 'd' pos 0(/subdir) 2019-08-24T19:19:45.838+0000 7fad31d16700 10 mds.0.cache.dir(0x10000000000) go_bad_dentry sixmegs 2019-08-24T19:19:45.838+0000 7fad31d16700 10 mds.0.cache.dir(0x10000000000) pre_dirty 10 2019-08-24T19:19:45.838+0000 7fad31d16700 10 mds.0.cache.dir(0x10000000000) _mark_dirty (was clean) [dir 0x10000000000 /subdir/ [2,head] auth v=10 cv=9/9 ap=1+0 state=1073741825|complete f(v0 m2019-08-24T19:08:14.963491+0000 1=1+0) n(v0 rc2019-08-24T19:08:15.150073+0000 b6291456 1=1+0) hs=0+0,ss=0+0 | waiter=1 authpin=1 0x55b6dc069900] version 10 2019-08-24T19:19:45.838+0000 7fad31d16700 10 mds.0.cache.dir(0x10000000000) auth_unpin by 0x55b6dc069900 on [dir 0x10000000000 /subdir/ [2,head] auth v=10 cv=9/9 state=1610612737|complete f(v0 m2019-08-24T19:08:14.963491+0000 1=1+0) n(v0 rc2019-08-24T19:08:15.150073+0000 b6291456 1=1+0) hs=0+0,ss=0+0 | dirty=1 waiter=1 authpin=0 0x55b6dc069900] count now 0 2019-08-24T19:19:45.838+0000 7fad31d16700 11 mds.0.cache.dir(0x10000000000) finish_waiting mask 2 result 0 on [dir 0x10000000000 /subdir/ [2,head] auth v=10 cv=9/9 state=1610612737|complete f(v0 m2019-08-24T19:08:14.963491+0000 1=1+0) n(v0 rc2019-08-24T19:08:15.150073+0000 b6291456 1=1+0) hs=0+0,ss=0+0 | dirty=1 waiter=1 authpin=0 0x55b6dc069900] 2019-08-24T19:19:45.838+0000 7fad33519700 7 mds.0.229 mds has 1 queued contexts 2019-08-24T19:19:45.838+0000 7fad33519700 10 mds.0.229 finish 0x55b6dc08e280 2019-08-24T19:19:45.838+0000 7fad33519700 10 MDSContext::complete: 18C_MDS_RetryRequest 2019-08-24T19:19:45.838+0000 7fad33519700 7 mds.0.server dispatch_client_request client_request(client.9217:3 readdir #0x10000000000 2019-08-24T19:19:45.850110+0000 caller_uid=0, caller_gid=0{0,}) v4
From: /ceph/teuthology-archive/pdonnell-2019-08-24_04:19:23-fs-wip-pdonnell-testing-20190824.014616-distro-basic-smithi/4248277/remote/smithi151/log/ceph-mds.b.log.gz
This PR in the integration branch may be related but I don't see how it caused this: https://github.com/ceph/ceph/pull/29526
Updated by Zheng Yan over 4 years ago
The readdir is unexpected and log does not include it. Client should just issue an lookup.
I think environment noise.
Updated by Zheng Yan over 4 years ago
- Status changed from New to Can't reproduce