Project

General

Profile

Actions

Bug #41426

closed

mds: wrongly signals directory is empty when dentry is damaged?

Added by Patrick Donnelly over 4 years ago. Updated over 4 years ago.

Status:
Can't reproduce
Priority:
High
Assignee:
Category:
-
Target version:
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Client, MDS
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

In this test:

/ceph/teuthology-archive/pdonnell-2019-08-24_04:19:23-fs-wip-pdonnell-testing-20190824.014616-distro-basic-smithi/4248277/teuthology.log

This command fails:

2019-08-24T19:19:45.765 INFO:teuthology.orchestra.run:Running command with timeout 900
2019-08-24T19:19:45.765 INFO:teuthology.orchestra.run.smithi183:Running:
2019-08-24T19:19:45.765 INFO:teuthology.orchestra.run.smithi183:> sudo chmod 1777 /home/ubuntu/cephtest/mnt.0
2019-08-24T19:19:45.888 INFO:teuthology.orchestra.run.smithi183:Running:
2019-08-24T19:19:45.888 INFO:teuthology.orchestra.run.smithi183:> sudo adjust-ulimits daemon-helper kill python -c '
2019-08-24T19:19:45.888 INFO:teuthology.orchestra.run.smithi183:> import os
2019-08-24T19:19:45.888 INFO:teuthology.orchestra.run.smithi183:> import stat
2019-08-24T19:19:45.888 INFO:teuthology.orchestra.run.smithi183:> import json
2019-08-24T19:19:45.888 INFO:teuthology.orchestra.run.smithi183:> import sys
2019-08-24T19:19:45.888 INFO:teuthology.orchestra.run.smithi183:>
2019-08-24T19:19:45.889 INFO:teuthology.orchestra.run.smithi183:> try:
2019-08-24T19:19:45.889 INFO:teuthology.orchestra.run.smithi183:>     s = os.stat("/home/ubuntu/cephtest/mnt.0/./subdir/sixmegs")
2019-08-24T19:19:45.889 INFO:teuthology.orchestra.run.smithi183:> except OSError as e:
2019-08-24T19:19:45.889 INFO:teuthology.orchestra.run.smithi183:>     sys.exit(e.errno)
2019-08-24T19:19:45.889 INFO:teuthology.orchestra.run.smithi183:>
2019-08-24T19:19:45.889 INFO:teuthology.orchestra.run.smithi183:> attrs = ["st_mode", "st_ino", "st_dev", "st_nlink", "st_uid", "st_gid", "st_size", "st_atime", "st_mtime", "st_ctime"]
2019-08-24T19:19:45.889 INFO:teuthology.orchestra.run.smithi183:> print json.dumps(
2019-08-24T19:19:45.890 INFO:teuthology.orchestra.run.smithi183:>     dict([(a, getattr(s, a)) for a in attrs]),
2019-08-24T19:19:45.890 INFO:teuthology.orchestra.run.smithi183:>     indent=2)
2019-08-24T19:19:45.890 INFO:teuthology.orchestra.run.smithi183:> '
2019-08-24T19:19:45.894 INFO:teuthology.orchestra.run:waiting for 20
2019-08-24T19:19:46.192 INFO:teuthology.orchestra.run.smithi183.stderr:daemon-helper: command failed with exit status 2
2019-08-24T19:19:51.996 DEBUG:teuthology.orchestra.run:got remote process result: 2
2019-08-24T19:19:51.996 INFO:tasks.cephfs.test_damage:Result: unexpected error Command failed on smithi183 with status 2: 'sudo adjust-ulimits daemon-helper kill python -c \'\nimport os\nimport stat\nimport json\nimport sys\n\ntry:\n    s = os.stat("/home/ubuntu/cephtest/mnt.0/./subdir/sixmegs")\nexcept OSError as e:\n    sys.exit(e.errno)\n\nattrs = ["st_mode", "st_ino", "st_dev", "st_nlink", "st_uid", "st_gid", "st_size", "st_atime", "st_mtime", "st_ctime"]\nprint json.dumps(\n    dict([(a, getattr(s, a)) for a in attrs]),\n    indent=2)\n\'' on client

The client wrongly returns an ENOENT response for sixmegs, believing that its cache of subdir is complete. In fact, the sixmegs dentry has been corrupted by the test so the client should be returning EIO. Here's the relevant bits from the client log:

2019-08-24T19:19:45.849+0000 7f984effd700 10 client.9217 send_request client_request(unknown.0:3 readdir #0x10000000000 2019-08-24T19:19:45.850110+0000 caller_uid=0, caller_gid=0{0,}) v4 to mds.0
2019-08-24T19:19:45.849+0000 7f984effd700  1 -- 172.21.15.183:0/1345946766 --> [v2:172.21.15.151:6834/4028287181,v1:172.21.15.151:6835/4028287181] -- client_request(unknown.0:3 readdir #0x10000000000 2019-08-24T19:19:45.850110+0000 caller_uid=0, caller_gid=0{0,}) v4 -- 0x7f98440242d0 con 0x557417db1740
2019-08-24T19:19:45.849+0000 7f984effd700 20 client.9217 awaiting reply|forward|kick on 0x7f984effb160
2019-08-24T19:19:45.851+0000 7f9868ff9700  1 -- 172.21.15.183:0/1345946766 <== mds.0 v2:172.21.15.151:6834/4028287181 5 ==== client_reply(???:3 = 0 (0) Success) v1 ==== 408+0+0 (crc 0 0 0) 0x7f9864005a20 con 0x557417db1740
2019-08-24T19:19:45.852+0000 7f9868ff9700 20 client.9217 handle_client_reply got a reply. Safe:1 tid 3
2019-08-24T19:19:45.852+0000 7f9868ff9700 10 client.9217 insert_trace from 2019-08-24T19:19:45.850149+0000 mds.0 is_target=1 is_dentry=0
2019-08-24T19:19:45.852+0000 7f9868ff9700 10 client.9217  features 0xffffffffffffffff
2019-08-24T19:19:45.852+0000 7f9868ff9700 10 client.9217 update_snap_trace len 48
2019-08-24T19:19:45.852+0000 7f9868ff9700 20 client.9217 get_snap_realm 0x1 0x7f9854008400 2 -> 3
2019-08-24T19:19:45.852+0000 7f9868ff9700 10 client.9217 update_snap_trace snaprealm(0x1 nref=3 c=0 seq=1 parent=0x0 my_snaps=[] cached_snapc=1=[]) seq 1 <= 1 and same parent, SKIPPING
2019-08-24T19:19:45.852+0000 7f9868ff9700 10 client.9217  hrm  is_target=1 is_dentry=0
2019-08-24T19:19:45.852+0000 7f9868ff9700 12 client.9217 add_update_inode had 0x10000000000.head(faked_ino=0 ref=4 ll_ref=2 cap_refs={} open={} mode=40755 size=0/0 nlink=1 btime=2019-08-24T19:08:14.715964+0000 mtime=2019-08-24T19:08:14.963491+0000 ctime=2019-08-24T19:08:14.963491+0000 caps=pAsLsXsFs(0=pAsLsXsFs) parents=0x1.head["subdir"] 0x7f9854008c80) caps pAsLsXsFs
2019-08-24T19:19:45.852+0000 7f9868ff9700 20 client.9217  dir hash is 2
2019-08-24T19:19:45.852+0000 7f9868ff9700 10 client.9217 add_update_cap issued pAsLsXsFs -> pAsLsXsFs from mds.0 on 0x10000000000.head(faked_ino=0 ref=4 ll_ref=2 cap_refs={} open={} mode=40755 size=0/0 nlink=1 btime=2019-08-24T19:08:14.715964+0000 mtime=2019-08-24T19:08:14.963491+0000 ctime=2019-08-24T19:08:14.963491+0000 caps=pAsLsXsFs(0=pAsLsXsFs) parents=0x1.head["subdir"] 0x7f9854008c80)
2019-08-24T19:19:45.852+0000 7f9868ff9700 15 open_dir 0x7f9854009400 on 0x7f9854008c80
2019-08-24T19:19:45.852+0000 7f9868ff9700 15 inode.get on 0x7f9854008c80 0x10000000000.head now 5
2019-08-24T19:19:45.852+0000 7f9868ff9700 10 client.9217 insert_readdir_results 0 readdir items, end=1, hash_order=1, readdir_start , last_hash 0, next_offset 2
2019-08-24T19:19:45.852+0000 7f9868ff9700 10 client.9217 _readdir_drop_dirp_buffer 0x7f9840022f70
2019-08-24T19:19:45.852+0000 7f9868ff9700 15 client.9217 close_dir dir 0x7f9854009400 on 0x7f9854008c80
2019-08-24T19:19:45.852+0000 7f9868ff9700 10 client.9217 put_inode on 0x10000000000.head(faked_ino=0 ref=5 ll_ref=2 cap_refs={} open={} mode=40755 size=0/0 nlink=1 btime=2019-08-24T19:08:14.715964+0000 mtime=2019-08-24T19:08:14.963491+0000 ctime=2019-08-24T19:08:14.963491+0000 caps=pAsLsXsFs(0=pAsLsXsFs) parents=0x1.head["subdir"] 0x7f9854008c80)
2019-08-24T19:19:45.852+0000 7f9868ff9700 15 inode.put on 0x7f9854008c80 0x10000000000.head now 4
2019-08-24T19:19:45.852+0000 7f9868ff9700 20 client.9217 put_snap_realm 0x1 0x7f9854008400 3 -> 2
2019-08-24T19:19:45.852+0000 7f9868ff9700 15 inode.get on 0x7f9854008c80 0x10000000000.head now 5
2019-08-24T19:19:45.852+0000 7f9868ff9700 20 client.9217 handle_client_reply signalling caller 0x7f984effb160
2019-08-24T19:19:45.852+0000 7f9868ff9700 20 client.9217 handle_client_reply awaiting kickback on tid 3 0x7f9868ff74f0
2019-08-24T19:19:45.852+0000 7f984effd700 20 client.9217 sendrecv kickback on tid 3 0x7f9868ff74f0
2019-08-24T19:19:45.852+0000 7f984effd700 20 client.9217 lat 0.002458
2019-08-24T19:19:45.852+0000 7f984effd700 10 client.9217 _readdir_get_frag 0x7f9840022f70 got frag * size 0
2019-08-24T19:19:45.852+0000 7f984effd700 10 client.9217 frag * buffer size 0 offset 2
2019-08-24T19:19:45.852+0000 7f984effd700 10 client.9217  marking (I_COMPLETE|I_DIR_ORDERED) on 0x10000000000.head(faked_ino=0 ref=5 ll_ref=2 cap_refs={} open={} mode=40755 size=0/0 nlink=1 btime=2019-08-24T19:08:14.715964+0000 mtime=2019-08-24T19:08:14.963491+0000 ctime=2019-08-24T19:08:14.963491+0000 caps=pAsLsXsFs(0=pAsLsXsFs) parents=0x1.head["subdir"] 0x7f9854008c80)
2019-08-24T19:19:45.852+0000 7f9868ff9700 10 client.9217 put_inode on 0x10000000000.head(faked_ino=0 ref=5 ll_ref=2 cap_refs={} open={} mode=40755 size=0/0 nlink=1 btime=2019-08-24T19:08:14.715964+0000 mtime=2019-08-24T19:08:14.963491+0000 ctime=2019-08-24T19:08:14.963491+0000 caps=pAsLsXsFs(0=pAsLsXsFs) COMPLETE parents=0x1.head["subdir"] 0x7f9854008c80)
2019-08-24T19:19:45.852+0000 7f9868ff9700 15 inode.put on 0x7f9854008c80 0x10000000000.head now 4
2019-08-24T19:19:45.852+0000 7f9868ff9700 10 client.9217 put_inode on 0x10000000000.head(faked_ino=0 ref=4 ll_ref=2 cap_refs={} open={} mode=40755 size=0/0 nlink=1 btime=2019-08-24T19:08:14.715964+0000 mtime=2019-08-24T19:08:14.963491+0000 ctime=2019-08-24T19:08:14.963491+0000 caps=pAsLsXsFs(0=pAsLsXsFs) COMPLETE parents=0x1.head["subdir"] 0x7f9854008c80)
2019-08-24T19:19:45.852+0000 7f9868ff9700 15 inode.put on 0x7f9854008c80 0x10000000000.head now 3
2019-08-24T19:19:45.852+0000 7f984f7fe700  3 client.9217 seekdir(0x7f9840022f70, 2)
2019-08-24T19:19:45.852+0000 7f984f7fe700 10 client.9217 readdir_r_cb 0x10000000000.head(faked_ino=0 ref=3 ll_ref=2 cap_refs={} open={} mode=40755 size=0/0 nlink=1 btime=2019-08-24T19:08:14.715964+0000 mtime=2019-08-24T19:08:14.963491+0000 ctime=2019-08-24T19:08:14.963491+0000 caps=pAsLsXsFs(0=pAsLsXsFs) COMPLETE parents=0x1.head["subdir"] 0x7f9854008c80) offset 2 at_end=0 hash_order=0
2019-08-24T19:19:45.852+0000 7f984f7fe700 10 client.9217 offset 2 snapid head (complete && ordered) 1 issued pAsLsXsFs
2019-08-24T19:19:45.852+0000 7f984f7fe700 10 client.9217 _readdir_cache_cb 0x7f9840022f70 on 0x10000000000 last_name  offset 2
2019-08-24T19:19:45.852+0000 7f984f7fe700 10 client.9217  dir is empty

From: /ceph/teuthology-archive/pdonnell-2019-08-24_04:19:23-fs-wip-pdonnell-testing-20190824.014616-distro-basic-smithi/4248277/remote/smithi183/log/ceph-client.0.102180.log.gz

And the MDS:

2019-08-24T19:19:45.837+0000 7fad38523700  1 -- [v2:172.21.15.151:6834/4028287181,v1:172.21.15.151:6835/4028287181] --> [v2:172.21.15.183:6809/13455,v1:172.21.15.183:6811/13455] -- osd_op(unknown.0.229:37 6.0 6:0cf6ad0f:::10000000000.00000000:head [omap-get-header,omap-get-vals,getxattr parent] snapc 0=[] ondisk+read+known_if_redirected+full_force e128) v8 -- 0x55b6dc006dc0 con 0x55b6dc034800
2019-08-24T19:19:45.838+0000 7fad3a527700  1 -- [v2:172.21.15.151:6834/4028287181,v1:172.21.15.151:6835/4028287181] <== osd.0 v2:172.21.15.183:6809/13455 5 ==== osd_op_reply(37 10000000000.00000000 [omap-get-header,omap-get-vals,getxattr (62)] v0'0 uv970 ondisk = 0) v8 ==== 248+0+441 (crc 0 0 0) 0x55b6dc07a500 con 0x55b6dc034800
2019-08-24T19:19:45.838+0000 7fad31d16700 10 MDSIOContextBase::complete: 21C_IO_Dir_OMAP_Fetched
2019-08-24T19:19:45.838+0000 7fad31d16700 10 MDSContext::complete: 21C_IO_Dir_OMAP_Fetched
2019-08-24T19:19:45.838+0000 7fad31d16700 10 mds.0.cache.ino(0x10000000000) verify_diri_backtrace
2019-08-24T19:19:45.838+0000 7fad31d16700 10 mds.0.cache.dir(0x10000000000) _fetched header 274 bytes 1 keys for [dir 0x10000000000 /subdir/ [2,head] auth v=0 cv=0/0 ap=1+0 state=1073741888|fetching f() n() hs=0+0,ss=0+0 | waiter=1 authpin=1 0x55b6dc069900]
2019-08-24T19:19:45.838+0000 7fad31d16700 10 mds.0.cache.dir(0x10000000000) _fetched version 9
2019-08-24T19:19:45.838+0000 7fad31d16700 10  mds.0.cache.snaprealm(0x1 seq 1 0x55b6dbffaa00) have_past_parents_open [1,head]
2019-08-24T19:19:45.838+0000 7fad31d16700 10  mds.0.cache.snaprealm(0x1 seq 1 0x55b6dbffaa00) have_past_parents_open [1,head]
2019-08-24T19:19:45.838+0000 7fad31d16700 10  mds.0.cache.snaprealm(0x1 seq 1 0x55b6dbffaa00) have_past_parents_open [1,head]
2019-08-24T19:19:45.838+0000 7fad31d16700 10  mds.0.cache.snaprealm(0x1 seq 1 0x55b6dbffaa00) get_snaps  (seq 1 cached_seq 1)
2019-08-24T19:19:45.838+0000 7fad31d16700 10  mds.0.cache.snaprealm(0x1 seq 1 0x55b6dbffaa00) have_past_parents_open [1,head]
2019-08-24T19:19:45.838+0000 7fad31d16700 10 mds.0.cache.dir(0x10000000000)  snap_purged_thru 0 < 1, snap purge based on
2019-08-24T19:19:45.838+0000 7fad31d16700 10  mds.0.cache.snaprealm(0x1 seq 1 0x55b6dbffaa00) have_past_parents_open [1,head]
2019-08-24T19:19:45.838+0000 7fad31d16700 20 mds.0.cache.dir(0x10000000000) _fetched pos 0 marker 'd' dname 'sixmegs [6665656264616564,head]
2019-08-24T19:19:45.838+0000 7fad31d16700 20 mds.0.cache.dir(0x10000000000) lookup (head, 'sixmegs')
2019-08-24T19:19:45.838+0000 7fad31d16700  0 log_channel(cluster) log [WRN] : Corrupt dentry 'sixmegs' in dir frag 0x10000000000: buffer::malformed_input: Invalid tag char 'd' pos 0(/subdir)
2019-08-24T19:19:45.838+0000 7fad31d16700 10 mds.0.cache.dir(0x10000000000) go_bad_dentry sixmegs
2019-08-24T19:19:45.838+0000 7fad31d16700 10 mds.0.cache.dir(0x10000000000) pre_dirty 10
2019-08-24T19:19:45.838+0000 7fad31d16700 10 mds.0.cache.dir(0x10000000000) _mark_dirty (was clean) [dir 0x10000000000 /subdir/ [2,head] auth v=10 cv=9/9 ap=1+0 state=1073741825|complete f(v0 m2019-08-24T19:08:14.963491+0000 1=1+0) n(v0 rc2019-08-24T19:08:15.150073+0000 b6291456 1=1+0) hs=0+0,ss=0+0 | waiter=1 authpin=1 0x55b6dc069900] version 10
2019-08-24T19:19:45.838+0000 7fad31d16700 10 mds.0.cache.dir(0x10000000000) auth_unpin by 0x55b6dc069900 on [dir 0x10000000000 /subdir/ [2,head] auth v=10 cv=9/9 state=1610612737|complete f(v0 m2019-08-24T19:08:14.963491+0000 1=1+0) n(v0 rc2019-08-24T19:08:15.150073+0000 b6291456 1=1+0) hs=0+0,ss=0+0 | dirty=1 waiter=1 authpin=0 0x55b6dc069900] count now 0
2019-08-24T19:19:45.838+0000 7fad31d16700 11 mds.0.cache.dir(0x10000000000) finish_waiting mask 2 result 0 on [dir 0x10000000000 /subdir/ [2,head] auth v=10 cv=9/9 state=1610612737|complete f(v0 m2019-08-24T19:08:14.963491+0000 1=1+0) n(v0 rc2019-08-24T19:08:15.150073+0000 b6291456 1=1+0) hs=0+0,ss=0+0 | dirty=1 waiter=1 authpin=0 0x55b6dc069900]
2019-08-24T19:19:45.838+0000 7fad33519700  7 mds.0.229 mds has 1 queued contexts
2019-08-24T19:19:45.838+0000 7fad33519700 10 mds.0.229  finish 0x55b6dc08e280
2019-08-24T19:19:45.838+0000 7fad33519700 10 MDSContext::complete: 18C_MDS_RetryRequest
2019-08-24T19:19:45.838+0000 7fad33519700  7 mds.0.server dispatch_client_request client_request(client.9217:3 readdir #0x10000000000 2019-08-24T19:19:45.850110+0000 caller_uid=0, caller_gid=0{0,}) v4

From: /ceph/teuthology-archive/pdonnell-2019-08-24_04:19:23-fs-wip-pdonnell-testing-20190824.014616-distro-basic-smithi/4248277/remote/smithi151/log/ceph-mds.b.log.gz

This PR in the integration branch may be related but I don't see how it caused this: https://github.com/ceph/ceph/pull/29526

Actions #1

Updated by Patrick Donnelly over 4 years ago

  • Assignee set to Zheng Yan
Actions #2

Updated by Zheng Yan over 4 years ago

The readdir is unexpected and log does not include it. Client should just issue an lookup.

See https://github.com/ceph/ceph-ci/blob/04411619ab7e505bb9abc4576655df8d19968bd7/qa/tasks/cephfs/test_damage.py#L313

I think environment noise.

Actions #3

Updated by Zheng Yan over 4 years ago

  • Status changed from New to Can't reproduce
Actions

Also available in: Atom PDF