https://tracker.ceph.com/https://tracker.ceph.com/favicon.ico2022-01-27T17:42:04ZCeph CephFS - Bug #54046: unaccessible dentries after fsstress run with namespace-restricted capshttps://tracker.ceph.com/issues/54046?journal_id=2093382022-01-27T17:42:04ZJeff Laytonjlayton@redhat.com
<ul><li><strong>Description</strong> updated (<a title="View differences" href="/journals/209338/diff?detail_id=220327">diff</a>)</li></ul> CephFS - Bug #54046: unaccessible dentries after fsstress run with namespace-restricted capshttps://tracker.ceph.com/issues/54046?journal_id=2093392022-01-27T17:48:07ZJeff Laytonjlayton@redhat.com
<ul></ul><p>The MDS I'm testing against has the latest fscrypt patchset based on top of this commit from ceph mainline:</p>
<pre><code>9e8b2d8282e9 Merge pull request #44510 from rzarzynski/wip-cephadm-docfix</code></pre>
<p>I doubt the fscrypt changes are the issue, but I can try testing a different cluster version if need be. I can collect debug logs, but there are 3 MDSs for this fs and fsstress generates a heavy workload. I might need some help targeting what you need.</p> CephFS - Bug #54046: unaccessible dentries after fsstress run with namespace-restricted capshttps://tracker.ceph.com/issues/54046?journal_id=2093532022-01-27T21:14:36ZGreg Farnumgfarnum@redhat.com
<ul><li><strong>Category</strong> set to <i>Security Model</i></li><li><strong>Component(FS)</strong> <i>MDS</i> added</li></ul><p>This issue should be pretty apparent from MDS logs where we can trace out the incoming client requests.</p>
<p>Jeff, I think you said in irc that this is run against a multimds setup? That may turn out to be important since I bet it's an issue with the path restriction and subtrees.</p> CephFS - Bug #54046: unaccessible dentries after fsstress run with namespace-restricted capshttps://tracker.ceph.com/issues/54046?journal_id=2093552022-01-27T21:53:37ZJeff Laytonjlayton@redhat.com
<ul></ul><p>Yes. I have max_mds set to 3, and I have 4 per fs (with 1 standby).</p>
<p>The good news is that this seems to be pretty easy to reproduce so far.</p> CephFS - Bug #54046: unaccessible dentries after fsstress run with namespace-restricted capshttps://tracker.ceph.com/issues/54046?journal_id=2093622022-01-28T13:28:52ZDan van der Ster
<ul></ul><p>FWIW none of our users have noticed this -- all have path/namespace restricted caps. (octopus multi-mds, pacific single mds; and centos + fedora atomic clients)</p> CephFS - Bug #54046: unaccessible dentries after fsstress run with namespace-restricted capshttps://tracker.ceph.com/issues/54046?journal_id=2094652022-01-31T13:41:54ZGreg Farnumgfarnum@redhat.com
<ul></ul><p>Jeff Layton wrote:</p>
<blockquote>
<p>Yes. I have max_mds set to 3, and I have 4 per fs (with 1 standby).</p>
<p>The good news is that this seems to be pretty easy to reproduce so far.</p>
</blockquote>
<p>Can you do so with "debug mds = 20" set and then post the logs, along with whatever failures you're getting on the client side so we know what files and requests to trace? :)</p> CephFS - Bug #54046: unaccessible dentries after fsstress run with namespace-restricted capshttps://tracker.ceph.com/issues/54046?journal_id=2094722022-01-31T14:16:32ZJeff Laytonjlayton@redhat.com
<ul></ul><pre>
$ cat /home/jlayton/git/xfstests-dev/results//generic/070.out.bad
QA output created by 070
rm: cannot remove '/mnt/test/fsstress/p0/dba/d35e/d7bc/d8fd/ff19': Permission denied
rm: cannot remove '/mnt/test/fsstress/p0/dba/d5ad/d65b/d7bb/d242/f98b': Permission denied
$ stat /mnt/test/fsstress/p0/dba/d35e/d7bc/d8fd/ff19
stat: cannot statx '/mnt/test/fsstress/p0/dba/d35e/d7bc/d8fd/ff19': Permission denied
</pre>
<p>Unfortunately, the logs are too large to attach here. Where would you like them?</p> CephFS - Bug #54046: unaccessible dentries after fsstress run with namespace-restricted capshttps://tracker.ceph.com/issues/54046?journal_id=2094732022-01-31T14:29:38ZJeff Laytonjlayton@redhat.com
<ul></ul><p>Logfiles are in a tarball at:</p>
<p>ceph-post-file: 20043459-1412-4880-8844-8f575228d1fb</p> CephFS - Bug #54046: unaccessible dentries after fsstress run with namespace-restricted capshttps://tracker.ceph.com/issues/54046?journal_id=2107742022-02-18T08:39:43ZVenky Shankarvshankar@redhat.com
<ul><li><strong>Assignee</strong> set to <i>Venky Shankar</i></li><li><strong>Target version</strong> set to <i>v18.0.0</i></li><li><strong>Backport</strong> set to <i>quincy, pacific</i></li></ul> CephFS - Bug #54046: unaccessible dentries after fsstress run with namespace-restricted capshttps://tracker.ceph.com/issues/54046?journal_id=2109212022-02-22T09:25:38ZVenky Shankarvshankar@redhat.com
<ul></ul><p>Jeff,</p>
<p>For this unlink request</p>
<pre>
rm: cannot remove '/mnt/test/fsstress/p0/dba/d35e/d7bc/d8fd/ff19': Permission denied
</pre>
<p>there is no corresponding mds unlink request or a path lookup failure in the MDS. "ff19" is a hardlink which has a corresponding mds request:</p>
<pre>
Jan 31 09:07:43 cephadm1 ceph-mds[4168]: mds.0.server handle_client_request client_request(client.234387:1752 link #0x100000402a0/ff19 #0x1000004028c/fef4 2022-01-31T14:07:43.911857+0000 caller_uid=0, caller_gid=0{0,}) v4
</pre>
<p>and its reply:</p>
<pre>
Jan 31 09:07:43 cephadm1 ceph-mds[4168]: mds.0.server reply_client_request 0 ((0) Success) client_request(client.234387:1752 link #0x100000402a0/ff19 #0x1000004028c/fef4 2022-01-31T14:07:43.911857+0000 caller_uid=0, caller_gid=0{0,}) v4
</pre>
<p>But, I cannot see the unlink request coming in or a failure with -EACCES for the path in question. Failures with -EACCES are these:</p>
<pre>
mds.test.cephadm1.tsiqgp:Jan 31 09:03:26 cephadm1 ceph-mds[4117]: mds.2.server reply_client_request -13 ((13) Permission denied) client_request(client.234333:5345 lookupino #0x30000008465 2022-01-31T14:03:26.757258+0000 caller_uid=0, caller_gid=0{0,}) v4
mds.test.cephadm1.tsiqgp:Jan 31 09:03:26 cephadm1 ceph-mds[4117]: mds.2.1168 send_message_client client.234333 192.168.1.209:0/3132864089 client_reply(???:5345 = -13 (13) Permission denied) v1
mds.test.cephadm1.tsiqgp:Jan 31 09:03:35 cephadm1 ceph-mds[4117]: mds.2.server reply_client_request -13 ((13) Permission denied) client_request(client.234354:21 lookupino #0x30000008465 2022-01-31T14:03:35.239479+0000 caller_uid=0, caller_gid=0{0,}) v4
mds.test.cephadm1.tsiqgp:Jan 31 09:07:24 cephadm1 ceph-mds[4117]: mds.2.server reply_client_request -13 ((13) Permission denied) client_request(client.234387:1497 lookupino #0x30000008465 2022-01-31T14:07:24.478378+0000 caller_uid=0, caller_gid=0{0,}) v4
mds.test.cephadm1.tsiqgp:Jan 31 09:07:24 cephadm1 ceph-mds[4117]: mds.2.1168 send_message_client client.234387 192.168.1.209:0/4024835920 client_reply(???:1497 = -13 (13) Permission denied) v1
mds.test.cephadm1.xrdhbp:Jan 31 09:06:15 cephadm1 ceph-mds[4168]: mds.0.server reply_client_request -13 ((13) Permission denied) client_request(client.234387:364 lookupino #0x30000008465 2022-01-31T14:06:15.864663+0000 caller_uid=0, caller_gid=0{0,}) v4
mds.test.cephadm1.xrdhbp:Jan 31 09:06:15 cephadm1 ceph-mds[4168]: mds.0.1168 send_message_client client.234387 192.168.1.209:0/4024835920 client_reply(???:364 = -13 (13) Permission denied) v1
mds.test.cephadm3.tglhaq:Jan 31 09:05:14 cephadm3 ceph-mds[2921]: mds.1.server reply_client_request -13 ((13) Permission denied) client_request(client.234387:2 lookupino #0x30000008465 2022-01-31T14:05:14.865090+0000 caller_uid=0, caller_gid=0{0,}) v4
mds.test.cephadm3.tglhaq:Jan 31 09:05:14 cephadm3 ceph-mds[2921]: mds.1.1168 send_message_client client.234387 192.168.1.209:0/4024835920 client_reply(???:2 = -13 (13) Permission denied) v1
</pre>
<p>But they are well before the link request and the inode number points to `/volumes/_nogroup/centos8/`</p>
<p>Also, the below failure:</p>
<pre>
[11770.089390] ceph: do_request 000000008d72dbb7 done, result -13
</pre>
<p>is a synchronous dirop request and the one of the places where -EACCES is returned in kclient is (mds_client.c):</p>
<pre>
if (session->s_state == CEPH_MDS_SESSION_REJECTED) {
if (ceph_test_mount_opt(mdsc->fsc, CLEANRECOVER))
list_add(&req->r_wait, &mdsc->waiting_for_map);
else
err = -EACCES;
goto out_session;
}
</pre>
<p>But I don't think this client session was blocklisted since I cannot find the corresponding log entry in the mds log. So, I wonder where -EACCES is coming from.</p>
<p>Also, I could not reproduce it running generic/070. Would it be possible to share kernel buffer logs and the mds logs if you could reproduce it again? AFAICT, this requires a dir metadata to be replicated. I'll try reducing `mds_bal_replicate_threshold` and see if see if I run into it.</p> CephFS - Bug #54046: unaccessible dentries after fsstress run with namespace-restricted capshttps://tracker.ceph.com/issues/54046?journal_id=2109342022-02-22T13:06:18ZJeff Laytonjlayton@redhat.com
<ul></ul><p>Venky Shankar wrote:</p>
<blockquote>
<p>But, I cannot see the unlink request coming in or a failure with -EACCES for the path in question. Failures with -EACCES are these:</p>
<p>[...]</p>
<p>But they are well before the link request and the inode number points to `/volumes/_nogroup/centos8/`</p>
</blockquote>
<p>It's interesting that they are all "lookupino" requests. We only really do that in the client from NFS export handling code (and also maybe from quotarealm handling). I wonder if fsstress might be doing some name_to_handle_at/open_by_handle_at calls that tickle this bug?</p>
<p>In any case, do path-restricted caps prevent the client from doing a lookupino? The client isn't providing a path in that case, so to properly enforce the caps you'd need to do subtree checking, which basically means ensuring that you can reach one of the inode's dentries from a path to which you have access. That's extremely expensive in an NFS server, and I imagine it'd be similar here.</p> CephFS - Bug #54046: unaccessible dentries after fsstress run with namespace-restricted capshttps://tracker.ceph.com/issues/54046?journal_id=2109362022-02-22T13:16:24ZJeff Laytonjlayton@redhat.com
<ul></ul><p>Venky Shankar wrote:</p>
<blockquote>
<p>Also, I could not reproduce it running generic/070. Would it be possible to share kernel buffer logs and the mds logs if you could reproduce it again? AFAICT, this requires a dir metadata to be replicated. I'll try reducing `mds_bal_replicate_threshold` and see if see if I run into it.</p>
</blockquote>
<p>generic/013 reproduces it pretty much every time for me. I didn't do anything special to set up dir replication. I have max_mds set to 3 for both test and scratch filesystems. This is the script I use to blow away and recreate the subvols that I use for testing:</p>
<pre>
#!/bin/bash
umount -t ceph -a
ceph fs subvolume rm test ${HOSTNAME}
ceph fs subvolume rm scratch ${HOSTNAME}
ceph fs subvolume create test ${HOSTNAME} --namespace-isolated
ceph fs subvolume create scratch ${HOSTNAME} --namespace-isolated
ceph fs subvolume authorize test ${HOSTNAME} ${HOSTNAME} --allow-existing-id
ceph fs subvolume authorize scratch ${HOSTNAME} ${HOSTNAME} --allow-existing-id
ceph fs subvolume info test ${HOSTNAME} | jq -r '.path'
ceph fs subvolume info scratch ${HOSTNAME} | jq -r '.path'
</pre> CephFS - Bug #54046: unaccessible dentries after fsstress run with namespace-restricted capshttps://tracker.ceph.com/issues/54046?journal_id=2109372022-02-22T13:18:05ZVenky Shankarvshankar@redhat.com
<ul></ul><p>Thx! I'll try that.</p> CephFS - Bug #54046: unaccessible dentries after fsstress run with namespace-restricted capshttps://tracker.ceph.com/issues/54046?journal_id=2109422022-02-22T14:27:18ZJeff Laytonjlayton@redhat.com
<ul></ul><p>I reproduced this morning again and gathered the mds logs:</p>
<p>ceph-post-file: bf0318cc-3e34-4d61-8895-03dfdae86c25</p> CephFS - Bug #54046: unaccessible dentries after fsstress run with namespace-restricted capshttps://tracker.ceph.com/issues/54046?journal_id=2109492022-02-22T15:59:41ZJeff Laytonjlayton@redhat.com
<ul></ul><p>FWIW, I also turned up kernel debug logs and did this:</p>
<pre>
# stat /mnt/test/fsstress.1621.1/p0/d0XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX/d15XXXXXXX/f5dXXXXXXX
stat: cannot statx '/mnt/test/fsstress.1621.1/p0/d0XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX/d15XXXXXXX/f5dXXXXXXX': Permission denied
</pre>
<p>...that produced:</p>
<pre>
[ 651.823615] ceph: prepare_send_request 000000003409d9cc tid 6 lookup (attempt 1)
[ 651.823618] ceph: dentry 000000004096f2c2 10000037709/f5dXXXXXXX
[ 651.823622] ceph: r_parent = 000000004b91e0cd
[ 651.823627] ceph: do_request waiting
[ 651.824206] ceph: handle_reply 000000003409d9cc
[ 651.824212] ceph: __unregister_request 000000003409d9cc tid 6
[ 651.824214] ceph: handle_reply tid 6 result -13
[ 651.824217] ceph: fill_trace 000000003409d9cc is_dentry 0 is_target 0
[ 651.824220] ceph: fill_trace reply is empty!
[ 651.824221] ceph: unreserve caps ctx=00000000f49b2520 count=2
[ 651.824224] ceph: __ceph_unreserve_caps: caps 7 = 5 used + 0 resv + 2 avail
[ 651.824249] ceph: do_request waited, got 0
[ 651.824253] ceph: do_request 000000003409d9cc done, result -13
</pre>
<p>...so we sent a LOOKUP to the MDS (I think for the last component since I can stat the parent) and got back -EACCES.</p> CephFS - Bug #54046: unaccessible dentries after fsstress run with namespace-restricted capshttps://tracker.ceph.com/issues/54046?journal_id=2109532022-02-22T17:03:39ZJeff Laytonjlayton@redhat.com
<ul></ul><p>I see this in the current batch of logs (mds.test.cephadm1.xlapqu.log):</p>
<pre>
Feb 22 08:50:38 cephadm1 ceph-mds[4043]: mds.0.locker rdlock_start on (isnap sync) on [inode 0x10000037707 [2,head] ~mds2/stray0/10000037707 rep@2.1 v465047 s=4194304 n(v0 rc2022-02-22T13:28:49.871445+0000 b4194304 1=1+0) (iversion lock) | ptrwaiter=0 request=0 lock=0 caps=0 remoteparent=1 exportingcaps=0 dirtyparent=0 dirty=0 authpin=0 tempexporting=0 0x562a094fbb80]
Feb 22 08:50:38 cephadm1 ceph-mds[4043]: mds.0.locker got rdlock on (isnap sync r=1) [inode 0x10000037707 [2,head] ~mds2/stray0/10000037707 rep@2.1 v465047 s=4194304 n(v0 rc2022-02-22T13:28:49.871445+0000 b4194304 1=1+0) (iversion lock) | ptrwaiter=0 request=0 lock=1 caps=0 remoteparent=1 exportingcaps=0 dirtyparent=0 dirty=0 authpin=0 tempexporting=0 0x562a094fbb80]
Feb 22 08:50:38 cephadm1 ceph-mds[4043]: mds.0.cache path_traverse finish on snapid head
Feb 22 08:50:38 cephadm1 ceph-mds[4043]: mds.0.server ref is [inode 0x10000037707 [2,head] ~mds2/stray0/10000037707 rep@2.1 v465047 s=4194304 n(v0 rc2022-02-22T13:28:49.871445+0000 b4194304 1=1+0) (iversion lock) | ptrwaiter=0 request=0 lock=1 caps=0 remoteparent=1 exportingcaps=0 dirtyparent=0 dirty=0 authpin=0 tempexporting=0 0x562a094fbb80]
Feb 22 08:50:38 cephadm1 ceph-mds[4043]: mds.0.locker acquire_locks request(client.284338:41 nref=3 cr=0x562a0b6d62c0)
Feb 22 08:50:38 cephadm1 ceph-mds[4043]: mds.0.locker must rdlock (iauth sync) [inode 0x10000037707 [2,head] ~mds2/stray0/10000037707 rep@2.1 v465047 s=4194304 n(v0 rc2022-02-22T13:28:49.871445+0000 b4194304 1=1+0) (iversion lock) | ptrwaiter=0 request=1 lock=1 caps=0 remoteparent=1 exportingcaps=0 dirtyparent=0 dirty=0 authpin=0 tempexporting=0 0x562a094fbb80]
Feb 22 08:50:38 cephadm1 ceph-mds[4043]: mds.0.locker must rdlock (ixattr sync) [inode 0x10000037707 [2,head] ~mds2/stray0/10000037707 rep@2.1 v465047 s=4194304 n(v0 rc2022-02-22T13:28:49.871445+0000 b4194304 1=1+0) (iversion lock) | ptrwaiter=0 request=1 lock=1 caps=0 remoteparent=1 exportingcaps=0 dirtyparent=0 dirty=0 authpin=0 tempexporting=0 0x562a094fbb80]
Feb 22 08:50:38 cephadm1 ceph-mds[4043]: mds.0.locker rdlock_start on (iauth sync) on [inode 0x10000037707 [2,head] ~mds2/stray0/10000037707 rep@2.1 v465047 s=4194304 n(v0 rc2022-02-22T13:28:49.871445+0000 b4194304 1=1+0) (iversion lock) | ptrwaiter=0 request=1 lock=1 caps=0 remoteparent=1 exportingcaps=0 dirtyparent=0 dirty=0 authpin=0 tempexporting=0 0x562a094fbb80]
Feb 22 08:50:38 cephadm1 ceph-mds[4043]: mds.0.locker got rdlock on (iauth sync r=1) [inode 0x10000037707 [2,head] ~mds2/stray0/10000037707 rep@2.1 v465047 s=4194304 n(v0 rc2022-02-22T13:28:49.871445+0000 b4194304 1=1+0) (iauth sync r=1) (iversion lock) | ptrwaiter=0 request=1 lock=2 caps=0 remoteparent=1 exportingcaps=0 dirtyparent=0 dirty=0 authpin=0 tempexporting=0 0x562a094fbb80]
Feb 22 08:50:38 cephadm1 ceph-mds[4043]: mds.0.locker rdlock_start on (ixattr sync) on [inode 0x10000037707 [2,head] ~mds2/stray0/10000037707 rep@2.1 v465047 s=4194304 n(v0 rc2022-02-22T13:28:49.871445+0000 b4194304 1=1+0) (iauth sync r=1) (iversion lock) | ptrwaiter=0 request=1 lock=2 caps=0 remoteparent=1 exportingcaps=0 dirtyparent=0 dirty=0 authpin=0 tempexporting=0 0x562a094fbb80]
Feb 22 08:50:38 cephadm1 ceph-mds[4043]: mds.0.locker got rdlock on (ixattr sync r=1) [inode 0x10000037707 [2,head] ~mds2/stray0/10000037707 rep@2.1 v465047 s=4194304 n(v0 rc2022-02-22T13:28:49.871445+0000 b4194304 1=1+0) (iauth sync r=1) (ixattr sync r=1) (iversion lock) | ptrwaiter=0 request=1 lock=3 caps=0 remoteparent=1 exportingcaps=0 dirtyparent=0 dirty=0 authpin=0 tempexporting=0 0x562a094fbb80]
Feb 22 08:50:38 cephadm1 ceph-mds[4043]: Session check_access stray_prior_path
Feb 22 08:50:38 cephadm1 ceph-mds[4043]: MDSAuthCap is_capable inode(path / owner 0:0 mode 0100666) by caller 4447:4447 mask 1 new 0:0 cap: MDSAuthCaps[allow rw path="/volumes/_nogroup/client1/ef0b8bdf-0e06-4c74-b523-e7e1cd6d61d8", allow rw path="/volumes/_nogroup/client1/8d9eebe8-c5bd-4d53-968b-25d1705f0d9a", allow rw path="/volumes/_nogroup/client1/81fd4273-03d1-43a9-9b9c-a3a1cd71626e", allow rw path="/volumes/_nogroup/client1/1a98f09f-c1fc-4b7d-bed9-c673b431bba1", allow rw path="/volumes/_nogroup/client1/f6589acf-b134-4b75-a222-bbc7197457f8", allow rw path="/volumes/_nogroup/client1/f4391bf8-865b-42c9-8541-103a6be949b0", allow rw path="/volumes/_nogroup/client1/45865a6d-7258-4184-9796-c922e787ebf2", allow rw path="/volumes/_nogroup/client1/5fe7c032-0b15-4b18-b634-2abab6b6ffc1", allow rw path="/volumes/_nogroup/client1/e994684e-a2bc-4270-8de1-7e461a77952a", allow rw path="/volumes/_nogroup/client1/7d04a246-bd3e-432d-b4ad-324fa31b4bfd", allow rw path="/volumes/_nogroup/client1/b071380c-d603-4008-b529-c132e53739ea", allow rw path="/volumes/_nogroup/client1/0650926a-4d4d-4332-9931-469a18c1b41b", allow rw path="/volumes/_nogroup/client1/670a813d-fcba-4fdf-b94f-ad154aba6833", allow rw path="/volumes/_nogroup/client1/38030a6e-b7d7-4613-9572-c919e7788641", allow rw path="/volumes/_nogroup/client1/79beb7ea-0713-42c4-909a-a55b7a4daf45", allow rw path="/volumes/_nogroup/client1/37e55b93-b98c-4ae6-a0f2-824b9c076a48", allow rw path="/volumes/_nogroup/client1/6658fab0-f31b-401e-9f8a-911a6e1f08af", allow rw path="/volumes/_nogroup/client1/6f1acc15-566e-4a0b-b494-c7e9b0d0b8a5", allow rw path="/volumes/_nogroup/client1/fe9abf2c-a7e2-4cf7-96e7-6ef207cc2f23", allow rw path="/volumes/_nogroup/client1/3b115ab7-4cc9-4bce-b3b9-9d98f9c3ca67", allow rw path="/volumes/_nogroup/client1/e5349eca-a30b-4345-a028-c2f8031b8dfe", allow rw path="/volumes/_nogroup/client1/44aaceca-faf7-4dd1-ad1c-c4caa874e0c0"]
Feb 22 08:50:38 cephadm1 ceph-mds[4043]: mds.0.server respond_to_request batch head request(client.284338:41 nref=3 cr=0x562a0b6d62c0)
Feb 22 08:50:38 cephadm1 ceph-mds[4043]: respond: responding to batch ops with result=-13: [batch front=request(client.284338:41 nref=3 cr=0x562a0b6d62c0)]
Feb 22 08:50:38 cephadm1 ceph-mds[4043]: mds.0.server reply_client_request -13 ((13) Permission denied) client_request(client.284338:41 lookup #0x10000037709/f1fX 2022-02-22T13:50:38.539147+0000 caller_uid=4447, caller_gid=4447{10,4447,}) v4
Feb 22 08:50:38 cephadm1 ceph-mds[4043]: mds.0.server apply_allocated_inos 0x0 / [] / 0x0
Feb 22 08:50:38 cephadm1 ceph-mds[4043]: mds.0.server lat 0.000357
Feb 22 08:50:38 cephadm1 ceph-mds[4043]: mds.0.774 send_message_client client.284338 192.168.1.77:0/2833116500 client_reply(???:41 = -13 (13) Permission denied) v1
Feb 22 08:50:38 cephadm1 ceph-mds[4043]: mds.0.cache request_finish request(client.284338:41 nref=3 cr=0x562a0b6d62c0)
</pre>
<p>...so one interesting thing is that when you remove a subvolume, it apparently doesn't clean up caps associated with that path. That's possibly ugly after a long uptime. I'll plan to open a new bug for that.</p>
<p>Beyond that though, I suspect the that MDSAuthCaps::is_capable is returning false and that makes Session::check_access return -EACCES. The path in this case is "/" though, so that return seems appropriate. Note too the "stray_prior_path" message in there, which comes from here. So the diri in this case is_stray:</p>
<pre>
if (!in->is_base())
diri = in->get_projected_parent_dn()->get_dir()->get_inode();
if (diri && diri->is_stray()){
path = in->get_projected_inode()->stray_prior_path;
dout(20) << __func__ << " stray_prior_path " << path << dendl;
} else {
in->make_path_string(path, true);
dout(20) << __func__ << " path " << path << dendl;
}
</pre>
<p>One thing I've come to understand recently is that the client often has to access inodes that are in the stray dirs. Perhaps path-restricted caps are preventing that?</p> CephFS - Bug #54046: unaccessible dentries after fsstress run with namespace-restricted capshttps://tracker.ceph.com/issues/54046?journal_id=2112312022-02-28T10:24:00ZVenky Shankarvshankar@redhat.com
<ul></ul><p>Jeff Layton wrote:</p>
<blockquote>
<p>I see this in the current batch of logs (mds.test.cephadm1.xlapqu.log):</p>
<p>[...]</p>
<p>...so one interesting thing is that when you remove a subvolume, it apparently doesn't clean up caps associated with that path. That's possibly ugly after a long uptime. I'll plan to open a new bug for that.</p>
<p>Beyond that though, I suspect the that MDSAuthCaps::is_capable is returning false and that makes Session::check_access return -EACCES. The path in this case is "/"</p>
</blockquote>
<p>`stray_prior_path` looks empty which is causing this bug. It should hold the path prior to the unlink.</p>
<p>I'll post a fix.</p>
<blockquote>
<p>though, so that return seems appropriate. Note too the "stray_prior_path" message in there, which comes from here. So the diri in this case is_stray:</p>
<p>[...]</p>
<p>One thing I've come to understand recently is that the client often has to access inodes that are in the stray dirs. Perhaps path-restricted caps are preventing that?</p>
</blockquote> CephFS - Bug #54046: unaccessible dentries after fsstress run with namespace-restricted capshttps://tracker.ceph.com/issues/54046?journal_id=2115472022-03-07T13:19:42ZVenky Shankarvshankar@redhat.com
<ul></ul><p>Hitting this bug involves having hardlinks to inodes which are authoritative in another active mds. When a non-primary hardlink inode is unlinked, the mds for this (non-primary) inode sends a UNLINKPREP message to the authoritative mds of the primary inode. This unlike the local unlink (where the auth mds would not be a different mds) does not set ->stray_prior_path field in the inode, leaving it empty.</p>
<p>There is something else on-going too in the mds which is a bit puzzling, which I have no explanation at-the-moment - I'm figuring that out currently.</p>
<p>Will update when get to the root of it.</p> CephFS - Bug #54046: unaccessible dentries after fsstress run with namespace-restricted capshttps://tracker.ceph.com/issues/54046?journal_id=2115902022-03-08T09:45:38ZVenky Shankarvshankar@redhat.com
<ul></ul><p>OK - super easy to reproduce with some directory pinning. steps -<br />(requires a ceph user with path restricted caps)</p>
<pre>
- mkdir d0 d1
- touch d0/foo
- ln d0/foo d1/bar
- ln d0/foo d1/baz
- setfattr -n ceph.dir.pin -v 0 d0
- setfattr -n ceph.dir.pin -v 1 d1
</pre>
<p>Fisrt, remove the (first) remote link:<br /><pre>
- rm -f d1/bar
</pre></p>
<p>at this point the state of the inode in rank1 is (`dump inode'):<br /><pre>
"path": "/volumes/_nogroup/subzero/e3dad093-e653-4750-b6c0-4ea09a38093b/d1/bar"
...
...
"stray_prior_path": "",
</pre></p>
<p>Now, remove the primary link and the (other) remote link:<br /><pre>
rm -f /mnt/cephfs/d0/foo && rm -f /mnt/cephfs/d1/baz
rm: cannot remove '/mnt/cephfs/d1/baz': Permission denied
rm -f /mnt/cephfs/d1/baz
rm: cannot remove '/mnt/cephfs/d1/baz': Permission denied
stat /mnt/cephfs/d1/baz
stat: cannot stat '/mnt/cephfs/d1/baz': Permission denied
</pre></p>
<p>State of the inode in rank1:<br /><pre>
"path": "~mds0/stray0/1000000001c",
...
...
"stray_prior_path": "",
</pre></p>
<p>A stray inode with an empty `stray_prior_path' - SessionMap::check_access() does not like that.</p>
<p>Unlinking the remote link (first time) does not store `stray_prior_path' (that's fine). Then unlinking the primary link results in the mds (auth of primary link) to send a `dentry_unlink' message to its replicas. This kicks the inode (it replica MDSs) to update the inode path with the stray directory path (of the sender mds), however, `stray_prior_path' still remain empty.</p>
<p>As far as the fix in concerned, once an inode becomes a stray `stray_prior_path' should point to its path before unlink.</p> CephFS - Bug #54046: unaccessible dentries after fsstress run with namespace-restricted capshttps://tracker.ceph.com/issues/54046?journal_id=2118602022-03-14T12:28:34ZVenky Shankarvshankar@redhat.com
<ul></ul><p>The issue happens the auth mds for the primary hardlink sends a `dentry_unlink' message to replica MDSs and the replica MDSs uses this inode to resolve a lookup. Here are the sequence of events/messages - primary dentry 0x10000000018/f0000 is auth on mds.c and dentry 0x10000000019/ba2222 is auth on mds.b.</p>
<p>(mds.c) `_unlink_local' called for 0x10000000018/f0000:<br /><pre>
2022-03-09T01:16:23.459-0500 7f6df3df2700 10 mds.0.server _unlink_local [dentry #0x1/volumes/_nogroup/subzero/e3dad093-e653-4750-b6c0-4ea09a38093b/d0/f0000 [2,head] auth{1=1} (dn xlock x=1 by 0x55626f4e4000) (dversion lock w=1 last_client=214131) v=147 ap=2 ino=0x10000000022 state=1610612736 | request=1 lock=2 inodepin=1 replicated=1 dirty=1 waiter=0 authpin=1 0x55626f340000]
</pre></p>
<p>(mds.c) `send_dentry_unlink' send `dentry_unlink' message to replica MDS (mds.b) and replies back to client:<br /><pre>
2022-03-09T01:16:27.219-0500 7f6dedde6700 10 mds.0.cache send_dentry_unlink [dentry #0x1/volumes/_nogroup/subzero/e3dad093-e653-4750-b6c0-4ea09a38093b/d0/f0000 [2,head] auth{1=1} NULL (dn xlockdone l x=1) (dversion lock w=1 last_client=214131) v=149 ap=2 ino=(nil) state=1610612736 | request=1 lock=2 inodepin=0 rep\
licated=1 dirty=1 waiter=0 authpin=1 clientlease=1 0x55626f340000]
2022-03-09T01:16:27.219-0500 7f6dedde6700 20 mds.0.cache.ino(0x100) encode_snap_blob snaprealm(0x100 seq 1 lc 0 cr 0 cps 1 snaps={} past_parent_snaps= 0x55626a5a1400)
2022-03-09T01:16:27.219-0500 7f6dedde6700 1 -- [v2:192.168.0.5:6830/2802097374,v1:192.168.0.5:6831/2802097374] send_to--> mds [v2:192.168.0.5:6828/1587915794,v1:192.168.0.5:6829/1587915794] -- dentry_unlink(0x10000000018 f0000) v1 -- ?+0 0x55626f442e00
2022-03-09T01:16:27.219-0500 7f6dedde6700 1 -- [v2:192.168.0.5:6830/2802097374,v1:192.168.0.5:6831/2802097374] --> [v2:192.168.0.5:6828/1587915794,v1:192.168.0.5:6829/1587915794] -- dentry_unlink(0x10000000018 f0000) v1 -- 0x55626f442e00 con 0x55626f2e6800
2022-03-09T01:16:27.219-0500 7f6dedde6700 20 mds.0.bal hit_dir 1 pop is 1.0741, frag * size 0 [pop IRD:[C 5.93e-01] IWR:[C 1.07e+00] RDR:[C 0.00e+00] FET:[C 0.00e+00] STR:[C 0.00e+00] *LOAD:2.7]
2022-03-09T01:16:27.219-0500 7f6dedde6700 7 mds.0.server reply_client_request 0 ((0) Success) client_request(client.214131:11 unlink #0x10000000018/f0000 2022-03-09T01:16:23.452163-0500 caller_uid=0, caller_gid=0{0,}) v4
</pre></p>
<p>(mds.b) lookup request for 0x10000000019/ba2222:<br /><pre>
2022-03-09T01:16:23.471-0500 7fd915a11700 1 -- [v2:192.168.0.5:6828/1587915794,v1:192.168.0.5:6829/1587915794] <== client.214131 192.168.0.3:0/2094212419 19 ==== client_request(client.214131:12 lookup #0x10000000019/ba2222 2022-03-09T01:16:23.464163-0500 caller_uid=0, caller_gid=0{0,}) v4 ==== 150+0+0 (crc 0 0 0)\
0x56210015d600 con 0x562100010400
</pre></p>
<p>(mds.b) MDCache::path_traverse() finds the inode in its cache looked-up from the first unlink operation. This inode is not a stray inode, thereby not resulting in stray reintegration.<br /><pre>
2022-03-09T01:16:23.471-0500 7fd915a11700 12 mds.1.cache traverse: path seg depth 0 'ba2222' snapid head
2022-03-09T01:16:23.471-0500 7fd915a11700 20 mds.1.cache.dir(0x10000000019) lookup (ba2222, 'head')
2022-03-09T01:16:23.471-0500 7fd915a11700 20 mds.1.cache.dir(0x10000000019) hit -> (ba2222,head)
2022-03-09T01:16:23.471-0500 7fd915a11700 7 mds.1.cache linking in remote in [inode 0x10000000022 [...2,head] /volumes/_nogroup/subzero/e3dad093-e653-4750-b6c0-4ea09a38093b/d0/f0000 rep@0.1 v145 s=0 nl=2 n(v0 rc2022-03-09T01:13:41.938796-0500 1=1+0) (ilink lock) (iversion lock) 0x562100014c00]
2022-03-09T01:16:23.471-0500 7fd915a11700 10 mds.1.cache.strays eval_remote [dentry #0x1/volumes/_nogroup/subzero/e3dad093-e653-4750-b6c0-4ea09a38093b/d1/ba2222 [2,head] auth REMOTE(reg) (dversion lock) v=37237 ino=0x10000000022 state=1610612736 | inodepin=0 dirty=1 0x5620ffff5680]
2022-03-09T01:16:23.471-0500 7fd915a11700 20 mds.1.cache.strays eval_remote: inode's primary dn not stray
2022-03-09T01:16:23.471-0500 7fd915a11700 10 mds.1.cache path_traverse finish on snapid head
</pre></p>
<p>mds.b sends various messages to mds.c, thereby handle_client_getattr getting called multiple times.</p>
<p>(mds.b) `dentry_unlink' message is received which links the stray inode in the dentry:<br /><pre>
2022-03-09T01:16:27.219-0500 7fd915a11700 7 mds.1.cache handle_dentry_unlink on [dentry #0x1/volumes/_nogroup/subzero/e3dad093-e653-4750-b6c0-4ea09a38093b/d0/f0000 [2,head] rep@0.1 (dn lock) (dversion lock) v=0 ino=0x10000000022 state=0 | inodepin=1 0x56210009ef00]
2022-03-09T01:16:27.219-0500 7fd915a11700 12 mds.1.cache.dir(0x10000000018) unlink_inode [dentry #0x1/volumes/_nogroup/subzero/e3dad093-e653-4750-b6c0-4ea09a38093b/d0/f0000 [2,head] rep@0.1 (dn lock) (dversion lock) v=0 ino=0x10000000022 state=0 | inodepin=1 0x56210009ef00] [inode 0x10000000022 [...2,head] /volume\
s/_nogroup/subzero/e3dad093-e653-4750-b6c0-4ea09a38093b/d0/f0000 rep@0.1 v145 s=0 nl=2 n(v0 rc2022-03-09T01:13:41.938796-0500 1=1+0) (ilink lock) (iversion lock) | request=1 lock=0 caps=0 remoteparent=1 waiter=1 0x562100014c00]
2022-03-09T01:16:27.219-0500 7fd915a11700 12 mds.1.cache.dir(0x600) link_primary_inode [dentry #0x100/stray0/10000000022 [2,head] rep@0.1 NULL (dn lock) (dversion lock) v=0 ino=(nil) state=64|bottomlru 0x56210009fb80] [inode 0x10000000022 [...2,head] #10000000022 rep@-2.1 v145 s=0 nl=2 n(v0 rc2022-03-09T01:13:41.9\
38796-0500 1=1+0) (ilink lock) (iversion lock) | request=1 lock=0 caps=0 remoteparent=1 waiter=1 0x562100014c00]
</pre></p>
<p>The inode gets linked with an empty `stray_prior_path'. The auth mds should have encoded this (inode_t::encode), so there is a bug somewhere...</p> CephFS - Bug #54046: unaccessible dentries after fsstress run with namespace-restricted capshttps://tracker.ceph.com/issues/54046?journal_id=2119662022-03-16T09:31:51ZVenky Shankarvshankar@redhat.com
<ul><li><strong>Pull request ID</strong> set to <i>45407</i></li></ul> CephFS - Bug #54046: unaccessible dentries after fsstress run with namespace-restricted capshttps://tracker.ceph.com/issues/54046?journal_id=2151392022-04-25T14:19:26ZVenky Shankarvshankar@redhat.com
<ul><li><strong>Status</strong> changed from <i>New</i> to <i>Pending Backport</i></li></ul> CephFS - Bug #54046: unaccessible dentries after fsstress run with namespace-restricted capshttps://tracker.ceph.com/issues/54046?journal_id=2151402022-04-25T14:20:49ZBackport Bot
<ul><li><strong>Copied to</strong> <i><a class="issue tracker-9 status-3 priority-4 priority-default closed" href="/issues/55427">Backport #55427</a>: pacific: unaccessible dentries after fsstress run with namespace-restricted caps</i> added</li></ul> CephFS - Bug #54046: unaccessible dentries after fsstress run with namespace-restricted capshttps://tracker.ceph.com/issues/54046?journal_id=2151422022-04-25T14:20:56ZBackport Bot
<ul><li><strong>Copied to</strong> <i><a class="issue tracker-9 status-3 priority-4 priority-default closed" href="/issues/55428">Backport #55428</a>: quincy: unaccessible dentries after fsstress run with namespace-restricted caps</i> added</li></ul> CephFS - Bug #54046: unaccessible dentries after fsstress run with namespace-restricted capshttps://tracker.ceph.com/issues/54046?journal_id=2225802022-08-08T16:29:29ZBackport Bot
<ul><li><strong>Tags</strong> set to <i>backport_processed</i></li></ul> CephFS - Bug #54046: unaccessible dentries after fsstress run with namespace-restricted capshttps://tracker.ceph.com/issues/54046?journal_id=2266822022-09-30T08:38:05ZKonstantin Shalygink0ste@k0ste.ru
<ul><li><strong>Status</strong> changed from <i>Pending Backport</i> to <i>Resolved</i></li><li><strong>% Done</strong> changed from <i>0</i> to <i>100</i></li><li><strong>Tags</strong> deleted (<del><i>backport_processed</i></del>)</li></ul>