Bug #59515
closedkclient: ln: failed to create hard link 'file name': Read-only file system
0%
Description
The ceph-user mail list thread: https://www.spinics.net/lists/ceph-users/msg76322.html
Just copied Frank's comments here:
Hi all, on an NFS re-export of a ceph-fs (kernel client) I observe a very strange error. I'm un-taring a larger package (1.2G) and after some time I get these errors: ln: failed to create hard link 'file name': Read-only file system The strange thing is that this seems only temporary. When I used "ln src dst" for manual testing, the command failed as above. However, after that I tried "ln -v src dst" and this command created the hard link with exactly the same path arguments. During the period when the error occurs, I can't see any FS in read-only mode, neither on the NFS client nor the NFS server. Funny thing is that file creation and write still works, its only the hard-link creation that fails. For details, the set-up is: file-server: mount ceph-fs at /shares/path, export /shares/path as nfs4 to other server other server: mount /shares/path as NFS More precisely, on the file-server: fstab: MON-IPs:/shares/folder /shares/nfs/folder ceph defaults,noshare,name=NAME,secretfile=sec.file,mds_namespace=FS-NAME,_netdev 0 0 exports: /shares/nfs/folder -no_root_squash,rw,async,mountpoint,no_subtree_check DEST-IP On the host at DEST-IP: fstab: FILE-SERVER-IP:/shares/nfs/folder /mnt/folder nfs defaults,_netdev 0 0 Both, the file server and the client server are virtual machines. The file server is on Centos 8 stream (4.18.0-338.el8.x86_64) and the client machine is on AlmaLinux 8 (4.18.0-425.13.1.el8_7.x86_64). When I change the NFS export from "async" to "sync" everything works. However, that's a rather bad workaround and not a solution. Although this looks like an NFS issue, I'm afraid it is a problem with hard links and ceph-fs. It looks like a race with scheduling and executing operations on the ceph-fs kernel mount. Has anyone seen something like that? Thanks and best regards, ================= Frank Schilder AIT Risø Campus Bygning 109, rum S14
Updated by Xiubo Li about 1 year ago
The error:
executing tar command, stopping after 5 occurrences of error tar: nfs-test/pkgs/libxml2-2.9.10-h72842e0_3/share/aclocal/libxml.m4: Cannot hard link to ‘nfs-test/share/aclocal/libxml.m4’: Read-only file system tar: nfs-test/pkgs/libxml2-2.9.10-h72842e0_3/share/gtk-doc/html/libxml2/libxml2-list.html: Cannot hard link to ‘nfs-test/share/gtk-doc/html/libxml2/libxml2-list.html’: Read-only file system tar: nfs-test/pkgs/libxml2-2.9.10-h72842e0_3/share/gtk-doc/html/libxml2/libxml2-dict.html: Cannot hard link to ‘nfs-test/share/gtk-doc/html/libxml2/libxml2-dict.html’: Read-only file system tar: nfs-test/pkgs/libxml2-2.9.10-h72842e0_3/share/gtk-doc/html/libxml2/libxml2-valid.html: Cannot hard link to ‘nfs-test/share/gtk-doc/html/libxml2/libxml2-valid.html’: Read-only file system tar: nfs-test/pkgs/libxml2-2.9.10-h72842e0_3/share/gtk-doc/html/libxml2/libxml2-parser.html: Cannot hard link to ‘nfs-test/share/gtk-doc/html/libxml2/libxml2-parser.html’: Read-only file system
Added on debug log in ceph_link():
diff --git a/fs/ceph/dir.c b/fs/ceph/dir.c index c28de23e12a1..4c9e84c6ba35 100644 --- a/fs/ceph/dir.c +++ b/fs/ceph/dir.c @@ -1151,8 +1151,10 @@ static int ceph_link(struct dentry *old_dentry, struct inode *dir, if (err) return err; - dout("link in dir %p old_dentry %p dentry %p\n", dir, - old_dentry, dentry); + dout("link in dir %p old_dentry %p:%pd disconnect=%d old_dentry's parent %p:%pd parent ino %llx.%llx, dentry %p:%pd disconnect=%d dentry's parent %p:%pd, parent ino %llx.%llx\n", dir, + old_dentry, old_dentry, (old_dentry->d_flags & DCACHE_DISCONNECTED), old_dentry->d_parent, old_dentry->d_parent, ceph_vinop(d_inode(old_dentry->d_parent)), + dentry, dentry, dentry->d_flags & DCACHE_DISCONNECTED, dentry->d_parent, dentry->d_parent, ceph_vinop(d_inode(dentry->d_parent))); + req = ceph_mdsc_create_request(mdsc, CEPH_MDS_OP_LINK, USE_AUTH_MDS); if (IS_ERR(req)) { d_drop(dentry);
And tried to reproduce it again, the logs:
<7>[331620.275135] ceph: link in dir 000000002d467779 old_dentry 000000001a6a54e7:/ disconnect=32 old_dentry's parent 000000001a6a54e7:/ parent ino 100007bb6d8.fffffffffffffffe, dentry 00000000041642b2:libxml.m4 disconnect=0 dentry's parent 00000000567fb787:aclocal, parent ino 100007cf5c6.fffffffffffffffe <7>[331620.275140] ceph: do_request on 000000009aca5048 <7>[331620.275141] ceph: submit_request on 000000009aca5048 for inode 000000002d467779 <7>[331620.275143] ceph: __register_request 000000009aca5048 tid 347271 <7>[331620.275145] ceph: __choose_mds 000000002d467779 is_hash=1 (0xff728fe9) mode 2 <7>[331620.275147] ceph: __choose_mds 000000002d467779 100007cf5c6.fffffffffffffffe mds0 (auth cap 00000000f9161ca8) <7>[331620.275150] ceph: do_request mds0 session 0000000000fba58e state open <7>[331620.275151] ceph: __prepare_send_request 000000009aca5048 tid 347271 link (attempt 1) <7>[331620.275153] ceph: dentry 00000000041642b2 100007cf5c6/libxml.m4 <7>[331620.275154] ceph: dentry 000000001a6a54e7 100007bb6d8// <7>[331620.275157] ceph: r_parent = 000000002d467779 <7>[331620.275159] ceph: do_request waiting <7>[331620.275774] ceph: handle_reply 000000009aca5048 <7>[331620.275777] ceph: __unregister_request 000000009aca5048 tid 347271 <7>[331620.275778] ceph: handle_reply tid 347271 result -30 <7>[331620.275785] ceph: do_request waited, got 0 <7>[331620.275786] ceph: do_request 000000009aca5048 done, result -30
The uplayer just passed a invalidate dentry: old_dentry 000000001a6a54e7:/ disconnect=32 old_dentry's parent 000000001a6a54e7:/ parent ino 100007bb6d8.fffffffffffffffe. The dentry is DCACHE_DISCONNECTED and the name is /* and the *ino# is 100007bb6d8. And then parsed it as a snapdir 100007bb6d8// and passed it to MDS.
While from the MDS side logs, the snapdir is not permitted to be modified and then just returned as -EROFS:
2023-04-24T06:55:31.337+0200 7f0c1a0d1700 7 mds.0.server dispatch_client_request client_request(client.194038:347271 link #0x100007cf5c6/libxml.m4 #0x100007bb6d8// 2023-04-24T06:55:31.337532+0200 caller_uid=2000, caller_gid=2000{10,2000,}) v6 2023-04-24T06:55:31.337+0200 7f0c1a0d1700 7 mds.0.server handle_client_link #0x100007cf5c6/libxml.m4 to #0x100007bb6d8// 2023-04-24T06:55:31.337+0200 7f0c1a0d1700 10 mds.0.server rdlock_two_paths_xlock_destdn request(client.194038:347271 nref=2 cr=0x55fe5a5ac280) #0x100007cf5c6/libxml.m4 #0x100007bb6d8// 2023-04-24T06:55:31.337+0200 7f0c1a0d1700 7 mds.0.server reply_client_request -30 ((30) Read-only file system) client_request(client.194038:347271 link #0x100007cf5c6/libxml.m4 #0x100007bb6d8// 2023-04-24T06:55:31.337532+0200 caller_uid=2000, caller_gid=2000{10,2000,}) v6 2023-04-24T06:55:31.337+0200 7f0c1a0d1700 10 mds.0.server apply_allocated_inos 0x0 / [] / 0x0 2023-04-24T06:55:31.337+0200 7f0c1a0d1700 20 mds.0.server lat 0.000268 2023-04-24T06:55:31.337+0200 7f0c1a0d1700 10 mds.0.1473 send_message_client client.194038 v1:10.41.24.225:0/723549921 client_reply(???:347271 = -30 (30) Read-only file system safe) v1 2023-04-24T06:55:31.337+0200 7f0c1a0d1700 1 -- [v2:10.41.24.14:6806/2297598453,v1:10.41.24.14:6813/2297598453] --> v1:10.41.24.225:0/723549921 -- client_reply(???:347271 = -30 (30) Read-only file system safe) v1 -- 0x55fde61b5c00 con 0x5601bdfdc800
But the old dentry's parent inode is 100007bb6d8, which is /data/nfs/nfs-test/share/aclocal/libxml.m4:
2023-04-24T06:55:31.337+0200 7f0c1a0d1700 7 mds.0.locker rdlock_finish on (ixattr sync) on [inode 0x100007bb6d8 [...7b,head] /data/nfs/nfs-test/share/aclocal/libxml.m4 auth v6 ap=1 snaprealm=0x56014fb6a780 s=7881 nl=2 n(v0 rc2023-04-24T06:43:58.786177+0200 b7881 1=1+0) (iversion lock) caps={194038=pAsLsXsFscr/-@2} | ptrwaiter=0 request=0 lock=0 caps=1 remoteparent=1 dirtyparent=0 dirty=0 authpin=1 0x5601c9706800]
This is incorrect, because that means the kclient passed a dentry, which is libxml.m4// and the libxml.m4 is a directory.
Updated by Xiubo Li about 1 year ago
I added some more debug log in fs/ceph/export.c:
diff --git a/fs/ceph/export.c b/fs/ceph/export.c index 8559990a59a5..04f721cfcebf 100644 --- a/fs/ceph/export.c +++ b/fs/ceph/export.c @@ -128,6 +128,7 @@ static struct inode *__lookup_inode(struct super_block *sb, u64 ino) struct ceph_vino vino; int err; + dout("__lookup_inode %llx\n", ino); vino.ino = ino; vino.snap = CEPH_NOSNAP; @@ -183,6 +184,7 @@ static struct dentry *__fh_to_dentry(struct super_block *sb, u64 ino) { struct inode *inode = __lookup_inode(sb, ino); struct ceph_inode_info *ci = ceph_inode(inode); + struct dentry *dn; int err; if (IS_ERR(inode)) @@ -198,7 +200,9 @@ static struct dentry *__fh_to_dentry(struct super_block *sb, u64 ino) iput(inode); return ERR_PTR(-ESTALE); } - return d_obtain_alias(inode); + dn = d_obtain_alias(inode); + dout("__fh_to_dentry %llx, dentry %p:%pd\n", ino, dn, dn); + return dn; } static struct dentry *__snapfh_to_dentry(struct super_block *sb,
The uplayer just get a disconnected dentry and passed it back to do the link:
<7>[348818.857649] ceph: fh_to_dentry 100007e535d <7>[348818.857651] ceph: __lookup_inode 100007e535d <7>[348818.857652] ceph: do_request on 0000000065ce9170 <7>[348818.857653] ceph: submit_request on 0000000065ce9170 for inode 0000000000000000 <7>[348818.857654] ceph: __register_request 0000000065ce9170 tid 350708 <7>[348818.857656] ceph: __choose_mds 0000000000000000 is_hash=0 (0x0) mode 0 <7>[348818.857657] ceph: __choose_mds chose random mds0 <7>[348818.857658] ceph: do_request mds0 session 00000000af7af898 state open <7>[348818.857659] ceph: __prepare_send_request 0000000065ce9170 tid 350708 lookupino (attempt 1) <7>[348818.857660] ceph: path <7>[348818.857661] ceph: r_parent = 0000000000000000 <7>[348818.857663] ceph: do_request waiting <7>[348818.858374] ceph: handle_reply 0000000065ce9170 <7>[348818.858375] ceph: __unregister_request 0000000065ce9170 tid 350708 <7>[348818.858377] ceph: handle_reply tid 350708 result 0 <7>[348818.858388] ceph: do_request waited, got 0 <7>[348818.858389] ceph: do_request 0000000065ce9170 done, result 0 <7>[348818.858394] ceph: __fh_to_dentry 100007e535d, dentry 00000000490ab95b:/ <7>[348818.858398] ceph: fh_to_dentry 100007f99ee <7>[348818.858398] ceph: __lookup_inode 100007f99ee <7>[348818.858400] ceph: __fh_to_dentry 100007f99ee, dentry 00000000a8c69314:nghttp2 <7>[348818.858404] ceph: lookup 00000000f450adaf dentry 000000001434d69d 'README.rst' <7>[348818.858405] ceph: dir 00000000f450adaf flags are 0x40 <7>[348818.858406] ceph: dir 00000000f450adaf complete, -ENOENT <7>[348818.858409] ceph: link in dir 00000000f450adaf old_dentry 00000000490ab95b:/ disconnect=32 old_dentry's parent 00000000490ab95b:/ parent ino 100007e535d.fffffffffffffffe, dentry 000000001434d69d:README.rst disconnect=0 dentry's parent 00000000a8c69314:nghttp2, parent ino 100007f99ee.fffffffffffffffe <7>[348818.858414] ceph: do_request on 0000000065ce9170 <7>[348818.858415] ceph: submit_request on 0000000065ce9170 for inode 00000000f450adaf <7>[348818.858416] ceph: __register_request 0000000065ce9170 tid 350709 <7>[348818.858417] ceph: __choose_mds 00000000f450adaf is_hash=1 (0xa91c567a) mode 2 <7>[348818.858418] ceph: __choose_mds 00000000f450adaf 100007f99ee.fffffffffffffffe mds0 (auth cap 00000000074dffe3) <7>[348818.858420] ceph: do_request mds0 session 00000000af7af898 state open <7>[348818.858421] ceph: __prepare_send_request 0000000065ce9170 tid 350709 link (attempt 1) <7>[348818.858423] ceph: dentry 000000001434d69d 100007f99ee/README.rst <7>[348818.858424] ceph: dentry 00000000490ab95b 100007e535d// <7>[348818.858426] ceph: r_parent = 00000000f450adaf <7>[348818.858427] ceph: do_request waiting <7>[348818.859055] ceph: handle_reply 0000000065ce9170 <7>[348818.859056] ceph: __unregister_request 0000000065ce9170 tid 350709 <7>[348818.859058] ceph: handle_reply tid 350709 result -30
For a disconnected dentry, it's parent dentry will point to itself, this is why we are getting the dentry name as "*/*": __fh_to_dentry 100007e535d, dentry 00000000490ab95b:/
To fix this I am thinking could we just pass a ino# to MDS for link request, and then in MDS side we will use this ino# to find the primary dentry and use the primary dentry to do the link ?
Updated by Xiubo Li about 1 year ago
There are some "open_by_handle" test cases in xfstests-dev, the source code is [1].
It will parse a file name to handle via "name_to_handle_at()", and the handle will only include the ino# of it, more detail please see [2]3. And then later it will open this handle via "open_by_handle_at()", which will use the ino# to get a dentry, but if the there is no any dentry in the dcache the kernel will just return a disconnected dentry, which has no name and it's parent will piont itself, just like we hit here.
I do not have much knowledge for the nfs and didn't know what exactly the nfs is doing, but from the logs I just guess the nfs is doing the similar thing.
[1] https://kernel.googlesource.com/pub/scm/fs/xfs/xfstests-dev/+/refs/heads/master/src/open_by_handle.c
[2] https://man.archlinux.org/man/name_to_handle_at.2.en
[3] https://github.com/ceph/ceph-client/blob/for-linus/fs/ceph/export.c
Updated by Xiubo Li 12 months ago
- Status changed from In Progress to Fix Under Review
The patchwork link: https://patchwork.kernel.org/project/ceph-devel/list/?series=743519