Bug #12189
closedEditing / Creating files fails for NFS-over-CephFS on EC pool with cache tier
0%
Description
Ubuntu 14.04, Kernel 3.13.0-55-generic
Standard kernel-based NFS server
Ceph Hammer release
~# ceph version
ceph version 0.94.2 (5fb85614ca8f354284c713a2f9c610860720bbf3)
Mount point for cephfs via fstab:
client_mountpoint=/volumes /srv/ceph fuse.ceph defaults,_netdev 0 0
~# mount | grep ceph
ceph-fuse on /srv/ceph type fuse.ceph-fuse (rw,nosuid,nodev,allow_other,default_permissions)
~# cat /etc/exports | grep ceph
- export ceph to desktop machines only
/srv/ceph XX.XX.XX.XX/YY
On client machine export CephFS is mounted as /ceph:
~$ mount | grep ceph
nfs-homes:/srv/ceph on /ceph type nfs (rw,noatime,fsc,nfsvers=4,sec=krb5p,intr,ac,sloppy,addr=XX.XX.XX.XX,clientaddr=XX.XX.XX.XX)
CephFS uses three data pools:
~# ceph fs ls
name: cephfs, metadata pool: cephfs_test_metadata, data pools: [cephfs_test_data cephfs_two_rep_data ec_ssd_cache ]
~# getfattr -n ceph.dir.layout /srv/ceph/
getfattr: Removing leading '/' from absolute path names
- file: srv/ceph/
ceph.dir.layout="stripe_unit=4194304 stripe_count=1 object_size=4194304 pool=ec_ssd_cache"
# getfattr -n ceph.dir.layout /srv/ceph/adm/temp/test/
/srv/ceph/adm/temp/test/: ceph.dir.layout: No such attribute
# getfattr -n ceph.dir.layout /srv/ceph/adm/temp/
/srv/ceph/adm/temp/: ceph.dir.layout: No such attribute
~# getfattr -n ceph.dir.layout /srv/ceph/adm/
/srv/ceph/adm/: ceph.dir.layout: No such attribute
Creating files and/or editing them on the server works:
:/srv/ceph/adm/temp/test$ dd if=/dev/zero of=test bs=1 count=10
10+0 records in
10+0 records out
10 bytes (10 B) copied, 0.0046936 s, 2.1 kB/s
Creating a file in the same way on a NFS client also works:
:/ceph/adm/temp/test$ dd if=/dev/zero of=test2 bs=1 count=10
10+0 records in
10+0 records out
10 bytes (10 B) copied, 0.0332668 s, 0.3 kB/s
Editing the file on the server is also ok:
:/srv/ceph/adm/temp/test$ vi test
blinke@waas:/srv/ceph/adm/temp/test$ ls al 1 blinke cb 10 Jun 30 16:23 test
total 2
drwxr-xr-x 1 blinke cb 20 Jun 30 16:25 .
drwxrwxrwt 1 blinke support 20733315 Jun 30 16:11 ..
-rw-r--r-rw-r--r- 1 blinke cb 10 Jun 30 16:24 test2
Editing the file on the NFS client FAILS:
:/ceph/adm/temp/test$ vi test2
E325: ATTENTION
Found a swap file by the name ".test2.swp"
owned by: blinke dated: Mon Dec 11 03:21:47 1972
[cannot be opened]
While opening file "test2"
dated: Tue Jun 30 16:24:29 2015
NEWER than swap file!
(1) Another program may be editing the same file. If this is the case,
be careful not to end up with two different instances of the same
file when making changes. Quit, or continue with caution.
(2) An edit session for this file crashed.
If this is the case, use ":recover" or "vim -r test2"
to recover the changes (see ":help recovery").
If you did this already, delete the swap file ".test2.swp"
to avoid this message.
Swap file ".test2.swp" already exists!
[O]pen Read-Only, (E)dit anyway, (R)ecover, (D)elete it, (Q)uit, (A)bort: -> q
:/ceph/adm/temp/test$ vi test2
blinke@fb08-bcf-pc01:/ceph/adm/temp/test$ ls al 1 blinke cb 10 Jun 30 16:23 test
total 2
drwxr-xr-x 1 blinke cb 20 Jun 30 16:26 .
drwxrwxrwt 1 blinke support 20733315 Jun 30 16:11 ..
-rw-r--r-rw-r--r- 1 blinke cb 10 Jun 30 16:24 test2
---------- 1 blinke cb 0 Dec 11 1972 .test2.swo
---------- 1 blinke cb 0 Dec 11 1972 .test2.swp
Editing a files stored on a replicated pool is OK
Copying a file to the directory on the server is OK:
:/srv/ceph/adm/temp/test$ cp ~/wf1.out .
:/srv/ceph/adm/temp/test$ rm wf1.out
Copying a file to the directory on the client FAILS:
:/ceph/adm/temp/test$ cp ~/wf1.out .
cp: cannot create regular file './wf1.out': Permission denied
:/ceph/adm/temp/test$ ls al 1 blinke cb 10 Jun 30 16:23 test
total 2
drwxr-xr-x 1 blinke cb 20 Jun 30 16:30 .
drwxrwxrwt 1 blinke support 20728068 Jun 30 16:11 ..
-rw-r--r-rw-r--r- 1 blinke cb 10 Jun 30 16:24 test2
---------- 1 blinke cb 0 Dec 11 1972 .test2.swo
---------- 1 blinke cb 0 Dec 11 1972 .test2.swp
---------- 1 blinke cb 0 Dec 11 1972 wf1.out
It is created as empty file....
Updated by Burkhard Linke almost 9 years ago
~# ceph df
GLOBAL:
SIZE AVAIL RAW USED %RAW USED
126T 53823G 75664G 58.38
POOLS:
NAME ID USED %USED MAX AVAIL OBJECTS
cephfs_test_data 7 918M 0 8764G 5614112
cephfs_test_metadata 8 36797k 0 833G 648204
cephfs_two_rep_data 12 28242G 21.79 13147G 9231148
ec_ssd_cache 18 388G 0.30 1250G 364092
cephfs_ec_data 19 10600G 8.18 17529G 5106582
:~# ceph osd dump | grep pool
pool 7 'cephfs_test_data' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 2048 pgp_num 2048 last_change 68813 flags hashpspool min_read_recency_for_promote 1 stripe_width 0
pool 8 'cephfs_test_metadata' replicated size 3 min_size 2 crush_ruleset 2 object_hash rjenkins pg_num 1024 pgp_num 1024 last_change 80938 flags hashpspool min_read_recency_for_promote 1 stripe_width 0
pool 12 'cephfs_two_rep_data' replicated size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 1024 pgp_num 1024 last_change 73815 flags hashpspool min_read_recency_for_promote 1 stripe_width 0
pool 18 'ec_ssd_cache' replicated size 2 min_size 1 crush_ruleset 2 object_hash rjenkins pg_num 128 pgp_num 128 last_change 162775 flags hashpspool,incomplete_clones tier_of 19 cache_mode writeback target_bytes 500000000000 target_objects 1000000 hit_set bloom{false_positive_probability: 0.05, target_size: 0, seed: 0} 3600s x1 stripe_width 0
pool 19 'cephfs_ec_data' erasure size 6 min_size 4 crush_ruleset 3 object_hash rjenkins pg_num 256 pgp_num 256 last_change 149589 lfor 149589 flags hashpspool tiers 18 read_tier 18 write_tier 18 stripe_width 4096
(removed other pools from output)
Updated by Greg Farnum almost 9 years ago
- Tracker changed from Tasks to Bug
- Project changed from Stable releases to CephFS
- Regression set to No
Updated by Burkhard Linke almost 9 years ago
Replicated pools also seems to be affected:
On client:
:/ceph/test$ ls
:/ceph/test$ touch foo
:/ceph/test$ cp ~/wf1.out .
cp: cannot create regular file './wf1.out': Permission denied
blinke@fb08-bcf-pc01:/ceph/test$ ls al 1 blinke cb 0 Jun 30 16:55 foo
total 1
drwxr-xr-x 1 blinke root 0 Jun 30 16:55 .
drwxr-xr-x 1 root root 30592876135254 Jun 30 16:53 ..
-rw-r--r-
... some seconds later...
blinke@fb08-bcf-pc01:/ceph/test$ ls al 1 blinke cb 0 Jun 30 16:55 foo
total 1
drwxr-xr-x 1 blinke root 0 Jun 30 16:55 .
drwxr-xr-x 1 root root 30592876135254 Jun 30 16:53 ..
-rw-r--r-
---------- 1 blinke cb 0 Dec 16 1972 wf1.out
~# getfattr -n ceph.dir.layout /srv/ceph/test/
getfattr: Removing leading '/' from absolute path names
- file: srv/ceph/test/
ceph.dir.layout="stripe_unit=4194304 stripe_count=1 object_size=4194304 pool=cephfs_two_rep_data"
Updated by Zheng Yan almost 9 years ago
it's likely your client does not have RW permission to the pools
Updated by Burkhard Linke almost 9 years ago
Zheng Yan wrote:
it's likely your client does not have RW permission to the pools
I don't think the problem is permission related:
- cephfs is mounted without specifying the user > client.admin with full permissins accessing, writing and modifying files works fine on the NFS server and other cephfs clients with the same mount point
The problem only occurs on the desktop machines with re-exported cephfs over NFS.
Updated by Zheng Yan almost 9 years ago
it's likely the client on your desktop machine does not RW permission to the pools. please try doing direct write on the ceph-fuse mount point. (dd if=/dev/zero bs=4k count=1 of=test oflag=direct)
Updated by Burkhard Linke almost 9 years ago
The desktop machine do not have access to the ceph network at all. That's why I have to use a NFS gateway.
dd on NFS server (ceph-fuse mount point):
/srv/ceph/adm/temp/test$ dd if=/dev/zero bs=4k count=1 of=test oflag=direct
1+0 records in
1+0 records out
4096 bytes (4.1 kB) copied, 0.000483719 s, 8.5 MB/s
Editing the file with vi is OK on the server
dd on NFS client (NFS mount point of re-exported ceph-fuse mount):
:/ceph/adm/temp/test$ dd if=/dev/zero bs=4k count=1 of=test oflag=direct
1+0 records in
1+0 records out
4096 bytes (4.1 kB) copied, 0.151032 s, 27.1 kB/s
vi fails again and creates two swap files:
:/ceph/adm/temp/test$ ls al 1 blinke cb 4096 Jul 2 09:54 test
total 5
drwxr-xr-x 1 blinke cb 4096 Jul 2 09:55 .
drwxrwxrwt 1 blinke support 20732144 Jun 30 16:45 ..
-rw-r--r-
---------- 1 blinke cb 0 Feb 16 1974 .test.swo
---------- 1 blinke cb 0 Feb 16 1974 .test.swp
The date of the swap files have changed compared to the test I've done yesterday.
Updated by Zheng Yan almost 9 years ago
no idea what happened, please try using newer kernel (such as 4.0 kernel) on both NFS server and NFS client
Updated by Burkhard Linke over 8 years ago
I was able to update the kernel on the NFS server to version 4.1.4 today, which also allows me to use the kernel client instead of ceph-fuse.
The issue is resolved by the update, accessing the files works without problems or spurious files now.