Bug #19894
openWrite fails on larger files on nfs-ganesha-rgw
0%
Description
Writing files on nfs mounted rgw fail on all files larger than 1MB(?)
2017-05-09 15:45:14.600162 7fe82c144700 1 ====== process_request starting new request req=0x7fe82c142930 ======
2017-05-09 15:45:14.600225 7fe82c144700 2 req 0:0.000012:: :stat_obj:initializing for trans_id = tx000000000000000000000-005911c7ea-11eb9cc-default
2017-05-09 15:45:14.600251 7fe82c144700 2 req 0:0.000038:: :stat_obj:authorizing
2017-05-09 15:45:14.600256 7fe82c144700 2 req 0:0.000044:: :stat_obj:reading op permissions
2017-05-09 15:45:14.600303 7fe82c144700 15 decode_policy Read AccessControlPolicy<AccessControlPolicy xmlns="http://s3.amazonaws.com/doc/2006-03-01/"><Owner><ID>nfstest</ID><DisplayName>nfs test</DisplayName></Owner><AccessControlList><Grant><Grantee xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="CanonicalUser"><ID>nfstest</ID><DisplayName>nfs test</DisplayName></Grantee><Permission>FULL_CONTROL</Permission></Grant></AccessControlList></AccessControlPolicy>
2017-05-09 15:45:14.601844 7fe82c144700 10 manifest: total_size = 0
2017-05-09 15:45:14.601895 7fe82c144700 15 decode_policy Read AccessControlPolicy<AccessControlPolicy xmlns="http://s3.amazonaws.com/doc/2006-03-01/"><Owner><ID>nfstest</ID><DisplayName>nfs test</DisplayName></Owner><AccessControlList><Grant><Grantee xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="CanonicalUser"><ID>nfstest</ID><DisplayName>nfs test</DisplayName></Grantee><Permission>FULL_CONTROL</Permission></Grant></AccessControlList></AccessControlPolicy>
2017-05-09 15:45:14.601906 7fe82c144700 2 req 0:0.001693:: :stat_obj:init op
2017-05-09 15:45:14.601913 7fe82c144700 2 req 0:0.001701:: :stat_obj:verifying op mask
2017-05-09 15:45:14.601917 7fe82c144700 2 req 0:0.001705:: :stat_obj:verifying op permissions
2017-05-09 15:45:14.601928 7fe82c144700 5 Searching permissions for uid=nfstest mask=49
2017-05-09 15:45:14.601932 7fe82c144700 5 Found permission: 15
2017-05-09 15:45:14.601935 7fe82c144700 5 Searching permissions for group=1 mask=49
2017-05-09 15:45:14.601937 7fe82c144700 5 Permissions for group not found
2017-05-09 15:45:14.601940 7fe82c144700 5 Searching permissions for group=2 mask=49
2017-05-09 15:45:14.601942 7fe82c144700 5 Permissions for group not found
2017-05-09 15:45:14.601943 7fe82c144700 5 Getting permissions id=nfstest owner=nfstest perm=1
2017-05-09 15:45:14.601945 7fe82c144700 10 uid=nfstest requested perm (type)=1, policy perm=1, user_perm_mask=15, acl perm=1
2017-05-09 15:45:14.601948 7fe82c144700 2 req 0:0.001735:: :stat_obj:verifying op params
2017-05-09 15:45:14.601951 7fe82c144700 2 req 0:0.001739:: :stat_obj:executing
2017-05-09 15:45:14.602118 7fe82c144700 2 req 0:0.001905:: :stat_obj:http status=200
2017-05-09 15:45:14.602125 7fe82c144700 1 ====== process_request req done req=0x7fe82c142930 http_status=200 ======
2017-05-09 15:45:14.602147 7fe82c144700 10 lookup_fh lookup called on testing for /testing/firefox-52.0.2esr.tar.bz2 (firefox-52.0.2esr.tar.bz2)
2017-05-09 15:45:14.667319 7fe82b142700 2 req 0:0.000015:: ::initializing for trans_id = tx000000000000000000000-005911c7ea-11eb9cc-default
2017-05-09 15:45:14.667391 7fe82b142700 1 ====== start_request starting new continued request req=0x7fe87c02d850 ======
2017-05-09 15:45:14.667396 7fe82b142700 2 req 0:0.000093:: :put_obj:authorizing
2017-05-09 15:45:14.667400 7fe82b142700 2 req 0:0.000097:: :put_obj:reading op permissions
2017-05-09 15:45:14.667434 7fe82b142700 15 decode_policy Read AccessControlPolicy<AccessControlPolicy xmlns="http://s3.amazonaws.com/doc/2006-03-01/"><Owner><ID>nfstest</ID><DisplayName>nfs test</DisplayName></Owner><AccessControlList><Grant><Grantee xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="CanonicalUser"><ID>nfstest</ID><DisplayName>nfs test</DisplayName></Grantee><Permission>FULL_CONTROL</Permission></Grant></AccessControlList></AccessControlPolicy>
2017-05-09 15:45:14.667454 7fe82b142700 2 req 0:0.000151:: :put_obj:init op
2017-05-09 15:45:14.667458 7fe82b142700 2 req 0:0.000155:: :put_obj:verifying op mask
2017-05-09 15:45:14.667460 7fe82b142700 2 req 0:0.000158:: :put_obj:verifying op permissions
2017-05-09 15:45:14.667464 7fe82b142700 5 Searching permissions for uid=nfstest mask=50
2017-05-09 15:45:14.667466 7fe82b142700 5 Found permission: 15
2017-05-09 15:45:14.667467 7fe82b142700 5 Searching permissions for group=1 mask=50
2017-05-09 15:45:14.667469 7fe82b142700 5 Permissions for group not found
2017-05-09 15:45:14.667470 7fe82b142700 5 Searching permissions for group=2 mask=50
2017-05-09 15:45:14.667471 7fe82b142700 5 Permissions for group not found
2017-05-09 15:45:14.667472 7fe82b142700 5 Getting permissions id=nfstest owner=nfstest perm=2
2017-05-09 15:45:14.667473 7fe82b142700 10 uid=nfstest requested perm (type)=2, policy perm=2, user_perm_mask=2, acl perm=2
2017-05-09 15:45:14.667474 7fe82b142700 2 req 0:0.000172:: :put_obj:verifying op params
2017-05-09 15:45:19.878759 7fe7f58d7700 5 writefirefox-52.0.2esr.tar.bz2 failed write at position 57536512 (fails write transaction)
2017-05-09 15:45:19.927644 7fe7f50d6700 5 write firefox-52.0.2esr.tar.bz2 non-0 initial write position 56623104
2017-05-09 15:45:19.933975 7fe7f48d5700 5 write firefox-52.0.2esr.tar.bz2 non-0 initial write position 57671680
On clientside:
[stein@brumbar testing]$ cp /home/stein/Downloads/firefox-52.0.2esr.tar.bz2 /mnt/ceph/testing/
cp: error writing '/mnt/ceph/testing/firefox-52.0.2esr.tar.bz2': Input/output error
cp: failed to close '/mnt/ceph/testing/firefox-52.0.2esr.tar.bz2': Input/output error
Mount:
ceph-gateway-01:/ /mnt/ceph nfs4 rw,sync,relatime,vers=4.1,rsize=1048576,wsize=1048576,namlen=255,soft,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=10.80.103.51,local_lock=none,addr=10.84.200.233 0 0
Updated by Matt Benjamin almost 7 years ago
- Status changed from New to In Progress
- Assignee set to Matt Benjamin
Updated by Matt Benjamin almost 7 years ago
Hi Stein,
I did a by-hand verify of current v10.2.7 stable and upstream nfs-ganesha v2.4-stable, and got correct results. Then I looked at your logs. The error you're getting is due to us seeing an out-of-order write near the end of the write cycle, which is basically not expected when the mount is -osync.
It's up to the program to do the right thing, and I have not seen cp do any overlapped or out-of-order write that could result in this, though it's possible, I guess. What happens if you do the by-hand version of this:
cat /home/stein/Downloads/firefox-52.0.2esr.tar.bz2 > /mnt/ceph/testing/
For extra data, could you share your os runtime (userland and kernel) version?
Regards,
Matt
Matt
Updated by Stein Rosseland almost 7 years ago
Matt Benjamin wrote:
Hi Stein,
I did a by-hand verify of current v10.2.7 stable and upstream nfs-ganesha v2.4-stable, and got correct results. Then I looked at your logs. The error you're getting is due to us seeing an out-of-order write near the end of the write cycle, which is basically not expected when the mount is -osync.
It's up to the program to do the right thing, and I have not seen cp do any overlapped or out-of-order write that could result in this, though it's possible, I guess. What happens if you do the by-hand version of this:
cat /home/stein/Downloads/firefox-52.0.2esr.tar.bz2 > /mnt/ceph/testing/
Same error
For extra data, could you share your os runtime (userland and kernel) version?
Clientside: fedora 25, 4.10.12-200
Serverside(nfsgw): centos7.3, 3.10.0-327.13.1, nfs-ganesha-2.4.5-2 rebuilt from fedora26 src rpm.
Stein
Updated by Charles Alva over 5 years ago
I'm also having the same issue on Luminous 12.2.10 with latest nfs-ganesha for Ubuntu 16.04 from Ceph repository. nfs-ganesha always throws error on files bigger than 1MB.