Project

General

Profile

Actions

Bug #19894

open

Write fails on larger files on nfs-ganesha-rgw

Added by Stein Rosseland almost 7 years ago. Updated over 5 years ago.

Status:
In Progress
Priority:
Normal
Assignee:
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
rgw
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Writing files on nfs mounted rgw fail on all files larger than 1MB(?)

2017-05-09 15:45:14.600162 7fe82c144700 1 ====== process_request starting new request req=0x7fe82c142930 ======
2017-05-09 15:45:14.600225 7fe82c144700 2 req 0:0.000012:: :stat_obj:initializing for trans_id = tx000000000000000000000-005911c7ea-11eb9cc-default
2017-05-09 15:45:14.600251 7fe82c144700 2 req 0:0.000038:: :stat_obj:authorizing
2017-05-09 15:45:14.600256 7fe82c144700 2 req 0:0.000044:: :stat_obj:reading op permissions
2017-05-09 15:45:14.600303 7fe82c144700 15 decode_policy Read AccessControlPolicy<AccessControlPolicy xmlns="http://s3.amazonaws.com/doc/2006-03-01/&quot;&gt;&lt;Owner&gt;&lt;ID&gt;nfstest&lt;/ID&gt;&lt;DisplayName&gt;nfs test</DisplayName></Owner><AccessControlList><Grant><Grantee xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance&quot; xsi:type="CanonicalUser"><ID>nfstest</ID><DisplayName>nfs test</DisplayName></Grantee><Permission>FULL_CONTROL</Permission></Grant></AccessControlList></AccessControlPolicy>
2017-05-09 15:45:14.601844 7fe82c144700 10 manifest: total_size = 0
2017-05-09 15:45:14.601895 7fe82c144700 15 decode_policy Read AccessControlPolicy<AccessControlPolicy xmlns="http://s3.amazonaws.com/doc/2006-03-01/&quot;&gt;&lt;Owner&gt;&lt;ID&gt;nfstest&lt;/ID&gt;&lt;DisplayName&gt;nfs test</DisplayName></Owner><AccessControlList><Grant><Grantee xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance&quot; xsi:type="CanonicalUser"><ID>nfstest</ID><DisplayName>nfs test</DisplayName></Grantee><Permission>FULL_CONTROL</Permission></Grant></AccessControlList></AccessControlPolicy>
2017-05-09 15:45:14.601906 7fe82c144700 2 req 0:0.001693:: :stat_obj:init op
2017-05-09 15:45:14.601913 7fe82c144700 2 req 0:0.001701:: :stat_obj:verifying op mask
2017-05-09 15:45:14.601917 7fe82c144700 2 req 0:0.001705:: :stat_obj:verifying op permissions
2017-05-09 15:45:14.601928 7fe82c144700 5 Searching permissions for uid=nfstest mask=49
2017-05-09 15:45:14.601932 7fe82c144700 5 Found permission: 15
2017-05-09 15:45:14.601935 7fe82c144700 5 Searching permissions for group=1 mask=49
2017-05-09 15:45:14.601937 7fe82c144700 5 Permissions for group not found
2017-05-09 15:45:14.601940 7fe82c144700 5 Searching permissions for group=2 mask=49
2017-05-09 15:45:14.601942 7fe82c144700 5 Permissions for group not found
2017-05-09 15:45:14.601943 7fe82c144700 5 Getting permissions id=nfstest owner=nfstest perm=1
2017-05-09 15:45:14.601945 7fe82c144700 10 uid=nfstest requested perm (type)=1, policy perm=1, user_perm_mask=15, acl perm=1
2017-05-09 15:45:14.601948 7fe82c144700 2 req 0:0.001735:: :stat_obj:verifying op params
2017-05-09 15:45:14.601951 7fe82c144700 2 req 0:0.001739:: :stat_obj:executing
2017-05-09 15:45:14.602118 7fe82c144700 2 req 0:0.001905:: :stat_obj:http status=200
2017-05-09 15:45:14.602125 7fe82c144700 1 ====== process_request req done req=0x7fe82c142930 http_status=200 ======
2017-05-09 15:45:14.602147 7fe82c144700 10 lookup_fh lookup called on testing for /testing/firefox-52.0.2esr.tar.bz2 (firefox-52.0.2esr.tar.bz2)
2017-05-09 15:45:14.667319 7fe82b142700 2 req 0:0.000015:: ::initializing for trans_id = tx000000000000000000000-005911c7ea-11eb9cc-default
2017-05-09 15:45:14.667391 7fe82b142700 1 ====== start_request starting new continued request req=0x7fe87c02d850 ======
2017-05-09 15:45:14.667396 7fe82b142700 2 req 0:0.000093:: :put_obj:authorizing
2017-05-09 15:45:14.667400 7fe82b142700 2 req 0:0.000097:: :put_obj:reading op permissions
2017-05-09 15:45:14.667434 7fe82b142700 15 decode_policy Read AccessControlPolicy<AccessControlPolicy xmlns="http://s3.amazonaws.com/doc/2006-03-01/&quot;&gt;&lt;Owner&gt;&lt;ID&gt;nfstest&lt;/ID&gt;&lt;DisplayName&gt;nfs test</DisplayName></Owner><AccessControlList><Grant><Grantee xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance&quot; xsi:type="CanonicalUser"><ID>nfstest</ID><DisplayName>nfs test</DisplayName></Grantee><Permission>FULL_CONTROL</Permission></Grant></AccessControlList></AccessControlPolicy>
2017-05-09 15:45:14.667454 7fe82b142700 2 req 0:0.000151:: :put_obj:init op
2017-05-09 15:45:14.667458 7fe82b142700 2 req 0:0.000155:: :put_obj:verifying op mask
2017-05-09 15:45:14.667460 7fe82b142700 2 req 0:0.000158:: :put_obj:verifying op permissions
2017-05-09 15:45:14.667464 7fe82b142700 5 Searching permissions for uid=nfstest mask=50
2017-05-09 15:45:14.667466 7fe82b142700 5 Found permission: 15
2017-05-09 15:45:14.667467 7fe82b142700 5 Searching permissions for group=1 mask=50
2017-05-09 15:45:14.667469 7fe82b142700 5 Permissions for group not found
2017-05-09 15:45:14.667470 7fe82b142700 5 Searching permissions for group=2 mask=50
2017-05-09 15:45:14.667471 7fe82b142700 5 Permissions for group not found
2017-05-09 15:45:14.667472 7fe82b142700 5 Getting permissions id=nfstest owner=nfstest perm=2
2017-05-09 15:45:14.667473 7fe82b142700 10 uid=nfstest requested perm (type)=2, policy perm=2, user_perm_mask=2, acl perm=2
2017-05-09 15:45:14.667474 7fe82b142700 2 req 0:0.000172:: :put_obj:verifying op params
2017-05-09 15:45:19.878759 7fe7f58d7700 5 writefirefox-52.0.2esr.tar.bz2 failed write at position 57536512 (fails write transaction)
2017-05-09 15:45:19.927644 7fe7f50d6700 5 write firefox-52.0.2esr.tar.bz2 non-0 initial write position 56623104
2017-05-09 15:45:19.933975 7fe7f48d5700 5 write firefox-52.0.2esr.tar.bz2 non-0 initial write position 57671680

On clientside:
[stein@brumbar testing]$ cp /home/stein/Downloads/firefox-52.0.2esr.tar.bz2 /mnt/ceph/testing/
cp: error writing '/mnt/ceph/testing/firefox-52.0.2esr.tar.bz2': Input/output error
cp: failed to close '/mnt/ceph/testing/firefox-52.0.2esr.tar.bz2': Input/output error

Mount:
ceph-gateway-01:/ /mnt/ceph nfs4 rw,sync,relatime,vers=4.1,rsize=1048576,wsize=1048576,namlen=255,soft,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=10.80.103.51,local_lock=none,addr=10.84.200.233 0 0

Actions #1

Updated by Matt Benjamin almost 7 years ago

  • Status changed from New to In Progress
  • Assignee set to Matt Benjamin
Actions #2

Updated by Matt Benjamin almost 7 years ago

Hi Stein,

I did a by-hand verify of current v10.2.7 stable and upstream nfs-ganesha v2.4-stable, and got correct results. Then I looked at your logs. The error you're getting is due to us seeing an out-of-order write near the end of the write cycle, which is basically not expected when the mount is -osync.

It's up to the program to do the right thing, and I have not seen cp do any overlapped or out-of-order write that could result in this, though it's possible, I guess. What happens if you do the by-hand version of this:

cat /home/stein/Downloads/firefox-52.0.2esr.tar.bz2 > /mnt/ceph/testing/

For extra data, could you share your os runtime (userland and kernel) version?

Regards,

Matt

Matt

Actions #3

Updated by Stein Rosseland almost 7 years ago

Matt Benjamin wrote:

Hi Stein,

I did a by-hand verify of current v10.2.7 stable and upstream nfs-ganesha v2.4-stable, and got correct results. Then I looked at your logs. The error you're getting is due to us seeing an out-of-order write near the end of the write cycle, which is basically not expected when the mount is -osync.

It's up to the program to do the right thing, and I have not seen cp do any overlapped or out-of-order write that could result in this, though it's possible, I guess. What happens if you do the by-hand version of this:

cat /home/stein/Downloads/firefox-52.0.2esr.tar.bz2 > /mnt/ceph/testing/

Same error

For extra data, could you share your os runtime (userland and kernel) version?

Clientside: fedora 25, 4.10.12-200
Serverside(nfsgw): centos7.3, 3.10.0-327.13.1, nfs-ganesha-2.4.5-2 rebuilt from fedora26 src rpm.

Stein

Actions #4

Updated by Charles Alva over 5 years ago

I'm also having the same issue on Luminous 12.2.10 with latest nfs-ganesha for Ubuntu 16.04 from Ceph repository. nfs-ganesha always throws error on files bigger than 1MB.

Actions

Also available in: Atom PDF