Actions
Bug #9869
closedClient: not handling cap_flush_ack messages properly
% Done:
0%
Source:
Support
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Client
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
We saw a log segment that contained this:
2014-10-22 17:27:55.722670 7f57870bb700 20 client.812098 reflushing caps on 1000005371d.head(ref=2 cap_refs={1024=1,2048=0,4096=0,8192=0} open={1=0,3=0} mode=100644 size=30573527040 mtime=2014-10-14 07:46:27.734079 caps=pAsLsXsFscr(0=pAsLsXsFscr) flushing_caps=AxFw objectset[1000005371d ts 0/0 objects 1 dirty_or_tx 0] parents=0x2fd1500 0x2fbc380) to mds.0 2014-10-22 17:27:55.722680 7f57870bb700 10 client.812098 flush_caps 0x2fbc380 mds.0 2014-10-22 17:27:55.722682 7f57870bb700 10 client.812098 send_cap 1000005371d.head(ref=2 cap_refs={1024=1,2048=0,4096=0,8192=0} open={1=0,3=0} mode=100644 size=30573527040 mtime=2014-10-14 07:46:27.734079 caps=pAsLsXsFscr(0=pAsLsXsFscr) flushing_caps=AxFw objectset[1000005371d ts 0/0 objects 1 dirty_or_tx 0] parents=0x2fd1500 0x2fbc380) mds.0 seq 0 used Fc want Fc flush AxFw retain pFc held pAsLsXsFscr revoking - dropping AsLsXsFsr 2014-10-22 17:27:55.722701 7f57870bb700 15 client.812098 auth cap, setting max_size = 0 2014-10-22 17:27:55.722702 7f57870bb700 1 -- 10.2.0.251:0/3996 --> 10.2.0.243:6800/2031 -- client_caps(update ino 1000005371d 233 seq 0 tid 203603 caps=pFc dirty=AxFw wanted=Fc follows 1 size 30573527040/0 ts 1 mtime 2014-10-14 07:46:27.734079) v2 -- ?+0 0x24a50680 con 0x152e69e0 2014-10-22 17:27:55.722718 7f57870bb700 20 client.812098 reflushing caps on 1000003ede9.head(ref=2 cap_refs={1024=1,2048=0,4096=0,8192=0} open={1=0,3=0} mode=100644 size=1558511616 mtime=2014-10-21 12:57:06.215061 caps=pAsLsXsFsxcrwb(0=pAsLsXsFsxcrwb) flushing_caps=AxFw objectset[1000003ede9 ts 0/0 objects 0 dirty_or_tx 0] parents=0x1660c670 0x2fbd100) to mds.0 ... ... ... 2014-10-22 17:27:55.893457 7f57870bb700 1 -- 10.2.0.251:0/3996 <== mds.0 10.2.0.243:6800/2031 30 ==== client_caps(flush_ack ino 1000005371d 233 seq 0 tid 203603 caps=pFc dirty=AxFw wanted=- follows 0 size 0/0 mtime 0.000000) v2 ==== 180+0+0 (2860243907 0 0) 0x9ad6e80 con 0x152e69e0 2014-10-22 17:27:55.893476 7f57870bb700 10 client.812098 mds.0 seq now 26 2014-10-22 17:27:55.893481 7f57870bb700 5 client.812098 handle_cap_flush_ack mds.0 cleaned - on 1000005371d.head(ref=2 cap_refs={1024=1,2048=0,4096=0,8192=0} open={1=0,3=0} mode=100644 size=30573527040 mtime=2014-10-14 07:46:27.734079 caps=pFc(0=pFc) flushing_caps=AxFw objectset[1000005371d ts 0/0 objects 1 dirty_or_tx 0] parents=0x2fd1500 0x2fbc380) with AxFw 2014-10-22 17:27:55.893499 7f57870bb700 10 client.812098 tid 203603 != any cap bit tids
...and there are no intervening references to the inode.
This was manifesting to users as a client mount which was refusing to accept setattr updates (in particular, chmod commands). It turns out to be because we're comparing a 16-bit and 64-bit value in Client::handle_cap_flush_ack, in determining whether the flush_ack we're seeing is matched appropriately with the latest one we sent. Fix!
Updated by Greg Farnum over 9 years ago
- Status changed from New to 7
Waiting for this to build so it can be tested.
Updated by Greg Farnum over 9 years ago
- Status changed from 7 to Pending Backport
I tested this manually with a patch that sets the starting tid value to 65535 and looking at the logs. That causes immediate failures recognizing cap flush acks on master, but with this patch applied everything went just fine.
Updated by Greg Farnum over 9 years ago
- Status changed from Pending Backport to Resolved
Actions