Project

General

Profile

Actions

Bug #9869

closed

Client: not handling cap_flush_ack messages properly

Added by Greg Farnum over 9 years ago. Updated almost 8 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Support
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Client
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

We saw a log segment that contained this:

2014-10-22 17:27:55.722670 7f57870bb700 20 client.812098  reflushing caps on 1000005371d.head(ref=2 cap_refs={1024=1,2048=0,4096=0,8192=0} open={1=0,3=0} mode=100644 size=30573527040 mtime=2014-10-14 07:46:27.734079 caps=pAsLsXsFscr(0=pAsLsXsFscr) flushing_caps=AxFw objectset[1000005371d ts 0/0 objects 1 dirty_or_tx 0] parents=0x2fd1500 0x2fbc380) to mds.0
2014-10-22 17:27:55.722680 7f57870bb700 10 client.812098 flush_caps 0x2fbc380 mds.0
2014-10-22 17:27:55.722682 7f57870bb700 10 client.812098 send_cap 1000005371d.head(ref=2 cap_refs={1024=1,2048=0,4096=0,8192=0} open={1=0,3=0} mode=100644 size=30573527040 mtime=2014-10-14 07:46:27.734079 caps=pAsLsXsFscr(0=pAsLsXsFscr) flushing_caps=AxFw objectset[1000005371d ts 0/0 objects 1 dirty_or_tx 0] parents=0x2fd1500 0x2fbc380) mds.0 seq 0 used Fc want Fc flush AxFw retain pFc held pAsLsXsFscr revoking - dropping AsLsXsFsr
2014-10-22 17:27:55.722701 7f57870bb700 15 client.812098 auth cap, setting max_size = 0
2014-10-22 17:27:55.722702 7f57870bb700  1 -- 10.2.0.251:0/3996 --> 10.2.0.243:6800/2031 -- client_caps(update ino 1000005371d 233 seq 0 tid 203603 caps=pFc dirty=AxFw wanted=Fc follows 1 size 30573527040/0 ts 1 mtime 2014-10-14 07:46:27.734079) v2 -- ?+0 0x24a50680 con 0x152e69e0
2014-10-22 17:27:55.722718 7f57870bb700 20 client.812098  reflushing caps on 1000003ede9.head(ref=2 cap_refs={1024=1,2048=0,4096=0,8192=0} open={1=0,3=0} mode=100644 size=1558511616 mtime=2014-10-21 12:57:06.215061 caps=pAsLsXsFsxcrwb(0=pAsLsXsFsxcrwb) flushing_caps=AxFw objectset[1000003ede9 ts 0/0 objects 0 dirty_or_tx 0] parents=0x1660c670 0x2fbd100) to mds.0
...
...
...
2014-10-22 17:27:55.893457 7f57870bb700  1 -- 10.2.0.251:0/3996 <== mds.0 10.2.0.243:6800/2031 30 ==== client_caps(flush_ack ino 1000005371d 233 seq 0 tid 203603 caps=pFc dirty=AxFw wanted=- follows 0 size 0/0 mtime 0.000000) v2 ==== 180+0+0 (2860243907 0 0) 0x9ad6e80 con 0x152e69e0
2014-10-22 17:27:55.893476 7f57870bb700 10 client.812098  mds.0 seq now 26
2014-10-22 17:27:55.893481 7f57870bb700  5 client.812098 handle_cap_flush_ack mds.0 cleaned - on 1000005371d.head(ref=2 cap_refs={1024=1,2048=0,4096=0,8192=0} open={1=0,3=0} mode=100644 size=30573527040 mtime=2014-10-14 07:46:27.734079 caps=pFc(0=pFc) flushing_caps=AxFw objectset[1000005371d ts 0/0 objects 1 dirty_or_tx 0] parents=0x2fd1500 0x2fbc380) with AxFw
2014-10-22 17:27:55.893499 7f57870bb700 10 client.812098  tid 203603 != any cap bit tids

...and there are no intervening references to the inode.

This was manifesting to users as a client mount which was refusing to accept setattr updates (in particular, chmod commands). It turns out to be because we're comparing a 16-bit and 64-bit value in Client::handle_cap_flush_ack, in determining whether the flush_ack we're seeing is matched appropriately with the latest one we sent. Fix!

Actions

Also available in: Atom PDF