Project

General

Profile

Actions

Bug #9869

closed

Client: not handling cap_flush_ack messages properly

Added by Greg Farnum over 9 years ago. Updated almost 8 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Support
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Client
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

We saw a log segment that contained this:

2014-10-22 17:27:55.722670 7f57870bb700 20 client.812098  reflushing caps on 1000005371d.head(ref=2 cap_refs={1024=1,2048=0,4096=0,8192=0} open={1=0,3=0} mode=100644 size=30573527040 mtime=2014-10-14 07:46:27.734079 caps=pAsLsXsFscr(0=pAsLsXsFscr) flushing_caps=AxFw objectset[1000005371d ts 0/0 objects 1 dirty_or_tx 0] parents=0x2fd1500 0x2fbc380) to mds.0
2014-10-22 17:27:55.722680 7f57870bb700 10 client.812098 flush_caps 0x2fbc380 mds.0
2014-10-22 17:27:55.722682 7f57870bb700 10 client.812098 send_cap 1000005371d.head(ref=2 cap_refs={1024=1,2048=0,4096=0,8192=0} open={1=0,3=0} mode=100644 size=30573527040 mtime=2014-10-14 07:46:27.734079 caps=pAsLsXsFscr(0=pAsLsXsFscr) flushing_caps=AxFw objectset[1000005371d ts 0/0 objects 1 dirty_or_tx 0] parents=0x2fd1500 0x2fbc380) mds.0 seq 0 used Fc want Fc flush AxFw retain pFc held pAsLsXsFscr revoking - dropping AsLsXsFsr
2014-10-22 17:27:55.722701 7f57870bb700 15 client.812098 auth cap, setting max_size = 0
2014-10-22 17:27:55.722702 7f57870bb700  1 -- 10.2.0.251:0/3996 --> 10.2.0.243:6800/2031 -- client_caps(update ino 1000005371d 233 seq 0 tid 203603 caps=pFc dirty=AxFw wanted=Fc follows 1 size 30573527040/0 ts 1 mtime 2014-10-14 07:46:27.734079) v2 -- ?+0 0x24a50680 con 0x152e69e0
2014-10-22 17:27:55.722718 7f57870bb700 20 client.812098  reflushing caps on 1000003ede9.head(ref=2 cap_refs={1024=1,2048=0,4096=0,8192=0} open={1=0,3=0} mode=100644 size=1558511616 mtime=2014-10-21 12:57:06.215061 caps=pAsLsXsFsxcrwb(0=pAsLsXsFsxcrwb) flushing_caps=AxFw objectset[1000003ede9 ts 0/0 objects 0 dirty_or_tx 0] parents=0x1660c670 0x2fbd100) to mds.0
...
...
...
2014-10-22 17:27:55.893457 7f57870bb700  1 -- 10.2.0.251:0/3996 <== mds.0 10.2.0.243:6800/2031 30 ==== client_caps(flush_ack ino 1000005371d 233 seq 0 tid 203603 caps=pFc dirty=AxFw wanted=- follows 0 size 0/0 mtime 0.000000) v2 ==== 180+0+0 (2860243907 0 0) 0x9ad6e80 con 0x152e69e0
2014-10-22 17:27:55.893476 7f57870bb700 10 client.812098  mds.0 seq now 26
2014-10-22 17:27:55.893481 7f57870bb700  5 client.812098 handle_cap_flush_ack mds.0 cleaned - on 1000005371d.head(ref=2 cap_refs={1024=1,2048=0,4096=0,8192=0} open={1=0,3=0} mode=100644 size=30573527040 mtime=2014-10-14 07:46:27.734079 caps=pFc(0=pFc) flushing_caps=AxFw objectset[1000005371d ts 0/0 objects 1 dirty_or_tx 0] parents=0x2fd1500 0x2fbc380) with AxFw
2014-10-22 17:27:55.893499 7f57870bb700 10 client.812098  tid 203603 != any cap bit tids

...and there are no intervening references to the inode.

This was manifesting to users as a client mount which was refusing to accept setattr updates (in particular, chmod commands). It turns out to be because we're comparing a 16-bit and 64-bit value in Client::handle_cap_flush_ack, in determining whether the flush_ack we're seeing is matched appropriately with the latest one we sent. Fix!

Actions #1

Updated by Greg Farnum over 9 years ago

  • Status changed from New to 7

Waiting for this to build so it can be tested.

Actions #2

Updated by Greg Farnum over 9 years ago

  • Status changed from 7 to Pending Backport

I tested this manually with a patch that sets the starting tid value to 65535 and looking at the logs. That causes immediate failures recognizing cap flush acks on master, but with this patch applied everything went just fine.

https://github.com/ceph/ceph/pull/2786

Actions #3

Updated by Greg Farnum over 9 years ago

  • Status changed from Pending Backport to Resolved
Actions #4

Updated by Greg Farnum almost 8 years ago

  • Component(FS) Client added
Actions

Also available in: Atom PDF