Project

General

Profile

Bug #9869

Client: not handling cap_flush_ack messages properly

Added by Greg Farnum almost 5 years ago. Updated about 3 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
Target version:
-
Start date:
10/22/2014
Due date:
% Done:

0%

Source:
Support
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Client
Labels (FS):
Pull request ID:

Description

We saw a log segment that contained this:

2014-10-22 17:27:55.722670 7f57870bb700 20 client.812098  reflushing caps on 1000005371d.head(ref=2 cap_refs={1024=1,2048=0,4096=0,8192=0} open={1=0,3=0} mode=100644 size=30573527040 mtime=2014-10-14 07:46:27.734079 caps=pAsLsXsFscr(0=pAsLsXsFscr) flushing_caps=AxFw objectset[1000005371d ts 0/0 objects 1 dirty_or_tx 0] parents=0x2fd1500 0x2fbc380) to mds.0
2014-10-22 17:27:55.722680 7f57870bb700 10 client.812098 flush_caps 0x2fbc380 mds.0
2014-10-22 17:27:55.722682 7f57870bb700 10 client.812098 send_cap 1000005371d.head(ref=2 cap_refs={1024=1,2048=0,4096=0,8192=0} open={1=0,3=0} mode=100644 size=30573527040 mtime=2014-10-14 07:46:27.734079 caps=pAsLsXsFscr(0=pAsLsXsFscr) flushing_caps=AxFw objectset[1000005371d ts 0/0 objects 1 dirty_or_tx 0] parents=0x2fd1500 0x2fbc380) mds.0 seq 0 used Fc want Fc flush AxFw retain pFc held pAsLsXsFscr revoking - dropping AsLsXsFsr
2014-10-22 17:27:55.722701 7f57870bb700 15 client.812098 auth cap, setting max_size = 0
2014-10-22 17:27:55.722702 7f57870bb700  1 -- 10.2.0.251:0/3996 --> 10.2.0.243:6800/2031 -- client_caps(update ino 1000005371d 233 seq 0 tid 203603 caps=pFc dirty=AxFw wanted=Fc follows 1 size 30573527040/0 ts 1 mtime 2014-10-14 07:46:27.734079) v2 -- ?+0 0x24a50680 con 0x152e69e0
2014-10-22 17:27:55.722718 7f57870bb700 20 client.812098  reflushing caps on 1000003ede9.head(ref=2 cap_refs={1024=1,2048=0,4096=0,8192=0} open={1=0,3=0} mode=100644 size=1558511616 mtime=2014-10-21 12:57:06.215061 caps=pAsLsXsFsxcrwb(0=pAsLsXsFsxcrwb) flushing_caps=AxFw objectset[1000003ede9 ts 0/0 objects 0 dirty_or_tx 0] parents=0x1660c670 0x2fbd100) to mds.0
...
...
...
2014-10-22 17:27:55.893457 7f57870bb700  1 -- 10.2.0.251:0/3996 <== mds.0 10.2.0.243:6800/2031 30 ==== client_caps(flush_ack ino 1000005371d 233 seq 0 tid 203603 caps=pFc dirty=AxFw wanted=- follows 0 size 0/0 mtime 0.000000) v2 ==== 180+0+0 (2860243907 0 0) 0x9ad6e80 con 0x152e69e0
2014-10-22 17:27:55.893476 7f57870bb700 10 client.812098  mds.0 seq now 26
2014-10-22 17:27:55.893481 7f57870bb700  5 client.812098 handle_cap_flush_ack mds.0 cleaned - on 1000005371d.head(ref=2 cap_refs={1024=1,2048=0,4096=0,8192=0} open={1=0,3=0} mode=100644 size=30573527040 mtime=2014-10-14 07:46:27.734079 caps=pFc(0=pFc) flushing_caps=AxFw objectset[1000005371d ts 0/0 objects 1 dirty_or_tx 0] parents=0x2fd1500 0x2fbc380) with AxFw
2014-10-22 17:27:55.893499 7f57870bb700 10 client.812098  tid 203603 != any cap bit tids

...and there are no intervening references to the inode.

This was manifesting to users as a client mount which was refusing to accept setattr updates (in particular, chmod commands). It turns out to be because we're comparing a 16-bit and 64-bit value in Client::handle_cap_flush_ack, in determining whether the flush_ack we're seeing is matched appropriately with the latest one we sent. Fix!

Associated revisions

Revision fabd4b57 (diff)
Added by Greg Farnum almost 5 years ago

client: cast m->get_client_tid() to compare to 16-bit Inode::flushing_cap_tid

m->get_client_tid() is 64 bits (as it should be), but Inode::flushing_cap_tid
is only 16 bits. 16 bits should be plenty to let the cap flush updates
pipeline appropriately, but we need to cast in the proper direction when
comparing these differently-sized versions. So downcast the 64-bit one
to 16 bits.

Fixes: #9869

Signed-off-by: Greg Farnum <>
(cherry picked from commit 7cda0e52924787f4be6f80cf7c3edcef1c995728)

Revision a5184cf4 (diff)
Added by Greg Farnum almost 5 years ago

client: cast m->get_client_tid() to compare to 16-bit Inode::flushing_cap_tid

m->get_client_tid() is 64 bits (as it should be), but Inode::flushing_cap_tid
is only 16 bits. 16 bits should be plenty to let the cap flush updates
pipeline appropriately, but we need to cast in the proper direction when
comparing these differently-sized versions. So downcast the 64-bit one
to 16 bits.

Fixes: #9869
Backport: giant, firefly, dumpling

Signed-off-by: Greg Farnum <>

Revision c20a2421 (diff)
Added by Greg Farnum almost 5 years ago

client: cast m->get_client_tid() to compare to 16-bit Inode::flushing_cap_tid

m->get_client_tid() is 64 bits (as it should be), but Inode::flushing_cap_tid
is only 16 bits. 16 bits should be plenty to let the cap flush updates
pipeline appropriately, but we need to cast in the proper direction when
comparing these differently-sized versions. So downcast the 64-bit one
to 16 bits.

Fixes: #9869
Backport: giant, firefly, dumpling

Signed-off-by: Greg Farnum <>
(cherry picked from commit a5184cf46a6e867287e24aeb731634828467cd98)

Revision 905aba2f (diff)
Added by Greg Farnum almost 5 years ago

client: cast m->get_client_tid() to compare to 16-bit Inode::flushing_cap_tid

m->get_client_tid() is 64 bits (as it should be), but Inode::flushing_cap_tid
is only 16 bits. 16 bits should be plenty to let the cap flush updates
pipeline appropriately, but we need to cast in the proper direction when
comparing these differently-sized versions. So downcast the 64-bit one
to 16 bits.

Fixes: #9869
Backport: giant, firefly, dumpling

Signed-off-by: Greg Farnum <>
(cherry picked from commit a5184cf46a6e867287e24aeb731634828467cd98)

History

#1 Updated by Greg Farnum almost 5 years ago

  • Status changed from New to Testing

Waiting for this to build so it can be tested.

#2 Updated by Greg Farnum almost 5 years ago

  • Status changed from Testing to Pending Backport

I tested this manually with a patch that sets the starting tid value to 65535 and looking at the logs. That causes immediate failures recognizing cap flush acks on master, but with this patch applied everything went just fine.

https://github.com/ceph/ceph/pull/2786

#3 Updated by Greg Farnum almost 5 years ago

  • Status changed from Pending Backport to Resolved

#4 Updated by Greg Farnum about 3 years ago

  • Component(FS) Client added

Also available in: Atom PDF