Project

General

Profile

Tasks #862

Ceph - Bug #859: Ceph does not pass fsstress

cap_refs[CEPH_CAP_FILE_BUFFER] isn't cleared if truncation zaps changes

Added by Greg Farnum almost 9 years ago. Updated over 3 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

0%

Tags:
Reviewed:
Affected Versions:
Component(FS):
Labels (FS):
Pull request ID:

Description

2011-03-03 22:13:50.562815 7f57530a5720 client4102 ino 10000000049 has 1 uncommitted, waiting
2011-03-03 22:13:50.634764 7f574f7ee710 -- 10.0.1.205:0/26734 <== osd0 10.0.1.205:6800/26662 81 ==== osd_op_reply(81 10000000087.00000000 [write 1036288~5604] ondisk = 0) v1 ==== 106+0+0 (3755548449 0 0) 0x1f5a380 con 0x1f5d500
2011-03-03 22:13:50.634796 7f574f7ee710 client4102 _flushed 10000000087.head( cap_refs={1024=0,4096=0,8192=1} open={2=0} ref=4 caps=pAsxLsXsxFsxcrwb mode=100666 mtime=2011-03-03 22:13:50.517223 dirty_caps=Fw parent=0x20087e0)
2011-03-03 22:13:50.634815 7f574f7ee710 client4102 check_caps on 10000000087.head( cap_refs={1024=0,4096=0,8192=0} open={2=0} ref=4 caps=pAsxLsXsxFsxcrwb mode=100666 mtime=2011-03-03 22:13:50.517223 dirty_caps=Fw parent=0x20087e0) wanted - used - is_delayed=0
2011-03-03 22:13:50.634831 7f574f7ee710 client4102 cap_delay_requeue on 10000000087.head( cap_refs={1024=0,4096=0,8192=0} open={2=0} ref=4 caps=pAsxLsXsxFsxcrwb mode=100666 mtime=2011-03-03 22:13:50.517223 dirty_caps=Fw parent=0x20087e0)
2011-03-03 22:13:50.634840 7f574f7ee710 client4102  cap mds0 issued pAsxLsXsxFsxcrwb implemented pAsxLsXsxFsxcrwb revoking -
2011-03-03 22:13:50.634844 7f574f7ee710 client4102 delaying cap release
2011-03-03 22:13:50.634853 7f574f7ee710 client4102 put_cap_ref dropped last FILE_BUFFER ref on 10000000087.head( cap_refs={1024=0,4096=0,8192=0} open={2=0} ref=4 caps=pAsxLsXsxFsxcrwb mode=100666 mtime=2011-03-03 22:13:50.517223 dirty_caps=Fw parent=0x20087e0) 

There's no further reference to that inode or the thread that's waiting on the waitfor_commit list, despite the fact that the "put_cap_ref dropped last FILE_BUFFER" output comes immediately after a signal to all the conds on that list.

Logs in kai:~gregf/logs/client_request_hang

Associated revisions

Revision cf6b1de4 (diff)
Added by Greg Farnum almost 9 years ago

uclient: Clear the CEPH_CAP_FILE_BUFFER ref on _flush, if safe.

Previously we just returned if safe, but leaving the CEPH_CAP_FILE_BUFFER
ref around breaks _fsync horribly. The root cause of this is
update_inode_file_bits calling objectcacher->truncate_set without
clearing the BUFFER ref, but the mechanics of clearing it there are
complicated, and I don't believe there are any issues with keeping
around the extra reference, as long as it's cleared when necessary.

This should fix #862.

Signed-off-by: Greg Farnum <>

History

#1 Updated by Greg Farnum almost 9 years ago

  • Status changed from New to In Progress
  • Assignee set to Greg Farnum

#2 Updated by Greg Farnum almost 9 years ago

  • Subject changed from client request hang during fsstress to client wait_on_list() & signal_cond_list() is broken somehow

#3 Updated by Greg Farnum almost 9 years ago

  • Category changed from 1 to 11

#4 Updated by Greg Farnum almost 9 years ago

Oh well, duh, those aren't the same inode. So for some reason the cap_refs[CEPH_CAP_FILE_BUFFER] count is off, or not resolving, or something.

Going through the logs, the client isn't sending a message to the OSD to write out the data in ino 10000000049 for some reason. I guess I'll try and reproduce with objectcacher logging on to see what's happening.

#5 Updated by Greg Farnum almost 9 years ago

Okay, looks like the problem has to do with update_inode_file_bits calling objectcacher->truncate_set(). This:
1) calls delete on the Inode::oset if empty! (inappropriate since Inode::oset is a data member, not a pointer),
2) removes data that might have a pending write, without clearing the in->cap_refs[CEPH_CAP_FILE_BUFFER] flag.

#6 Updated by Greg Farnum almost 9 years ago

Ah, so it doesn't call delete on the oset. That was just me misreading the code.

However, I still believe it does clear data that might have a pending write, without clearing the appropriate flags.

#7 Updated by Greg Farnum almost 9 years ago

  • Tracker changed from Tasks to Bug
  • Subject changed from client wait_on_list() & signal_cond_list() is broken somehow to cap_refs[CEPH_CAP_FILE_BUFFER] isn't cleared if truncation zaps changes
  • Status changed from In Progress to Resolved

Pushed to stable in commit:cf6b1de4a692ca0f3e86a600bcf4642723ccade7, and merged stable into master.

#8 Updated by John Spray over 3 years ago

  • Project changed from Ceph to fs
  • Category deleted (11)
  • Target version deleted (v0.25.2)

Bulk updating project=ceph category=ceph-fuse issues to move to fs project so that we can remove the ceph-fuse category from the ceph project

Also available in: Atom PDF