Project

General

Profile

Actions

Feature #10498

open

ObjectCacher: order wakeups when write calls block on throttling

Added by Greg Farnum over 9 years ago. Updated almost 8 years ago.

Status:
New
Priority:
High
Assignee:
-
Category:
Correctness/Safety
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Reviewed:
Affected Versions:
Component(FS):
osdc
Labels (FS):
Pull request ID:

Description

The ObjectCacher can block write calls if the dirty data limits are exceeded by sleeping on a cond. Unfortunately, we have seen cases where a single write call blocks for hours while others continue to go through.

Josh thinks the way to fix this is by introducing ordered wakeups when we hit this case, which makes sense to me. Strict ordering isn't great but should be sufficient.

Previously:
http://qa-proxy.ceph.com/teuthology/teuthology-2015-01-04_23:04:01-fs-next-testing-basic-multi/686605/

There's nothing apparent there, it starts up and does some writes and then the client log ends, and eventually teuthology times out the run. I'm not sure if this is a real bug or some kind of unrelated hangup, but I'm archiving it for posterity.

Actions #1

Updated by Greg Farnum about 9 years ago

  • Priority changed from Normal to High

http://qa-proxy.ceph.com/teuthology/teuthology-2015-01-14_23:04:01-fs-master-testing-basic-multi/704465/

We've seen this repeat a couple times now. This run includes a client log that will hopefully let us diagnose the cause of what I presume is a hang or deadlock of some kind (but it has a clean shutdown at the end!).

Actions #2

Updated by Zheng Yan about 9 years ago

  • Status changed from New to Fix Under Review
Actions #3

Updated by Zheng Yan about 9 years ago

ObjectCacher::file_write did hang about 3 hours

15 20:26:02.964206 7f008ffff700  3 client.4109 ll_write 0x7f009505da60 10000000005 786432~131072
2015-01-15 20:26:02.964306 7f008ffff700 10 client.4109 get_caps 10000000005.head(ref=4 ll_ref=1 cap_refs={1024=0,4096=0,8192=0} open={2=1} mode=100740 size=1048576/4194304 mtime=2015-01-15 20:26:02.963849 caps=pAsxLsXsxFsxcrwb(0=pAsxLsXsxFsxcrwb) dirty_caps=Fw objectset[10000000005 ts 0/0 objects 0 dirty_or_tx 0] parents=0x7f00be392210 0x7f00d8014bb0) have pAsxLsXsxFsxcrwb need Fw want Fb but not Fb revoking -
2015-01-15 20:26:02.964331 7f008ffff700  5 client.4109 get_cap_ref got first FILE_BUFFER ref on 10000000005.head(ref=4 ll_ref=1 cap_refs={1024=0,4096=1,8192=0} open={2=1} mode=100740 size=1048576/4194304 mtime=2015-01-15 20:26:02.963849 caps=pAsxLsXsxFsxcrwb(0=pAsxLsXsxFsxcrwb) dirty_caps=Fw objectset[10000000005 ts 0/0 objects 0 dirty_or_tx 0] parents=0x7f00be392210 0x7f00d8014bb0)
2015-01-15 20:26:02.964341 7f008ffff700  5 client.4109 get_cap_ref got first FILE_CACHE ref on 10000000005.head(ref=5 ll_ref=1 cap_refs={1024=0,4096=1,8192=0} open={2=1} mode=100740 size=1048576/4194304 mtime=2015-01-15 20:26:02.963849 caps=pAsxLsXsxFsxcrwb(0=pAsxLsXsxFsxcrwb) dirty_caps=Fw objectset[10000000005 ts 0/0 objects 0 dirty_or_tx 0] parents=0x7f00be392210 0x7f00d8014bb0)
2015-01-15 20:26:07.291129 7f00eeffd700  1 -- 10.214.135.28:0/20743 --> 10.214.132.13:6800/7046 -- osd_op(client.4109.0:74618 10000000005.00000000 [write 786432~131072] 2.be6d1a49 snapc 1=[] ondisk+write+known_if_redirected e6) v4 -- ?+0 0x7f00ae8b2280 con 0x7f00ac01e3b0
2015-01-15 20:26:07.535951 7f00ec2f3700  1 -- 10.214.135.28:0/20743 <== osd.2 10.214.132.13:6800/7046 10593 ==== osd_op_reply(74618 10000000005.00000000 [write 786432~131072] v6'10593 uv10593 ondisk = 0) v6 ==== 187+0+0 (2371123278 0 0) 0x7f00a400fa10 con 0x7f00ac01e3b0
2015-01-15 20:26:07.536013 7f00f491b700 10 client.4109 _flushed 10000000005.head(ref=6 ll_ref=1 cap_refs={1024=1,4096=1,8192=2} open={2=1} mode=100740 size=1048576/4194304 mtime=2015-01-15 20:26:02.963849 caps=pAsxLsXsxFsxcrwb(0=pAsxLsXsxFsxcrwb) dirty_caps=Fw objectset[10000000005 ts 0/0 objects 1 dirty_or_tx 0] parents=0x7f00be392210 0x7f00d8014bb0)

...

2015-01-15 23:24:48.994871 7f008ffff700  5 client.4109 put_cap_ref dropped last FILE_BUFFER ref on 10000000005.head(ref=4 ll_ref=1 cap_refs={1024=0,4096=1,8192=0} open={2=1} mode=100740 size=1048576/4194304 mtime=2015-01-15 20:26:02.963849 caps=pAsxLsXsxFsxcrwb(0=pAsxLsXsxFsxcrwb) objectset[10000000005 ts 0/0 objects 0 dirty_or_tx 0] parents=0x7f00be392210 0x7f00d8014bb0)
2015-01-15 23:24:48.994939 7f008ffff700 10 client.4109 put_inode on 10000000005.head(ref=4 ll_ref=1 cap_refs={1024=0,4096=1,8192=0} open={2=1} mode=100740 size=1048576/4194304 mtime=2015-01-15 20:26:02.963849 caps=pAsxLsXsxFsxcrwb(0=pAsxLsXsxFsxcrwb) objectset[10000000005 ts 0/0 objects 0 dirty_or_tx 0] parents=0x7f00be392210 0x7f00d8014bb0)
2015-01-15 23:24:48.994951 7f008ffff700  7 client.4109 wrote to 917504, leaving file size at 1048576
2015-01-15 23:24:48.994952 7f008ffff700 10 client.4109 mark_caps_dirty 10000000005.head(ref=3 ll_ref=1 cap_refs={1024=0,4096=1,8192=0} open={2=1} mode=100740 size=1048576/4194304 mtime=2015-01-15 23:24:48.994952 caps=pAsxLsXsxFsxcrwb(0=pAsxLsXsxFsxcrwb) objectset[10000000005 ts 0/0 objects 0 dirty_or_tx 0] parents=0x7f00be392210 0x7f00d8014bb0) - -> Fw
2015-01-15 23:24:48.995208 7f008ffff700  3 client.4109 ll_write 0x7f009505da60 786432~131072 = 131072

I also notice that another ObjectSet's dirty_or_rx kept above 100M (default value of client_oc_max_dirty) while "ll_write 0x7f009505da60 10000000005" hang.

2015-01-15 23:24:48.821040 7f00e5ffb700 10 client.4109 get_caps 10000000004.head(ref=6 ll_ref=1 cap_refs={1024=1,4096=0,8192=1} open={2=1} mode=100400 size=268435456/1073741824 mtime=2015-01-15 23:24:48.820915 caps=pAsxLsXsxFsxcrwb(0=pAsxLsXsxFsxcrwb) dirty_caps=Fw objectset[10000000004 ts 0/0 objects 64 dirty_or_tx 104980480] parents=0x7f00c40365b0 0x7f00d801cdb0) have pAsxLsXsxFsxcrwb need Fw want Fb but not Fb revoking -
2015-01-15 23:24:48.821083 7f00e5ffb700 10 client.4109 mark_caps_dirty 10000000004.head(ref=6 ll_ref=1 cap_refs={1024=1,4096=1,8192=1} open={2=1} mode=100400 size=268435456/1073741824 mtime=2015-01-15 23:24:48.821083 caps=pAsxLsXsxFsxcrwb(0=pAsxLsXsxFsxcrwb) dirty_caps=Fw objectset[10000000004 ts 0/0 objects 64 dirty_or_tx 104984576] parents=0x7f00c40365b0 0x7f00d801cdb0) Fw -> Fw
2015-01-15 23:24:48.821161 7f00e57fa700 10 client.4109 get_caps 10000000004.head(ref=6 ll_ref=1 cap_refs={1024=1,4096=0,8192=1} open={2=1} mode=100400 size=268435456/1073741824 mtime=2015-01-15 23:24:48.821083 caps=pAsxLsXsxFsxcrwb(0=pAsxLsXsxFsxcrwb) dirty_caps=Fw objectset[10000000004 ts 0/0 objects 64 dirty_or_tx 104984576] parents=0x7f00c40365b0 0x7f00d801cdb0) have pAsxLsXsxFsxcrwb need Fw want Fb but not Fb revoking -
2015-01-15 23:24:48.825833 7f00e57fa700 10 client.4109 mark_caps_dirty 10000000004.head(ref=6 ll_ref=1 cap_refs={1024=1,4096=1,8192=1} open={2=1} mode=100400 size=268435456/1073741824 mtime=2015-01-15 23:24:48.825833 caps=pAsxLsXsxFsxcrwb(0=pAsxLsXsxFsxcrwb) dirty_caps=Fw objectset[10000000004 ts 0/0 objects 64 dirty_or_tx 104984576] parents=0x7f00c40365b0 0x7f00d801cdb0) Fw -> Fw

"ll_write 0x7f009505da60 10000000005" resumed executing immediaetely after objectset10000000004's dirty_or_rx drops below 100M

2015-01-15 23:24:48.995208 7f008ffff700  3 client.4109 ll_write 0x7f009505da60 786432~131072 = 131072
2015-01-15 23:24:48.995307 7f00e57fa700  3 client.4109 ll_flush 0x7f009505da60 10000000005
2015-01-15 23:24:48.995317 7f00e57fa700 10 client.4109 _flush: 0x7f009505da60 on inode 10000000005.head(ref=4 ll_ref=1 cap_refs={1024=0,4096=0,8192=0} open={2=1} mode=100740 size=1048576/4194304 mtime=2015-01-15 23:24:48.994952 caps=pAsxLsXsxFsxcrwb(0=pAsxLsXsxFsxcrwb) dirty_caps=Fw objectset[10000000005 ts 0/0 objects 0 dirty_or_tx 0] parents=0x7f00be392210 0x7f00d8014bb0) no async_err state
2015-01-15 23:24:48.995365 7f00e5ffb700  3 client.4109 ll_release (fh)0x7f00c4036990 10000000004
2015-01-15 23:24:48.995371 7f00e5ffb700  5 client.4109 _release_fh 0x7f00c4036990 mode 2 on 10000000004.head(ref=6 ll_ref=1 cap_refs={1024=1,4096=0,8192=1} open={2=1} mode=100400 size=268435456/1073741824 mtime=2015-01-15 23:24:48.896668 caps=pAsxLsXsxFsxcrwb(0=pAsxLsXsxFsxcrwb) dirty_caps=Fw objectset[10000000004 ts 0/0 objects 64 dirty_or_tx 104853504] parents=0x7f00c40365b0 0x7f00d801cdb0)

Actions #4

Updated by Greg Farnum about 9 years ago

  • Status changed from Fix Under Review to 12
Actions #5

Updated by Greg Farnum about 9 years ago

  • Tracker changed from Bug to Feature
  • Subject changed from ceph-fuse: hung fsync-tester run to ObjectCacher: order wakeups when write calls block on throttling
  • Description updated (diff)

Let's do ordered wakeups!

Actions #6

Updated by Loïc Dachary about 9 years ago

I vote that this is restored as "Bug" instead of "Feature". http://pulpito.ceph.com/loic-2015-01-29_15:41:06-rados-dumpling-backports---basic-multi/730305/ may be the same issue

Actions #7

Updated by Greg Farnum about 9 years ago

It requires some dev work and testing, which might need to be scheduled. It goes in the feature tracker even if it's needed for a bug.

(I'm not very familiar with the ObjectCacher, but I think this might be a pretty short bit of work...just switch to a list of Conds instead of a single Cond for waking up blocked writers.)

Actions #8

Updated by Greg Farnum almost 8 years ago

  • Category changed from 45 to Correctness/Safety
  • Component(FS) osdc added
Actions #9

Updated by Greg Farnum almost 8 years ago

  • Status changed from 12 to New
Actions

Also available in: Atom PDF