Bug #17275
closedMDS long-time blocked ops. ceph-fuse locks up with getattr of file
0%
Description
In 10.2.1 cluster we are having some blocked ceph-fuse (tested with 10.2.2) metadata accesses to files.
I cannot reproduce the issue but it happens from time to time.
mds dump_blocked_ops show some amount of long-running ops. Logs show some amount of slow ops, e.g. " 6 slow requests, 1 included below; oldest blocked for > 83360.920878 secs". Attaching some log files.
ceph-mds_log with debug_mds* set to 20/20 and greped with "10000328fca\|client.7397637"
dump_blocked_ops
mds_cache.gz
Problem is not Jewel specific, had similar problems on hammer cluster/client too
Files
Updated by Henrik Korkuc over 7 years ago
- File ceph-mds_log ceph-mds_log added
- File dump_blocked_ops dump_blocked_ops added
- File mds_cache.xz mds_cache.xz added
forgot to attach files... Attaching now. Also recompressed mds_cache with xz for it to go bellow 1MB (50% smaller file compared to gz)
Updated by Zheng Yan over 7 years ago
please run "dump cache", "mds_requests" and "objecter_requests" commands for client.7397637 and upload the outputs.
Updated by Henrik Korkuc over 7 years ago
- File dump_cache dump_cache added
- File mds_requests mds_requests added
- File objecter_requests objecter_requests added
as I killed that ceph-fuse, I hanged another ceph-fuse. Attaching outputs from that one
Updated by Zheng Yan over 7 years ago
the client that causes problem is 2816210, please dump its cache
Updated by Henrik Korkuc over 7 years ago
- File dump_cache_2816210 dump_cache_2816210 added
Attaching.
btw as you'll probably notice, 2016-09-11-170046 directory is in another path in this dump. It was moved to "moved" after it started to cause problems. And for dirs in this "broken" state it takes minutes to move them
Updated by Zheng Yan over 7 years ago
it seems there is a blocked write in client.2816210. could you use gdb to attach that ceph-fuse, the run "thread apply all bt" command in gdb. and upload the output
Updated by Zheng Yan over 7 years ago
Thread 835 (Thread 0x7f4c937fe700 (LWP 18496)): #0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185 #1 0x00007f4cbcfeb81d in Wait (mutex=..., this=0x7f4cc7e04a68) at ./common/Cond.h:56 #2 ObjectCacher::maybe_wait_for_writeback (this=this@entry=0x7f4cc7e046b0, len=len@entry=8192) at osdc/ObjectCacher.cc:1660 #3 0x00007f4cbcffafd3 in ObjectCacher::_wait_for_write (this=this@entry=0x7f4cc7e046b0, wr=wr@entry=0x7f49901e0c50, len=len@entry=8192, oset=oset@entry=0x7f4ca009cf48, onfreespace=onfreespace@entry=0x0) at osdc/ObjectCacher.cc:1682 #4 0x00007f4cbd000c29 in ObjectCacher::writex (this=this@entry=0x7f4cc7e046b0, wr=wr@entry=0x7f49901e0c50, oset=oset@entry=0x7f4ca009cf48, onfreespace=onfreespace@entry=0x0) at osdc/ObjectCacher.cc:1626 #5 0x00007f4cbcf8fb01 in file_write (flags=0, mtime=..., bl=..., len=<optimized out>, offset=671285764096, snapc=..., layout=0x7f4ca009cc28, oset=0x7f4ca009cf48, this=0x7f4cc7e046b0) at ./osdc/ObjectCacher.h:766 #6 Client::_write (this=this@entry=0x7f4cc7e01180, f=f@entry=0x7f4c7c024ea0, offset=offset@entry=671285764096, size=size@entry=8192, buf=buf@entry=0x7f4cc7e117c0 "\267\n\207\272\207\037\257\350\367\274\203\064\373|E\324\063V\244G\031\211MRa\035\215\327+\036<\373\204\370h\"\005\002!]\276\206Yݿ\252\351[\364\211\262\036\232\354\022}", iov=iov@entry=0x0, iovcnt=0) at client/Client.cc:8469 #7 0x00007f4cbcf90c67 in Client::ll_write (this=0x7f4cc7e01180, fh=0x7f4c7c024ea0, off=off@entry=671285764096, len=len@entry=8192, data=data@entry=0x7f4cc7e117c0 "\267\n\207\272\207\037\257\350\367\274\203\064\373|E\324\063V\244G\031\211MRa\035\215\327+\036<\373\204\370h\"\005\002!]\276\206Yݿ\252\351[\364\211\262\036\232\354\022}") at client/Client.cc:11456 #8 0x00007f4cbcf41596 in fuse_ll_write (req=0x7f4c8c0b7280, ino=<optimized out>, buf=0x7f4cc7e117c0 "\267\n\207\272\207\037\257\350\367\274\203\064\373|E\324\063V\244G\031\211MRa\035\215\327+\036<\373\204\370h\"\005\002!]\276\206Yݿ\252\351[\364\211\262\036\232\354\022}", size=8192, off=671285764096, fi=0x7f4c937fd5c0) at client/fuse_ll.cc:503 #9 0x00007f4cbc968fe4 in ?? () from /lib/x86_64-linux-gnu/libfuse.so.2 #10 0x00007f4cbc969895 in ?? () from /lib/x86_64-linux-gnu/libfuse.so.2 #11 0x00007f4cbc966394 in ?? () from /lib/x86_64-linux-gnu/libfuse.so.2 #12 0x00007f4cbc31f0ca in start_thread (arg=0x7f4c937fe700) at pthread_create.c:309 #13 0x00007f4cbac868ad in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
this thread is blocking, which prevents mds from serving other threads' request. please run objecter_requests command for client.2816210, check if there are hang osd requests.
Updated by Henrik Korkuc over 7 years ago
{
"ops": [],
"linger_ops": [],
"pool_ops": [],
"pool_stat_ops": [],
"statfs_ops": [],
"command_ops": []
}
Updated by Zheng Yan over 7 years ago
this is strange, could you do some interactive debug (through xchat)
Updated by Zheng Yan over 7 years ago
- Status changed from New to 7
It seems ObjectCacher::maybe_wait_for_writeback() doesn't get woken up properly. Maybe following patch can fix it
diff --git a/src/osdc/ObjectCacher.cc b/src/osdc/ObjectCacher.cc index e18de8c..ca58eb2 100644 --- a/src/osdc/ObjectCacher.cc +++ b/src/osdc/ObjectCacher.cc @@ -2522,6 +2522,8 @@ void ObjectCacher::bh_stat_sub(BufferHead *bh) default: assert(0 == "bh_stat_sub: invalid bufferhead state"); } + if (get_stat_dirty_waiting() > 0) + stat_cond.Signal(); } void ObjectCacher::bh_set_state(BufferHead *bh, int s)
Updated by Henrik Korkuc over 7 years ago
hey Zheng,
can you provide me instructions how can I trace client which blocks ops? client.2816210 in this case.
I have some blocked ops but I am not sure if they are running new or old version
Updated by Zheng Yan over 7 years ago
check the cluster log, find lines "client.xxx failing to respond to capability release". client.xxx is the guy that causes problem
Updated by Henrik Korkuc over 7 years ago
- File client.27841966_gdb.txt client.27841966_gdb.txt added
- File client.27841966_objecter_requests client.27841966_objecter_requests added
- File mds-dump_blocked_ops mds-dump_blocked_ops added
it looks like this time I got a read request stuck for 2 days from OSD. Client is 0.94.9
client cache: ceph-post-file: cd539449-cca1-4b36-bf32-54f0353c0815
mds cache: ceph-post-file: 477d14b0-dcc9-4291-a7fd-fe654483b6b8
cluster does not have stale/down/missing pgs, but there was pg splitting/rebalancing done some time ago (I think splitting was before that read request).
ceph pg 1.1ab4db54 query says "Error ENOENT: i don't have pgid 1.1ab4db54"
Updated by Zheng Yan over 7 years ago
Henrik Korkuc wrote:
it looks like this time I got a read request stuck for 2 days from OSD. Client is 0.94.9
client cache: ceph-post-file: cd539449-cca1-4b36-bf32-54f0353c0815
mds cache: ceph-post-file: 477d14b0-dcc9-4291-a7fd-fe654483b6b8cluster does not have stale/down/missing pgs, but there was pg splitting/rebalancing done some time ago (I think splitting was before that read request).
ceph pg 1.1ab4db54 query says "Error ENOENT: i don't have pgid 1.1ab4db54"
This is objecter issue, please open a new ticket (Category librados)
Updated by Henrik Korkuc over 7 years ago
I created #17413 for that. Still waiting for this issue's issue to pop out
Updated by Henrik Korkuc over 7 years ago
got a getattr long lock, but this time client has long-running objecter requests. I will be filling a ticket for that soon. I think we can assume patch helped with these problems
Updated by Henrik Korkuc over 7 years ago
Updated by Zheng Yan over 7 years ago
- Status changed from 7 to Fix Under Review
Updated by John Spray over 7 years ago
- Status changed from Fix Under Review to Pending Backport
- Backport set to jewel
Updated by Loïc Dachary over 7 years ago
- Copied to Backport #17697: jewel: MDS long-time blocked ops. ceph-fuse locks up with getattr of file added
Updated by Nathan Cutler about 7 years ago
- Status changed from Pending Backport to Resolved