Project

General

Profile

Actions

Bug #22119

closed

Possible deadlock in librbd

Added by Ivan Mironov over 6 years ago. Updated over 6 years ago.

Status:
Can't reproduce
Priority:
Normal
Assignee:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Hi,

we are using qemu-kvm with ceph/rbd as a storage backend for our VMs. And VMs sometimes hang without leaving anything interesting in log files (both inside and outside of the VM). After examining backtraces, I began to suspect librbd. See attached backtraces obtained from one qemu-kvm process with an interval of two hours. This seemed most interesting to me:

Thread 61 (Thread 0x7f5928282700 (LWP 11809)):
#0 pthread_rwlock_rdlock () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_rwlock_rdlock.S:87
#1 0x00007f593c7882b9 in get_read (this=0x7f5942122200) at common/RWLock.h:82
#2 librbd::ImageCtx::user_flushed (this=this@entry=0x7f5942122000) at librbd/ImageCtx.cc:721
#3 0x00007f593c7b6f8b in librbd::aio_flush (ictx=0x7f5942122000, c=0x7f5948c41480) at librbd/internal.cc:3471
#4 0x00007f593c7698b9 in Context::complete (this=0x7f59460e7e40, r=<optimized out>) at include/Context.h:64
#5 0x00007f593c791874 in ContextWQ::process (this=0x7f59420694d0, ctx=0x7f59460e7e40) at common/WorkQueue.h:603
#6 0x00007f593c89bb26 in ThreadPool::worker (this=0x7f5941ff8340, wt=0x7f594218c330) at common/WorkQueue.cc:128
#7 0x00007f593c89c9e0 in ThreadPool::WorkThread::entry (this=<optimized out>) at common/WorkQueue.h:441
#8 0x00007f593559adc5 in start_thread (arg=0x7f5928282700) at pthread_create.c:308
#9 0x00007f59352c976d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113
Thread 57 (Thread 0x7f5925c78700 (LWP 11819)):
#0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1 0x00007f593c78941a in Wait (mutex=..., this=0x7f5925c77410) at common/Cond.h:55
#2 wait (this=0x7f5925c773b0) at common/Cond.h:186
#3 librbd::ImageCtx::flush_async_operations (this=this@entry=0x7f5942122000) at librbd/ImageCtx.cc:840
#4 0x00007f593c789bfe in librbd::ImageCtx::flush (this=0x7f5942122000) at librbd/ImageCtx.cc:851
#5 0x00007f593c79ca32 in librbd::ImageWatcher::release_lock (this=0x7f594208b3c0) at librbd/ImageWatcher.cc:392
#6 0x00007f593c79e748 in librbd::ImageWatcher::reregister_watch (this=0x7f594208b3c0) at librbd/ImageWatcher.cc:1076
#7 0x00007f593c771f1a in operator() (a0=<optimized out>, this=<optimized out>) at /usr/include/boost/function/function_template.hpp:767
#8 FunctionContext::finish (this=<optimized out>, r=<optimized out>) at include/Context.h:460
#9 0x00007f593c7698b9 in Context::complete (this=0x7f5945738d80, r=<optimized out>) at include/Context.h:64
#10 0x00007f593c7698b9 in Context::complete (this=0x7f59421b2c00, r=<optimized out>) at include/Context.h:64
#11 0x00007f593c8015e8 in Finisher::finisher_thread_entry (this=0x7f59421c2500) at common/Finisher.cc:65
#12 0x00007f593559adc5 in start_thread (arg=0x7f5925c78700) at pthread_create.c:308
#13 0x00007f59352c976d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113

Both threads are stuck according to /proc/.../task/.../sched (no context switches happening).

Versions:
ceph 9.2.1-0.el7.x86_64
qemu-kvm 2.9.0-16.el7_4.5.1 (qemu-kvm-ev from CentOS Virtualization SIG)

# virsh qemu-monitor-command instance-00012954 --hmp "info block" 
drive-scsi0-0-0-0 (#block162): json:{"driver": "raw", "file": {"password-secret": "scsi0-0-0-0-secret0", "pool": "nova_vms", "image": "04b0562f-74ae-49b4-bef2-eadde0b53bf0_disk", "driver": "rbd", "user": "nova", "=keyvalue-pairs": "[\"auth_supported\", \"cephx;none\", \"mon_host\", \"10.208.37.1:6789;10.208.37.4:6789;10.208.37.7:6789;10.208.37.17:6789;10.208.37.20:6789;10.208.37.23:6789;10.208.37.26:6789\"]"}} (raw)
    Cache mode:       writeback

 # virsh qemu-monitor-command instance-00012954 --hmp "info blockstats" 
drive-scsi0-0-0-0: rd_bytes=8271305728 wr_bytes=2679325184 rd_operations=188599 wr_operations=44683 flush_operations=7336 wr_total_time_ns=470870876562 rd_total_time_ns=2289034152949 flush_total_time_ns=1215754101321 rd_merged=0 wr_merged=0 idle_time_ns=40785815566708

So the questions is:
1) Is it possible that there is a deadlock in librbd, or VM hang was caused by something else?
2) Is it possible that this was caused by known bug in librbd which is fixed in newer versions of ceph?


Files

qemu-bt.12_00.txt (76.7 KB) qemu-bt.12_00.txt Ivan Mironov, 11/13/2017 03:16 PM
qemu-bt.13_52.txt (76.7 KB) qemu-bt.13_52.txt Ivan Mironov, 11/13/2017 03:16 PM
Actions #1

Updated by Jason Dillaman over 6 years ago

  • Status changed from New to Need More Info

@Ivan: can you repeat this issue on a non-EOL version of Ceph? This code no longer exists in Jewel and later releases.

Actions #2

Updated by Ivan Mironov over 6 years ago

@Jason Borden: currently we have plans to upgrade Ceph to the latest stable release, but migration is not easy and will not happen soon. Do you think that Ceph upgrade could fix such hangs?

Also, it seems that there is no stable way to reproduce this. We use our system for running CI workloads, with a lot of VMs spawning and running identical tests every time. Sometimes one of these VMs freezes, but this happens rarely and without any obvious logic.

Actions #3

Updated by Jason Dillaman over 6 years ago

@Ivan: yes, I think the issue no longer exists since the problematic code where it performed a synchronous flush operation after the watch is lost due to heartbeat timeouts no longer exists. Since the original issue would have been a race with non-responsive or down OSDs, it would definitely be hard to hit.

Actions #4

Updated by Jason Dillaman over 6 years ago

  • Status changed from Need More Info to Can't reproduce
Actions

Also available in: Atom PDF