Project

General

Profile

Actions

Bug #17536

closed

Extremely rare Qemu hang with suspicion that RBD might be the issue

Added by Christian Theune over 7 years ago. Updated over 7 years ago.

Status:
Can't reproduce
Priority:
Normal
Assignee:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Hi,

we are experiencing stalled IO on a VM every few weeks (sigh) and I don't have enough space to log everything on high.
The VM's disk is pretty large (around 8TiB) and I've seen that our backup is running rbddiff while it stalled. The diff took longer than usually when this last happened: normally about 13 hours, this time almost 24 hours.

I'd love to provide reasonable logging but would need a little bit of help to adjust the logging properly: turning everything to 20 wouldn't be helpful if we need to let this run for a couple of weeks ... :/

Actions #1

Updated by Jason Dillaman over 7 years ago

  • Status changed from New to Need More Info

@Christian: I'd say the first step is to ensure that your VMs are creaking admin sockets so that when you see a hang you can run "ceph --admin-daemon /path/to/vm/asok objecter_requests" to see if you have a hung OSD op. Alternatively, you could create a core dump via "gcore" and upload the file using "ceph-post-file" (along with your current deb/rpm versions).

Actions #2

Updated by Christian Theune over 7 years ago

Thanks for the admin socket hint. We experimented with this and currently we are suspecting some interaction with XFS. It never happened on our ext4 machines and it always seems to happen when many files are deleted. However, the OS version is also different between those sets of machines, so it might not directly be XFS related, but also dependent on the guest kernel version.

Here's the stall we see inside the VM:

[ 2040.315197] INFO: task mount:129 blocked for more than 120 seconds.
[ 2040.317144]       Not tainted 4.3.6 #1-NixOS
[ 2040.318462] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2040.320039] mount           D 0000000000000001     0   129      1 0x00000000
[ 2040.321238]  ffff8800bb63f988 0000000000000086 ffff88042d031b00 ffff88042b46d100
[ 2040.322588]  ffff8800bb640000 7fffffffffffffff ffff88042b46d100 ffff88042b2821a0
[ 2040.323887]  ffff8800bb63fb08 ffff8800bb63f9a0 ffffffff814cab53 ffff88042b282198
[ 2040.325242] Call Trace:
[ 2040.325662]  [<ffffffff814cab53>] schedule+0x33/0x80
[ 2040.326496]  [<ffffffff814cd74a>] schedule_timeout+0x1fa/0x290
[ 2040.327494]  [<ffffffffa01ec017>] ? xfs_iunlock+0x97/0xf0 [xfs]
[ 2040.328493]  [<ffffffff812b82ad>] ? radix_tree_lookup+0xd/0x10
[ 2040.329407]  [<ffffffffa01d10ca>] ? xfs_perag_get+0x2a/0x90 [xfs]
[ 2040.330272]  [<ffffffff814cbd7d>] wait_for_completion+0x9d/0x100
[ 2040.331123]  [<ffffffff810938d0>] ? wake_up_q+0x60/0x60
[ 2040.331866]  [<ffffffffa020ba74>] xfs_qm_flush_one+0x64/0xa0 [xfs]
[ 2040.332758]  [<ffffffffa020ba10>] ? xfs_qm_dqattach_one+0xd0/0xd0 [xfs]
[ 2040.333706]  [<ffffffffa020bc2a>] xfs_qm_dquot_walk.isra.9+0xda/0x150 [xfs]
[ 2040.334702]  [<ffffffffa020da75>] xfs_qm_quotacheck+0x235/0x2e0 [xfs]
[ 2040.335639]  [<ffffffffa020dbfa>] xfs_qm_mount_quotas+0xda/0x160 [xfs]
[ 2040.336573]  [<ffffffffa01f2531>] xfs_mountfs+0x791/0x7b0 [xfs]
[ 2040.337426]  [<ffffffffa01f4b3c>] xfs_fs_fill_super+0x37c/0x490 [xfs]
[ 2040.338334]  [<ffffffff811b89e6>] mount_bdev+0x1a6/0x1e0
[ 2040.339106]  [<ffffffffa01f47c0>] ? xfs_parseargs+0xa50/0xa50 [xfs]
[ 2040.340009]  [<ffffffffa01f3295>] xfs_fs_mount+0x15/0x20 [xfs]
[ 2040.340877]  [<ffffffff811b9528>] mount_fs+0x38/0x170
[ 2040.341596]  [<ffffffff811d3d97>] vfs_kern_mount+0x67/0x110
[ 2040.342384]  [<ffffffff811d6408>] do_mount+0x218/0xd40
[ 2040.343114]  [<ffffffff811d6092>] ? copy_mount_options+0xd2/0x210
[ 2040.343958]  [<ffffffff811d722b>] SyS_mount+0x8b/0xd0
[ 2040.344677]  [<ffffffff814ce8ee>] entry_SYSCALL_64_fastpath+0x12/0x71

And as you suspected there is a hung OSD op:

kyle04 ~ # ceph --admin-daemon /run/ceph/rbd-24240-94369395899456.asok objecter_requests
{
    "ops": [],
    "linger_ops": [
        {
            "linger_id": 1,
            "pg": "16.6424fdda",
            "osd": 23,
            "object_id": "rbd_header.12d1c3d515f007c",
            "object_locator": "@16",
            "target_object_id": "rbd_header.12d1c3d515f007c",
            "target_object_locator": "@16",
            "paused": 0,
            "used_replica": 0,
            "precalc_pgid": 0,
            "snapid": "head",
            "registered": "1" 
        }
    ],
    "pool_ops": [],
    "pool_stat_ops": [],
    "statfs_ops": [],
    "command_ops": []
}

This happens reliably when the specific machine boots and performs its XFS quota check. (100% for n=2 ;) ) I'm going to run this again later and get you a full debug output file for this client (I guess the current info isn't much help yet).

Actions #3

Updated by Jason Dillaman over 7 years ago

@Christian: the "linger_ops" against "rbd_header.12d1c3d515f007c" is actually just librbd watching for update notifications on the rbd image header object. That is to be expected.

Actions #4

Updated by Christian Theune over 7 years ago

Ah, me bad. I guess your suggestion then would be to look in the guest kernel instead of RBD, am I right? Or should I still bring the logs? (I'd prefer an Ockham's razor approach where I suspect only one of the parties and not a mysterious abstraction-leaking interaction between XFS and RBD ... )

Actions #5

Updated by Jason Dillaman over 7 years ago

@Christian: if you can get librbd debug logs or a gcore dump from the stuck process, I'd be happy to look into it.

Actions #6

Updated by Christian Theune over 7 years ago

Hi,

I managed to crash the VM during XFS quota check. The RBD log doesn't look suspicious to me, so I'd love if you could tell me that nothing is wrong on the Ceph side. :)

<<< NixOS Stage 1 >>>

loading module virtio_balloon...
loading module virtio_console...
loading module virtio_rng...
loading module dm_mod...
[    0.527288] device-mapper: ioctl: 4.33.0-ioctl (2015-8-18) initialised: dm-devel@redhat.com
running udev...
starting version 217
[    0.532950] random: systemd-udevd urandom read with 3 bits of entropy available
[    0.548795] SCSI subsystem initialized
[    0.560614] i8042: PNP: PS/2 Controller [PNP0303:KBD,PNP0f13:MOU] at 0x60,0x64 irq 1,12
[    0.560852] rtc_cmos 00:00: RTC can wake from S4
[    0.562794] rtc_cmos 00:00: rtc core: registered rtc_cmos as rtc0
[    0.563663] rtc_cmos 00:00: alarms up to one day, 114 bytes nvram, hpet irqs
[    0.565898] serio: i8042 KBD port at 0x60,0x64 irq 1
[    0.566670] serio: i8042 AUX port at 0x60,0x64 irq 12
[    0.568794] input: AT Translated Set 2 keyboard as /devices/platform/i8042/serio0/input/input0
[    0.576155] ACPI: PCI Interrupt Link [LNKC] enabled at IRQ 11
[    0.577481] virtio-pci 0000:00:03.0: virtio_pci: leaving for legacy driver
[    0.604353] ACPI: PCI Interrupt Link [LNKD] enabled at IRQ 10
[    0.605621] virtio-pci 0000:00:04.0: virtio_pci: leaving for legacy driver
[    0.663945] ACPI: PCI Interrupt Link [LNKA] enabled at IRQ 10
[    0.664875] virtio-pci 0000:00:05.0: virtio_pci: leaving for legacy driver
[    0.689024] ACPI: PCI Interrupt Link [LNKB] enabled at IRQ 11
[    0.689911] virtio-pci 0000:00:06.0: virtio_pci: leaving for legacy driver
[    0.714324] virtio-pci 0000:00:07.0: virtio_pci: leaving for legacy driver
[    0.718028] scsi host0: ata_piix
[    0.718795] scsi host1: ata_piix
[    0.719312] ata1: PATA max MWDMA2 cmd 0x1f0 ctl 0x3f6 bmdma 0xc100 irq 14
[    0.720301] ata2: PATA max MWDMA2 cmd 0x170 ctl 0x376 bmdma 0xc108 irq 15
[    0.753954]  vda: vda1 vda2
[    0.830666]  vdc: vdc1
starting device mapper and LVM...
checking /dev/disk/by-label/root...
fsck (busybox 1.23.2, )
[fsck.xfs (1) -- /mnt-root/] fsck.xfs -a /dev/disk/by-label/root
/nix/store/g0b8gj3rgz1zc5nlrc424b4mbb950gzr-extra-utils/bin/fsck.xfs: XFS file system.
mounting /dev/disk/by-label/root on /...
[    0.966974] SGI XFS with ACLs, security attributes, realtime, no debug enabled
[    0.971511] XFS (vda1): Mounting V5 Filesystem
[    1.079583] XFS (vda1): Ending clean mount
[    1.085791] XFS (vda1): Quotacheck needed: Please wait.
[    1.444187] tsc: Refined TSC clocksource calibration: 2600.018 MHz
[    1.446101] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x257a4cffb2a, max_idle_ns: 440795258827 ns
[    3.462797] random: nonblocking pool is initialized
[ 1680.098174] INFO: task mount:129 blocked for more than 120 seconds.
[ 1680.100105]       Not tainted 4.3.6 #1-NixOS
[ 1680.101347] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1680.103562] mount           D ffff88043fc955c0     0   129      1 0x00000000
[ 1680.104698]  ffff8800bb633988 0000000000000086 ffff88042d649b00 ffff88042c09b600
[ 1680.105905]  ffff8800bb634000 7fffffffffffffff ffff88042c09b600 ffff88042b7201a0
[ 1680.107121]  ffff8800bb633b08 ffff8800bb6339a0 ffffffff814cab53 ffff88042b720198
[ 1680.108326] Call Trace:
[ 1680.108742]  [<ffffffff814cab53>] schedule+0x33/0x80
[ 1680.109524]  [<ffffffff814cd74a>] schedule_timeout+0x1fa/0x290
[ 1680.110481]  [<ffffffffa01ec017>] ? xfs_iunlock+0x97/0xf0 [xfs]
[ 1680.111423]  [<ffffffff812b82ad>] ? radix_tree_lookup+0xd/0x10
[ 1680.112334]  [<ffffffffa01d10ca>] ? xfs_perag_get+0x2a/0x90 [xfs]
[ 1680.113287]  [<ffffffff814cbd7d>] wait_for_completion+0x9d/0x100
[ 1680.114184]  [<ffffffff810938d0>] ? wake_up_q+0x60/0x60
[ 1680.114944]  [<ffffffffa020ba74>] xfs_qm_flush_one+0x64/0xa0 [xfs]
[ 1680.115845]  [<ffffffffa020ba10>] ? xfs_qm_dqattach_one+0xd0/0xd0 [xfs]
[ 1680.116809]  [<ffffffffa020bc2a>] xfs_qm_dquot_walk.isra.9+0xda/0x150 [xfs]
[ 1680.117823]  [<ffffffffa020da75>] xfs_qm_quotacheck+0x235/0x2e0 [xfs]
[ 1680.118754]  [<ffffffffa020dbfa>] xfs_qm_mount_quotas+0xda/0x160 [xfs]
[ 1680.119662]  [<ffffffffa01f2531>] xfs_mountfs+0x791/0x7b0 [xfs]
[ 1680.120483]  [<ffffffffa01f4b3c>] xfs_fs_fill_super+0x37c/0x490 [xfs]
[ 1680.121365]  [<ffffffff811b89e6>] mount_bdev+0x1a6/0x1e0
[ 1680.122102]  [<ffffffffa01f47c0>] ? xfs_parseargs+0xa50/0xa50 [xfs]
[ 1680.122967]  [<ffffffffa01f3295>] xfs_fs_mount+0x15/0x20 [xfs]
[ 1680.123772]  [<ffffffff811b9528>] mount_fs+0x38/0x170
[ 1680.124475]  [<ffffffff811d3d97>] vfs_kern_mount+0x67/0x110
[ 1680.125230]  [<ffffffff811d6408>] do_mount+0x218/0xd40
[ 1680.125933]  [<ffffffff811d6092>] ? copy_mount_options+0xd2/0x210
[ 1680.126765]  [<ffffffff811d722b>] SyS_mount+0x8b/0xd0
[ 1680.127467]  [<ffffffff814ce8ee>] entry_SYSCALL_64_fastpath+0x12/0x71
[ 1800.128125] INFO: task mount:129 blocked for more than 120 seconds.
[ 1800.129157]       Not tainted 4.3.6 #1-NixOS
[ 1800.129930] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1800.131127] mount           D ffff88043fc955c0     0   129      1 0x00000000
[ 1800.132244]  ffff8800bb633988 0000000000000086 ffff88042d649b00 ffff88042c09b600
[ 1800.133432]  ffff8800bb634000 7fffffffffffffff ffff88042c09b600 ffff88042b7201a0
[ 1800.134621]  ffff8800bb633b08 ffff8800bb6339a0 ffffffff814cab53 ffff88042b720198
[ 1800.135802] Call Trace:
[ 1800.136168]  [<ffffffff814cab53>] schedule+0x33/0x80
[ 1800.136961]  [<ffffffff814cd74a>] schedule_timeout+0x1fa/0x290
[ 1800.137954]  [<ffffffffa01ec017>] ? xfs_iunlock+0x97/0xf0 [xfs]
[ 1800.138857]  [<ffffffff812b82ad>] ? radix_tree_lookup+0xd/0x10
[ 1800.139777]  [<ffffffffa01d10ca>] ? xfs_perag_get+0x2a/0x90 [xfs]
[ 1800.140741]  [<ffffffff814cbd7d>] wait_for_completion+0x9d/0x100
[ 1800.141684]  [<ffffffff810938d0>] ? wake_up_q+0x60/0x60
[ 1800.142448]  [<ffffffffa020ba74>] xfs_qm_flush_one+0x64/0xa0 [xfs]
[ 1800.143483]  [<ffffffffa020ba10>] ? xfs_qm_dqattach_one+0xd0/0xd0 [xfs]
[ 1800.144524]  [<ffffffffa020bc2a>] xfs_qm_dquot_walk.isra.9+0xda/0x150 [xfs]
[ 1800.145620]  [<ffffffffa020da75>] xfs_qm_quotacheck+0x235/0x2e0 [xfs]
[ 1800.146703]  [<ffffffffa020dbfa>] xfs_qm_mount_quotas+0xda/0x160 [xfs]
[ 1800.147655]  [<ffffffffa01f2531>] xfs_mountfs+0x791/0x7b0 [xfs]
[ 1800.148609]  [<ffffffffa01f4b3c>] xfs_fs_fill_super+0x37c/0x490 [xfs]
[ 1800.149615]  [<ffffffff811b89e6>] mount_bdev+0x1a6/0x1e0
[ 1800.150463]  [<ffffffffa01f47c0>] ? xfs_parseargs+0xa50/0xa50 [xfs]
[ 1800.151470]  [<ffffffffa01f3295>] xfs_fs_mount+0x15/0x20 [xfs]
[ 1800.152304]  [<ffffffff811b9528>] mount_fs+0x38/0x170
[ 1800.153115]  [<ffffffff811d3d97>] vfs_kern_mount+0x67/0x110
[ 1800.153990]  [<ffffffff811d6408>] do_mount+0x218/0xd40
[ 1800.154788]  [<ffffffff811d6092>] ? copy_mount_options+0xd2/0x210
[ 1800.155758]  [<ffffffff811d722b>] SyS_mount+0x8b/0xd0
[ 1800.156565]  [<ffffffff814ce8ee>] entry_SYSCALL_64_fastpath+0x12/0x71
[ 1920.157164] INFO: task mount:129 blocked for more than 120 seconds.
[ 1920.158538]       Not tainted 4.3.6 #1-NixOS
[ 1920.159384] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1920.161020] mount           D ffff88043fc955c0     0   129      1 0x00000000
[ 1920.162540]  ffff8800bb633988 0000000000000086 ffff88042d649b00 ffff88042c09b600
[ 1920.164407]  ffff8800bb634000 7fffffffffffffff ffff88042c09b600 ffff88042b7201a0
[ 1920.166140]  ffff8800bb633b08 ffff8800bb6339a0 ffffffff814cab53 ffff88042b720198
[ 1920.167508] Call Trace:
[ 1920.167960]  [<ffffffff814cab53>] schedule+0x33/0x80
[ 1920.168844]  [<ffffffff814cd74a>] schedule_timeout+0x1fa/0x290
[ 1920.169924]  [<ffffffffa01ec017>] ? xfs_iunlock+0x97/0xf0 [xfs]
[ 1920.170963]  [<ffffffff812b82ad>] ? radix_tree_lookup+0xd/0x10
[ 1920.171958]  [<ffffffffa01d10ca>] ? xfs_perag_get+0x2a/0x90 [xfs]
[ 1920.172980]  [<ffffffff814cbd7d>] wait_for_completion+0x9d/0x100
[ 1920.173985]  [<ffffffff810938d0>] ? wake_up_q+0x60/0x60
[ 1920.174882]  [<ffffffffa020ba74>] xfs_qm_flush_one+0x64/0xa0 [xfs]
[ 1920.175945]  [<ffffffffa020ba10>] ? xfs_qm_dqattach_one+0xd0/0xd0 [xfs]
[ 1920.176877]  [<ffffffffa020bc2a>] xfs_qm_dquot_walk.isra.9+0xda/0x150 [xfs]
[ 1920.177836]  [<ffffffffa020da75>] xfs_qm_quotacheck+0x235/0x2e0 [xfs]
[ 1920.178725]  [<ffffffffa020dbfa>] xfs_qm_mount_quotas+0xda/0x160 [xfs]
[ 1920.179614]  [<ffffffffa01f2531>] xfs_mountfs+0x791/0x7b0 [xfs]
[ 1920.180436]  [<ffffffffa01f4b3c>] xfs_fs_fill_super+0x37c/0x490 [xfs]
[ 1920.181321]  [<ffffffff811b89e6>] mount_bdev+0x1a6/0x1e0
[ 1920.182069]  [<ffffffffa01f47c0>] ? xfs_parseargs+0xa50/0xa50 [xfs]
[ 1920.182925]  [<ffffffffa01f3295>] xfs_fs_mount+0x15/0x20 [xfs]
[ 1920.183716]  [<ffffffff811b9528>] mount_fs+0x38/0x170
[ 1920.184404]  [<ffffffff811d3d97>] vfs_kern_mount+0x67/0x110
[ 1920.185162]  [<ffffffff811d6408>] do_mount+0x218/0xd40
[ 1920.185862]  [<ffffffff811d6092>] ? copy_mount_options+0xd2/0x210
[ 1920.186694]  [<ffffffff811d722b>] SyS_mount+0x8b/0xd0
[ 1920.187381]  [<ffffffff814ce8ee>] entry_SYSCALL_64_fastpath+0x12/0x71
[ 2040.188110] INFO: task mount:129 blocked for more than 120 seconds.
[ 2040.189379]       Not tainted 4.3.6 #1-NixOS
[ 2040.190086] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2040.191574] mount           D ffff88043fc955c0     0   129      1 0x00000000
[ 2040.192934]  ffff8800bb633988 0000000000000086 ffff88042d649b00 ffff88042c09b600
[ 2040.194405]  ffff8800bb634000 7fffffffffffffff ffff88042c09b600 ffff88042b7201a0
[ 2040.195816]  ffff8800bb633b08 ffff8800bb6339a0 ffffffff814cab53 ffff88042b720198
[ 2040.197205] Call Trace:
[ 2040.197809]  [<ffffffff814cab53>] schedule+0x33/0x80
[ 2040.198627]  [<ffffffff814cd74a>] schedule_timeout+0x1fa/0x290
[ 2040.199745]  [<ffffffffa01ec017>] ? xfs_iunlock+0x97/0xf0 [xfs]
[ 2040.200904]  [<ffffffff812b82ad>] ? radix_tree_lookup+0xd/0x10
[ 2040.202027]  [<ffffffffa01d10ca>] ? xfs_perag_get+0x2a/0x90 [xfs]
[ 2040.203172]  [<ffffffff814cbd7d>] wait_for_completion+0x9d/0x100
[ 2040.204334]  [<ffffffff810938d0>] ? wake_up_q+0x60/0x60
[ 2040.205369]  [<ffffffffa020ba74>] xfs_qm_flush_one+0x64/0xa0 [xfs]
[ 2040.206533]  [<ffffffffa020ba10>] ? xfs_qm_dqattach_one+0xd0/0xd0 [xfs]
[ 2040.207787]  [<ffffffffa020bc2a>] xfs_qm_dquot_walk.isra.9+0xda/0x150 [xfs]
[ 2040.209075]  [<ffffffffa020da75>] xfs_qm_quotacheck+0x235/0x2e0 [xfs]
[ 2040.210284]  [<ffffffffa020dbfa>] xfs_qm_mount_quotas+0xda/0x160 [xfs]
[ 2040.211474]  [<ffffffffa01f2531>] xfs_mountfs+0x791/0x7b0 [xfs]
[ 2040.212559]  [<ffffffffa01f4b3c>] xfs_fs_fill_super+0x37c/0x490 [xfs]
[ 2040.213713]  [<ffffffff811b89e6>] mount_bdev+0x1a6/0x1e0
[ 2040.214655]  [<ffffffffa01f47c0>] ? xfs_parseargs+0xa50/0xa50 [xfs]
[ 2040.215829]  [<ffffffffa01f3295>] xfs_fs_mount+0x15/0x20 [xfs]
[ 2040.216947]  [<ffffffff811b9528>] mount_fs+0x38/0x170
[ 2040.217709]  [<ffffffff811d3d97>] vfs_kern_mount+0x67/0x110
[ 2040.218488]  [<ffffffff811d6408>] do_mount+0x218/0xd40
[ 2040.219217]  [<ffffffff811d6092>] ? copy_mount_options+0xd2/0x210
[ 2040.220090]  [<ffffffff811d722b>] SyS_mount+0x8b/0xd0
[ 2040.220856]  [<ffffffff814ce8ee>] entry_SYSCALL_64_fastpath+0x12/0x71
[ 2160.221133] INFO: task mount:129 blocked for more than 120 seconds.
[ 2160.222912]       Not tainted 4.3.6 #1-NixOS
[ 2160.224039] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2160.226091] mount           D ffff88043fc955c0     0   129      1 0x00000000
[ 2160.227901]  ffff8800bb633988 0000000000000086 ffff88042d649b00 ffff88042c09b600
[ 2160.229282]  ffff8800bb634000 7fffffffffffffff ffff88042c09b600 ffff88042b7201a0
[ 2160.230646]  ffff8800bb633b08 ffff8800bb6339a0 ffffffff814cab53 ffff88042b720198
[ 2160.232021] Call Trace:
[ 2160.232462]  [<ffffffff814cab53>] schedule+0x33/0x80
[ 2160.233331]  [<ffffffff814cd74a>] schedule_timeout+0x1fa/0x290
[ 2160.234397]  [<ffffffffa01ec017>] ? xfs_iunlock+0x97/0xf0 [xfs]
[ 2160.235445]  [<ffffffff812b82ad>] ? radix_tree_lookup+0xd/0x10
[ 2160.236488]  [<ffffffffa01d10ca>] ? xfs_perag_get+0x2a/0x90 [xfs]
[ 2160.237565]  [<ffffffff814cbd7d>] wait_for_completion+0x9d/0x100
[ 2160.238461]  [<ffffffff810938d0>] ? wake_up_q+0x60/0x60
[ 2160.239188]  [<ffffffffa020ba74>] xfs_qm_flush_one+0x64/0xa0 [xfs]
[ 2160.240040]  [<ffffffffa020ba10>] ? xfs_qm_dqattach_one+0xd0/0xd0 [xfs]
[ 2160.240942]  [<ffffffffa020bc2a>] xfs_qm_dquot_walk.isra.9+0xda/0x150 [xfs]
[ 2160.241901]  [<ffffffffa020da75>] xfs_qm_quotacheck+0x235/0x2e0 [xfs]
[ 2160.242789]  [<ffffffffa020dbfa>] xfs_qm_mount_quotas+0xda/0x160 [xfs]
[ 2160.243683]  [<ffffffffa01f2531>] xfs_mountfs+0x791/0x7b0 [xfs]
[ 2160.244491]  [<ffffffffa01f4b3c>] xfs_fs_fill_super+0x37c/0x490 [xfs]
[ 2160.245371]  [<ffffffff811b89e6>] mount_bdev+0x1a6/0x1e0
[ 2160.246102]  [<ffffffffa01f47c0>] ? xfs_parseargs+0xa50/0xa50 [xfs]
[ 2160.246964]  [<ffffffffa01f3295>] xfs_fs_mount+0x15/0x20 [xfs]
[ 2160.247763]  [<ffffffff811b9528>] mount_fs+0x38/0x170
[ 2160.248451]  [<ffffffff811d3d97>] vfs_kern_mount+0x67/0x110
[ 2160.249208]  [<ffffffff811d6408>] do_mount+0x218/0xd40
[ 2160.249899]  [<ffffffff811d6092>] ? copy_mount_options+0xd2/0x210
[ 2160.250718]  [<ffffffff811d722b>] SyS_mount+0x8b/0xd0
[ 2160.251397]  [<ffffffff814ce8ee>] entry_SYSCALL_64_fastpath+0x12/0x71
[ 2280.252171] INFO: task mount:129 blocked for more than 120 seconds.
[ 2280.254176]       Not tainted 4.3.6 #1-NixOS
[ 2280.255516] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2280.257909] mount           D ffff88043fc955c0     0   129      1 0x00000000
[ 2280.259190]  ffff8800bb633988 0000000000000086 ffff88042d649b00 ffff88042c09b600
[ 2280.260384]  ffff8800bb634000 7fffffffffffffff ffff88042c09b600 ffff88042b7201a0
[ 2280.261584]  ffff8800bb633b08 ffff8800bb6339a0 ffffffff814cab53 ffff88042b720198
[ 2280.262780] Call Trace:
[ 2280.263205]  [<ffffffff814cab53>] schedule+0x33/0x80
[ 2280.263963]  [<ffffffff814cd74a>] schedule_timeout+0x1fa/0x290
[ 2280.264913]  [<ffffffffa01ec017>] ? xfs_iunlock+0x97/0xf0 [xfs]
[ 2280.265833]  [<ffffffff812b82ad>] ? radix_tree_lookup+0xd/0x10
[ 2280.266746]  [<ffffffffa01d10ca>] ? xfs_perag_get+0x2a/0x90 [xfs]
[ 2280.267686]  [<ffffffff814cbd7d>] wait_for_completion+0x9d/0x100
[ 2280.268594]  [<ffffffff810938d0>] ? wake_up_q+0x60/0x60
[ 2280.269367]  [<ffffffffa020ba74>] xfs_qm_flush_one+0x64/0xa0 [xfs]
[ 2280.270258]  [<ffffffffa020ba10>] ? xfs_qm_dqattach_one+0xd0/0xd0 [xfs]
[ 2280.271212]  [<ffffffffa020bc2a>] xfs_qm_dquot_walk.isra.9+0xda/0x150 [xfs]
[ 2280.272222]  [<ffffffffa020da75>] xfs_qm_quotacheck+0x235/0x2e0 [xfs]
[ 2280.273141]  [<ffffffffa020dbfa>] xfs_qm_mount_quotas+0xda/0x160 [xfs]
[ 2280.274101]  [<ffffffffa01f2531>] xfs_mountfs+0x791/0x7b0 [xfs]
[ 2280.274955]  [<ffffffffa01f4b3c>] xfs_fs_fill_super+0x37c/0x490 [xfs]
[ 2280.275868]  [<ffffffff811b89e6>] mount_bdev+0x1a6/0x1e0
[ 2280.276636]  [<ffffffffa01f47c0>] ? xfs_parseargs+0xa50/0xa50 [xfs]
[ 2280.277535]  [<ffffffffa01f3295>] xfs_fs_mount+0x15/0x20 [xfs]
[ 2280.278375]  [<ffffffff811b9528>] mount_fs+0x38/0x170
[ 2280.279086]  [<ffffffff811d3d97>] vfs_kern_mount+0x67/0x110
[ 2280.279874]  [<ffffffff811d6408>] do_mount+0x218/0xd40
[ 2280.280629]  [<ffffffff811d6092>] ? copy_mount_options+0xd2/0x210
[ 2280.281494]  [<ffffffff811d722b>] SyS_mount+0x8b/0xd0
[ 2280.282246]  [<ffffffff814ce8ee>] entry_SYSCALL_64_fastpath+0x12/0x71

Here's the RBD log:

2016-11-10 10:11:25.253521 7f95d91f4700 20 librbd::AioRequest: should_complete 0x7f910c160f70 rbd_data.10f46d32eb141f2.00000000001f3f64 2846720~32768 r = 32768
2016-11-10 10:11:25.253527 7f95d91f4700 20 librbd::AioRequest: should_complete 0x7f910c160f70 READ_FLAT
2016-11-10 10:11:25.253528 7f95d91f4700 20 librbd::AioRequest: complete 0x7f910c160f70
2016-11-10 10:11:25.253530 7f95d91f4700 10 librbd::AioCompletion: C_AioRead::finish() 0x7f910c145510 r = 32768
2016-11-10 10:11:25.253531 7f95d91f4700 10 librbd::AioCompletion:  got {} for [0,32768] bl 32768
2016-11-10 10:11:25.253535 7f95d91f4700 20 librbd::AioCompletion: AioCompletion::complete_request() 0x7f910c0e0250 complete_cb=0x7f95f920207e pending 1
2016-11-10 10:11:25.253537 7f95d91f4700 20 librbd::AioCompletion: AioCompletion::finalize() 0x7f910c0e0250 rval 32768 read_buf 0 read_bl 0x7f95d01b4f00
2016-11-10 10:11:25.253539 7f95d91f4700 20 librbd::AioCompletion: AioCompletion::finalize() moving resulting 32768 bytes to bl 0x7f95d01b4f00
2016-11-10 10:11:25.253541 7f95d91f4700 20 librbd::AsyncOperation: 0x7f910c0e0380 finish_op
2016-11-10 10:11:25.253543 7f95d91f4700 20 librbd::AioRequest: should_complete 0x7f95d01b4e50 rbd_data.12d1c3d515f007c.00000000001f3f64 2846720~32768 r = 32768
2016-11-10 10:11:25.253544 7f95d91f4700 20 librbd::AioRequest: should_complete 0x7f95d01b4e50 READ_CHECK_GUARD
2016-11-10 10:11:25.253545 7f95d91f4700 20 librbd::AioRequest: complete 0x7f95d01b4e50
2016-11-10 10:11:25.253546 7f95d91f4700 10 librbd::AioCompletion: C_AioRead::finish() 0x7f95d016a900 r = 32768
2016-11-10 10:11:25.253546 7f95d91f4700 10 librbd::AioCompletion:  got {} for [0,32768] bl 32768
2016-11-10 10:11:25.253549 7f95d91f4700 20 librbd::AioCompletion: AioCompletion::complete_request() 0x55b7f9737c80 complete_cb=0x55b7f5bf6525 pending 1
2016-11-10 10:11:25.253550 7f95d91f4700 20 librbd::AioCompletion: AioCompletion::finalize() 0x55b7f9737c80 rval 32768 read_buf 0x55b7f8a33000 read_bl 0
2016-11-10 10:11:25.253557 7f95d91f4700 20 librbd::AioCompletion: AioCompletion::finalize() copied resulting 32768 bytes to 0x55b7f8a33000
2016-11-10 10:11:25.253558 7f95d91f4700 20 librbd::AsyncOperation: 0x55b7f9737db0 finish_op
2016-11-10 10:11:25.255221 7f95e905c700 20 librbdwriteback: aio_cb completing 
2016-11-10 10:11:25.255252 7f95e905c700 20 librbdwriteback: aio_cb finished
2016-11-10 10:11:25.255258 7f95d91f4700 20 librbd::AioRequest: should_complete 0x7f910c0d8110 rbd_data.10f46d32eb141f2.00000000001f3fc5 520192~32768 r = 32768
2016-11-10 10:11:25.255265 7f95d91f4700 20 librbd::AioRequest: should_complete 0x7f910c0d8110 READ_FLAT
2016-11-10 10:11:25.255267 7f95d91f4700 20 librbd::AioRequest: complete 0x7f910c0d8110
2016-11-10 10:11:25.255269 7f95d91f4700 10 librbd::AioCompletion: C_AioRead::finish() 0x7f910c178cb0 r = 32768
2016-11-10 10:11:25.255271 7f95d91f4700 10 librbd::AioCompletion:  got {} for [0,32768] bl 32768
2016-11-10 10:11:25.255277 7f95d91f4700 20 librbd::AioCompletion: AioCompletion::complete_request() 0x7f910c1e32a0 complete_cb=0x7f95f920207e pending 1
2016-11-10 10:11:25.255280 7f95d91f4700 20 librbd::AioCompletion: AioCompletion::finalize() 0x7f910c1e32a0 rval 32768 read_buf 0 read_bl 0x7f95d008f000
2016-11-10 10:11:25.255283 7f95d91f4700 20 librbd::AioCompletion: AioCompletion::finalize() moving resulting 32768 bytes to bl 0x7f95d008f000
2016-11-10 10:11:25.255287 7f95d91f4700 20 librbd::AsyncOperation: 0x7f910c1e33d0 finish_op
2016-11-10 10:11:25.255289 7f95d91f4700 20 librbd::AioRequest: should_complete 0x7f95d008ef50 rbd_data.12d1c3d515f007c.00000000001f3fc5 520192~32768 r = 32768
2016-11-10 10:11:25.255293 7f95d91f4700 20 librbd::AioRequest: should_complete 0x7f95d008ef50 READ_CHECK_GUARD
2016-11-10 10:11:25.255295 7f95d91f4700 20 librbd::AioRequest: complete 0x7f95d008ef50
2016-11-10 10:11:25.255296 7f95d91f4700 10 librbd::AioCompletion: C_AioRead::finish() 0x7f95d005ba20 r = 32768
2016-11-10 10:11:25.255298 7f95d91f4700 10 librbd::AioCompletion:  got {} for [0,32768] bl 32768
2016-11-10 10:11:25.255302 7f95d91f4700 20 librbd::AioCompletion: AioCompletion::complete_request() 0x55b7f82e8570 complete_cb=0x55b7f5bf6525 pending 1
2016-11-10 10:11:25.255305 7f95d91f4700 20 librbd::AioCompletion: AioCompletion::finalize() 0x55b7f82e8570 rval 32768 read_buf 0x55b7f9866000 read_bl 0
2016-11-10 10:11:25.255314 7f95d91f4700 20 librbd::AioCompletion: AioCompletion::finalize() copied resulting 32768 bytes to 0x55b7f9866000
2016-11-10 10:11:25.255318 7f95d91f4700 20 librbd::AsyncOperation: 0x55b7f82e86a0 finish_op
2016-11-10 10:11:25.257044 7f95e905c700 20 librbdwriteback: aio_cb completing 
2016-11-10 10:11:25.257064 7f95e905c700 20 librbdwriteback: aio_cb finished
2016-11-10 10:11:25.257070 7f95d91f4700 20 librbd::AioRequest: should_complete 0x7f910c080360 rbd_data.10f46d32eb141f2.00000000001f3ffc 4075520~32768 r = 32768
2016-11-10 10:11:25.257078 7f95d91f4700 20 librbd::AioRequest: should_complete 0x7f910c080360 READ_FLAT
2016-11-10 10:11:25.257080 7f95d91f4700 20 librbd::AioRequest: complete 0x7f910c080360
2016-11-10 10:11:25.257082 7f95d91f4700 10 librbd::AioCompletion: C_AioRead::finish() 0x7f910c0b19c0 r = 32768
2016-11-10 10:11:25.257084 7f95d91f4700 10 librbd::AioCompletion:  got {} for [0,32768] bl 32768
2016-11-10 10:11:25.257089 7f95d91f4700 20 librbd::AioCompletion: AioCompletion::complete_request() 0x7f910c0b88d0 complete_cb=0x7f95f920207e pending 1
2016-11-10 10:11:25.257093 7f95d91f4700 20 librbd::AioCompletion: AioCompletion::finalize() 0x7f910c0b88d0 rval 32768 read_buf 0 read_bl 0x7f95d01a9d70
2016-11-10 10:11:25.257096 7f95d91f4700 20 librbd::AioCompletion: AioCompletion::finalize() moving resulting 32768 bytes to bl 0x7f95d01a9d70
2016-11-10 10:11:25.257100 7f95d91f4700 20 librbd::AsyncOperation: 0x7f910c0b8a00 finish_op
2016-11-10 10:11:25.257103 7f95d91f4700 20 librbd::AioRequest: should_complete 0x7f95d01a9cc0 rbd_data.12d1c3d515f007c.00000000001f3ffc 4075520~32768 r = 32768
2016-11-10 10:11:25.257107 7f95d91f4700 20 librbd::AioRequest: should_complete 0x7f95d01a9cc0 READ_CHECK_GUARD
2016-11-10 10:11:25.257108 7f95d91f4700 20 librbd::AioRequest: complete 0x7f95d01a9cc0
2016-11-10 10:11:25.257111 7f95d91f4700 10 librbd::AioCompletion: C_AioRead::finish() 0x7f95d009b020 r = 32768
2016-11-10 10:11:25.257112 7f95d91f4700 10 librbd::AioCompletion:  got {} for [0,32768] bl 32768
2016-11-10 10:11:25.257117 7f95d91f4700 20 librbd::AioCompletion: AioCompletion::complete_request() 0x55b7f90a8a10 complete_cb=0x55b7f5bf6525 pending 1
2016-11-10 10:11:25.257121 7f95d91f4700 20 librbd::AioCompletion: AioCompletion::finalize() 0x55b7f90a8a10 rval 32768 read_buf 0x55b7f999e000 read_bl 0
2016-11-10 10:11:25.257131 7f95d91f4700 20 librbd::AioCompletion: AioCompletion::finalize() copied resulting 32768 bytes to 0x55b7f999e000
2016-11-10 10:11:25.257135 7f95d91f4700 20 librbd::AsyncOperation: 0x55b7f90a8b40 finish_op
2016-11-10 10:11:25.260339 7f95e905c700 20 librbdwriteback: aio_cb completing 
2016-11-10 10:11:25.260374 7f95e905c700 20 librbdwriteback: aio_cb finished
2016-11-10 10:11:25.260377 7f95d91f4700 20 librbd::AioRequest: should_complete 0x7f910c1775d0 rbd_data.10f46d32eb141f2.00000000001f3f92 45056~32768 r = 32768
2016-11-10 10:11:25.260382 7f95d91f4700 20 librbd::AioRequest: should_complete 0x7f910c1775d0 READ_FLAT
2016-11-10 10:11:25.260383 7f95d91f4700 20 librbd::AioRequest: complete 0x7f910c1775d0
2016-11-10 10:11:25.260384 7f95d91f4700 10 librbd::AioCompletion: C_AioRead::finish() 0x7f910c1c75f0 r = 32768
2016-11-10 10:11:25.260385 7f95d91f4700 10 librbd::AioCompletion:  got {} for [0,32768] bl 32768
2016-11-10 10:11:25.260388 7f95d91f4700 20 librbd::AioCompletion: AioCompletion::complete_request() 0x7f910c067430 complete_cb=0x7f95f920207e pending 1
2016-11-10 10:11:25.260390 7f95d91f4700 20 librbd::AioCompletion: AioCompletion::finalize() 0x7f910c067430 rval 32768 read_buf 0 read_bl 0x7f95d00d2860
2016-11-10 10:11:25.260392 7f95d91f4700 20 librbd::AioCompletion: AioCompletion::finalize() moving resulting 32768 bytes to bl 0x7f95d00d2860
2016-11-10 10:11:25.260394 7f95d91f4700 20 librbd::AsyncOperation: 0x7f910c067560 finish_op
2016-11-10 10:11:25.260395 7f95d91f4700 20 librbd::AioRequest: should_complete 0x7f95d00d27b0 rbd_data.12d1c3d515f007c.00000000001f3f92 45056~32768 r = 32768
2016-11-10 10:11:25.260396 7f95d91f4700 20 librbd::AioRequest: should_complete 0x7f95d00d27b0 READ_CHECK_GUARD
2016-11-10 10:11:25.260397 7f95d91f4700 20 librbd::AioRequest: complete 0x7f95d00d27b0
2016-11-10 10:11:25.260398 7f95d91f4700 10 librbd::AioCompletion: C_AioRead::finish() 0x7f95d00f2a70 r = 32768
2016-11-10 10:11:25.260398 7f95d91f4700 10 librbd::AioCompletion:  got {} for [0,32768] bl 32768
2016-11-10 10:11:25.260400 7f95d91f4700 20 librbd::AioCompletion: AioCompletion::complete_request() 0x55b7f999c570 complete_cb=0x55b7f5bf6525 pending 1
2016-11-10 10:11:25.260401 7f95d91f4700 20 librbd::AioCompletion: AioCompletion::finalize() 0x55b7f999c570 rval 32768 read_buf 0x55b7f9082000 read_bl 0
2016-11-10 10:11:25.260408 7f95d91f4700 20 librbd::AioCompletion: AioCompletion::finalize() copied resulting 32768 bytes to 0x55b7f9082000
2016-11-10 10:11:25.260410 7f95d91f4700 20 librbd::AsyncOperation: 0x55b7f999c6a0 finish_op
2016-11-10 10:11:25.260764 7f95e905c700 20 librbdwriteback: aio_cb completing 
2016-11-10 10:11:25.260789 7f95e905c700 20 librbdwriteback: aio_cb finished
2016-11-10 10:11:25.260791 7f95d91f4700 20 librbd::AioRequest: should_complete 0x7f910c1922e0 rbd_data.10f46d32eb141f2.00000000001f3fe2 2813952~32768 r = 32768
2016-11-10 10:11:25.260796 7f95d91f4700 20 librbd::AioRequest: should_complete 0x7f910c1922e0 READ_FLAT
2016-11-10 10:11:25.260797 7f95d91f4700 20 librbd::AioRequest: complete 0x7f910c1922e0
2016-11-10 10:11:25.260798 7f95d91f4700 10 librbd::AioCompletion: C_AioRead::finish() 0x7f910c0c90d0 r = 32768
2016-11-10 10:11:25.260799 7f95d91f4700 10 librbd::AioCompletion:  got {} for [0,32768] bl 32768
2016-11-10 10:11:25.260802 7f95d91f4700 20 librbd::AioCompletion: AioCompletion::complete_request() 0x7f910c04eac0 complete_cb=0x7f95f920207e pending 1
2016-11-10 10:11:25.260803 7f95d91f4700 20 librbd::AioCompletion: AioCompletion::finalize() 0x7f910c04eac0 rval 32768 read_buf 0 read_bl 0x7f95d00c3440
2016-11-10 10:11:25.260805 7f95d91f4700 20 librbd::AioCompletion: AioCompletion::finalize() moving resulting 32768 bytes to bl 0x7f95d00c3440
2016-11-10 10:11:25.260807 7f95d91f4700 20 librbd::AsyncOperation: 0x7f910c04ebf0 finish_op
2016-11-10 10:11:25.260809 7f95d91f4700 20 librbd::AioRequest: should_complete 0x7f95d00c3390 rbd_data.12d1c3d515f007c.00000000001f3fe2 2813952~32768 r = 32768
2016-11-10 10:11:25.260810 7f95d91f4700 20 librbd::AioRequest: should_complete 0x7f95d00c3390 READ_CHECK_GUARD
2016-11-10 10:11:25.260811 7f95d91f4700 20 librbd::AioRequest: complete 0x7f95d00c3390
2016-11-10 10:11:25.260812 7f95d91f4700 10 librbd::AioCompletion: C_AioRead::finish() 0x7f95d0126f10 r = 32768
2016-11-10 10:11:25.260812 7f95d91f4700 10 librbd::AioCompletion:  got {} for [0,32768] bl 32768
2016-11-10 10:11:25.260814 7f95d91f4700 20 librbd::AioCompletion: AioCompletion::complete_request() 0x55b7f831ddc0 complete_cb=0x55b7f5bf6525 pending 1
2016-11-10 10:11:25.260816 7f95d91f4700 20 librbd::AioCompletion: AioCompletion::finalize() 0x55b7f831ddc0 rval 32768 read_buf 0x55b7f9446000 read_bl 0
2016-11-10 10:11:25.260822 7f95d91f4700 20 librbd::AioCompletion: AioCompletion::finalize() copied resulting 32768 bytes to 0x55b7f9446000
2016-11-10 10:11:25.260824 7f95d91f4700 20 librbd::AsyncOperation: 0x55b7f831def0 finish_op
2016-11-10 10:11:25.266177 7f95e905c700 20 librbdwriteback: aio_cb completing 
2016-11-10 10:11:25.266205 7f95e905c700 20 librbdwriteback: aio_cb finished
2016-11-10 10:11:25.266209 7f95d91f4700 20 librbd::AioRequest: should_complete 0x7f910c105790 rbd_data.10f46d32eb141f2.00000000001f3ffa 3682304~32768 r = 32768
2016-11-10 10:11:25.266214 7f95d91f4700 20 librbd::AioRequest: should_complete 0x7f910c105790 READ_FLAT
2016-11-10 10:11:25.266215 7f95d91f4700 20 librbd::AioRequest: complete 0x7f910c105790
2016-11-10 10:11:25.266217 7f95d91f4700 10 librbd::AioCompletion: C_AioRead::finish() 0x7f910c18ad20 r = 32768
2016-11-10 10:11:25.266217 7f95d91f4700 10 librbd::AioCompletion:  got {} for [0,32768] bl 32768
2016-11-10 10:11:25.266221 7f95d91f4700 20 librbd::AioCompletion: AioCompletion::complete_request() 0x7f910c1b8230 complete_cb=0x7f95f920207e pending 1
2016-11-10 10:11:25.266222 7f95d91f4700 20 librbd::AioCompletion: AioCompletion::finalize() 0x7f910c1b8230 rval 32768 read_buf 0 read_bl 0x7f95d0123180
2016-11-10 10:11:25.266224 7f95d91f4700 20 librbd::AioCompletion: AioCompletion::finalize() moving resulting 32768 bytes to bl 0x7f95d0123180
2016-11-10 10:11:25.266226 7f95d91f4700 20 librbd::AsyncOperation: 0x7f910c1b8360 finish_op
2016-11-10 10:11:25.266227 7f95d91f4700 20 librbd::AioRequest: should_complete 0x7f95d01230d0 rbd_data.12d1c3d515f007c.00000000001f3ffa 3682304~32768 r = 32768
2016-11-10 10:11:25.266229 7f95d91f4700 20 librbd::AioRequest: should_complete 0x7f95d01230d0 READ_CHECK_GUARD
2016-11-10 10:11:25.266229 7f95d91f4700 20 librbd::AioRequest: complete 0x7f95d01230d0
2016-11-10 10:11:25.266230 7f95d91f4700 10 librbd::AioCompletion: C_AioRead::finish() 0x7f95d015cec0 r = 32768
2016-11-10 10:11:25.266231 7f95d91f4700 10 librbd::AioCompletion:  got {} for [0,32768] bl 32768
2016-11-10 10:11:25.266233 7f95d91f4700 20 librbd::AioCompletion: AioCompletion::complete_request() 0x55b7f8515570 complete_cb=0x55b7f5bf6525 pending 1
2016-11-10 10:11:25.266234 7f95d91f4700 20 librbd::AioCompletion: AioCompletion::finalize() 0x55b7f8515570 rval 32768 read_buf 0x55b7f990d000 read_bl 0
2016-11-10 10:11:25.266240 7f95d91f4700 20 librbd::AioCompletion: AioCompletion::finalize() copied resulting 32768 bytes to 0x55b7f990d000
2016-11-10 10:11:25.266242 7f95d91f4700 20 librbd::AsyncOperation: 0x55b7f85156a0 finish_op
2016-11-10 10:11:25.267842 7f95e905c700 20 librbdwriteback: aio_cb completing 
2016-11-10 10:11:25.267871 7f95e905c700 20 librbdwriteback: aio_cb finished
2016-11-10 10:11:25.267874 7f95d91f4700 20 librbd::AioRequest: should_complete 0x7f910c1330a0 rbd_data.10f46d32eb141f2.00000000001f3f6f 2994176~32768 r = 32768
2016-11-10 10:11:25.267880 7f95d91f4700 20 librbd::AioRequest: should_complete 0x7f910c1330a0 READ_FLAT
2016-11-10 10:11:25.267881 7f95d91f4700 20 librbd::AioRequest: complete 0x7f910c1330a0
2016-11-10 10:11:25.267882 7f95d91f4700 10 librbd::AioCompletion: C_AioRead::finish() 0x7f910c142a90 r = 32768
2016-11-10 10:11:25.267883 7f95d91f4700 10 librbd::AioCompletion:  got {} for [0,32768] bl 32768
2016-11-10 10:11:25.267886 7f95d91f4700 20 librbd::AioCompletion: AioCompletion::complete_request() 0x7f910c15ec60 complete_cb=0x7f95f920207e pending 1
2016-11-10 10:11:25.267888 7f95d91f4700 20 librbd::AioCompletion: AioCompletion::finalize() 0x7f910c15ec60 rval 32768 read_buf 0 read_bl 0x7f95d00a27c0
2016-11-10 10:11:25.267890 7f95d91f4700 20 librbd::AioCompletion: AioCompletion::finalize() moving resulting 32768 bytes to bl 0x7f95d00a27c0
2016-11-10 10:11:25.267892 7f95d91f4700 20 librbd::AsyncOperation: 0x7f910c15ed90 finish_op
2016-11-10 10:11:25.267893 7f95d91f4700 20 librbd::AioRequest: should_complete 0x7f95d00a2710 rbd_data.12d1c3d515f007c.00000000001f3f6f 2994176~32768 r = 32768
2016-11-10 10:11:25.267894 7f95d91f4700 20 librbd::AioRequest: should_complete 0x7f95d00a2710 READ_CHECK_GUARD
2016-11-10 10:11:25.267895 7f95d91f4700 20 librbd::AioRequest: complete 0x7f95d00a2710
2016-11-10 10:11:25.267896 7f95d91f4700 10 librbd::AioCompletion: C_AioRead::finish() 0x7f95d01cfc40 r = 32768
2016-11-10 10:11:25.267897 7f95d91f4700 10 librbd::AioCompletion:  got {} for [0,32768] bl 32768
2016-11-10 10:11:25.267899 7f95d91f4700 20 librbd::AioCompletion: AioCompletion::complete_request() 0x55b7f834fd20 complete_cb=0x55b7f5bf6525 pending 1
2016-11-10 10:11:25.267900 7f95d91f4700 20 librbd::AioCompletion: AioCompletion::finalize() 0x55b7f834fd20 rval 32768 read_buf 0x55b7f995a000 read_bl 0
2016-11-10 10:11:25.267906 7f95d91f4700 20 librbd::AioCompletion: AioCompletion::finalize() copied resulting 32768 bytes to 0x55b7f995a000
2016-11-10 10:11:25.267908 7f95d91f4700 20 librbd::AsyncOperation: 0x55b7f834fe50 finish_op
2016-11-10 10:11:25.272127 7f95e905c700 20 librbdwriteback: aio_cb completing 
2016-11-10 10:11:25.272157 7f95e905c700 20 librbdwriteback: aio_cb finished
2016-11-10 10:11:25.272162 7f95d91f4700 20 librbd::AioRequest: should_complete 0x7f910c0fb410 rbd_data.10f46d32eb141f2.00000000001f3feb 3928064~32768 r = 32768
2016-11-10 10:11:25.272168 7f95d91f4700 20 librbd::AioRequest: should_complete 0x7f910c0fb410 READ_FLAT
2016-11-10 10:11:25.272169 7f95d91f4700 20 librbd::AioRequest: complete 0x7f910c0fb410
2016-11-10 10:11:25.272170 7f95d91f4700 10 librbd::AioCompletion: C_AioRead::finish() 0x7f910c0b8af0 r = 32768
2016-11-10 10:11:25.272170 7f95d91f4700 10 librbd::AioCompletion:  got {} for [0,32768] bl 32768
2016-11-10 10:11:25.272174 7f95d91f4700 20 librbd::AioCompletion: AioCompletion::complete_request() 0x7f910c1df030 complete_cb=0x7f95f920207e pending 1
2016-11-10 10:11:25.272184 7f95d91f4700 20 librbd::AioCompletion: AioCompletion::finalize() 0x7f910c1df030 rval 32768 read_buf 0 read_bl 0x7f95d0195d50
2016-11-10 10:11:25.272186 7f95d91f4700 20 librbd::AioCompletion: AioCompletion::finalize() moving resulting 32768 bytes to bl 0x7f95d0195d50
2016-11-10 10:11:25.272188 7f95d91f4700 20 librbd::AsyncOperation: 0x7f910c1df160 finish_op
2016-11-10 10:11:25.272190 7f95d91f4700 20 librbd::AioRequest: should_complete 0x7f95d0195ca0 rbd_data.12d1c3d515f007c.00000000001f3feb 3928064~32768 r = 32768
2016-11-10 10:11:25.272191 7f95d91f4700 20 librbd::AioRequest: should_complete 0x7f95d0195ca0 READ_CHECK_GUARD
2016-11-10 10:11:25.272192 7f95d91f4700 20 librbd::AioRequest: complete 0x7f95d0195ca0
2016-11-10 10:11:25.272193 7f95d91f4700 10 librbd::AioCompletion: C_AioRead::finish() 0x7f95d014f050 r = 32768
2016-11-10 10:11:25.272193 7f95d91f4700 10 librbd::AioCompletion:  got {} for [0,32768] bl 32768
2016-11-10 10:11:25.272195 7f95d91f4700 20 librbd::AioCompletion: AioCompletion::complete_request() 0x55b7f828fe10 complete_cb=0x55b7f5bf6525 pending 1
2016-11-10 10:11:25.272196 7f95d91f4700 20 librbd::AioCompletion: AioCompletion::finalize() 0x55b7f828fe10 rval 32768 read_buf 0x55b7f9461000 read_bl 0
2016-11-10 10:11:25.272203 7f95d91f4700 20 librbd::AioCompletion: AioCompletion::finalize() copied resulting 32768 bytes to 0x55b7f9461000
2016-11-10 10:11:25.272205 7f95d91f4700 20 librbd::AsyncOperation: 0x55b7f828ff40 finish_op
2016-11-10 10:11:25.290547 7f95e905c700 20 librbdwriteback: aio_cb completing 
2016-11-10 10:11:25.290579 7f95e905c700 20 librbdwriteback: aio_cb finished
2016-11-10 10:11:25.290599 7f95d91f4700 20 librbd::AioRequest: should_complete 0x7f910c0c3780 rbd_data.10f46d32eb141f2.00000000001f3ff4 2355200~32768 r = 32768
2016-11-10 10:11:25.290610 7f95d91f4700 20 librbd::AioRequest: should_complete 0x7f910c0c3780 READ_FLAT
2016-11-10 10:11:25.290612 7f95d91f4700 20 librbd::AioRequest: complete 0x7f910c0c3780
2016-11-10 10:11:25.290621 7f95d91f4700 10 librbd::AioCompletion: C_AioRead::finish() 0x7f910c01f8a0 r = 32768
2016-11-10 10:11:25.290622 7f95d91f4700 10 librbd::AioCompletion:  got {} for [0,32768] bl 32768
2016-11-10 10:11:25.290628 7f95d91f4700 20 librbd::AioCompletion: AioCompletion::complete_request() 0x7f910c116330 complete_cb=0x7f95f920207e pending 1
2016-11-10 10:11:25.290630 7f95d91f4700 20 librbd::AioCompletion: AioCompletion::finalize() 0x7f910c116330 rval 32768 read_buf 0 read_bl 0x7f95d0169d40
2016-11-10 10:11:25.290634 7f95d91f4700 20 librbd::AioCompletion: AioCompletion::finalize() moving resulting 32768 bytes to bl 0x7f95d0169d40
2016-11-10 10:11:25.290636 7f95d91f4700 20 librbd::AsyncOperation: 0x7f910c116460 finish_op
2016-11-10 10:11:25.290638 7f95d91f4700 20 librbd::AioRequest: should_complete 0x7f95d0169c90 rbd_data.12d1c3d515f007c.00000000001f3ff4 2355200~32768 r = 32768
2016-11-10 10:11:25.290640 7f95d91f4700 20 librbd::AioRequest: should_complete 0x7f95d0169c90 READ_CHECK_GUARD
2016-11-10 10:11:25.290642 7f95d91f4700 20 librbd::AioRequest: complete 0x7f95d0169c90
2016-11-10 10:11:25.290643 7f95d91f4700 10 librbd::AioCompletion: C_AioRead::finish() 0x7f95d0142410 r = 32768
2016-11-10 10:11:25.290643 7f95d91f4700 10 librbd::AioCompletion:  got {} for [0,32768] bl 32768
2016-11-10 10:11:25.290646 7f95d91f4700 20 librbd::AioCompletion: AioCompletion::complete_request() 0x55b7f9a69e80 complete_cb=0x55b7f5bf6525 pending 1
2016-11-10 10:11:25.290647 7f95d91f4700 20 librbd::AioCompletion: AioCompletion::finalize() 0x55b7f9a69e80 rval 32768 read_buf 0x55b7f9a6a000 read_bl 0
2016-11-10 10:11:25.290654 7f95d91f4700 20 librbd::AioCompletion: AioCompletion::finalize() copied resulting 32768 bytes to 0x55b7f9a6a000
2016-11-10 10:11:25.290657 7f95d91f4700 20 librbd::AsyncOperation: 0x55b7f9a69fb0 finish_op

Log config was:

debug rbd = 20
debug client  = 20
Actions #7

Updated by Jason Dillaman over 7 years ago

@Christian: any chance you can upload that full log? If it's too large for the ticket, you can use ceph-post-file.

Actions #8

Updated by Christian Theune over 7 years ago

On its way. It's about 2.2g. The post tag is e027b01b-a2d6-4963-a911-6e28c584d210.

Actions #9

Updated by Christian Theune over 7 years ago

It's done. That was a good connection, almost faster than some on-disk stuff. :)

Actions #10

Updated by Jason Dillaman over 7 years ago

@Christian: Thanks, I am d/ling the log file now.

Actions #11

Updated by Jason Dillaman over 7 years ago

@Christian: The good news is that according to your log, there are zero stuck or dropped requests within librbd:

# grep -o -E "librbd: aio_(discard|read|write|flush)" client.log | wc -l
916470
# grep -o -E 'librbd::AsyncOperation: 0x[a-f0-9]* start_op' client.log | wc -l
916470
# grep -o -E 'librbd::AsyncOperation: 0x[a-f0-9]* finish_op' client.log | wc -l
916470
# grep -o -E 'librbd::AioCompletion: AioCompletion::finalize\(\) 0x' client.log  | wc -l
916470

The logs show 916470 read (vast majority), write (only 1 write) and flush requests (3 from closing the 3 images contained within the log) being submitted, started, finished, and finalized with a completion callback. The bad news is that means it most likely isn't an issue within librbd.

Actions #12

Updated by Christian Theune over 7 years ago

Thanks for that additional pair of eyes. We're about to update the guest system to kernel 4.4 which does have XFS changes that may potentially be involved here. Feel free to close this issue, I don't think there's anything left to do here from a Ceph perspective.

Actions #13

Updated by Jason Dillaman over 7 years ago

  • Status changed from Need More Info to Can't reproduce
Actions

Also available in: Atom PDF