Actions
Bug #19187
closedDelete/discard operations initiated by a qemu/kvm guest get stuck
Status:
Closed
Priority:
High
Assignee:
-
Category:
-
Target version:
-
% Done:
0%
Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Description
We are frequently seeing delete/discard operations get stuck on rbd devices attached to qemu/kvm VMs. In the guest, the issue presents itself as a stuck i/o, with any additional i/o issued to the device also getting stuck:
guest# cat /sys/block/sda/inflight 0 2
On the host we see can see the stuck delete operation in the objecter_requests
:
host# ceph --admin-daemon /var/run/ceph/rbd-ceph-client.hypervisor01.104686.140551808376528.asok objecter_requests { "ops": [ { "tid": 2827, "pg": "4.b7604fd9", "osd": 708, "object_id": "rbd_data.3d31d74bb5ef91.0000000000000341", "object_locator": "@4", "target_object_id": "rbd_data.3d31d74bb5ef91.0000000000000341", "target_object_locator": "@4", "paused": 0, "used_replica": 0, "precalc_pgid": 0, "last_sent": "713765s", "attempts": 1, "snapid": "head", "snap_context": "0=[]", "mtime": "2017-03-03 18:46:08.0.612098s", "osd_ops": [ "delete" ] } ], "linger_ops": [ { "linger_id": 1, "pg": "4.ccf574ff", "osd": 205, "object_id": "rbd_header.3d31d74bb5ef91", "object_locator": "@4", "target_object_id": "rbd_header.3d31d74bb5ef91", "target_object_locator": "@4", "paused": 0, "used_replica": 0, "precalc_pgid": 0, "snapid": "head", "registered": "1" } ], "pool_ops": [], "pool_stat_ops": [], "statfs_ops": [], "command_ops": [] } host# date Fri Mar 3 20:02:25 UTC 2017
We have been able to reproduce this with both mkfs.ext4
(with its default discard setting), and by attaching an rbd device to the VM then running:
mkfs.ext4 -E nodiscard -F /dev/sda mount -o nodiscard /dev/sda /mnt dd if=/dev/urandom of=/mnt/big-file bs=1M count=200 oflag=sync dd if=/dev/zero of=/mnt/big-file bs=1M count=200 oflag=sync fstrim /mnt
The discard doesn't get stuck 100% of the time, but often enough that we can reproduce the issue at will.
Version info:
host# sudo ceph --version ceph version 10.2.3 (ecc23778eb545d8dd55e2e4735b53cc93f92e65b) host# uname -a Linux nbg1node863 3.13.0-110-generic #157-Ubuntu SMP Mon Feb 20 11:54:05 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux host# lsb_release -a No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 14.04.5 LTS Release: 14.04 Codename: trusty
I've attached client logs - we have rbd and rados debug set to 10.
Files
Actions