Project

General

Profile

Actions

Bug #19187

closed

Delete/discard operations initiated by a qemu/kvm guest get stuck

Added by Adam Wolfe Gordon about 7 years ago. Updated almost 7 years ago.

Status:
Closed
Priority:
High
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

We are frequently seeing delete/discard operations get stuck on rbd devices attached to qemu/kvm VMs. In the guest, the issue presents itself as a stuck i/o, with any additional i/o issued to the device also getting stuck:

guest# cat /sys/block/sda/inflight
       0        2

On the host we see can see the stuck delete operation in the objecter_requests:

host# ceph --admin-daemon /var/run/ceph/rbd-ceph-client.hypervisor01.104686.140551808376528.asok objecter_requests
{
    "ops": [
        {
            "tid": 2827,
            "pg": "4.b7604fd9",
            "osd": 708,
            "object_id": "rbd_data.3d31d74bb5ef91.0000000000000341",
            "object_locator": "@4",
            "target_object_id": "rbd_data.3d31d74bb5ef91.0000000000000341",
            "target_object_locator": "@4",
            "paused": 0,
            "used_replica": 0,
            "precalc_pgid": 0,
            "last_sent": "713765s",
            "attempts": 1,
            "snapid": "head",
            "snap_context": "0=[]",
            "mtime": "2017-03-03 18:46:08.0.612098s",
            "osd_ops": [
                "delete" 
            ]
        }
    ],
    "linger_ops": [
        {
            "linger_id": 1,
            "pg": "4.ccf574ff",
            "osd": 205,
            "object_id": "rbd_header.3d31d74bb5ef91",
            "object_locator": "@4",
            "target_object_id": "rbd_header.3d31d74bb5ef91",
            "target_object_locator": "@4",
            "paused": 0,
            "used_replica": 0,
            "precalc_pgid": 0,
            "snapid": "head",
            "registered": "1" 
        }
    ],
    "pool_ops": [],
    "pool_stat_ops": [],
    "statfs_ops": [],
    "command_ops": []
}

host# date
Fri Mar  3 20:02:25 UTC 2017

We have been able to reproduce this with both mkfs.ext4 (with its default discard setting), and by attaching an rbd device to the VM then running:

mkfs.ext4 -E nodiscard -F /dev/sda
mount -o nodiscard /dev/sda /mnt
dd if=/dev/urandom of=/mnt/big-file bs=1M count=200 oflag=sync
dd if=/dev/zero of=/mnt/big-file bs=1M count=200 oflag=sync
fstrim /mnt

The discard doesn't get stuck 100% of the time, but often enough that we can reproduce the issue at will.

Version info:

host# sudo ceph --version
ceph version 10.2.3 (ecc23778eb545d8dd55e2e4735b53cc93f92e65b)
host# uname -a
Linux nbg1node863 3.13.0-110-generic #157-Ubuntu SMP Mon Feb 20 11:54:05 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
host# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 14.04.5 LTS
Release:        14.04
Codename:       trusty

I've attached client logs - we have rbd and rados debug set to 10.


Files

logs.txt (515 KB) logs.txt Adam Wolfe Gordon, 03/03/2017 08:37 PM
rados-log.txt (91.4 KB) rados-log.txt rados delete with stuck client Adam Wolfe Gordon, 03/07/2017 09:35 PM
rados-log-after-kill.txt (61.1 KB) rados-log-after-kill.txt rados delete after killing stuck client Adam Wolfe Gordon, 03/07/2017 09:36 PM
ops.txt (59.2 KB) ops.txt Adam Wolfe Gordon, 03/09/2017 09:34 PM
Actions #1

Updated by Adam Wolfe Gordon about 7 years ago

rbd info for the image, in case it is helpful:

rbd image '1856566/9756d69c-0041-11e7-a842-000f533176b1':
        size 102400 MB in 25600 objects
        order 22 (4096 kB objects)
        block_name_prefix: rbd_data.3d31d74bb5ef91
        format: 2
        features: layering, striping, exclusive-lock, object-map, fast-diff, deep-flatten
        flags:
        stripe unit: 4096 kB
        stripe count: 1
Actions #2

Updated by Jason Dillaman about 7 years ago

  • Status changed from New to Need More Info

@Adam DC949: what version of Ceph are you running on your OSDs and clients? Since it appears like the PG is stuck and not responding to the client, can you verify the PG mapping for the stuck objects? i.e. run "ceph osd map <pool> <object>" and compare it to the pg listed in the "objecter_requests" output. Is your Ceph cluster health OK?

Actions #3

Updated by Adam Wolfe Gordon about 7 years ago

Hi Jason,

Our clients are running 10.2.3 (ecc23778eb545d8dd55e2e4735b53cc93f92e65b). The OSDs are running a mix of 10.2.3 and 10.2.5 (c461ee19ecbc0c5c330aca20f7392c9a00730367):

OSDs 1-631 are on 10.2.3
OSDs 632-791 are on 10.2.5

Looks like all our current stuck ops are on OSDs 708 and 751, which are both on 10.2.5.

The PG mapping for the stuck objects looks fine. The stuck OSD is the primary in all cases. Cluster is relatively healthy - current status is {{HEALTH_WARN}} due to a near-full OSD.

Actions #4

Updated by Jason Dillaman about 7 years ago

@fengguang gong: Any chance you have logs from those two OSDs? Are your file descriptor limits set high enough on both the clients and the OSDs? What happens if you run "rados --debug-ms=20 --debug-objecter=20 --pool <pool> rm <rbd object to be deleted>"?

Updated by Adam Wolfe Gordon about 7 years ago

I do have OSD logs, but the debug level is low so there isn't much there (I only turned up debug on the client side).

FD limits are 8192 on the clients and 327680 on the OSDs. From {{/proc/<pid>/fd}} it doesn't look like we're close to the limits.

I tried {{rados --debug-ms=20 --debug-objecter=20 --pool <pool> rm <rbd object to be deleted>"}} on one of the objects whose delete is stuck, and again after killing the stuck client. Both times it seems to hang - output from both tries attached.

Actions #6

Updated by Jason Dillaman about 7 years ago

  • Project changed from rbd to Ceph
  • Priority changed from Normal to High

@Adam DC949: It's fortunate that you are able to reproduce it from the rados CLI as well. Assuming you have an asok for OSD 751, can you attach to it and collect its state (i.e. "dump_blocked_ops", "dump_historic_ops", "dump_ops_in_flight", and "ops").

Alternatively, if you can temporarily increase the logging (debug_ms=1, debug_osd=20, debug_os=20) on OSD 751, I would recommend retrying the removal of rbd_data.3d31f5756d622c.0000000000005b87 and attaching the logs? You can revert the logging changes on that OSD immediately after re-running the object removal.

Actions #7

Updated by Adam Wolfe Gordon about 7 years ago

I have attached the admin socket output as requested. Please let me know if there's any other information that would be helpful.

Actions #8

Updated by Adam Wolfe Gordon almost 7 years ago

Update: We have upgraded some of our clients to 10.2.6, and on these clients we can no longer reproduce the issue. I believe this can be closed as fixed or unreproducible in 10.2.6, even though we don't have a root cause.

Actions #9

Updated by Sage Weil almost 7 years ago

  • Status changed from Need More Info to Closed
Actions

Also available in: Atom PDF