Bug #19187: Delete/discard operations initiated by a qemu/kvm guest get stuck - Ceph - Ceph

Actions

Copy link

Bug #19187

closed

Delete/discard operations initiated by a qemu/kvm guest get stuck

Added by Adam Wolfe Gordon about 7 years ago. Updated almost 7 years ago.

Status:

Closed

Priority:

High

Assignee:

Category:

Target version:

% Done:

Source:

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

v10.2.3

ceph-qa-suite:

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

We are frequently seeing delete/discard operations get stuck on rbd devices attached to qemu/kvm VMs. In the guest, the issue presents itself as a stuck i/o, with any additional i/o issued to the device also getting stuck:

guest# cat /sys/block/sda/inflight
       0        2

On the host we see can see the stuck delete operation in the objecter_requests:

host# ceph --admin-daemon /var/run/ceph/rbd-ceph-client.hypervisor01.104686.140551808376528.asok objecter_requests
{
    "ops": [
        {
            "tid": 2827,
            "pg": "4.b7604fd9",
            "osd": 708,
            "object_id": "rbd_data.3d31d74bb5ef91.0000000000000341",
            "object_locator": "@4",
            "target_object_id": "rbd_data.3d31d74bb5ef91.0000000000000341",
            "target_object_locator": "@4",
            "paused": 0,
            "used_replica": 0,
            "precalc_pgid": 0,
            "last_sent": "713765s",
            "attempts": 1,
            "snapid": "head",
            "snap_context": "0=[]",
            "mtime": "2017-03-03 18:46:08.0.612098s",
            "osd_ops": [
                "delete" 
            ]
        }
    ],
    "linger_ops": [
        {
            "linger_id": 1,
            "pg": "4.ccf574ff",
            "osd": 205,
            "object_id": "rbd_header.3d31d74bb5ef91",
            "object_locator": "@4",
            "target_object_id": "rbd_header.3d31d74bb5ef91",
            "target_object_locator": "@4",
            "paused": 0,
            "used_replica": 0,
            "precalc_pgid": 0,
            "snapid": "head",
            "registered": "1" 
        }
    ],
    "pool_ops": [],
    "pool_stat_ops": [],
    "statfs_ops": [],
    "command_ops": []
}

host# date
Fri Mar  3 20:02:25 UTC 2017

We have been able to reproduce this with both mkfs.ext4 (with its default discard setting), and by attaching an rbd device to the VM then running:

mkfs.ext4 -E nodiscard -F /dev/sda
mount -o nodiscard /dev/sda /mnt
dd if=/dev/urandom of=/mnt/big-file bs=1M count=200 oflag=sync
dd if=/dev/zero of=/mnt/big-file bs=1M count=200 oflag=sync
fstrim /mnt

The discard doesn't get stuck 100% of the time, but often enough that we can reproduce the issue at will.

Version info:

host# sudo ceph --version
ceph version 10.2.3 (ecc23778eb545d8dd55e2e4735b53cc93f92e65b)
host# uname -a
Linux nbg1node863 3.13.0-110-generic #157-Ubuntu SMP Mon Feb 20 11:54:05 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
host# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 14.04.5 LTS
Release:        14.04
Codename:       trusty

I've attached client logs - we have rbd and rados debug set to 10.

Files

Download all files

logs.txt (515 KB) logs.txt		Adam Wolfe Gordon, 03/03/2017 08:37 PM
rados-log.txt (91.4 KB) rados-log.txt	rados delete with stuck client	Adam Wolfe Gordon, 03/07/2017 09:35 PM
rados-log-after-kill.txt (61.1 KB) rados-log-after-kill.txt	rados delete after killing stuck client	Adam Wolfe Gordon, 03/07/2017 09:36 PM
ops.txt (59.2 KB) ops.txt		Adam Wolfe Gordon, 03/09/2017 09:34 PM

Actions

Copy link

Updated by Adam Wolfe Gordon about 7 years ago

rbd info for the image, in case it is helpful:

rbd image '1856566/9756d69c-0041-11e7-a842-000f533176b1':
        size 102400 MB in 25600 objects
        order 22 (4096 kB objects)
        block_name_prefix: rbd_data.3d31d74bb5ef91
        format: 2
        features: layering, striping, exclusive-lock, object-map, fast-diff, deep-flatten
        flags:
        stripe unit: 4096 kB
        stripe count: 1

Actions

Copy link

Updated by Jason Dillaman about 7 years ago

Status changed from New to Need More Info

@Adam DC949: what version of Ceph are you running on your OSDs and clients? Since it appears like the PG is stuck and not responding to the client, can you verify the PG mapping for the stuck objects? i.e. run "ceph osd map <pool> <object>" and compare it to the pg listed in the "objecter_requests" output. Is your Ceph cluster health OK?

Actions

Copy link

Updated by Adam Wolfe Gordon about 7 years ago

Hi Jason,

Our clients are running 10.2.3 (ecc23778eb545d8dd55e2e4735b53cc93f92e65b). The OSDs are running a mix of 10.2.3 and 10.2.5 (c461ee19ecbc0c5c330aca20f7392c9a00730367):

OSDs 1-631 are on 10.2.3
OSDs 632-791 are on 10.2.5

Looks like all our current stuck ops are on OSDs 708 and 751, which are both on 10.2.5.

The PG mapping for the stuck objects looks fine. The stuck OSD is the primary in all cases. Cluster is relatively healthy - current status is {{HEALTH_WARN}} due to a near-full OSD.

Actions

Copy link

Updated by Jason Dillaman about 7 years ago

@fengguang gong: Any chance you have logs from those two OSDs? Are your file descriptor limits set high enough on both the clients and the OSDs? What happens if you run "rados --debug-ms=20 --debug-objecter=20 --pool <pool> rm <rbd object to be deleted>"?

Actions

Copy link Download all files

Updated by Adam Wolfe Gordon about 7 years ago

File rados-log.txt rados-log.txt added
File rados-log-after-kill.txt rados-log-after-kill.txt added

I do have OSD logs, but the debug level is low so there isn't much there (I only turned up debug on the client side).

FD limits are 8192 on the clients and 327680 on the OSDs. From {{/proc/<pid>/fd}} it doesn't look like we're close to the limits.

I tried {{rados --debug-ms=20 --debug-objecter=20 --pool <pool> rm <rbd object to be deleted>"}} on one of the objects whose delete is stuck, and again after killing the stuck client. Both times it seems to hang - output from both tries attached.

Actions

Copy link

Updated by Jason Dillaman about 7 years ago

Project changed from rbd to Ceph
Priority changed from Normal to High

@Adam DC949: It's fortunate that you are able to reproduce it from the rados CLI as well. Assuming you have an asok for OSD 751, can you attach to it and collect its state (i.e. "dump_blocked_ops", "dump_historic_ops", "dump_ops_in_flight", and "ops").

Alternatively, if you can temporarily increase the logging (debug_ms=1, debug_osd=20, debug_os=20) on OSD 751, I would recommend retrying the removal of rbd_data.3d31f5756d622c.0000000000005b87 and attaching the logs? You can revert the logging changes on that OSD immediately after re-running the object removal.

Actions

Copy link

Updated by Adam Wolfe Gordon about 7 years ago

File ops.txt ops.txt added

I have attached the admin socket output as requested. Please let me know if there's any other information that would be helpful.

Actions

Copy link

Updated by Adam Wolfe Gordon almost 7 years ago

Update: We have upgraded some of our clients to 10.2.6, and on these clients we can no longer reproduce the issue. I believe this can be closed as fixed or unreproducible in 10.2.6, even though we don't have a root cause.

Actions

Copy link

Updated by Sage Weil almost 7 years ago

Status changed from Need More Info to Closed

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph

Custom queries

Bug #19187

Delete/discard operations initiated by a qemu/kvm guest get stuck

Updated by Adam Wolfe Gordon about 7 years ago

Updated by Jason Dillaman about 7 years ago

Updated by Adam Wolfe Gordon about 7 years ago

Updated by Jason Dillaman about 7 years ago

Updated by Adam Wolfe Gordon about 7 years ago

Updated by Jason Dillaman about 7 years ago

Updated by Adam Wolfe Gordon about 7 years ago

Updated by Adam Wolfe Gordon almost 7 years ago

Updated by Sage Weil almost 7 years ago