Project

General

Profile

Actions

Bug #8275

open

krbd: 'rbd unmap' gets stuck

Added by Ilya Dryomov almost 10 years ago. Updated about 9 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):

Description

This could be a libceph issue, but both Hannes and myself saw it on 'rbd unmap'.

From: Hannes Landeholm <>

Hi, I just had a rbd unmap operation deadlock on my development
machine. The file system was in heavy use before I did it but I have a
sync barrier before the umount and unmap so it shouldn't matter. The
rbd unmap hanged in "State:  D (disk sleep)". I have so far waited
over 10 minutes, this normally takes < 1 sec.

Here is the /proc/pid/stack output:

[<ffffffff8107e23a>] flush_workqueue+0x11a/0x5a0
[<ffffffffa031b415>] ceph_msgr_flush+0x15/0x20 [libceph]
[<ffffffffa03219c6>] ceph_monc_stop+0x46/0x120 [libceph]
[<ffffffffa031af28>] ceph_destroy_client+0x38/0xa0 [libceph]
[<ffffffffa0359b88>] rbd_client_release+0x68/0xa0 [rbd]
[<ffffffffa0359bec>] rbd_put_client+0x2c/0x30 [rbd]
[<ffffffffa0359c06>] rbd_dev_destroy+0x16/0x30 [rbd]
[<ffffffffa0359c77>] rbd_dev_image_release+0x57/0x60 [rbd]
[<ffffffffa035adc7>] do_rbd_remove.isra.25+0x167/0x1b0 [rbd]
[<ffffffffa035ae54>] rbd_remove+0x24/0x30 [rbd]
[<ffffffff8136ea67>] bus_attr_store+0x27/0x30
[<ffffffff81218d4d>] sysfs_kf_write+0x3d/0x50
[<ffffffff8121c982>] kernfs_fop_write+0xd2/0x140
[<ffffffff811a67fa>] vfs_write+0xba/0x1e0
[<ffffffff811a7206>] SyS_write+0x46/0xc0
[<ffffffff814e66e9>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff

This machine runs both the ceph cluster and the clients.

"rbd unmap deadlock" thread from May 2 on ceph-devel.

Actions #1

Updated by Ilya Dryomov over 9 years ago

  • Project changed from rbd to Linux kernel client
  • Subject changed from 'rbd unmap' gets stuck to krbd: 'rbd unmap' gets stuck
Actions #2

Updated by Nils Meyer about 9 years ago

This affects me as well, seems to happen when I unmap two devices after one another eg.:

root@hv-production-host1:~# rbd --version
ceph version 0.87.1 (283c2e7cfa2457799f534744d7d549f83ea1335e)
root@hv-production-host1:/var/log# rbd create --size $(expr 20 \* 1024) test1
root@hv-production-host1:/var/log# rbd create --size $(expr 20 \* 1024) test2
root@hv-production-host1:/var/log# rbd map test1
/dev/rbd0
root@hv-production-host1:/var/log# rbd map test2
/dev/rbd1
root@hv-production-host1:/var/log# rbd unmap /dev/rbd0 && rbd unmap /dev/rbd1

Unmapping rbd1 hangs here, this is the stack output:

[<ffffffff8108896a>] flush_workqueue+0x11a/0x5a0
[<ffffffffc0608415>] ceph_msgr_flush+0x15/0x20 [libceph]
[<ffffffffc060fc76>] ceph_monc_stop+0x46/0x120 [libceph]
[<ffffffffc06077e8>] ceph_destroy_client+0x38/0xa0 [libceph]
[<ffffffffc0656658>] rbd_client_release+0x68/0xa0 [rbd]
[<ffffffffc06578b5>] rbd_dev_destroy+0x65/0x70 [rbd]
[<ffffffffc0657b67>] rbd_dev_image_release+0x57/0x60 [rbd]
[<ffffffffc065b12b>] do_rbd_remove.isra.27+0x15b/0x200 [rbd]
[<ffffffffc065b1e4>] rbd_remove_single_major+0x14/0x20 [rbd]
[<ffffffff814b6297>] bus_attr_store+0x27/0x30
[<ffffffff81248cfd>] sysfs_kf_write+0x3d/0x50
[<ffffffff81248230>] kernfs_fop_write+0xe0/0x160
[<ffffffff811d3bd7>] vfs_write+0xb7/0x1f0
[<ffffffff811d4776>] SyS_write+0x46/0xb0
[<ffffffff8176aced>] system_call_fastpath+0x1a/0x1f
[<ffffffffffffffff>] 0xffffffffffffffff

Actions #3

Updated by Ilya Dryomov about 9 years ago

Hi Nils,

Which kernel on the client box?
Was there anything else involved between map and unmap? I.e. does the following script reproduce it?

#!/bin/bash

rbd create --size $((20 * 1024)) test1
rbd create --size $((20 * 1024)) test2
rbd map test1
rbd map test2
rbd unmap /dev/rbd0 && rbd unmap /dev/rbd1

Actions

Also available in: Atom PDF