Bug #8275
openkrbd: 'rbd unmap' gets stuck
0%
Description
This could be a libceph issue, but both Hannes and myself saw it on 'rbd unmap'.
From: Hannes Landeholm <hannes@jumpstarter.io>
Hi, I just had a rbd unmap operation deadlock on my development machine. The file system was in heavy use before I did it but I have a sync barrier before the umount and unmap so it shouldn't matter. The rbd unmap hanged in "State: D (disk sleep)". I have so far waited over 10 minutes, this normally takes < 1 sec. Here is the /proc/pid/stack output: [<ffffffff8107e23a>] flush_workqueue+0x11a/0x5a0 [<ffffffffa031b415>] ceph_msgr_flush+0x15/0x20 [libceph] [<ffffffffa03219c6>] ceph_monc_stop+0x46/0x120 [libceph] [<ffffffffa031af28>] ceph_destroy_client+0x38/0xa0 [libceph] [<ffffffffa0359b88>] rbd_client_release+0x68/0xa0 [rbd] [<ffffffffa0359bec>] rbd_put_client+0x2c/0x30 [rbd] [<ffffffffa0359c06>] rbd_dev_destroy+0x16/0x30 [rbd] [<ffffffffa0359c77>] rbd_dev_image_release+0x57/0x60 [rbd] [<ffffffffa035adc7>] do_rbd_remove.isra.25+0x167/0x1b0 [rbd] [<ffffffffa035ae54>] rbd_remove+0x24/0x30 [rbd] [<ffffffff8136ea67>] bus_attr_store+0x27/0x30 [<ffffffff81218d4d>] sysfs_kf_write+0x3d/0x50 [<ffffffff8121c982>] kernfs_fop_write+0xd2/0x140 [<ffffffff811a67fa>] vfs_write+0xba/0x1e0 [<ffffffff811a7206>] SyS_write+0x46/0xc0 [<ffffffff814e66e9>] system_call_fastpath+0x16/0x1b [<ffffffffffffffff>] 0xffffffffffffffff This machine runs both the ceph cluster and the clients.
"rbd unmap deadlock" thread from May 2 on ceph-devel.
Updated by Ilya Dryomov over 9 years ago
- Project changed from rbd to Linux kernel client
- Subject changed from 'rbd unmap' gets stuck to krbd: 'rbd unmap' gets stuck
Updated by Nils Meyer about 9 years ago
This affects me as well, seems to happen when I unmap two devices after one another eg.:
root@hv-production-host1:~# rbd --version
ceph version 0.87.1 (283c2e7cfa2457799f534744d7d549f83ea1335e)
root@hv-production-host1:/var/log# rbd create --size $(expr 20 \* 1024) test1
root@hv-production-host1:/var/log# rbd create --size $(expr 20 \* 1024) test2
root@hv-production-host1:/var/log# rbd map test1
/dev/rbd0
root@hv-production-host1:/var/log# rbd map test2
/dev/rbd1
root@hv-production-host1:/var/log# rbd unmap /dev/rbd0 && rbd unmap /dev/rbd1
Unmapping rbd1 hangs here, this is the stack output:
[<ffffffff8108896a>] flush_workqueue+0x11a/0x5a0
[<ffffffffc0608415>] ceph_msgr_flush+0x15/0x20 [libceph]
[<ffffffffc060fc76>] ceph_monc_stop+0x46/0x120 [libceph]
[<ffffffffc06077e8>] ceph_destroy_client+0x38/0xa0 [libceph]
[<ffffffffc0656658>] rbd_client_release+0x68/0xa0 [rbd]
[<ffffffffc06578b5>] rbd_dev_destroy+0x65/0x70 [rbd]
[<ffffffffc0657b67>] rbd_dev_image_release+0x57/0x60 [rbd]
[<ffffffffc065b12b>] do_rbd_remove.isra.27+0x15b/0x200 [rbd]
[<ffffffffc065b1e4>] rbd_remove_single_major+0x14/0x20 [rbd]
[<ffffffff814b6297>] bus_attr_store+0x27/0x30
[<ffffffff81248cfd>] sysfs_kf_write+0x3d/0x50
[<ffffffff81248230>] kernfs_fop_write+0xe0/0x160
[<ffffffff811d3bd7>] vfs_write+0xb7/0x1f0
[<ffffffff811d4776>] SyS_write+0x46/0xb0
[<ffffffff8176aced>] system_call_fastpath+0x1a/0x1f
[<ffffffffffffffff>] 0xffffffffffffffff
Updated by Ilya Dryomov about 9 years ago
Hi Nils,
Which kernel on the client box?
Was there anything else involved between map and unmap? I.e. does the following script reproduce it?
#!/bin/bash rbd create --size $((20 * 1024)) test1 rbd create --size $((20 * 1024)) test2 rbd map test1 rbd map test2 rbd unmap /dev/rbd0 && rbd unmap /dev/rbd1