Bug #21246
closedbluestore: hang while replaying deferred ios from journal
0%
Description
Running ceph-osd-11.2.0-0.el7.x86_64 from ceph-stable's CentOS repository, I hit the following problem. The cluster (and that OSD) were created with luminous rc4 11.1.4, but upgraded to 11.2.0 when the packages appeared in the repository.
The cluster is running fine, but one OSD cannot join. It logs
bluestore(/var/lib/ceph/osd/ceph-0) _open_alloc loaded 2047 G in 2 extents
and does not continue. I attached the debug log.
Running
ceph-bluestore-tool --dev /dev/vg_centos/ceph-osd-1 --path /var/lib/ceph/osd/ceph-0 fsck --debug
also does stalls after that log line.
I can keep that OSD around for a few days, but will have to deprovision that cluster eventually.
Files
Updated by Sage Weil over 6 years ago
- Subject changed from bluestore: previously running OSD does not join cluster to bluestore: hang while replaying deferred ios from journal
- Status changed from New to 12
- Assignee set to Sage Weil
- Priority changed from Normal to Urgent
This looks like it might be the same as #21171, or one of the related bugs I am currently working on. As soon as I have a fix I will link to a build so we can hopefully confirm that it fixes this? Thanks!
Updated by Sage Weil over 6 years ago
- Related to Bug #21171: bluestore: aio submission deadlock added
Updated by Sage Weil over 6 years ago
- Status changed from 12 to Resolved
Pretty sure this was #21171. Fix is merged to master and luminous branch, will be in v12.2.1.