bluestore: hang while replaying deferred ios from journal
Running ceph-osd-11.2.0-0.el7.x86_64 from ceph-stable's CentOS repository, I hit the following problem. The cluster (and that OSD) were created with luminous rc4 11.1.4, but upgraded to 11.2.0 when the packages appeared in the repository.
The cluster is running fine, but one OSD cannot join. It logs
bluestore(/var/lib/ceph/osd/ceph-0) _open_alloc loaded 2047 G in 2 extents
and does not continue. I attached the debug log.
ceph-bluestore-tool --dev /dev/vg_centos/ceph-osd-1 --path /var/lib/ceph/osd/ceph-0 fsck --debug
also does stalls after that log line.
I can keep that OSD around for a few days, but will have to deprovision that cluster eventually.
#1 Updated by Sage Weil over 3 years ago
- Subject changed from bluestore: previously running OSD does not join cluster to bluestore: hang while replaying deferred ios from journal
- Status changed from New to 12
- Assignee set to Sage Weil
- Priority changed from Normal to Urgent
This looks like it might be the same as #21171, or one of the related bugs I am currently working on. As soon as I have a fix I will link to a build so we can hopefully confirm that it fixes this? Thanks!