Project

General

Profile

Actions

Bug #21246

closed

bluestore: hang while replaying deferred ios from journal

Added by Tobias Florek over 6 years ago. Updated over 6 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
BlueStore
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Running ceph-osd-11.2.0-0.el7.x86_64 from ceph-stable's CentOS repository, I hit the following problem. The cluster (and that OSD) were created with luminous rc4 11.1.4, but upgraded to 11.2.0 when the packages appeared in the repository.

The cluster is running fine, but one OSD cannot join. It logs

bluestore(/var/lib/ceph/osd/ceph-0) _open_alloc loaded 2047 G in 2 extents

and does not continue. I attached the debug log.

Running

ceph-bluestore-tool --dev /dev/vg_centos/ceph-osd-1 --path /var/lib/ceph/osd/ceph-0 fsck --debug

also does stalls after that log line.

I can keep that OSD around for a few days, but will have to deprovision that cluster eventually.


Files

ceph-osd.non-start.log (647 KB) ceph-osd.non-start.log /var/log/ceph/ceph-osd.0.log Tobias Florek, 09/05/2017 12:51 PM

Related issues 1 (0 open1 closed)

Related to RADOS - Bug #21171: bluestore: aio submission deadlockResolvedSage Weil08/29/2017

Actions
Actions #1

Updated by Sage Weil over 6 years ago

  • Subject changed from bluestore: previously running OSD does not join cluster to bluestore: hang while replaying deferred ios from journal
  • Status changed from New to 12
  • Assignee set to Sage Weil
  • Priority changed from Normal to Urgent

This looks like it might be the same as #21171, or one of the related bugs I am currently working on. As soon as I have a fix I will link to a build so we can hopefully confirm that it fixes this? Thanks!

Actions #2

Updated by Sage Weil over 6 years ago

  • Related to Bug #21171: bluestore: aio submission deadlock added
Actions #3

Updated by Sage Weil over 6 years ago

  • Status changed from 12 to Resolved

Pretty sure this was #21171. Fix is merged to master and luminous branch, will be in v12.2.1.

Actions

Also available in: Atom PDF