Project

General

Profile

Actions

Bug #40100

closed

Missing block.wal and block.db symlinks on restart

Added by Corey Bryant almost 5 years ago. Updated almost 5 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

We are tracking a bug in Ubuntu (https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/1828617) wherea race on system restart causes missing block.wal and block.db symlinks.

There is a loop for each OSD that calls 'ceph-volume lvm trigger' 30 times until the OSD is activated, for example:
[2019-05-31 01:27:29,235][ceph_volume.process][INFO ] Running command: ceph-volume lvm trigger 4-7478edfc-f321-40a2-a105-8e8a2c8ca3f6
[2019-05-31 01:27:35,435][ceph_volume.process][INFO ] stderr --> RuntimeError: could not find osd.4 with fsid 7478edfc-f321-40a2-a105-8e8a2c8ca3f6
[2019-05-31 01:27:35,530][systemd][WARNING] command returned non-zero exit status: 1
[2019-05-31 01:27:35,531][systemd][WARNING] failed activating OSD, retries left: 30
[2019-05-31 01:27:44,122][ceph_volume.process][INFO ] stderr --> RuntimeError: could not find osd.4 with fsid 7478edfc-f321-40a2-a105-8e8a2c8ca3f6
[2019-05-31 01:27:44,174][systemd][WARNING] command returned non-zero exit status: 1
[2019-05-31 01:27:44,175][systemd][WARNING] failed activating OSD, retries left: 29
...

The race appears to exist where 'ceph-volume lvm trigger' succeeds yet the WAL and DB devices are not ready:
https://github.com/ceph/ceph/blob/luminous/src/ceph-volume/ceph_volume/systemd/main.py#L93

Then the symlinks don't get setup here:
https://github.com/ceph/ceph/blob/luminous/src/ceph-volume/ceph_volume/devices/lvm/activate.py#L154
https://github.com/ceph/ceph/blob/luminous/src/ceph-volume/ceph_volume/devices/lvm/activate.py#L177

I wonder if we can have similar 'ceph-volume lvm trigger'-ish calls/loops for WAL and DB devices per OSD in src/ceph-volume/ceph_volume/systemd/main.py. We can determine if an OSD has a DB or WAL device from the lvm tags.

Actions

Also available in: Atom PDF