Bug #40100
closed
Missing block.wal and block.db symlinks on restart
Added by Corey Bryant almost 5 years ago.
Updated almost 5 years ago.
Description
We are tracking a bug in Ubuntu (https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/1828617) wherea race on system restart causes missing block.wal and block.db symlinks.
There is a loop for each OSD that calls 'ceph-volume lvm trigger' 30 times until the OSD is activated, for example:
[2019-05-31 01:27:29,235][ceph_volume.process][INFO ] Running command: ceph-volume lvm trigger 4-7478edfc-f321-40a2-a105-8e8a2c8ca3f6
[2019-05-31 01:27:35,435][ceph_volume.process][INFO ] stderr --> RuntimeError: could not find osd.4 with fsid 7478edfc-f321-40a2-a105-8e8a2c8ca3f6
[2019-05-31 01:27:35,530][systemd][WARNING] command returned non-zero exit status: 1
[2019-05-31 01:27:35,531][systemd][WARNING] failed activating OSD, retries left: 30
[2019-05-31 01:27:44,122][ceph_volume.process][INFO ] stderr --> RuntimeError: could not find osd.4 with fsid 7478edfc-f321-40a2-a105-8e8a2c8ca3f6
[2019-05-31 01:27:44,174][systemd][WARNING] command returned non-zero exit status: 1
[2019-05-31 01:27:44,175][systemd][WARNING] failed activating OSD, retries left: 29
...
The race appears to exist where 'ceph-volume lvm trigger' succeeds yet the WAL and DB devices are not ready:
https://github.com/ceph/ceph/blob/luminous/src/ceph-volume/ceph_volume/systemd/main.py#L93
Then the symlinks don't get setup here:
https://github.com/ceph/ceph/blob/luminous/src/ceph-volume/ceph_volume/devices/lvm/activate.py#L154
https://github.com/ceph/ceph/blob/luminous/src/ceph-volume/ceph_volume/devices/lvm/activate.py#L177
I wonder if we can have similar 'ceph-volume lvm trigger'-ish calls/loops for WAL and DB devices per OSD in src/ceph-volume/ceph_volume/systemd/main.py. We can determine if an OSD has a DB or WAL device from the lvm tags.
Can we do something along these lines in ceph_volume/systemd/main.py after existing while loop?
- using extra_data in ceph_volume/systemd/main.py, get ceph.wal_device and ceph.db_device from lvs tag with matching ceph.osd_id, ceph.osd_fsid, and type=block
- e.g. where extra_data=ceph.osd_id=0-e20dbce0-34f4-46b3-8efc-f41edbcae3d7
sudo lvs -o lv_tags|grep type=block | grep ceph.osd_id=0 | grep ceph\.osd_fsid=e20dbce0-34f4-46b3-8efc-f41edbcae3d7 | grep ceph\.wal_device
sudo lvs -o lv_tags|grep type=block | grep ceph.osd_id=0 | grep ceph\.osd_fsid=e20dbce0-34f4-46b3-8efc-f41edbcae3d7 | grep ceph\.db_device
- loop until the following is found or CEPH_VOLUME_SYSTEMD_TRIES times
- where ceph.wal_device=/dev/ceph-wal-8a073a5b-6e42-43bf-a99d-e30c649362ea/osd-wal-e20dbce0-34f4-46b3-8efc-f41edbcae3d7
sudo lvs -o lv_tags|grep type=wal | grep ceph.wal_device=/dev/ceph-wal-8a073a5b-6e42-43bf-a99d-e30c649362ea/osd-wal-e20dbce0-34f4-46b3-8efc-f41edbcae3d7
- loop until the following is found or CEPH_VOLUME_SYSTEMD_TRIES times
- where ceph.db_device=/dev/ceph-db-c37da146-b9a3-4339-bb2f-819f223982d3/osd-db-e20dbce0-34f4-46b3-8efc-f41edbcae3d7
sudo lvs -o lv_tags|grep type=db | grep ceph.db_device=/dev/ceph-db-c37da146-b9a3-4339-bb2f-819f223982d3/osd-db-e20dbce0-34f4-46b3-8efc-f41edbcae3d7
Trying that again, the formatting in the last comment was unintended:
# using extra_data in ceph_volume/systemd/main.py, get ceph.wal_device and ceph.db_device from lvs tag with matching ceph.osd_id, ceph.osd_fsid, and type=block
# e.g. where extra_data=ceph.osd_id=0-e20dbce0-34f4-46b3-8efc-f41edbcae3d7
sudo lvs -o lv_tags|grep type=block | grep ceph.osd_id=0 | grep ceph\.osd_fsid=e20dbce0-34f4-46b3-8efc-f41edbcae3d7 | grep ceph\.wal_device
sudo lvs -o lv_tags|grep type=block | grep ceph.osd_id=0 | grep ceph\.osd_fsid=e20dbce0-34f4-46b3-8efc-f41edbcae3d7 | grep ceph\.db_device
# loop until the following is found or CEPH_VOLUME_SYSTEMD_TRIES times
# where ceph.wal_device=/dev/ceph-wal-8a073a5b-6e42-43bf-a99d-e30c649362ea/osd-wal-e20dbce0-34f4-46b3-8efc-f41edbcae3d7
sudo lvs -o lv_tags|grep type=wal | grep ceph.wal_device=/dev/ceph-wal-8a073a5b-6e42-43bf-a99d-e30c649362ea/osd-wal-e20dbce0-34f4-46b3-8efc-f41edbcae3d7
# loop until the following is found or CEPH_VOLUME_SYSTEMD_TRIES times
# where ceph.db_device=/dev/ceph-db-c37da146-b9a3-4339-bb2f-819f223982d3/osd-db-e20dbce0-34f4-46b3-8efc-f41edbcae3d7
sudo lvs -o lv_tags|grep type=db | grep ceph.db_device=/dev/ceph-db-c37da146-b9a3-4339-bb2f-819f223982d3/osd-db-e20dbce0-34f4-46b3-8efc-f41edbcae3d7
- Project changed from Ceph to ceph-volume
- Status changed from New to Fix Under Review
Proposal pull request (from the work of coreycb) : https://github.com/ceph/ceph/pull/28520
In the coreycb proposal, the command was evaluated before waiting for WAL / DB devices to arrive.
In order to keep an equivalent timeout, I propose to put in the same loop
I do not know the consequences in case "ceph-volume simple" is used. I added a control "if sub_command == 'lvm':"
I commented in the PR, but want to reiterate here: we knew that there was a chance that in certain systems, the 30 tries at 5 second interval wouldn't be enough which is why we made it configurable and not hard coded.
In this case, this can benefit from changing the environment variables (as opposed to add extra intervals or tries).
The environment variables are:
CEPH_VOLUME_SYSTEMD_TRIES
CEPH_VOLUME_SYSTEMD_INTERVAL
- Pull request ID set to 28791
- Status changed from Fix Under Review to Resolved
Looks like the fix was merged? Feel free to re-open if its still an issue.
Also available in: Atom
PDF