Project

General

Profile

Actions

Support #49847

closed

OSD Fails to init after upgrading to octopus: _deferred_replay failed to decode deferred txn

Added by Eetu Lampsijärvi about 3 years ago. Updated about 3 years ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Tags:
Reviewed:
Affected Versions:
Component(RADOS):
Pull request ID:

Description

An OSD fails to start after upgrading from mimic 13.2.2 to octopus 15.2.9.

It seems like first bluestore fails at something:
-1 bluestore(/var/lib/ceph/osd/ceph-5) _deferred_replay failed to decode deferred txn 0x0000000000000002

Which causes rocksdb to shut down, bluefs to unmount, and then to fail the init of the OSD:
-1 osd.5 0 OSD:init: unable to mount object store
-1 ** ERROR: osd init failed: (5) Input/output error

Other five nodes & OSDs upgraded without issues. The OSD/node in question got stuck when upgrading the OS with do-release-upgrade from Ubuntu 16 -> 18, which caused some dependency problems, which were solved. Initially I thought this had somehow messed up the ceph packages or configuration as ceph-osd didn't automatically run after a reboot, but reinstallation of the ceph packages and scanning & activating the osd with ceph-volume seemed to remedy that, but we then arrive at the present situation/problem of the OSD failing to init.

I was convinced on IRC to try posting here instead of zapping the osd/disk and letting ceph recover, just in case it's an actual bug and not me missing something very obvious.


Files

ceph-osd.5.log (168 KB) ceph-osd.5.log OSD Log file with multiple restart attempts Eetu Lampsijärvi, 03/16/2021 07:46 PM
Actions

Also available in: Atom PDF