Support #48630: non-LVM OSD do not start after upgrade from 15.2.4 -> 15.2.7 - Orchestrator - Ceph

Actions

Copy link

Support #48630

closed

non-LVM OSD do not start after upgrade from 15.2.4 -> 15.2.7

Added by ronnie laptop over 3 years ago. Updated almost 3 years ago.

Status:

Resolved

Priority:

Normal

Assignee:

Category:

cephadm

Target version:

Ceph - v15.2.8

% Done:

Tags:

Reviewed:

Affected Versions:

Ceph - v15.2.7

Pull request ID:

Description

During upgrade from 15.2.4 to 15.2.7 (docker_hub image), some of our OSD's do not startup after their systemd unit.run file was replaced during the upgrade.
The new unit.run script roughly consists of:

start of docker container, block device LVM
if step 1 fails, it should start docker container of other device type

Some of our OSD's are not of LVM type, but of 2 partitions type: (#1 XFS, #2 bluestore ).

We can workaround this bug by commenting out the first docker run statement in the unit files for the OSD in question, but this is holding up the upgrade big time, and requires lot of manual work edits.

Could you confirm there is testing in place for older block device types, and fix the issue for future releases?

Actions

Copy link

Updated by Sebastian Wagner about 3 years ago

Description updated (diff)

Actions

Copy link

Updated by Sebastian Wagner about 3 years ago

I think you probably want to migrate to ceph-volume for now.

Actions

Copy link

Updated by Sebastian Wagner about 3 years ago

Tracker changed from Bug to Support
Status changed from New to Resolved

Actions

Copy link

Updated by ronnie laptop almost 3 years ago

Sebastian Wagner wrote:

I think you probably want to migrate to ceph-volume for now.

Hi Sebastian,

Thanks for the response, but this raises some questions:
- if we should not use the old filesystems anymore, should we be notified on this in release notes? I guess other people have the same issue? Shuld the OSD's not start at all, maybe with an additional parameter to noticed end-users?
- is there a good procedure for migrating the OSD's? Because with ~400 OSD's (12/14 TB each), with roughly half of the OSD's on the old volume, what is a good approach for migrating? I can think of two ways:
-- mark each OSD down, leave it for some weeks to drain/rebalance, and then zap it and reuse it and rebalance all the data back
-- on the rough way zap the disk (as if it would have failed), and reuse it and pray that Crush works correctly and rebalance all the data back.

Any advice is more then welcome as these scenarios are not really clear documented!

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » Orchestrator

Custom queries

Support #48630

non-LVM OSD do not start after upgrade from 15.2.4 -> 15.2.7

Updated by Sebastian Wagner about 3 years ago

Updated by Sebastian Wagner about 3 years ago

Updated by Sebastian Wagner about 3 years ago

Updated by ronnie laptop almost 3 years ago