Bug #15874: Upon hammer->jewel upgrade, OSD cannot access journal device until after reboot - devops - Ceph

Actions

Copy link

Bug #15874

open

Upon hammer->jewel upgrade, OSD cannot access journal device until after reboot

Added by Nathan Cutler almost 8 years ago. Updated over 7 years ago.

Status:

New

Priority:

Normal

Assignee:

Category:

Target version:

% Done:

Source:

Community (dev)

Tags:

Backport:

jewel

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

Scenario: hammer cluster being upgraded to jewel. Before upgrade, the permissions/ownership of the journal device are "660/root:disk". On each node, the packages are upgraded, the daemons are stopped, /var/lib/ceph is chowned, and the daemons are started again.

Problem: when the jewel OSDs start, they are running as ceph:ceph and fail to open their journal devices (permission denied).

The problem only lasts until the node is rebooted.

Solution: the only solution that occurs to me is to do a "udevadm trigger" in the postinst scripts. I tested this manually and it works like a charm. In this example, node2 was just upgraded from hammer to jewel and /dev/sdb2 is a journal device:

node2:~ # ls -l /dev/sdb2
brw-rw---- 1 root disk 8, 18 May 12 15:49 /dev/sdb2
node2:~ # udevadm trigger
node2:~ # ls -l /dev/sdb2
brw-rw---- 1 ceph ceph 8, 18 May 12 22:26 /dev/sdb2

Actions

Copy link

Updated by Nathan Cutler almost 8 years ago

Backport set to jewel

Actions

Copy link

Updated by Nathan Cutler almost 8 years ago

Related to Feature #15733: ceph-osd should chown OSD data when --setuser is specified added

Actions

Copy link

Updated by Nathan Cutler almost 8 years ago

Related to deleted (Feature #15733: ceph-osd should chown OSD data when --setuser is specified)

Actions

Copy link

Updated by Nathan Cutler almost 8 years ago

Description updated (diff)

Actions

Copy link

Updated by Nathan Cutler almost 8 years ago

Description updated (diff)

Actions

Copy link

Updated by Nathan Cutler almost 8 years ago

Description updated (diff)

Actions

Copy link

Updated by Nathan Cutler almost 8 years ago

There is a concern that running udevadm trigger unconditionally could cause "churn" on a node with many devices.

Actions

Copy link

Updated by Daniel Kraft almost 8 years ago

Workaround:

# sudo systemctl edit ceph-osd@.service
[Service]
ExecStartPre=/bin/chown ceph:ceph /var/lib/ceph/osd/ceph-%i/journal

This will add an additional pre-start step when an osd is started without touching the original unit file.

Actions

Copy link

Updated by alexander walker over 7 years ago

I've also issue with the permissions of journal parition. I use m2 SSD for journal with the name /dev/nvme0n1p4 and /dev/nvme0n1p5.

Workaround:

Set permissions "chown -R ceph:ceph /dev/nvme0n1p4" and "chown -R ceph:ceph /dev/nvme0n1p5"

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » devops

Custom queries

Bug #15874

Upon hammer->jewel upgrade, OSD cannot access journal device until after reboot

Updated by Nathan Cutler almost 8 years ago

Updated by Nathan Cutler almost 8 years ago

Updated by Nathan Cutler almost 8 years ago

Updated by Nathan Cutler almost 8 years ago

Updated by Nathan Cutler almost 8 years ago

Updated by Nathan Cutler almost 8 years ago

Updated by Nathan Cutler almost 8 years ago

Updated by Daniel Kraft almost 8 years ago

Updated by alexander walker over 7 years ago