Project

General

Profile

Actions

Bug #15874

open

Upon hammer->jewel upgrade, OSD cannot access journal device until after reboot

Added by Nathan Cutler almost 8 years ago. Updated over 7 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Community (dev)
Tags:
Backport:
jewel
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Scenario: hammer cluster being upgraded to jewel. Before upgrade, the permissions/ownership of the journal device are "660/root:disk". On each node, the packages are upgraded, the daemons are stopped, /var/lib/ceph is chowned, and the daemons are started again.

Problem: when the jewel OSDs start, they are running as ceph:ceph and fail to open their journal devices (permission denied).

The problem only lasts until the node is rebooted.

Solution: the only solution that occurs to me is to do a "udevadm trigger" in the postinst scripts. I tested this manually and it works like a charm. In this example, node2 was just upgraded from hammer to jewel and /dev/sdb2 is a journal device:

node2:~ # ls -l /dev/sdb2
brw-rw---- 1 root disk 8, 18 May 12 15:49 /dev/sdb2
node2:~ # udevadm trigger
node2:~ # ls -l /dev/sdb2
brw-rw---- 1 ceph ceph 8, 18 May 12 22:26 /dev/sdb2
Actions #1

Updated by Nathan Cutler almost 8 years ago

  • Backport set to jewel
Actions #2

Updated by Nathan Cutler almost 8 years ago

  • Related to Feature #15733: ceph-osd should chown OSD data when --setuser is specified added
Actions #3

Updated by Nathan Cutler almost 8 years ago

  • Related to deleted (Feature #15733: ceph-osd should chown OSD data when --setuser is specified)
Actions #4

Updated by Nathan Cutler almost 8 years ago

  • Description updated (diff)
Actions #5

Updated by Nathan Cutler almost 8 years ago

  • Description updated (diff)
Actions #6

Updated by Nathan Cutler almost 8 years ago

  • Description updated (diff)
Actions #7

Updated by Nathan Cutler almost 8 years ago

There is a concern that running udevadm trigger unconditionally could cause "churn" on a node with many devices.

Actions #8

Updated by Daniel Kraft almost 8 years ago

Workaround:

# sudo systemctl edit ceph-osd@.service
[Service]
ExecStartPre=/bin/chown ceph:ceph /var/lib/ceph/osd/ceph-%i/journal

This will add an additional pre-start step when an osd is started without touching the original unit file.

Actions #9

Updated by alexander walker over 7 years ago

I've also issue with the permissions of journal parition. I use m2 SSD for journal with the name /dev/nvme0n1p4 and /dev/nvme0n1p5.

Workaround:

Set permissions "chown -R ceph:ceph /dev/nvme0n1p4" and "chown -R ceph:ceph /dev/nvme0n1p5" 

Actions

Also available in: Atom PDF