Project

General

Profile

Actions

Cleanup #15990

closed

ceph-disk: expected systemd unit failures are confusing

Added by Loïc Dachary almost 8 years ago. Updated over 7 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
-
Target version:
-
% Done:

0%

Tags:
Backport:
jewel
Reviewed:
Affected Versions:
Pull request ID:

Description

When activating a journal fails and later succeed (or the other way around), one of the two systemd unit is permanently marked as failed. This is confusing to the sysadmin who think there is a problem. This should be fixed.


Related issues 1 (0 open1 closed)

Copied to Ceph - Backport #17149: jewel: ceph-disk: expected systemd unit failures are confusingResolvedLoïc DacharyActions
Actions #1

Updated by Nathan Cutler almost 8 years ago

@Loïc Dachary: Which two systemd units are you referring to?

Actions #2

Updated by Boris Ranto almost 8 years ago

@Nathan Weinberg: He is talking about and for data/journal.

Actions #3

Updated by Loïc Dachary almost 8 years ago

Boris Ranto writes:

This could be fixed if we added 1 as an acceptable exit code for ceph-disk command, e.g. adding this line to the service file for ceph-disk:

SuccessExitStatus=1

We'd be just ignoring the error, though.

Actions #4

Updated by Loïc Dachary almost 8 years ago

That would be fine with me (forcing the exit code to always be a success). The problem is that it will also hide real errors. Some would say that there are so many level of indirections when / if ceph-disk prepare fails that it won't really make a difference (ceph-disk prepare creates a partition -> fires a udev event -> it triggers a systemd unit -> it runs ceph-disk activate fails because the journal is not there yet -> ceph-disk prepare creates the journal partition -> fires a udev event -> it triggers the systemd unit -> it runs ceph-disk activate does the job).

It's a judgement call really and I'm not sure what's best.

Actions #5

Updated by Boris Ranto almost 8 years ago

Yeah, I was not too thrilled about it either. We might want to reserve a special exit code for saying 'not all devices ready' and ignore just that one?

Actions #6

Updated by Boris Ranto almost 8 years ago

btw: If journal and data are on the same drive, this will be ok after reboot as both devices show up at pretty much the same time. However, it is still reproducible when installing the cluster as you first create and activate the journal partition and only then, few seconds later, you create the data partition. It would help a bit to create both partitions at the same time (or tell kernel to activate them at the same time, not one by one).

Actions #7

Updated by Loïc Dachary almost 8 years ago

  • Priority changed from Normal to High
Actions #8

Updated by Boris Ranto almost 8 years ago

  • Assignee set to Boris Ranto
Actions #9

Updated by Kefu Chai over 7 years ago

  • Status changed from New to Resolved
Actions #10

Updated by Loïc Dachary over 7 years ago

  • Status changed from Resolved to Pending Backport
  • Backport set to jewel
Actions #11

Updated by Loïc Dachary over 7 years ago

  • Copied to Backport #17149: jewel: ceph-disk: expected systemd unit failures are confusing added
Actions #12

Updated by Loïc Dachary over 7 years ago

  • Status changed from Pending Backport to Resolved
Actions

Also available in: Atom PDF