Cleanup #8590: better error reporting from ceph-disk - Ceph - Ceph

Actions

Copy link

Cleanup #8590

open

better error reporting from ceph-disk

Added by Alfredo Deza almost 10 years ago. Updated about 9 years ago.

Status:

New

Priority:

Normal

Assignee:

Category:

ceph cli

Target version:

% Done:

Tags:

Backport:

Reviewed:

Affected Versions:

Pull request ID:

Description

It is very difficult to understand what the actual error may be when ceph-disk fails.

This is an example output that you may get when it encounters an error:

[node1][INFO  ] Running command: sudo ceph-disk-prepare --fs-type xfs --dmcrypt --dmcrypt-key-dir /etc/ceph/dmcrypt-keys --cluster ceph -- /dev/sdb
[node1][DEBUG ] Information: Moved requested sector from 10485794 to 10487808 in
[node1][DEBUG ] order to align on 2048-sector boundaries.
[node1][WARNIN] INFO:ceph-disk:Will colocate journal with data on /dev/sdb
[node1][WARNIN] Could not create partition 2 from 10485794 to 20971553
[node1][WARNIN] Unable to set partition 2's name to 'ceph journal'!
[node1][WARNIN] Could not change partition 2's type code to 45b0969e-9b03-4f30-b4c6-5ec00ceff106!
[node1][WARNIN] Error encountered; not saving changes.
[node1][WARNIN] ceph-disk: Error: Command '['/sbin/sgdisk', '--new=2:+0:+5120M', '--change-name=2:ceph journal', '--partition-guid=2:51727383-c22a-499c-8ddb-69df0abb49bd', '--typecode=2:45b0969e-9b03-4f30-b4c6-5ec00ceff106', '--mbrtogpt', '--', '/dev/sdb']' returned non-zero exit status 4

Specifically, the 'Error: Command...' part is just a representation of a Python list which doesn't even allow to copy/paste.

Ideally the report would show something like:

ceph-disk encountered an error when calling /sbin/sgdisk
Full command was: sudo /sbin/sgdisk --largest-new=1 --change-name=1:ceph data --partition-guid=1:f2a6b82f-20f0-400f-bd2e-5d313ca271d5 --typecode=1:89c57f98-2fe5-4dc0-89c1-5ec00ceff2be -- /dev/sdb
Exit code from command was: 4

And although it is nice to get the non-zero exit status number, it doesn't say anything about the error because most (if not all) ceph-disk
subprocess calls eat up the stderr.

If I run the command that failed on that node, this is the full output:

vagrant@node1:~$ sudo /sbin/sgdisk --largest-new=1 --change-name=1:ceph data --partition-guid=1:f2a6b82f-20f0-400f-bd2e-5d313ca271d5 --typecode=1:89c57f98-2fe5-4dc0-89c1-5ec00ceff2be -- /dev/sdb
Problem opening data for reading! Error is 2.
The specified file does not exist!

That is very useful information that ceph-disk just doesn't relay.

Actions

Copy link

Updated by Loïc Dachary about 9 years ago

Tracker changed from Bug to Cleanup

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph

Custom queries

Cleanup #8590

better error reporting from ceph-disk

Updated by Loïc Dachary about 9 years ago