Actions
Cleanup #8590
openbetter error reporting from ceph-disk
Status:
New
Priority:
Normal
Assignee:
-
Category:
ceph cli
Target version:
-
% Done:
0%
Tags:
Backport:
Reviewed:
Affected Versions:
Pull request ID:
Description
It is very difficult to understand what the actual error may be when ceph-disk fails.
This is an example output that you may get when it encounters an error:
[node1][INFO ] Running command: sudo ceph-disk-prepare --fs-type xfs --dmcrypt --dmcrypt-key-dir /etc/ceph/dmcrypt-keys --cluster ceph -- /dev/sdb [node1][DEBUG ] Information: Moved requested sector from 10485794 to 10487808 in [node1][DEBUG ] order to align on 2048-sector boundaries. [node1][WARNIN] INFO:ceph-disk:Will colocate journal with data on /dev/sdb [node1][WARNIN] Could not create partition 2 from 10485794 to 20971553 [node1][WARNIN] Unable to set partition 2's name to 'ceph journal'! [node1][WARNIN] Could not change partition 2's type code to 45b0969e-9b03-4f30-b4c6-5ec00ceff106! [node1][WARNIN] Error encountered; not saving changes. [node1][WARNIN] ceph-disk: Error: Command '['/sbin/sgdisk', '--new=2:+0:+5120M', '--change-name=2:ceph journal', '--partition-guid=2:51727383-c22a-499c-8ddb-69df0abb49bd', '--typecode=2:45b0969e-9b03-4f30-b4c6-5ec00ceff106', '--mbrtogpt', '--', '/dev/sdb']' returned non-zero exit status 4
Specifically, the 'Error: Command...' part is just a representation of a Python list which doesn't even allow to copy/paste.
Ideally the report would show something like:
ceph-disk encountered an error when calling /sbin/sgdisk Full command was: sudo /sbin/sgdisk --largest-new=1 --change-name=1:ceph data --partition-guid=1:f2a6b82f-20f0-400f-bd2e-5d313ca271d5 --typecode=1:89c57f98-2fe5-4dc0-89c1-5ec00ceff2be -- /dev/sdb Exit code from command was: 4
And although it is nice to get the non-zero exit status number, it doesn't say anything about the error because most (if not all) ceph-disk
subprocess calls eat up the stderr.
If I run the command that failed on that node, this is the full output:
vagrant@node1:~$ sudo /sbin/sgdisk --largest-new=1 --change-name=1:ceph data --partition-guid=1:f2a6b82f-20f0-400f-bd2e-5d313ca271d5 --typecode=1:89c57f98-2fe5-4dc0-89c1-5ec00ceff2be -- /dev/sdb Problem opening data for reading! Error is 2. The specified file does not exist!
That is very useful information that ceph-disk just doesn't relay.
Actions