Bug #12373
closedsgdisk hangs on zap command
0%
Description
The launchpad bug that contains all the info:
https://bugs.launchpad.net/charms/+source/ceph/+bug/1475247
What we're seeing on several machines now is that the /sbin/sgdisk --zap-all --clear --mbrtogpt -- /dev/sdb is taking upwards of 40-50 minutes to complete. According to the sgdisk website the "--clear" command will fail if the partition on the disk is corrupted in any way. It's possible that this is calling sgdisk to hang and wait in a D state that cannot be killed. I'm going to try patching the python code to do 2 steps instead of one: 1) sgdisk --zap-all 2) sgdisk --clear --mbrtogpt -- /dev/sdb and see if that helps.
Updated by Chris Holcombe almost 9 years ago
Log from a stuck disk:
2015-07-16 15:11:33 INFO mon-relation-changed ^MReading state information... 0%^M^MReading state information... 0%^M^MReading state information... Done
2015-07-16 15:11:33 INFO mon-relation-changed ^GCaution: invalid backup GPT header, but valid main header; regenerating
2015-07-16 15:11:33 INFO mon-relation-changed backup header from main header.
2015-07-16 15:11:33 INFO mon-relation-changed
2015-07-16 15:11:33 INFO mon-relation-changed Warning! Main and backup partition tables differ! Use the 'c' and 'e' options
2015-07-16 15:11:33 INFO mon-relation-changed on the recovery & transformation menu to examine the two tables.
2015-07-16 15:11:33 INFO mon-relation-changed
2015-07-16 15:11:33 INFO mon-relation-changed Warning! One or more CRCs don't match. You should repair the disk!
2015-07-16 15:11:33 INFO mon-relation-changed
2015-07-16 15:55:33 INFO mon-relation-changed ^G^G****************************************************************************
2015-07-16 15:55:33 INFO mon-relation-changed Caution: Found protective or hybrid MBR and corrupt GPT. Using GPT, but disk
2015-07-16 15:55:33 INFO mon-relation-changed verification and recovery are STRONGLY recommended.
2015-07-16 15:55:33 INFO mon-relation-changed ********************************************************************
2015-07-16 15:55:33 INFO mon-relation-changed GPT data structures destroyed! You may now partition the disk using fdisk or
2015-07-16 15:55:33 INFO mon-relation-changed other utilities.
2015-07-16 15:55:33 INFO mon-relation-changed The operation has completed successfully.
Look at the delta between the "repair the disk" advice and the last line. That would account for sgdisk in D state doing stuff to the disk. What, I don't know.
This disk in particular, /dev/sdb, is a:
[ 3.913915] sd 0:0:1:0: [sdb] 5860533168 512-byte logical blocks: (3.00 TB/2.72 TiB)
Updated by Andreas Hasenack almost 9 years ago
Turns out this is a duplicate of #11143
Updated by Alfredo Deza over 8 years ago
- Status changed from New to Resolved
Closing this as the fix went already into master (and backported to hammer and firefly):
https://github.com/osynge/ceph/commit/fdd7f8d83afa25c4e09aaedd90ab93f3b64a677b
Will comment on the launchpad issue as well.