Project

General

Profile

Actions

Bug #12373

closed

sgdisk hangs on zap command

Added by Chris Holcombe almost 9 years ago. Updated over 8 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

The launchpad bug that contains all the info:
https://bugs.launchpad.net/charms/+source/ceph/+bug/1475247

What we're seeing on several machines now is that the /sbin/sgdisk --zap-all --clear --mbrtogpt -- /dev/sdb is taking upwards of 40-50 minutes to complete. According to the sgdisk website the "--clear" command will fail if the partition on the disk is corrupted in any way. It's possible that this is calling sgdisk to hang and wait in a D state that cannot be killed. I'm going to try patching the python code to do 2 steps instead of one: 1) sgdisk --zap-all 2) sgdisk --clear --mbrtogpt -- /dev/sdb and see if that helps.

Actions #1

Updated by Chris Holcombe almost 9 years ago

Log from a stuck disk:
2015-07-16 15:11:33 INFO mon-relation-changed ^MReading state information... 0%^M^MReading state information... 0%^M^MReading state information... Done
2015-07-16 15:11:33 INFO mon-relation-changed ^GCaution: invalid backup GPT header, but valid main header; regenerating
2015-07-16 15:11:33 INFO mon-relation-changed backup header from main header.
2015-07-16 15:11:33 INFO mon-relation-changed
2015-07-16 15:11:33 INFO mon-relation-changed Warning! Main and backup partition tables differ! Use the 'c' and 'e' options
2015-07-16 15:11:33 INFO mon-relation-changed on the recovery & transformation menu to examine the two tables.
2015-07-16 15:11:33 INFO mon-relation-changed
2015-07-16 15:11:33 INFO mon-relation-changed Warning! One or more CRCs don't match. You should repair the disk!
2015-07-16 15:11:33 INFO mon-relation-changed
2015-07-16 15:55:33 INFO mon-relation-changed ^G^G****************************************************************************
2015-07-16 15:55:33 INFO mon-relation-changed Caution: Found protective or hybrid MBR and corrupt GPT. Using GPT, but disk
2015-07-16 15:55:33 INFO mon-relation-changed verification and recovery are STRONGLY recommended.
2015-07-16 15:55:33 INFO mon-relation-changed ********************************************************************
2015-07-16 15:55:33 INFO mon-relation-changed GPT data structures destroyed! You may now partition the disk using fdisk or
2015-07-16 15:55:33 INFO mon-relation-changed other utilities.
2015-07-16 15:55:33 INFO mon-relation-changed The operation has completed successfully.

Look at the delta between the "repair the disk" advice and the last line. That would account for sgdisk in D state doing stuff to the disk. What, I don't know.

This disk in particular, /dev/sdb, is a:
[ 3.913915] sd 0:0:1:0: [sdb] 5860533168 512-byte logical blocks: (3.00 TB/2.72 TiB)

Actions #2

Updated by Andreas Hasenack almost 9 years ago

Turns out this is a duplicate of #11143

Actions #3

Updated by Alfredo Deza over 8 years ago

  • Status changed from New to Resolved

Closing this as the fix went already into master (and backported to hammer and firefly):

https://github.com/osynge/ceph/commit/fdd7f8d83afa25c4e09aaedd90ab93f3b64a677b

Will comment on the launchpad issue as well.

Actions

Also available in: Atom PDF