Project

General

Profile

Actions

Bug #14619

closed

mira039 missing drive 5

Added by Yuri Weinstein about 8 years ago. Updated about 8 years ago.

Status:
Resolved
Priority:
Normal
Category:
Test Node
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):

Description

I mira039 was listed as 'coluld not nuke' on teh stale nodes list:

2016-02-02 16:41:22,836.836 ERROR:teuthology.nuke:Could not nuke {u'mira039.front.sepia.ceph.com': u'ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDUlZ2hLNhfSTqsWtuDLJ7Zk+kGcUy93lP6jgfcAscmyLiHhtX86VLWMoANh1Hp/A2ZY35wX6WvFGly5qjQY/3sKAIg2IeGfarz1t7bz4HM67cQZwnCrk9BeChFw8ICwwCCF6/V4NLHphSF+6pxpzipiv2tj+w1ZTv4xHIlX/TA/ThtTRp0vuSX+FTDTY6HqFduEUnTRuV1IDB02Qt92CnlfWAWAZHIZ6FUjrBLnaFVp37E+0dmLijV1nRqUN5ldy6Cl0RZ0P/ksyiMcZ69SY9sAFRFaulJTdXCX+Ki+XHfN2XQGcWBBooRCt2+f3ToPuVuSA5vTvcpty1pTt/QS/nv'}

Could not sol activate:

yuriw@Yuris-MacBook-Air:~$ ipmitool XXXXXXX sol activate
[SOL Session operational.  Use ~? for help]
ERROR: Received message with invalid authcode!
                                              Assertion failed: (0), function ipmi_lan_poll_recv, file lanplus.c, line 659.
                                                                                                                           Abort trap: 6

It was locked by VPSHOST, I think I saw it was actually coming up as stale locked by Sage ?!
teuthology-lock --unlock --owner VPSHOST@VPSHOST mira036

It's marked down now

Actions #1

Updated by Yuri Weinstein about 8 years ago

  • Description updated (diff)
Actions #2

Updated by Yuri Weinstein about 8 years ago

  • Subject changed from can't ssh nuke or access via ipmi sol mira036 to can't ssh nuke or access via ipmi sol mira039
Actions #3

Updated by Yuri Weinstein about 8 years ago

  • Description updated (diff)
Actions #4

Updated by Dan Mick about 8 years ago

  • Subject changed from can't ssh nuke or access via ipmi sol mira039 to mira039 missing drive 5
  • Description updated (diff)
  • Assignee set to David Galloway

The upshot is: mira036 was not involved; mira039 was unresponsive, and on reboot spent a long time determining that drive 5 had failed, but now is just in "needs drive 5" state.

Actions #5

Updated by Dan Mick about 8 years ago

it's also behaving badly on reboot; still spending a lot of time in RAID BIOS. Maybe the drive is failed in such a way that it's poisoning the attempts to probe it, and it would behave better with the drive removed/replaced, but...be prepared for something deeper being wrong.

Actions #6

Updated by David Galloway about 8 years ago

  • Category set to Test Node
Actions #7

Updated by David Galloway about 8 years ago

  • Status changed from New to In Progress

Updated RAID controller firmware from V1.49 2011-08-24 to V1.52 2015-11-20 and system boots considerably faster during RAID BIOS

Ticket is on hold till we get more drives.

Actions #8

Updated by David Galloway about 8 years ago

Something's up with this machine. I started a reimage on 5 miras at the same time. The other four are done and this one hasn't gotten through package installation yet.

Maybe try replacing RAID controller after all.

Actions #9

Updated by David Galloway about 8 years ago

  • Status changed from In Progress to Resolved

I'm unable to reproduce any weirdness on this machine now. Releasing to the pool and will try replacing RAID controller if issues persist.

Actions

Also available in: Atom PDF