Project

General

Profile

Actions

Bug #14432

closed

mira095 got wedged

Added by Greg Farnum about 8 years ago. Updated about 8 years ago.

Status:
Closed
Priority:
Normal
Category:
Test Node
Target version:
-
% Done:

0%

Source:
Development
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):

Description

It disappeared while running http://pulpito.ceph.com/gregf-2016-01-18_18:01:09-fs-greg-fs-testing-118-1---basic-mira/32812

Attempting to nuke failed:

2016-01-19 18:10:07,417.417 INFO:teuthology.nuke:checking console status of mira095.ipmi.sepia.ceph.com
2016-01-19 18:12:07,922.922 INFO:teuthology.nuke:console ready on mira095.ipmi.sepia.ceph.com
2016-01-19 18:12:07,922.922 INFO:teuthology.task.internal:Checking locks...
2016-01-19 18:12:07,955.955 INFO:teuthology.task.internal:Opening connections...
2016-01-19 18:12:10,959.959 ERROR:teuthology.nuke:Could not nuke {'mira095.front.sepia.ceph.com': 'ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDAXNc0cJz/l/FhKxPhWQbgaEVKqQuhWP/1sVtQb8SyX6T4Tan1lkuxoFRbIa7OYco+vtE1+sD4CeLgXUpWtiPLr+B1a96G7GP8Dv7xtdr5wi0eX2uEWWz+mzk6t/K3ys0KNInJ4Z5BpXL8yL/WJj6abOO1l33FpHGisoLN1YQO4I6MlmfyHPkZun+teywFpEFVqCq2Sk8l2APQHRamZ9jshgwYoNf0Mw2rH6BSB0qdBgpWGTWkIQQVlX8DXZKUjxjHCHRFZ4bSIB2rLw9m0ND1r4srQ/MAowhqHCCLpUOd36au56jLubiHDC84Ssu08uFKjPFGIcd2da5Ur3qRnAtF'}
Traceback (most recent call last):
  File "/home/ubuntu/teuthology/teuthology/nuke.py", line 613, in nuke_one
    nuke_helper(ctx, should_unlock)
  File "/home/ubuntu/teuthology/teuthology/nuke.py", line 665, in nuke_helper
    connect(ctx, None)
  File "/home/ubuntu/teuthology/teuthology/task/internal.py", line 335, in connect
    rem.connect()
  File "/home/ubuntu/teuthology/teuthology/orchestra/remote.py", line 62, in connect
    self.ssh = connection.connect(**args)
  File "/home/ubuntu/teuthology/teuthology/orchestra/connection.py", line 112, in connect
    ssh.connect(**connect_args)
  File "/home/ubuntu/teuthology/virtualenv/local/lib/python2.7/site-packages/paramiko/client.py", line 296, in connect
    sock.connect(addr)
  File "/home/ubuntu/teuthology/virtualenv/local/lib/python2.7/site-packages/gevent/socket.py", line 344, in connect
    raise error(err, strerror(err))
error: [Errno 113] No route to host

I wasn't able to try direct ipmi fixes.

Actions #1

Updated by Dan Mick about 8 years ago

Maybe a hardware fault?

3]kdb> bt
Stack traceback for pid 21299
0xffff8803a2bf3000    21299    21294  1    3   R  0xffff8803a2bf34e8 *log
 ffff88043fcc8d50 0000000000000018
Call Trace:
 <#DB>  <<EOE>>  <#MC>  [<ffffffff81102c79>] ? kgdb_panic_event+0x29/0x30
 [<ffffffff8173125c>] ? notifier_call_chain+0x4c/0x70
 [<ffffffff817312ba>] ? atomic_notifier_call_chain+0x1a/0x20
 [<ffffffff8171da17>] ? panic+0xec/0x1d7
 [<ffffffff8171e51e>] ? printk+0x67/0x69
 [<ffffffff81036e5a>] ? mce_panic+0x1fa/0x210
 [<ffffffff81038ca4>] ? do_machine_check+0xaa4/0xab0
 [<ffffffff8172d43f>] ? machine_check+0x1f/0x30
 [<ffffffff811f248b>] ? generic_write_end+0x2b/0xb0
 <<EOE>>  [<ffffffff8124702f>] ? ext4_da_write_end+0x9f/0x250
 [<ffffffff8114fa7a>] ? generic_file_buffered_write+0x16a/0x250
 [<ffffffff811510d1>] ? __generic_file_aio_write+0x1c1/0x3d0
 [<ffffffff81151338>] ? generic_file_aio_write+0x58/0xa0
 [<ffffffff8123c482>] ? ext4_file_write+0xa2/0x3f0
 [<ffffffff810d8468>] ? get_futex_key+0x1d8/0x2c0
 [<ffffffff810d96a1>] ? futex_wake+0x1b1/0x1d0
 [<ffffffff811bdd8a>] ? do_sync_write+0x5a/0x90
 [<ffffffff811be514>] ? vfs_write+0xb4/0x1f0
 [<ffffffff811bef49>] ? SyS_write+0x49/0xa0
 [<ffffffff8173575d>] ? system_call_fastpath+0x1a/0x1f

Actions #2

Updated by Dan Mick about 8 years ago

Nothing more in syslog or kern.log about the fault.

Actions #3

Updated by Dan Mick about 8 years ago

  • Assignee set to David Galloway
Actions #4

Updated by David Galloway about 8 years ago

  • Status changed from New to Closed

Without more logs (or bandwidth) to investigate further, not a lot more I can do to say for sure whether this is a hardware issue.

I always search for past issues when troubleshooting machine problems, though, so if this becomes a recurring issue, I'll devote more time to it.

For now, I've reimaged the machine and released it.

Actions

Also available in: Atom PDF