Project

General

Profile

Actions

Bug #18689

closed

mira108 and mira072 inaccessible and can't be nuked

Added by Yuri Weinstein about 7 years ago. Updated about 7 years ago.

Status:
Resolved
Priority:
Normal
Category:
Test Node
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):

Description

2017-01-26 16:38:08,006.006 INFO:teuthology.orchestra.console:Power cycling mira108
2017-01-26 16:48:08,192.192 ERROR:teuthology.nuke:
Traceback (most recent call last):
  File "/home/yuriw/teuthology/teuthology/nuke/__init__.py", line 312, in nuke_helper
    check_console(host)
  File "/home/yuriw/teuthology/teuthology/nuke/actions.py", line 422, in check_console
    console.power_cycle()
  File "/home/yuriw/teuthology/teuthology/orchestra/console.py", line 203, in power_cycle
    self._wait_for_login(timeout=300)
  File "/home/yuriw/teuthology/teuthology/orchestra/console.py", line 155, in _wait_for_login
    raise ConsoleError("Did not get a login prompt from %s!" % self.name)
ConsoleError: Did not get a login prompt from mira108.front.sepia.ceph.com!
2017-01-26 16:48:08,192.192 INFO:teuthology.nuke:Will attempt to connect via SSH
2017-01-26 16:48:11,738.738 ERROR:teuthology.nuke:Could not nuke {u'mira108.front.sepia.ceph.com': u'ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDnKiE7063UDRi6+dUhGB49c1jds1c+Gdnbgv5zB8VFt7j51ByjJTFwx46wWJSd3ZFd0yFS+t/aJfNOV61lVhHNI9RClI0ON0W/qupGKenxgsd1ncGKrVMcoKzJ8Khu7XJ7gUS+yFLYbdhJozEW9mYcggp7DYitcD7yHUBahiuDZVrCCYFpi7iGqkcms8mkE1s6UN22nSMyeEoJ3vnN4N8FmduR8HTVOAL/1FxSdRgEd8SHtytg+IKgkKmfuCKqgXSOtjqAQVD5DMEUnzSDAUpNBhZbWDZOrTilgNhWaSD60OCGMWgWjEK26plJQ6YEmJELCSd2T/6H0b3cKT17MHOF'}
Traceback (most recent call last):
  File "/home/yuriw/teuthology/teuthology/nuke/__init__.py", line 281, in nuke_one
    nuke_helper(ctx, should_unlock)
  File "/home/yuriw/teuthology/teuthology/nuke/__init__.py", line 317, in nuke_helper
    remote.connect()
  File "/home/yuriw/teuthology/teuthology/orchestra/remote.py", line 59, in connect
    self.ssh = connection.connect(**args)
  File "/home/yuriw/teuthology/teuthology/orchestra/connection.py", line 104, in connect
    ssh.connect(**connect_args)
  File "/home/yuriw/teuthology/virtualenv/local/lib/python2.7/site-packages/paramiko/client.py", line 324, in connect
    raise NoValidConnectionsError(errors)
NoValidConnectionsError: [Errno None] Unable to connect to port 22 on 172.21.8.104
2017-01-26 16:48:11,740.740 ERROR:teuthology.nuke:Could not nuke the following targets:
targets:
  mira108.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDnKiE7063UDRi6+dUhGB49c1jds1c+Gdnbgv5zB8VFt7j51ByjJTFwx46wWJSd3ZFd0yFS+t/aJfNOV61lVhHNI9RClI0ON0W/qupGKenxgsd1ncGKrVMcoKzJ8Khu7XJ7gUS+yFLYbdhJozEW9mYcggp7DYitcD7yHUBahiuDZVrCCYFpi7iGqkcms8mkE1s6UN22nSMyeEoJ3vnN4N8FmduR8HTVOAL/1FxSdRgEd8SHtytg+IKgkKmfuCKqgXSOtjqAQVD5DMEUnzSDAUpNBhZbWDZOrTilgNhWaSD60OCGMWgWjEK26plJQ6YEmJELCSd2T/6H0b3cKT17MHOF

Actions #1

Updated by David Galloway about 7 years ago

mira108 dropped to initramfs.

dmesg output

[    2.722018] =================================
[    2.722018] [ INFO: inconsistent lock state ]
[    2.722019] 4.10.0-rc5-ceph-g10f34ebb06e7 #1 Not tainted
[    2.722020] ---------------------------------
[    2.722021] inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
[    2.722022] cpuhp/5/37 [HC0[0]:SC0[0]:HE1:SE1] takes:
[    2.722023]  (tick_broadcast_lock){?.....}, at: [<ffffffffb1126aee>] tick_broadcast_control+0x4e/0x180
[    2.722031] {IN-HARDIRQ-W} state was registered at:
[    2.722031]   
[    2.722034] [<ffffffffb10eac52>] __lock_acquire+0x762/0x1250
[    2.722034]   
[    2.722035] [<ffffffffb10ebba0>] lock_acquire+0x100/0x1f0
[    2.722035]   
[    2.722038] [<ffffffffb189eb80>] _raw_spin_lock_irqsave+0x50/0x70
[    2.722039]   
[    2.722040] [<ffffffffb1126c36>] tick_broadcast_switch_to_oneshot+0x16/0x50
[    2.722040]   
[    2.722042] [<ffffffffb1126fda>] tick_switch_to_oneshot+0x4a/0xc0
[    2.722042]   
[    2.722043] [<ffffffffb11270d5>] tick_init_highres+0x15/0x20
[    2.722044]   
[    2.722046] [<ffffffffb1116c8b>] hrtimer_run_queues+0x8b/0xd0
[    2.722046]   
[    2.722048] [<ffffffffb111519e>] run_local_timers+0x1e/0x50
[    2.722048]   
[    2.722049] [<ffffffffb11151f7>] update_process_times+0x27/0x60
[    2.722049]   
[    2.722051] [<ffffffffb11253af>] tick_periodic+0x2f/0xc0
[    2.722051]   
[    2.722052] [<ffffffffb1125465>] tick_handle_periodic+0x25/0x70
[    2.722053]   
[    2.722056] [<ffffffffb105cd78>] local_apic_timer_interrupt+0x38/0x60
[    2.722056]   
[    2.722058] [<ffffffffb18a1928>] smp_apic_timer_interrupt+0x38/0x50
[    2.722059]   
[    2.722060] [<ffffffffb18a0ab3>] apic_timer_interrupt+0x93/0xa0
[    2.722060]   
[    2.722062] [<ffffffffb189dabc>] mwait_idle+0x6c/0x220
[    2.722062]   
[    2.722065] [<ffffffffb103f93f>] arch_cpu_idle+0xf/0x20
[    2.722065]   
[    2.722066] [<ffffffffb189df33>] default_idle_call+0x23/0x40
[    2.722067]   
[    2.722069] [<ffffffffb10de4ca>] do_idle+0x16a/0x200
[    2.722069]   
[    2.722070] [<ffffffffb10de8e2>] cpu_startup_entry+0x62/0x70
[    2.722070]   
[    2.722072] [<ffffffffb105b2be>] start_secondary+0x14e/0x180
[    2.722072]   
[    2.722074] [<ffffffffb10001c4>] verify_cpu+0x0/0xfc
[    2.722074] irq event stamp: 95
[    2.722076] hardirqs last  enabled at (95): [<ffffffffb189e52c>] _raw_spin_unlock_irq+0x2c/0x40
[    2.722079] hardirqs last disabled at (94): [<ffffffffb18953ea>] __schedule+0xca/0xb60
[    2.722082] softirqs last  enabled at (0): [<ffffffffb108a5e6>] copy_process+0x576/0x20a0
[    2.722083] softirqs last disabled at (0): [<          (null)>]           (null)
[    2.722083] 
[    2.722083] other info that might help us debug this:
[    2.722083]  Possible unsafe locking scenario:
[    2.722083] 
[    2.722084]        CPU0
[    2.722084]        ----
[    2.722084]   lock(tick_broadcast_lock);
[    2.722085]   <Interrupt>
[    2.722085]     lock(tick_broadcast_lock);
[    2.722086] 
[    2.722086]  *** DEADLOCK ***
[    2.722086] 
[    2.722087] no locks held by cpuhp/5/37.
[    2.722087] 
[    2.722087] stack backtrace:
[    2.722089] CPU: 5 PID: 37 Comm: cpuhp/5 Not tainted 4.10.0-rc5-ceph-g10f34ebb06e7 #1
[    2.722089] Hardware name: Supermicro X8SIL/X8SIL, BIOS 1.0c 02/25/2010
[    2.722090] Call Trace:
[    2.722093]  dump_stack+0x85/0xc2
[    2.722094]  print_usage_bug+0x1e1/0x1f0
[    2.722096]  mark_lock+0x526/0x5d0
[    2.722098]  ? check_usage_forwards+0x100/0x100
[    2.722099]  __lock_acquire+0x5ce/0x1250
[    2.722101]  lock_acquire+0x100/0x1f0
[    2.722102]  ? tick_broadcast_control+0x4e/0x180
[    2.722106]  ? backlight_resume+0x90/0x90
[    2.722107]  _raw_spin_lock+0x38/0x50
[    2.722108]  ? tick_broadcast_control+0x4e/0x180
[    2.722109]  tick_broadcast_control+0x4e/0x180
[    2.722111]  ? backlight_resume+0x90/0x90
[    2.722112]  intel_idle_cpu_online+0x22/0x100
[    2.722114]  cpuhp_invoke_callback+0x1f2/0x810
[    2.722116]  cpuhp_thread_fun+0x4a/0x110
[    2.722118]  smpboot_thread_fn+0x11a/0x1e0
[    2.722120]  kthread+0x10c/0x140
[    2.722121]  ? sort_range+0x30/0x30
[    2.722122]  ? kthread_stop+0x2b0/0x2b0
[    2.722124]  ret_from_fork+0x31/0x40

Actions #2

Updated by David Galloway about 7 years ago

  • Category set to Test Node
  • Status changed from New to In Progress

mira072 had locked up and wouldn't power on. I power cycled its PDU port and it came back. Will reimage and flash latest firmware.

mira108 has been reimaged and released.

Actions #3

Updated by David Galloway about 7 years ago

  • Status changed from In Progress to Resolved

mira072 had its firmware updated, was reimaged and released.

Actions

Also available in: Atom PDF