Project

General

Profile

Actions

Bug #2761

closed

osd: failed to recover before timeout expired

Added by Tamilarasi muthamizhan almost 12 years ago. Updated about 11 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
-
Target version:
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Logs: ubuntu@teuthology:/a/teuthology-2012-07-09_05:00:08-regression-stable-master-basic/8039

Attaching below an excerpt from teuthology.log,config.yaml and summary.yaml for reference,

2012-07-09T06:46:07.157 ERROR:teuthology.run_tasks:Manager failed: <contextlib.GeneratorContextManager object at 0x1449e90>
Traceback (most recent call last):
File "/var/lib/teuthworker/teuthology/teuthology/run_tasks.py", line 45, in run_tasks
suppress = manager.__exit__(*exc_info)
File "/usr/lib/python2.6/contextlib.py", line 23, in exit
self.gen.next()
File "/var/lib/teuthworker/teuthology/teuthology/task/thrashosds.py", line 85, in task
manager.wait_for_recovery(config.get('timeout', 360))
File "/var/lib/teuthworker/teuthology/teuthology/task/ceph_manager.py", line 309, in wait_for_recovery
'failed to recover before timeout expired'
AssertionError: failed to recover before timeout expired
2012-07-09T06:46:07.158 DEBUG:teuthology.run_tasks:Unwinding manager <contextlib.GeneratorContextManager object at 0x137d950>
2012-07-09T06:46:07.158 ERROR:teuthology.contextutil:Saw exception from nested tasks
Traceback (most recent call last):
File "/var/lib/teuthworker/teuthology/teuthology/contextutil.py", line 27, in nested
yield vars
File "/var/lib/teuthworker/teuthology/teuthology/task/ceph.py", line 1020, in task
yield
File "/var/lib/teuthworker/teuthology/teuthology/run_tasks.py", line 45, in run_tasks
suppress = manager.__exit__(*exc_info)
File "/usr/lib/python2.6/contextlib.py", line 23, in exit
self.gen.next()
File "/var/lib/teuthworker/teuthology/teuthology/task/thrashosds.py", line 85, in task
manager.wait_for_recovery(config.get('timeout', 360))
File "/var/lib/teuthworker/teuthology/teuthology/task/ceph_manager.py", line 309, in wait_for_recovery
'failed to recover before timeout expired'
AssertionError: failed to recover before timeout expired

config.yaml

kernel: &id001
kdb: true
sha1: 26ce171915f348abd1f41da1ed139d93750d987f
nuke-on-error: true
overrides:
ceph:
conf:
client:
rbd cache: true
rbd cache max dirty: 0
fs: xfs
log-whitelist:
- slow request
sha1: 03c2dc244af11b711e2514fd5f32b9bfa34183f6
roles:
- - mon.a
- osd.0
- osd.1
- osd.2
- - mds.a
- osd.3
- osd.4
- osd.5
- - client.0
targets:
: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDOTCMIScDTmD9NkfsWU7xeyZ+WOXai5izYeliiXDSjJC3bT6r8Fp+rhPfcHCVHiw++VsbvKZtkhjCSnJTVPWCdpRDghzJ3nZUBImWRo3PmHo1etQpCeimaOrIJ2q0ChN5jmSOqy5B+Z4om2vXBtBY6nkdTxDOr2+MH3NrSPkQSFB0zO+VPuwKXsemeUC6urb2IZZpxY3cxNq4fafTF9PROpgOnIA+o3igyU4duKEjnCzTHZjw/PL7Eph/7p6+UQgrUwe7pgVzT+2MM0zcBtBSXNqs3dCGmpvUapOkBlDoIX02EkWRNpkM3vfeFt1EFC17B5vd61Kg40bYUG8qWGR0T
: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC1atTvrZo1Sr3dY/WzNvE8gg/UeFV/U2NdmR9UlTkihYWK9nqdjirHhN/MNYVMkjY+yP3zWnbkQNx89J+XTvf1ROM+CsT3n885LCMxDtzzjwU2/x6vjwkLKSm43e8QMOfsCVl4jTniK3godrIJIg/pvUwD+dnkV/qmx3SpMnRTwzwgpfFwtFVENu7k519POG1jVrQ1tpksAke6s4/gWNsAeYxoDzP4tegWPmWIu7qHcpH8X5t/ClHWeMV6ur0yVRk/6rX0Jve98PFrsW5vFJvbZpMNMvV4ei/g8jyeCUS3OBqanRnSBgn3geLTngw4y32+squLtIKBeojNQ+8u/YNL
: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDk4GmsUmC8svnRI6Xd+mRX2MwKb4RHECAeLfqTm2COfqfolS2wKGw3U92eJcyvpZ+2p82X7uBrimjZh5JgRtxJ1aGUG4Pi60+JBYF0WpohM/3aYISFegVNET9rcapdDaAi6fFB5vhT06Q/cYEO0tPrdqGb/O3oiDSurtqtfOzkdwSPWSTY/hSegXgOeG6EjuEfvnU4BbgXWkLlDQRXCdgQd35F0SlKJVgMo+J1MgMCEK4qnBMFN614P1gBSzZCBsSUGQdjYBOzZfCRlI2bUdPDtB0kyjp7o5Ns9gLd07TLw8h9oxvI7wxG16XnLOAIzPBNOaH4OztTMGg3wJ/1e26t
tasks:
- internal.lock_machines: 3
- internal.save_config: null
- internal.check_lock: null
- internal.connect: null
- internal.check_conflict: null
- kernel: *id001
- internal.base: null
- internal.archive: null
- internal.coredump: null
- internal.syslog: null
- internal.timer: null
- chef: null
- clock: null
- ceph:
log-whitelist:
- wrongly marked me down or wrong addr
- objects unfound and apparently lost
- thrashosds:
timeout: 1200
- rbd_fsx:
clients:
- client.0
ops: 20000

ubuntu@teuthology:/a/teuthology-2012-07-09_05:00:08-regression-stable-master-basic/8039$ cat summary.yaml
ceph-sha1: 03c2dc244af11b711e2514fd5f32b9bfa34183f6
description: collection:thrash clusters:6-osd-3-machine.yaml fs:xfs.yaml thrashers:default.yaml
workloads:rbd_fsx_cache_writethrough.yaml
duration: 2710.1456501483917
failure_reason: failed to recover before timeout expired
flavor: basic
owner: scheduled_teuthology@teuthology
success: false

Actions

Also available in: Atom PDF