Bug #6141: OSDs crash on recovery - Ceph - Ceph

Actions

Copy link

Bug #6141

closed

OSDs crash on recovery

Added by Niklas Goerke over 10 years ago. Updated almost 10 years ago.

Status:

Can't reproduce

Priority:

Normal

Assignee:

Category:

Target version:

% Done:

Source:

other

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

After (mistakenly) executing "echo 2 > /proc/sys/vm/drop_caches" instead of "echo 1 > /proc/sys/vm/drop_caches" to clean filesystem caches for performance testing my (test-) cluster crashed. I finally got it online about 24 hours later after discovering that my machines did not have enough pids for the 1.4 Million Threads that ceph spawned (of which 33k seem to be still running).
(No real ceph problem until now)

But now my osds start failing one after another (see log files attached). It feels like from my 180 OSDs about 1 fails in 10 Minutes. It wont come online on its own again but will when started manually.
The machine from which this logfile and objdump are taken hosts 15 OSDs of which No. 160 was down when the objdump was created.

Files

Download all files

ceph-osd.160.log (3.15 MB) ceph-osd.160.log	osd logfile	Niklas Goerke, 08/28/2013 09:39 AM
objdump (66.3 MB) objdump	#objdump -rdS /usr/bin/ceph-osd >objdump	Niklas Goerke, 08/28/2013 09:39 AM
ceph-osd.120.log.1 (2.63 MB) ceph-osd.120.log.1	OSD that crashed initially - NOT directly bug related!	Niklas Goerke, 08/28/2013 10:16 AM
ceph-osd.160.log.1 (5.89 MB) ceph-osd.160.log.1	OSD that did not crash initially - NOT directly bug related!	Niklas Goerke, 08/28/2013 10:16 AM

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph

Custom queries

Bug #6141

OSDs crash on recovery

Updated by Greg Farnum over 10 years ago

Updated by Niklas Goerke over 10 years ago

Updated by Ian Colle over 10 years ago

Updated by Greg Farnum almost 10 years ago

Updated by Samuel Just almost 10 years ago