Bug #10435: ceph-osd stops with "Caught signal (Aborted)" or "osd/PG.cc: 2683: FAILED assert(values.size() == 1)" - Ceph - Ceph

Actions

Copy link

Bug #10435

closed

ceph-osd stops with "Caught signal (Aborted)" or "osd/PG.cc: 2683: FAILED assert(values.size() == 1)"

Added by Jamin Collins over 9 years ago. Updated over 9 years ago.

Status:

Closed

Priority:

Normal

Assignee:

Category:

OSD

Target version:

% Done:

Source:

Community (user)

Tags:

Backport:

Regression:

Severity:

2 - major

Reviewed:

Affected Versions:

ceph-qa-suite:

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

While my production ceph cluster was recovering from a power outage, a few of my OSDs started flapping and eventually went down. Previously, I've simply completely removed the OSDs and re-added them fresh and allowed the cluster to recover. However, the cluster is currently reporting a few items are "unfound" (3/939435 unfound (0.000%)) and I'm leery of completely removing OSDs in this state as I don't want to incur any data loss.

Digging through the archives and bug reports I've found a similar case¹ with a request for reproduction with increased logging levels. I believe I've managed to gather the requested level of detail and will attach it to this report.

[1] - https://www.mail-archive.com/ceph-users@lists.ceph.com/msg01034.html

Files

Download all files

ceph-osd.6.log.lzma (14.2 MB) ceph-osd.6.log.lzma	attempted ceph-osd startup with debug options -- Caught signal (Aborted)	Jamin Collins, 12/27/2014 12:30 PM
ceph-osd.11.log.lzma (13.7 MB) ceph-osd.11.log.lzma	attempted ceph-osd startup with debug options -- osd/PG.cc: 2683: FAILED assert(values.size() == 1)	Jamin Collins, 12/27/2014 12:33 PM
ceph-locate-unfound (419 Bytes) ceph-locate-unfound	script used to check storage node for unfound objects	Jamin Collins, 12/27/2014 01:17 PM

Actions

Copy link

Updated by Jamin Collins over 9 years ago

File ceph-osd.11.log.lzma ceph-osd.11.log.lzma added

Actions

Copy link

Updated by Jamin Collins over 9 years ago

File ceph-locate-unfound ceph-locate-unfound added

Near as I can tell, all the unfound objects reside on osd.6:

$ ./ceph-locate-unfound
/var/lib/ceph/osd/ceph-6/current/3.2ba_head/DIR_A/DIR_B/DIR_2/rb.0.1da2e.238e1f29.000000000178__head_F23D22BA__3
/var/lib/ceph/osd/ceph-6/current/3.25f_head/DIR_F/DIR_5/DIR_E/rb.0.1175.2ae8944a.0000000024e0__head_B0B2CE5F__3
/var/lib/ceph/osd/ceph-6/current/3.199_head/DIR_9/DIR_9/DIR_D/rb.0.1da2e.238e1f29.0000000000b3__head_76DA7D99__3

Is there any way to move these objects to a working OSD or get osd.6 back to a point where ceph-osd can start on it?

Actions

Copy link