Project

General

Profile

Actions

Bug #10435

closed

ceph-osd stops with "Caught signal (Aborted)" or "osd/PG.cc: 2683: FAILED assert(values.size() == 1)"

Added by Jamin Collins over 9 years ago. Updated over 9 years ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Category:
OSD
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

While my production ceph cluster was recovering from a power outage, a few of my OSDs started flapping and eventually went down. Previously, I've simply completely removed the OSDs and re-added them fresh and allowed the cluster to recover. However, the cluster is currently reporting a few items are "unfound" (3/939435 unfound (0.000%)) and I'm leery of completely removing OSDs in this state as I don't want to incur any data loss.

Digging through the archives and bug reports I've found a similar case1 with a request for reproduction with increased logging levels. I believe I've managed to gather the requested level of detail and will attach it to this report.

[1] - https://www.mail-archive.com/ceph-users@lists.ceph.com/msg01034.html


Files

ceph-osd.6.log.lzma (14.2 MB) ceph-osd.6.log.lzma attempted ceph-osd startup with debug options -- Caught signal (Aborted) Jamin Collins, 12/27/2014 12:30 PM
ceph-osd.11.log.lzma (13.7 MB) ceph-osd.11.log.lzma attempted ceph-osd startup with debug options -- osd/PG.cc: 2683: FAILED assert(values.size() == 1) Jamin Collins, 12/27/2014 12:33 PM
ceph-locate-unfound (419 Bytes) ceph-locate-unfound script used to check storage node for unfound objects Jamin Collins, 12/27/2014 01:17 PM
Actions

Also available in: Atom PDF