Project

General

Profile

Actions

Bug #3676

closed

osd keeps crashing at ReplicatedPG::scan_range()

Added by Xiaopong Tran over 11 years ago. Updated over 6 years ago.

Status:
Can't reproduce
Priority:
Normal
Assignee:
-
Category:
OSD
Target version:
-
% Done:

0%

Source:
Development
Tags:
Backport:
Regression:
No
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

This specific osd (osd.17) keeps crashing at the same location, as I tried to bring it back. It would start peering and recovery, but sooner or later, it would crash again, and it never finished the recovery.

Our other larger cluster, with 76 osds, crashed and we could not bring it up. Well, at the rate the recovery is running, it probably would take months to complete. We shut it down and access the data directly, since our data are immutable. And built a smaller one.

This is a small cluster with only 30 osds. The log file is attached. Here's the version information:

ceph version 0.55.1-1-g7c7469a (7c7469a19b0d563a448486adce9f326c6e5bd66d)

Running on Debian Wheezy. This is a build from Samuel Just, with a backport that fixed the constant osd crash.

Some more information:

root@s1:/tmp# ceph -s
   health HEALTH_WARN 251 pgs backfill; 38 pgs backfilling; 289 pgs degraded; 1 pgs recovering; 6 pgs recovery_wait; 34 pgs stale; 34 pgs stuck stale; 296 pgs stuck unclean; recovery 76564/2415766 degraded (3.169%)
   monmap e1: 3 mons at {a=10.1.3.1:6789/0,b=10.1.3.2:6789/0,c=10.1.3.3:6789/0}, election epoch 14, quorum 0,1,2 a,b,c
   osdmap e540: 30 osds: 29 up, 29 in
    pgmap v8905: 11452 pgs: 11122 active+clean, 98 active+degraded+wait_backfill, 6 active+recovery_wait, 34 stale+active+clean, 37 active+degraded+backfilling, 153 active+degraded+remapped+wait_backfill, 1 active+degraded+remapped+backfilling, 1 active+recovering; 2745 GB data, 7581 GB used, 46419 GB / 54001 GB avail; 76564/2415766 degraded (3.169%)
   mdsmap e1: 0/0/1 up

I tried over 10 times to bring it back, but everytime it just crashed without finishing recovery. Please let me know if I can provide more information.


Files

osd.17.log (4.07 MB) osd.17.log Xiaopong Tran, 12/22/2012 09:22 AM
Actions

Also available in: Atom PDF