Project

General

Profile

Actions

Bug #1617

closed

pgs stuck down and peering with only one osd down and out

Added by Josh Durgin over 12 years ago. Updated over 12 years ago.

Status:
Won't Fix
Priority:
Normal
Assignee:
-
Category:
OSD
Target version:
% Done:

0%

Source:
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

From teuthology:~teuthworker/archive/nightly_coverage_2011-10-13/491/teuthology.log:

2011-10-13T15:46:33.996 INFO:teuthology.task.thrashosds.ceph_manager:2011-10-13 15:40:58.190910    pg v1117: 144 pgs: 141 active+clean, 3 down+peering; 27400 MB data, 105 GB used, 719 GB / 869 GB avail
2011-10-13 15:40:58.191662   mds e5: 1/1/1 up {0=0=up:active}
2011-10-13 15:40:58.191713   osd e125: 8 osds: 7 up, 7 in
2011-10-13 15:40:58.191783   log 2011-10-13 12:39:15.188577 mon.0 10.3.14.194:6791/0 63 : [INF] osd.3 out (down for 300.693807)
2011-10-13 15:40:58.191857   mon e1: 3 mons at {0=10.3.14.194:6791/0,1=10.3.14.198:6789/0,2=10.3.14.184:6790/0}

Actions #1

Updated by Josh Durgin over 12 years ago

Happened in run 494 as well. These were both rados bench with thrashing.

Actions #2

Updated by Sage Weil over 12 years ago

  • Status changed from New to Rejected

non-specific, and pre-prior set refactor.

Actions #3

Updated by Josh Durgin over 12 years ago

  • Status changed from Rejected to New
  • Target version changed from v0.38 to v0.39

Happened again today in teuthology:~teuthworker/archive/nightly_coverage_2011-11-03/1433:

$ LD_LIBRARY_PATH=/tmp/cephtest/binary/usr/local/lib /tmp/cephtest/binary/usr/local/bin/ceph-coverage /tmp/cephtest/archive/coverage /tmp/cephtest/binary/usr/local/bin/ceph -c /tmp/cephtest/ceph.conf -s
2011-11-03 14:02:00.316911    pg v6925: 144 pgs: 142 active+clean, 2 down+peering; 126 MB data, 15506 MB used, 3141 GB / 3172 GB avail
2011-11-03 14:02:00.317602   mds e5: 1/1/1 up {0=0=up:active}
2011-11-03 14:02:00.317658   osd e1645: 8 osds: 7 up, 7 in
2011-11-03 14:02:00.317768   log 2011-11-03 14:01:52.600845 osd.6 10.3.14.191:6803/8573 460 : [INF] 0.0p6 scrub ok
2011-11-03 14:02:00.317852   mon e1: 3 mons at {0=10.3.14.133:6791/0,1=10.3.14.167:6789/0,2=10.3.14.170:6790/0}
Actions #4

Updated by Sage Weil over 12 years ago

  • Target version changed from v0.39 to v0.40
Actions #5

Updated by Sage Weil over 12 years ago

  • Status changed from New to Won't Fix

the new code will have an explicit 'incomplete' state when peering fails, instead of being 'stuck'. let's ignore this and see how the new code fares.

Actions #6

Updated by Sage Weil over 12 years ago

  • Translation missing: en.field_position set to 1
  • Translation missing: en.field_position changed from 1 to 1049
Actions

Also available in: Atom PDF