Actions
Bug #1617
closedpgs stuck down and peering with only one osd down and out
% Done:
0%
Source:
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
From teuthology:~teuthworker/archive/nightly_coverage_2011-10-13/491/teuthology.log:
2011-10-13T15:46:33.996 INFO:teuthology.task.thrashosds.ceph_manager:2011-10-13 15:40:58.190910 pg v1117: 144 pgs: 141 active+clean, 3 down+peering; 27400 MB data, 105 GB used, 719 GB / 869 GB avail 2011-10-13 15:40:58.191662 mds e5: 1/1/1 up {0=0=up:active} 2011-10-13 15:40:58.191713 osd e125: 8 osds: 7 up, 7 in 2011-10-13 15:40:58.191783 log 2011-10-13 12:39:15.188577 mon.0 10.3.14.194:6791/0 63 : [INF] osd.3 out (down for 300.693807) 2011-10-13 15:40:58.191857 mon e1: 3 mons at {0=10.3.14.194:6791/0,1=10.3.14.198:6789/0,2=10.3.14.184:6790/0}
Updated by Josh Durgin over 12 years ago
Happened in run 494 as well. These were both rados bench with thrashing.
Updated by Sage Weil over 12 years ago
- Status changed from New to Rejected
non-specific, and pre-prior set refactor.
Updated by Josh Durgin over 12 years ago
- Status changed from Rejected to New
- Target version changed from v0.38 to v0.39
Happened again today in teuthology:~teuthworker/archive/nightly_coverage_2011-11-03/1433:
$ LD_LIBRARY_PATH=/tmp/cephtest/binary/usr/local/lib /tmp/cephtest/binary/usr/local/bin/ceph-coverage /tmp/cephtest/archive/coverage /tmp/cephtest/binary/usr/local/bin/ceph -c /tmp/cephtest/ceph.conf -s 2011-11-03 14:02:00.316911 pg v6925: 144 pgs: 142 active+clean, 2 down+peering; 126 MB data, 15506 MB used, 3141 GB / 3172 GB avail 2011-11-03 14:02:00.317602 mds e5: 1/1/1 up {0=0=up:active} 2011-11-03 14:02:00.317658 osd e1645: 8 osds: 7 up, 7 in 2011-11-03 14:02:00.317768 log 2011-11-03 14:01:52.600845 osd.6 10.3.14.191:6803/8573 460 : [INF] 0.0p6 scrub ok 2011-11-03 14:02:00.317852 mon e1: 3 mons at {0=10.3.14.133:6791/0,1=10.3.14.167:6789/0,2=10.3.14.170:6790/0}
Updated by Sage Weil over 12 years ago
- Target version changed from v0.39 to v0.40
Updated by Sage Weil over 12 years ago
- Status changed from New to Won't Fix
the new code will have an explicit 'incomplete' state when peering fails, instead of being 'stuck'. let's ignore this and see how the new code fares.
Updated by Sage Weil over 12 years ago
- Translation missing: en.field_position set to 1
- Translation missing: en.field_position changed from 1 to 1049
Actions