Bug #1617
closed
pgs stuck down and peering with only one osd down and out
Added by Josh Durgin over 12 years ago.
Updated over 12 years ago.
Description
From teuthology:~teuthworker/archive/nightly_coverage_2011-10-13/491/teuthology.log:
2011-10-13T15:46:33.996 INFO:teuthology.task.thrashosds.ceph_manager:2011-10-13 15:40:58.190910 pg v1117: 144 pgs: 141 active+clean, 3 down+peering; 27400 MB data, 105 GB used, 719 GB / 869 GB avail
2011-10-13 15:40:58.191662 mds e5: 1/1/1 up {0=0=up:active}
2011-10-13 15:40:58.191713 osd e125: 8 osds: 7 up, 7 in
2011-10-13 15:40:58.191783 log 2011-10-13 12:39:15.188577 mon.0 10.3.14.194:6791/0 63 : [INF] osd.3 out (down for 300.693807)
2011-10-13 15:40:58.191857 mon e1: 3 mons at {0=10.3.14.194:6791/0,1=10.3.14.198:6789/0,2=10.3.14.184:6790/0}
Happened in run 494 as well. These were both rados bench with thrashing.
- Status changed from New to Rejected
non-specific, and pre-prior set refactor.
- Status changed from Rejected to New
- Target version changed from v0.38 to v0.39
Happened again today in teuthology:~teuthworker/archive/nightly_coverage_2011-11-03/1433:
$ LD_LIBRARY_PATH=/tmp/cephtest/binary/usr/local/lib /tmp/cephtest/binary/usr/local/bin/ceph-coverage /tmp/cephtest/archive/coverage /tmp/cephtest/binary/usr/local/bin/ceph -c /tmp/cephtest/ceph.conf -s
2011-11-03 14:02:00.316911 pg v6925: 144 pgs: 142 active+clean, 2 down+peering; 126 MB data, 15506 MB used, 3141 GB / 3172 GB avail
2011-11-03 14:02:00.317602 mds e5: 1/1/1 up {0=0=up:active}
2011-11-03 14:02:00.317658 osd e1645: 8 osds: 7 up, 7 in
2011-11-03 14:02:00.317768 log 2011-11-03 14:01:52.600845 osd.6 10.3.14.191:6803/8573 460 : [INF] 0.0p6 scrub ok
2011-11-03 14:02:00.317852 mon e1: 3 mons at {0=10.3.14.133:6791/0,1=10.3.14.167:6789/0,2=10.3.14.170:6790/0}
- Target version changed from v0.39 to v0.40
- Status changed from New to Won't Fix
the new code will have an explicit 'incomplete' state when peering fails, instead of being 'stuck'. let's ignore this and see how the new code fares.
- Translation missing: en.field_position set to 1
- Translation missing: en.field_position changed from 1 to 1049
Also available in: Atom
PDF