https://tracker.ceph.com/https://tracker.ceph.com/favicon.ico2017-08-01T19:02:40ZCeph RADOS - Bug #20844: peering_blocked_by_history_les_bound on workloads/ec-snaps-few-objects-overwrites.yamlhttps://tracker.ceph.com/issues/20844?journal_id=962992017-08-01T19:02:40ZSage Weilsage@newdream.net
<ul></ul><p>/a/sage-2017-08-01_15:32:10-rados-wip-sage-testing-distro-basic-smithi/1469176</p>
<p>rados/thrash-erasure-code/{ceph.yaml clusters/{fixed-2.yaml openstack.yaml} d-require-luminous/at-end.yaml fast/fast.yaml leveldb.yaml msgr-failures/osd-delay.yaml objectstore/filestore-xfs.yaml rados.yaml thrashers/mapgap.yaml thrashosds-health.yaml workloads/ec-rados-plugin=jerasure-k=3-m=1.yaml}</p> RADOS - Bug #20844: peering_blocked_by_history_les_bound on workloads/ec-snaps-few-objects-overwrites.yamlhttps://tracker.ceph.com/issues/20844?journal_id=963002017-08-01T19:03:47ZSage Weilsage@newdream.net
<ul></ul><pre>
root@smithi200:~# ceph tell 2.1b query
{
"state": "incomplete",
"snap_trimq": "[]",
"epoch": 3426,
"up": [
2,
3,
0,
5
],
"acting": [
2,
3,
0,
5
],
...
"probing_osds": [
"0(2)",
"2(0)",
"3(1)",
"5(3)"
],
"down_osds_we_would_probe": [
1
],
"peering_blocked_by": [],
"peering_blocked_by_detail": [
{
"detail": "peering_blocked_by_history_les_bound"
}
]
},
{
"name": "Started",
"enter_time": "2017-08-01 16:36:39.822172"
}
],
"agent_state": {}
}
</pre> RADOS - Bug #20844: peering_blocked_by_history_les_bound on workloads/ec-snaps-few-objects-overwrites.yamlhttps://tracker.ceph.com/issues/20844?journal_id=963332017-08-02T14:11:51ZSage Weilsage@newdream.net
<ul></ul><p>/a/sage-2017-08-02_01:58:49-rados-wip-sage-testing-distro-basic-smithi/1470073</p>
<p>pg 2.d on [5,1,4]</p> RADOS - Bug #20844: peering_blocked_by_history_les_bound on workloads/ec-snaps-few-objects-overwrites.yamlhttps://tracker.ceph.com/issues/20844?journal_id=963732017-08-02T15:33:46ZSage Weilsage@newdream.net
<ul><li><strong>Priority</strong> changed from <i>Urgent</i> to <i>Immediate</i></li></ul> RADOS - Bug #20844: peering_blocked_by_history_les_bound on workloads/ec-snaps-few-objects-overwrites.yamlhttps://tracker.ceph.com/issues/20844?journal_id=964212017-08-03T13:51:38ZSage Weilsage@newdream.net
<ul></ul><p>This appears to be a test problem:</p>
<p>- the thrashosds has 'chance_test_map_discontinuity: 0.5', which will mark an osd down, wait for things to go clean, and then bring it back up.<br />- the workload creates an ec pool with the teuthologyprofile profile, which is apparent k=2 m=1. That means min_size=2+1=3, and <strong>any</strong> osd down will usually prevent us from going clean.</p>
<p>I'm not sure why we're seeing this now and we weren't before. It seems like the fix is to do something like k=2 m=2, though?</p> RADOS - Bug #20844: peering_blocked_by_history_les_bound on workloads/ec-snaps-few-objects-overwrites.yamlhttps://tracker.ceph.com/issues/20844?journal_id=964242017-08-03T14:49:23ZSage Weilsage@newdream.net
<ul><li><strong>Status</strong> changed from <i>12</i> to <i>Fix Under Review</i></li></ul><p><a class="external" href="https://github.com/ceph/ceph/pull/16789">https://github.com/ceph/ceph/pull/16789</a></p> RADOS - Bug #20844: peering_blocked_by_history_les_bound on workloads/ec-snaps-few-objects-overwrites.yamlhttps://tracker.ceph.com/issues/20844?journal_id=964492017-08-03T18:22:03ZSage Weilsage@newdream.net
<ul><li><strong>Status</strong> changed from <i>Fix Under Review</i> to <i>Resolved</i></li></ul>