https://tracker.ceph.com/https://tracker.ceph.com/favicon.ico2013-08-27T08:26:05ZCeph Ceph - Fix #6116: osd: incomplete pg from thrashing on nexthttps://tracker.ceph.com/issues/6116?journal_id=267482013-08-27T08:26:05ZSage Weilsage@newdream.net
<ul></ul><p>ubuntu@teuthology:/a/teuthology-2013-08-26_15:47:58-rados-next-testing-basic-plana/6694</p>
<p>cluster is still hung</p> Ceph - Fix #6116: osd: incomplete pg from thrashing on nexthttps://tracker.ceph.com/issues/6116?journal_id=267622013-08-27T09:38:47ZIan Colleicolle@redhat.com
<ul><li><strong>Assignee</strong> set to <i>Samuel Just</i></li></ul><p>Sam, please take a look.</p> Ceph - Fix #6116: osd: incomplete pg from thrashing on nexthttps://tracker.ceph.com/issues/6116?journal_id=268212013-08-29T07:40:26ZSage Weilsage@newdream.net
<ul></ul><p>ubuntu@teuthology://a/teuthology-2013-08-28_01:00:04-rados-master-testing-basic-plana/10150</p> Ceph - Fix #6116: osd: incomplete pg from thrashing on nexthttps://tracker.ceph.com/issues/6116?journal_id=273762013-09-10T14:09:01ZSamuel Justsjust@redhat.com
<ul></ul><p>time: 2717s<br />log: <a class="external" href="http://qa-proxy.ceph.com/teuthology/teuthology-2013-09-09_20:00:20-rados-dumpling-testing-basic-plana/27708/">http://qa-proxy.ceph.com/teuthology/teuthology-2013-09-09_20:00:20-rados-dumpling-testing-basic-plana/27708/</a></p>
<pre><code>failed to become clean before timeout expired</code></pre>
<p>Hung</p>
<p>2013-09-09 22:33:32.641520 mon.0 10.214.131.15:6789/0 3025 : [INF] pgmap v1622: 172 pgs: 171 active+clean, 1 incomplete; 21590 bytes data, 892 MB used, 2174 GB / 2291 GB avail</p> Ceph - Fix #6116: osd: incomplete pg from thrashing on nexthttps://tracker.ceph.com/issues/6116?journal_id=273772013-09-10T14:10:32ZSamuel Justsjust@redhat.com
<ul></ul><p>Hmm, the last osd log entry indicates that the pg in question may have gone clean?<br />2013-09-09 22:27:19.022997 7f1724a49700 5 osd.3 pg_epoch: 1049 pg[2.1e( empty local-les=1044 n=0 ec=1 les/c 1044/972 1043/1043/1043) [3,0] r=0 lpr=1043 pi=791-1042/8 bft=0 mlcod 0'0 active] enter Started/Primary/Active/Clean</p> Ceph - Fix #6116: osd: incomplete pg from thrashing on nexthttps://tracker.ceph.com/issues/6116?journal_id=273792013-09-10T14:12:10ZSamuel Justsjust@redhat.com
<ul></ul><p>The task was in process of letting the cluster recover with osd.2 down.</p> Ceph - Fix #6116: osd: incomplete pg from thrashing on nexthttps://tracker.ceph.com/issues/6116?journal_id=273802013-09-10T14:13:12ZSamuel Justsjust@redhat.com
<ul></ul><p>There appear to be no pgs in incomplete state according to the osd log. Issue notifying the mon?</p> Ceph - Fix #6116: osd: incomplete pg from thrashing on nexthttps://tracker.ceph.com/issues/6116?journal_id=273812013-09-10T14:16:25ZSamuel Justsjust@redhat.com
<ul></ul><p>From the mon logs, last reported seems to be<br />2013-09-09 22:31:28.047555 7f56db94d700 15 mon.a@0(leader).pg v1614 got 1.3f reported at 1348:307 state incomplete -> incomplete</p> Ceph - Fix #6116: osd: incomplete pg from thrashing on nexthttps://tracker.ceph.com/issues/6116?journal_id=273822013-09-10T14:18:01ZSamuel Justsjust@redhat.com
<ul></ul><p>1.3f does appear to be incomplete in the osd log</p> Ceph - Fix #6116: osd: incomplete pg from thrashing on nexthttps://tracker.ceph.com/issues/6116?journal_id=273832013-09-10T14:27:20ZSamuel Justsjust@redhat.com
<ul></ul><p>Ok, there are enough logs to confirm that this is the primary-thinks-it's-clean vs backfill-peer-thinks-it's-clean race.</p> Ceph - Fix #6116: osd: incomplete pg from thrashing on nexthttps://tracker.ceph.com/issues/6116?journal_id=273872013-09-10T15:16:58ZSamuel Justsjust@redhat.com
<ul><li><strong>Tracker</strong> changed from <i>Bug</i> to <i>Fix</i></li><li><strong>Target version</strong> set to <i>v0.70</i></li></ul> Ceph - Fix #6116: osd: incomplete pg from thrashing on nexthttps://tracker.ceph.com/issues/6116?journal_id=275332013-09-13T13:39:22ZSamuel Justsjust@redhat.com
<ul><li><strong>Target version</strong> deleted (<del><i>v0.70</i></del>)</li></ul> Ceph - Fix #6116: osd: incomplete pg from thrashing on nexthttps://tracker.ceph.com/issues/6116?journal_id=288292013-10-23T10:55:20ZSamuel Justsjust@redhat.com
<ul></ul><p>The workaround I put into teuthology was inadequate, I'm going to put this in the backlog and downgrade it now that it should stop messing up the nightlies.</p> Ceph - Fix #6116: osd: incomplete pg from thrashing on nexthttps://tracker.ceph.com/issues/6116?journal_id=288322013-10-23T11:03:38ZSamuel Justsjust@redhat.com
<ul><li><strong>Target version</strong> set to <i>v0.73</i></li></ul> Ceph - Fix #6116: osd: incomplete pg from thrashing on nexthttps://tracker.ceph.com/issues/6116?journal_id=288332013-10-23T11:03:44ZSamuel Justsjust@redhat.com
<ul><li><strong>translation missing: en.field_story_points</strong> set to <i>5.0</i></li></ul> Ceph - Fix #6116: osd: incomplete pg from thrashing on nexthttps://tracker.ceph.com/issues/6116?journal_id=291002013-10-30T12:24:02ZSamuel Justsjust@redhat.com
<ul><li><strong>Status</strong> changed from <i>New</i> to <i>Resolved</i></li></ul><p>I was way off on this one. We do ack the backfill completion. I suspect that the actual problem was probably fixed by the 6585 fixes (backfill_pos vs last_backfill confusion)</p> Ceph - Fix #6116: osd: incomplete pg from thrashing on nexthttps://tracker.ceph.com/issues/6116?journal_id=291012013-10-30T12:26:28ZSamuel Justsjust@redhat.com
<ul></ul><p>Removed the teuthology workaround as well.</p>