https://tracker.ceph.com/https://tracker.ceph.com/favicon.ico2018-03-19T16:00:10ZCeph RADOS - Bug #23267: scrub errors not cleared on replicas can cause inconsistent pg state when replica takes over primaryhttps://tracker.ceph.com/issues/23267?journal_id=1092962018-03-19T16:00:10ZDavid Zafmandzafman@redhat.com
<ul><li><strong>Status</strong> changed from <i>New</i> to <i>12</i></li></ul> RADOS - Bug #23267: scrub errors not cleared on replicas can cause inconsistent pg state when replica takes over primaryhttps://tracker.ceph.com/issues/23267?journal_id=1092972018-03-19T16:26:05ZDavid Zafmandzafman@redhat.com
<ul><li><strong>Status</strong> changed from <i>12</i> to <i>In Progress</i></li><li><strong>Priority</strong> changed from <i>Normal</i> to <i>Urgent</i></li></ul> RADOS - Bug #23267: scrub errors not cleared on replicas can cause inconsistent pg state when replica takes over primaryhttps://tracker.ceph.com/issues/23267?journal_id=1094892018-03-22T02:55:59ZDavid Zafmandzafman@redhat.com
<ul></ul><p>I reproduced this by creating an inconsistent pg and then causing it to split.</p>
<p>pool of size 2 with 1 pg and I created an inconsistency in one object (it had 2 scrub errors, size and data_digest_mismatch)</p>
<p>After pg_num/pgp_num to 8, there was only 1 resulting pg on a single replica with num_scrub_errors and the value was 1.</p>
<pre>
PG_STAT OBJECTS MISSING_ON_PRIMARY DEGRADED MISPLACED UNFOUND BYTES LOG DISK_LOG STATE STATE_STAMP VERSION REPORTED UP UP_PRIMARY ACTING ACTING_PRIMARY LAST_SCRUB SCRUB_STAMP LAST_DEEP_SCRUB DEEP_SCRUB_STAMP SNAPTRIMQ_LEN
1.7 12 0 0 0 0 20048 101 101 active+clean 2018-03-21 19:46:17.366706 11'101 20:169 [1,2] 1 [1,2] 1 11'101 2018-03-21 19:43:22.172793 11'101 2018-03-21 19:43:22.172793 0
1.6 12 0 0 0 0 20048 101 101 active+clean 2018-03-21 19:46:15.176818 11'101 19:146 [1,0] 1 [1,0] 1 11'101 2018-03-21 19:43:22.172793 11'101 2018-03-21 19:43:22.172793 0
1.5 13 0 0 0 0 20048 101 101 active+clean 2018-03-21 19:46:18.414012 11'101 20:54 [2,0] 2 [2,0] 2 11'101 2018-03-21 19:43:22.172793 11'101 2018-03-21 19:43:22.172793 0
1.4 13 0 0 0 0 20049 101 101 active+clean 2018-03-21 19:46:15.181264 11'101 19:146 [1,0] 1 [1,0] 1 11'101 2018-03-21 19:43:22.172793 11'101 2018-03-21 19:43:22.172793 0
1.3 13 0 0 0 0 20049 101 101 active+clean 2018-03-21 19:46:18.724337 11'101 20:173 [1,2] 1 [1,2] 1 11'101 2018-03-21 19:43:22.172793 11'101 2018-03-21 19:43:22.172793 0
1.2 13 0 0 0 0 20049 101 101 active+clean 2018-03-21 19:46:16.187580 11'101 20:136 [0,1] 0 [0,1] 0 11'101 2018-03-21 19:43:22.172793 11'101 2018-03-21 19:43:22.172793 0
1.1 13 0 0 0 0 20049 101 101 active+clean 2018-03-21 19:46:19.378158 11'101 20:30 [2,0] 2 [2,0] 2 11'101 2018-03-21 19:43:22.172793 11'101 2018-03-21 19:43:22.172793 0
1.0 12 0 0 0 0 20048 101 101 active+clean 2018-03-21 19:46:15.174846 11'101 19:147 [1,0] 1 [1,0] 1 11'101 2018-03-21 19:43:22.172793 11'101 2018-03-21 19:43:22.172793 0
</pre>
<p>Interestingly, all the primaries have num_scrub_errors == 0, but a single post-split pg replica has a non-zero value.</p>
<pre>
1.0
"num_scrub_errors": 0,
"num_scrub_errors": 0,
1.1
"num_scrub_errors": 0,
"num_scrub_errors": 0,
1.2
"num_scrub_errors": 0,
"num_scrub_errors": 1,
1.3
"num_scrub_errors": 0,
"num_scrub_errors": 0,
1.4
"num_scrub_errors": 0,
"num_scrub_errors": 0,
1.5
"num_scrub_errors": 0,
"num_scrub_errors": 0,
1.6
"num_scrub_errors": 0,
"num_scrub_errors": 0,
1.7
"num_scrub_errors": 0,
"num_scrub_errors": 0,
</pre> RADOS - Bug #23267: scrub errors not cleared on replicas can cause inconsistent pg state when replica takes over primaryhttps://tracker.ceph.com/issues/23267?journal_id=1097642018-03-28T17:27:56ZDavid Zafmandzafman@redhat.com
<ul><li><strong>Status</strong> changed from <i>In Progress</i> to <i>Fix Under Review</i></li><li><strong>Backport</strong> set to <i>luminous, jewel</i></li></ul><p><a class="external" href="https://github.com/ceph/ceph/pull/21101">https://github.com/ceph/ceph/pull/21101</a></p> RADOS - Bug #23267: scrub errors not cleared on replicas can cause inconsistent pg state when replica takes over primaryhttps://tracker.ceph.com/issues/23267?journal_id=1097652018-03-28T17:42:38ZDavid Zafmandzafman@redhat.com
<ul><li><strong>Copied to</strong> <i><a class="issue tracker-9 status-3 priority-6 priority-high2 closed" href="/issues/23485">Backport #23485</a>: luminous: scrub errors not cleared on replicas can cause inconsistent pg state when replica takes over primary</i> added</li></ul> RADOS - Bug #23267: scrub errors not cleared on replicas can cause inconsistent pg state when replica takes over primaryhttps://tracker.ceph.com/issues/23267?journal_id=1097752018-03-28T20:26:54ZNathan Cutlerncutler@suse.cz
<ul><li><strong>Copied to</strong> <i><a class="issue tracker-9 status-3 priority-6 priority-high2 closed" href="/issues/23486">Backport #23486</a>: jewel: scrub errors not cleared on replicas can cause inconsistent pg state when replica takes over primary</i> added</li></ul> RADOS - Bug #23267: scrub errors not cleared on replicas can cause inconsistent pg state when replica takes over primaryhttps://tracker.ceph.com/issues/23267?journal_id=1101712018-04-04T21:07:12ZGreg Farnumgfarnum@redhat.com
<ul><li><strong>Status</strong> changed from <i>Fix Under Review</i> to <i>Pending Backport</i></li></ul> RADOS - Bug #23267: scrub errors not cleared on replicas can cause inconsistent pg state when replica takes over primaryhttps://tracker.ceph.com/issues/23267?journal_id=1109542018-04-11T22:59:30ZDavid Zafmandzafman@redhat.com
<ul><li><strong>Related to</strong> <i><a class="issue tracker-1 status-9 priority-4 priority-default closed" href="/issues/23576">Bug #23576</a>: osd: active+clean+inconsistent pg will not scrub or repair</i> added</li></ul> RADOS - Bug #23267: scrub errors not cleared on replicas can cause inconsistent pg state when replica takes over primaryhttps://tracker.ceph.com/issues/23267?journal_id=1140252018-05-24T20:41:13ZYuri Weinsteinyweinste@redhat.com
<ul></ul><p>merged <a class="external" href="https://github.com/ceph/ceph/pull/21194">https://github.com/ceph/ceph/pull/21194</a></p> RADOS - Bug #23267: scrub errors not cleared on replicas can cause inconsistent pg state when replica takes over primaryhttps://tracker.ceph.com/issues/23267?journal_id=1140562018-05-25T03:38:03ZDavid Zafmandzafman@redhat.com
<ul><li><strong>Status</strong> changed from <i>Pending Backport</i> to <i>Resolved</i></li></ul>