https://tracker.ceph.com/https://tracker.ceph.com/favicon.ico2019-04-17T14:06:57ZCeph RADOS - Feature #39339: prioritize backfill of metadata pools, automaticallyhttps://tracker.ceph.com/issues/39339?journal_id=1348812019-04-17T14:06:57ZSage Weilsage@newdream.net
<ul></ul><p>ceph osd pool set <pool> recovery_priority <value></p>
<p>I think a value of 1 or 2 makes sense (default if unset is 0).</p> RADOS - Feature #39339: prioritize backfill of metadata pools, automaticallyhttps://tracker.ceph.com/issues/39339?journal_id=1348972019-04-17T15:38:40ZBen Englandbengland@redhat.com
<ul></ul><p>is backfill any different than recovery priority? If not, should it be? By "backfill" I mean the emergency situation where you lose replicas of an object , whereas by "recovery" I mean that you restore an OSD to operational state and bring data back onto it, but the data is already at proper level of replication.</p> RADOS - Feature #39339: prioritize backfill of metadata pools, automaticallyhttps://tracker.ceph.com/issues/39339?journal_id=1348992019-04-17T15:54:52ZBen Englandbengland@redhat.com
<ul></ul><p>Also, this ceph command requires the operator to do it, the point of the tracker is that this should be default behavior, does anyone disagree with that? If people agree, where does this get implemented? For example, rook.io seems like the wrong place, because anything that isn't a kubernetes cluster won't benefit and this default has nothing to do with Kubernetes.</p> RADOS - Feature #39339: prioritize backfill of metadata pools, automaticallyhttps://tracker.ceph.com/issues/39339?journal_id=1349072019-04-17T18:39:51ZDavid Zafmandzafman@redhat.com
<ul><li><strong>Related to</strong> <i><a class="issue tracker-6 status-3 priority-4 priority-default closed" href="/issues/39011">Documentation #39011</a>: Document how get_recovery_priority() and get_backfill_priority() impacts recovery order</i> added</li></ul> RADOS - Feature #39339: prioritize backfill of metadata pools, automaticallyhttps://tracker.ceph.com/issues/39339?journal_id=1349092019-04-17T18:50:03ZDavid Zafmandzafman@redhat.com
<ul></ul><p>Recovery is also about restoring objects to the right level of replication. Because the log is known to represent a complete picture of the contents, it is used to identify the objects that need recovery. Backfill is considered another form of recovery. In that case the log isn't enough and we must iterate all objects on all replicas to find the objects to be restored.</p>
<p>In the code PG::get_recovery_priority() and PG::get_backfill_priority() compute the value based on multiple factors. A basic recovery is prioritized over backfill presumably because it can quickly get PGs active+clean the quickest. In the case were objects are below min_size, client I/O is blocked and data is more at risk than simply degraded, the priority even is higher.</p>
<p>It isn't totally clear how all these factors should interact with pools that store metadata. I understand that metadata pools should have priority, but how much? Should they override all other considerations? Should they boost priority the same way the pool recovery priority does? Since the code adds priority for how many missing replicas there are, what priority should be used for a data pool which is down more replicas than a metadata pool?</p> RADOS - Feature #39339: prioritize backfill of metadata pools, automaticallyhttps://tracker.ceph.com/issues/39339?journal_id=1349102019-04-17T18:52:07ZDavid Zafmandzafman@redhat.com
<ul><li><strong>Related to</strong> <i><a class="issue tracker-1 status-3 priority-4 priority-default closed" href="/issues/39099">Bug #39099</a>: Give recovery for inactive PGs a higher priority</i> added</li></ul> RADOS - Feature #39339: prioritize backfill of metadata pools, automaticallyhttps://tracker.ceph.com/issues/39339?journal_id=1349122019-04-17T18:53:13ZDavid Zafmandzafman@redhat.com
<ul><li><strong>Related to</strong> <i><a class="issue tracker-6 status-3 priority-4 priority-default closed" href="/issues/23999">Documentation #23999</a>: osd_recovery_priority is not documented (but osd_recovery_op_priority is)</i> added</li></ul> RADOS - Feature #39339: prioritize backfill of metadata pools, automaticallyhttps://tracker.ceph.com/issues/39339?journal_id=1349152019-04-17T19:49:09ZDavid Zafmandzafman@redhat.com
<ul></ul><p>I forgot that it is possible that backfill/recovery could be moving data around for several reasons. In those cases the lowest priority is appropriate without needing a boost for metadata pools.</p> RADOS - Feature #39339: prioritize backfill of metadata pools, automaticallyhttps://tracker.ceph.com/issues/39339?journal_id=1412492019-07-23T00:15:47ZNeha Ojhanojha@redhat.com
<ul><li><strong>Status</strong> changed from <i>New</i> to <i>Fix Under Review</i></li></ul><p><a class="external" href="https://github.com/ceph/ceph/pull/29180">https://github.com/ceph/ceph/pull/29180</a><br /><a class="external" href="https://github.com/ceph/ceph/pull/29181">https://github.com/ceph/ceph/pull/29181</a></p> RADOS - Feature #39339: prioritize backfill of metadata pools, automaticallyhttps://tracker.ceph.com/issues/39339?journal_id=1414292019-07-23T15:02:24ZNeha Ojhanojha@redhat.com
<ul><li><strong>Assignee</strong> set to <i>Sage Weil</i></li><li><strong>Backport</strong> set to <i>nautilus</i></li></ul> RADOS - Feature #39339: prioritize backfill of metadata pools, automaticallyhttps://tracker.ceph.com/issues/39339?journal_id=1416132019-07-24T22:03:10ZNeha Ojhanojha@redhat.com
<ul><li><strong>Status</strong> changed from <i>Fix Under Review</i> to <i>Pending Backport</i></li></ul><p>One backport for nautilus: <a class="external" href="https://github.com/ceph/ceph/pull/29275">https://github.com/ceph/ceph/pull/29275</a></p> RADOS - Feature #39339: prioritize backfill of metadata pools, automaticallyhttps://tracker.ceph.com/issues/39339?journal_id=1416142019-07-24T22:06:47ZNeha Ojhanojha@redhat.com
<ul><li><strong>Status</strong> changed from <i>Pending Backport</i> to <i>In Progress</i></li></ul><p>Sorry, <a class="external" href="https://github.com/ceph/ceph/pull/29181">https://github.com/ceph/ceph/pull/29181</a> is yet to merge.</p> RADOS - Feature #39339: prioritize backfill of metadata pools, automaticallyhttps://tracker.ceph.com/issues/39339?journal_id=1416712019-07-25T10:32:48ZNathan Cutlerncutler@suse.cz
<ul><li><strong>Backport</strong> deleted (<del><i>nautilus</i></del>)</li></ul><p>since this is only going to be backported to nautilus and since there are two PRs involved, and since one of those PRs already has a backport PR open, I suggest we handle the backporting right here in the master issue. I.e. let's not set the status to Pending Backport because that will cause a backport issue to be opened, which won't add any value in this case but instead just muddy the water.</p> RADOS - Feature #39339: prioritize backfill of metadata pools, automaticallyhttps://tracker.ceph.com/issues/39339?journal_id=1855522021-02-19T22:29:28ZDavid Zafmandzafman@redhat.com
<ul></ul><p>I think this tracker can be marked resolved since pull request 29181 merged.</p>