Project

General

Profile

Bug #52884

osd: optimize pg peering latency when add new osd that need backfill

Added by jianwei zhang over 2 years ago. Updated almost 2 years ago.

Status:
Fix Under Review
Priority:
Normal
Assignee:
-
Category:
Peering
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
1 - critical
Reviewed:
10/08/2021
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Reproduce:
(1) ceph cluster not running any client IO
(2) only ceph osd in osd.14 operation ( add new osd to cluster)

Reason:
(1) There is no data on the new OSD, it is empty OSD;
(2) The new OSD requires a full copy from primary OSD;
(3) If primary OSD's PGlog entries count < osd_min_pg_log_entries(3000), /// this is the case important point
The primary OSD thinks the new OSD can recovery by PGlog
(4) primary OSD send PGlog to new OSD,
(5) new OSD receive PGlog, it has two ways to handle these PGlog:
/// one way is backfill, directly claim pglog
/// one way is recovery, loop pglog to merge
(6) The recovery way(loop pglog to merge ) average handle latency is 109ms
The backfill way(directly claim pglog) average handle latency is 14ms

(7) So we should use backfill instead of recovery for add new OSD

Signed-off-by: Jianwei Zhang

History

#2 Updated by Neha Ojha over 2 years ago

  • Status changed from New to Fix Under Review

#3 Updated by Loïc Dachary over 2 years ago

  • Target version deleted (v15.2.15)

#4 Updated by jianwei zhang almost 2 years ago

https://github.com/ceph/ceph/pull/46281

add codes for master branch

#5 Updated by jianwei zhang almost 2 years ago

    osd: optimize pg peering latency when add new osd that need backfill

    set last_backfill to MIN when creating pg

    This happens when the newly created(caused by adding or deleting osd)
    pg has its pglog (head, tail) that is empty pglog(0,0),
    but it still continuous with the authoritative log(0, 3000),
    In this case, we use backfill instead of recovery.

    If the authoritative log (6000,9000), that is, tail > 3000,
    then the newly created pg (0,0) will naturally go directly to the backfill path,
    because there is no intersection between the two.

    If the osd is offline for a short time,
    the pg on it will still be continuous with the authoritative log.
    In this case, the recovery path will still be taken, not backfill.

    There are 2 benefits
    1. Backfill has lower latency than recovery in processing pglog during peering
    2. When choose_acting, the backfill osd will not be considered,
       when osd is continuously added/deleted, the number of acting changes can be reduced.

    Fix : https://tracker.ceph.com/issues/52884

    Signed-off-by: Jianwei Zhang <jianwei1216@qq.com>

#6 Updated by Radoslaw Zarzynski almost 2 years ago

  • Pull request ID changed from 43482 to 46281

Also available in: Atom PDF