Project

General

Profile

Actions

Support #64378

open

Slow / Single backfilling on Reef (18.2.1-pve2)

Added by Pivert Dubuisson 3 months ago. Updated 23 days ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
OSD
Target version:
-
% Done:

0%

Tags:
Reviewed:
Affected Versions:
Pull request ID:

Description

Hi,

Despite the :

ceph tell 'osd.*' injectargs '--osd-max-backfills 16'
ceph tell 'osd.*' injectargs '--osd-recovery-max-active 4'

I reweighted several OSDs so I expect several backfills on at least 2 OSDs.
I'm still stuck with maximum one backfilling at a time.
Why ?

Setup is 3 nodes with 3 fast SSDs (WD_Black SN850X) with full mesh 1GB full duplex network (with frr), and recovery is typically between 5 and 20MB/s probably because it's limited to a single PG backfill at a time.

How can I parallelize backfilling on Reef ?

root@pve1:~# ceph status
  cluster:
    id:     e7628d51-32b5-4f5c-8eec-1cafb41ead74
    health: HEALTH_WARN
            Degraded data redundancy: 4510616/37577132 objects degraded (12.004%), 39 pgs degraded, 42 pgs undersized
            101 pgs not deep-scrubbed in time
            77 pgs not scrubbed in time

  services:
    mon: 3 daemons, quorum pve3,pve2,pve1 (age 16h)
    mgr: pve1(active, since 19h), standbys: pve3, pve2
    mds: 1/1 daemons up, 2 standby
    osd: 5 osds: 4 up (since 3h), 3 in (since 3h); 64 remapped pgs

  data:
    volumes: 1/1 healthy
    pools:   11 pools, 179 pgs
    objects: 12.55M objects, 1.2 TiB
    usage:   3.1 TiB used, 4.2 TiB / 7.3 TiB avail
    pgs:     4510616/37577132 objects degraded (12.004%)
             6561953/37577132 objects misplaced (17.463%)
             115 active+clean
             38  active+undersized+degraded+remapped+backfill_wait
             21  active+remapped+backfill_wait
             3   active+undersized+remapped+backfill_wait
             1   active+clean+remapped
             1   active+undersized+degraded+remapped+backfilling

  io:
    client:   6.6 KiB/s rd, 2.6 MiB/s wr, 10 op/s rd, 241 op/s wr
    recovery: 14 MiB/s, 3 objects/s

Actions #1

Updated by Pivert Dubuisson 3 months ago

The problem seems to be that even with commands like
ceph tell 'osd.*' injectargs --osd-max-backfills=3 --osd-recovery-max-active=9

The osd_max_backfills remains = 1...

root@pve1:~# ceph-conf --show-config | egrep "osd_recovery_max_active|osd_recovery_op_priority|osd_max_backfills" 
osd_max_backfills = 1
osd_recovery_max_active = 0
osd_recovery_max_active_hdd = 3
osd_recovery_max_active_ssd = 10
osd_recovery_op_priority = 3
Actions #2

Updated by Pivert Dubuisson 3 months ago

Replying to myself:

It seems the injectargs --osd-max-backfills do not work on Reef.

The workaround I used:
  • Set the `osd_max_backfills = 4` in the global section of the ceph.conf
  • Restart each OSD

Is there a better solution ?

Actions #3

Updated by Pivert Dubuisson 3 months ago

With osd_max_backfills = 9, numbers are very different:

root@pve1:~# ceph status
  cluster:
    id:     e7628d51-32b5-4f5c-8eec-1cafb41ead74
    health: HEALTH_WARN
            Degraded data redundancy: 3893818/37590920 objects degraded (10.358%), 31 pgs degraded, 31 pgs undersized
            100 pgs not deep-scrubbed in time
            77 pgs not scrubbed in time

  services:
    mon: 3 daemons, quorum pve3,pve2,pve1 (age 17h)
    mgr: pve1(active, since 20h), standbys: pve3, pve2
    mds: 1/1 daemons up, 2 standby
    osd: 5 osds: 5 up (since 82s), 3 in (since 4h); 55 remapped pgs

  data:
    volumes: 1/1 healthy
    pools:   11 pools, 179 pgs
    objects: 12.56M objects, 1.2 TiB
    usage:   3.3 TiB used, 4.0 TiB / 7.3 TiB avail
    pgs:     3893818/37590920 objects degraded (10.358%)
             6376952/37590920 objects misplaced (16.964%)
             124 active+clean
             22  active+undersized+degraded+remapped+backfill_wait
             19  active+remapped+backfill_wait
             9   active+undersized+degraded+remapped+backfilling
             4   active+remapped+backfilling
             1   active+clean+remapped

  io:
    client:   12 KiB/s rd, 3.6 MiB/s wr, 3 op/s rd, 401 op/s wr
    recovery: 279 MiB/s, 175 objects/s
Actions #4

Updated by Niklas Hambuechen 24 days ago

I observe the same problem on 18.2.1:

    pgs:     136112881/1202666364 objects misplaced (11.318%)
             240 active+clean
             75  active+remapped+backfill_wait
             24  active+clean+scrubbing+deep
             21  active+clean+scrubbing
             1   active+remapped+backfilling

Only 1 PG backfilling and 75 backfill_wait, no matter what configs I set.

And `ceph config set osd osd_max_backfills` does not work:

# ceph config set osd osd_max_backfills 6 && ceph config get osd osd_max_backfills 
1
Actions #5

Updated by Niklas Hambuechen 23 days ago

Aha, there's a new feature in Ceph that auto-resets these values:

https://docs.ceph.com/en/quincy/rados/configuration/mclock-config-ref/#recovery-backfill-options

The following recovery and backfill related Ceph options are overridden to mClock defaults:

    osd_max_backfills
    osd_recovery_max_active
    osd_recovery_max_active_hdd
    osd_recovery_max_active_ssd

@Pivert Dubuisson Dubuisson: Check if the

ceph config set osd osd_mclock_override_recovery_settings true

mention in the linked https://docs.ceph.com/en/quincy/rados/configuration/mclock-config-ref/#steps-to-modify-mclock-max-backfills-recovery-limits helps you.

It does work for me:

# ceph config set osd osd_mclock_override_recovery_settings true
# ceph config set osd osd_max_backfills 6 && ceph config get osd osd_max_backfills
6

This speeds up my recovery drastically for one cluster. But not for the cluster mentioned in my post above, which still has

             1   active+remapped+backfilling

That problematic cluster is the one mentioned in https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/YOBS2MZS6D5CYABLZD4GMLXV6Z6RIN4W/ where I'm trying to move inode 130M backtrace information RADOS objects from replicated-HDD to replicated-SSD.

Actions

Also available in: Atom PDF