Support #64378
openSlow / Single backfilling on Reef (18.2.1-pve2)
0%
Description
Hi,
Despite the :
ceph tell 'osd.*' injectargs '--osd-max-backfills 16'
ceph tell 'osd.*' injectargs '--osd-recovery-max-active 4'
I reweighted several OSDs so I expect several backfills on at least 2 OSDs.
I'm still stuck with maximum one backfilling at a time.
Why ?
Setup is 3 nodes with 3 fast SSDs (WD_Black SN850X) with full mesh 1GB full duplex network (with frr), and recovery is typically between 5 and 20MB/s probably because it's limited to a single PG backfill at a time.
How can I parallelize backfilling on Reef ?
root@pve1:~# ceph status
cluster:
id: e7628d51-32b5-4f5c-8eec-1cafb41ead74
health: HEALTH_WARN
Degraded data redundancy: 4510616/37577132 objects degraded (12.004%), 39 pgs degraded, 42 pgs undersized
101 pgs not deep-scrubbed in time
77 pgs not scrubbed in time
services:
mon: 3 daemons, quorum pve3,pve2,pve1 (age 16h)
mgr: pve1(active, since 19h), standbys: pve3, pve2
mds: 1/1 daemons up, 2 standby
osd: 5 osds: 4 up (since 3h), 3 in (since 3h); 64 remapped pgs
data:
volumes: 1/1 healthy
pools: 11 pools, 179 pgs
objects: 12.55M objects, 1.2 TiB
usage: 3.1 TiB used, 4.2 TiB / 7.3 TiB avail
pgs: 4510616/37577132 objects degraded (12.004%)
6561953/37577132 objects misplaced (17.463%)
115 active+clean
38 active+undersized+degraded+remapped+backfill_wait
21 active+remapped+backfill_wait
3 active+undersized+remapped+backfill_wait
1 active+clean+remapped
1 active+undersized+degraded+remapped+backfilling
io:
client: 6.6 KiB/s rd, 2.6 MiB/s wr, 10 op/s rd, 241 op/s wr
recovery: 14 MiB/s, 3 objects/s
Updated by Pivert Dubuisson 3 months ago
The problem seems to be that even with commands like
ceph tell 'osd.*' injectargs --osd-max-backfills=3 --osd-recovery-max-active=9
The osd_max_backfills remains = 1...
root@pve1:~# ceph-conf --show-config | egrep "osd_recovery_max_active|osd_recovery_op_priority|osd_max_backfills"
osd_max_backfills = 1
osd_recovery_max_active = 0
osd_recovery_max_active_hdd = 3
osd_recovery_max_active_ssd = 10
osd_recovery_op_priority = 3
Updated by Pivert Dubuisson 3 months ago
Replying to myself:
It seems the injectargs --osd-max-backfills do not work on Reef.
The workaround I used:- Set the `osd_max_backfills = 4` in the global section of the ceph.conf
- Restart each OSD
Is there a better solution ?
Updated by Pivert Dubuisson 3 months ago
With osd_max_backfills = 9, numbers are very different:
root@pve1:~# ceph status
cluster:
id: e7628d51-32b5-4f5c-8eec-1cafb41ead74
health: HEALTH_WARN
Degraded data redundancy: 3893818/37590920 objects degraded (10.358%), 31 pgs degraded, 31 pgs undersized
100 pgs not deep-scrubbed in time
77 pgs not scrubbed in time
services:
mon: 3 daemons, quorum pve3,pve2,pve1 (age 17h)
mgr: pve1(active, since 20h), standbys: pve3, pve2
mds: 1/1 daemons up, 2 standby
osd: 5 osds: 5 up (since 82s), 3 in (since 4h); 55 remapped pgs
data:
volumes: 1/1 healthy
pools: 11 pools, 179 pgs
objects: 12.56M objects, 1.2 TiB
usage: 3.3 TiB used, 4.0 TiB / 7.3 TiB avail
pgs: 3893818/37590920 objects degraded (10.358%)
6376952/37590920 objects misplaced (16.964%)
124 active+clean
22 active+undersized+degraded+remapped+backfill_wait
19 active+remapped+backfill_wait
9 active+undersized+degraded+remapped+backfilling
4 active+remapped+backfilling
1 active+clean+remapped
io:
client: 12 KiB/s rd, 3.6 MiB/s wr, 3 op/s rd, 401 op/s wr
recovery: 279 MiB/s, 175 objects/s
Updated by Niklas Hambuechen 24 days ago
I observe the same problem on 18.2.1:
pgs: 136112881/1202666364 objects misplaced (11.318%) 240 active+clean 75 active+remapped+backfill_wait 24 active+clean+scrubbing+deep 21 active+clean+scrubbing 1 active+remapped+backfilling
Only 1 PG backfilling and 75 backfill_wait, no matter what configs I set.
And `ceph config set osd osd_max_backfills` does not work:
# ceph config set osd osd_max_backfills 6 && ceph config get osd osd_max_backfills 1
Updated by Niklas Hambuechen 23 days ago
Aha, there's a new feature in Ceph that auto-resets these values:
https://docs.ceph.com/en/quincy/rados/configuration/mclock-config-ref/#recovery-backfill-options
The following recovery and backfill related Ceph options are overridden to mClock defaults: osd_max_backfills osd_recovery_max_active osd_recovery_max_active_hdd osd_recovery_max_active_ssd
@Pivert Dubuisson Dubuisson: Check if the
ceph config set osd osd_mclock_override_recovery_settings true
mention in the linked https://docs.ceph.com/en/quincy/rados/configuration/mclock-config-ref/#steps-to-modify-mclock-max-backfills-recovery-limits helps you.
It does work for me:
# ceph config set osd osd_mclock_override_recovery_settings true # ceph config set osd osd_max_backfills 6 && ceph config get osd osd_max_backfills 6
This speeds up my recovery drastically for one cluster. But not for the cluster mentioned in my post above, which still has
1 active+remapped+backfilling
That problematic cluster is the one mentioned in https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/YOBS2MZS6D5CYABLZD4GMLXV6Z6RIN4W/ where I'm trying to move inode 130M backtrace information RADOS objects from replicated-HDD to replicated-SSD.