Project

General

Profile

Actions

Bug #23595

closed

osd: recovery/backfill is extremely slow

Added by Niklas Hambuechen about 6 years ago. Updated about 6 years ago.

Status:
Duplicate
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
OSD
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

I made a Ceph 12.2.4 (luminous stable) cluster of 3 machines with 10-Gigabit networking on Ubuntu 16.04, using pretty much only default settings.

I put CephFS on it, filling it with 6 large files (1 GB each) and 270k empty files (just `touch`ed).

I removed one OSD, wiped its data, and created a new one and added it backto the cluster.

After doing so, I observed that the recovery speed is extremely slow.

In particular, I observe quite precisely 10 objects being recovered per second:

2018-04-08 19:48:41.871692 mon.ceph2 [WRN] Health check update: Degraded data redundancy: 258294/830889 objects degraded (31.086%), 133 pgs degraded, 133 pgs undersized (PG_DEGRADED)
2018-04-08 19:48:46.872108 mon.ceph2 [WRN] Health check update: Degraded data redundancy: 258247/830889 objects degraded (31.081%), 133 pgs degraded, 133 pgs undersized (PG_DEGRADED)
2018-04-08 19:48:51.872489 mon.ceph2 [WRN] Health check update: Degraded data redundancy: 258200/830889 objects degraded (31.075%), 133 pgs degraded, 133 pgs undersized (PG_DEGRADED)

At this speed, my recovery will take years.

What is going on here, why does ceph recover so slowly?


Files

Actions

Also available in: Atom PDF