Actions
Bug #43311
closedasynchronous recovery + backfill might spin pg undersized for a long time
Status:
Resolved
Priority:
Normal
Assignee:
Category:
Backfill/Recovery
Target version:
% Done:
0%
Source:
Community (dev)
Tags:
Backport:
mimic,nautilus
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
When an osd that is part of current up set gets chosen as an
async_recovery_target, it gets removed from the acting set.
Since we don't allow any want that is larger than the pool size,
a pg must transit into UNDERSIZED when asynchronous recovery
eventually happens.
However, if that pg has one or more backfill targets, it might
spin UNDERSIZED for a long time during which mon will keep issuing
"PG_AVAILABILITY" warns until all backfill targets finally completes.
Below is an example:
[root@host-192-168-9-13 ~]# ceph pg 1.1fc query
{
"state": "active+undersized+remapped+backfilling",
"snap_trimq": "[]",
"snap_trimq_len": 0,
"epoch": 16777,
"up": [
29,
15,
0
],
"acting": [
29,
54
],
"backfill_targets": [
"15"
],
"async_recovery_targets": [
"0"
],
"acting_recovery_backfill": [
"0",
"15",
"29",
"54"
],
Actions