Project

General

Profile

Actions

Bug #62338

closed

osd: choose_async_recovery_ec may select an acting set < min_size

Added by Samuel Just 9 months ago. Updated 3 days ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
EC Pools
Target version:
% Done:

100%

Source:
Community (user)
Tags:
backport_processed
Backport:
pacific,quincy,reef
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

choose_async_recovery_ec may remove OSDs from the acting set as long as PeeringState::recoverable evaluates to true. Prior to 90022b35 (merge of PR 17619), the condition was PeeringState::recoverable_and_ge_min_size which behaved as the name indicates. 7cb818a85 weakened the condition in PeeringState::recoverable_and_ge_min_size to only check min_size if !cct->_conf.get_val<bool>("osd_allow_recovery_below_min_size") (name was changed to PeeringState::recoverable in a subsequent commit in that PR e4c8bee88). PeeringState::recoverable_and_ge_min_size had (and has) two callers: choose_acting and choose_async_recovery_ec. For choose_acting, this change is correct. However, for choose_async_recovery_ec, we don't want to reduce the acting set size below min_size as it would prevent the PG doing IO during recovery.

The main observable symptom will be a PG that ends up in peered state during recovery (peered+recovering, peered+recovery_wait) unable to do IO until recovery completes although there are sufficient pretty much up-to-date osds.


Related issues 3 (1 open2 closed)

Copied to RADOS - Backport #62817: quincy: osd: choose_async_recovery_ec may select an acting set < min_sizeIn ProgressKonstantin ShalyginActions
Copied to RADOS - Backport #62818: pacific: osd: choose_async_recovery_ec may select an acting set < min_sizeResolvedKonstantin ShalyginActions
Copied to RADOS - Backport #62819: reef: osd: choose_async_recovery_ec may select an acting set < min_sizeResolvedKonstantin ShalyginActions
Actions #1

Updated by Samuel Just 9 months ago

  • Description updated (diff)
Actions #2

Updated by Prashant D 9 months ago

Workaround for this issue is to set osd_async_recovery_min_cost to a very large value.

# ceph config set osd osd_async_recovery_min_cost 1099511627776

Notes from Sam : The async recovery cost is the number of pg log entries behind on the replica + the number of missing objects. The osd_target_pg_log_entries_per_osd is 30000, so an OSD with a single PG could have 30000 entries. The osd_async_recovery_min_cost is a 64-bit integer so set it to 2^40 (1<<40) i.e to 1099511627776 value which we cannot hit.
Actions #3

Updated by Radoslaw Zarzynski 9 months ago

  • Status changed from New to Fix Under Review
  • Pull request ID set to 52823
Actions #4

Updated by Neha Ojha 9 months ago

  • Project changed from Ceph to RADOS
Actions #5

Updated by Radoslaw Zarzynski 8 months ago

  • Status changed from Fix Under Review to Pending Backport
  • Backport changed from octopus, pacific, quincy, reef to pacific,quincy,reef
Actions #6

Updated by Backport Bot 8 months ago

  • Copied to Backport #62817: quincy: osd: choose_async_recovery_ec may select an acting set < min_size added
Actions #7

Updated by Backport Bot 8 months ago

  • Copied to Backport #62818: pacific: osd: choose_async_recovery_ec may select an acting set < min_size added
Actions #8

Updated by Backport Bot 8 months ago

  • Copied to Backport #62819: reef: osd: choose_async_recovery_ec may select an acting set < min_size added
Actions #9

Updated by Backport Bot 8 months ago

  • Tags set to backport_processed
Actions #10

Updated by Bartosz Rabiega about 2 months ago

Hello. Just FYI, this fixes a very nasty issue in my EC setup.
Here are some details.

The EC setup and crush rules are defined to have:
3 racks
2 hosts per rack
12 disks per host

EC configuration 7+5
Crush rule picks 1 rack, 2 hosts and 2 disks so 4 chunks of data in each rack.

Now here is the funny thing I end up thanks to this bug.

1. Start some IO
2. Shutdown OSDs from rack A (pg's are active+undersized)
3. Start OSDs from rack A (pg's are active+undersized)
4. Again Shutdown OSDs from rack A (some PG's are down)

So in theory when rack is down 4 chunks are unavailable but 8 are still present - all PGs should remain active.

Now even more weird, continuing the case described above:

4a. Stop IO
5. Disable recovery/rebalance to make sure no chunks are recovered
6. Start OSDs from rack A (again pg's are active+undersized)
7. Again Shutdown OSDs from rack A (some PG's are down again but much less than in step 4, e.g. 50 instead of 300)
8. Start OSDs from rack A (again pg's are active+undersized)
9. Again Shutdown OSDs from rack A (some PG's are down again but much less than in step 7, e.g. 5 instead of 50)
10. Start OSDs from rack A (again pg's are active+undersized)
11. Again Shutdown OSDs from rack A (all pg's are active+undersized)

Further rack restarts won't cause PG down any more, unless there is some IO.

So my guess here is that async recovery kicks in when rack A goes up for the first time, it messes up with the acting set and as a result when rack A goes down again some unfortunate PGs end up in down state.

I retested everything a couple of times with `osd_async_recovery_min_cost 1099511627776` on reef - no more down PGs.

Thank you very very much for the fix.

Actions #11

Updated by Bartosz Rabiega about 2 months ago

Hello again.

Apparently I got a tiny little bit too excited.

I tested the case described above with 16.2.15 and unfortunately the problem still exists.
However if I disable async recovery (osd_async_recovery_min_cost 1099511627776) then cluster works as desired, all PGs states are as expected (active + undersized, never down)

I'd appreciate any tips on how to narrow this bug down.

Actions #12

Updated by Konstantin Shalygin 3 days ago

  • Category set to EC Pools
  • Status changed from Pending Backport to Resolved
  • Target version set to v19.1.0
  • % Done changed from 0 to 100
  • Source set to Community (user)
Actions

Also available in: Atom PDF