Bug #7576
closed
osd: large skew in pg epochs (dumpling)
Added by Sage Weil about 10 years ago.
Updated over 9 years ago.
Description
Observed a cluster with pgs at very different pg epochs (~17000 and ~24000). This isn't supposed to happen on start because we flush the peering wq.
Maybe it can still happen while teh osd is active, though? At some point we should mark ourselves down (maybe?) if there are pgs that are so far behind.
Hmm, this is partly deliberate — we allow PGs to move forward "at their own pace", so if they aren't getting any activity they can fall behind a bit so as not to preempt actual work. I don't recall what mechanisms exist to keep them from being completely out of date, though.
- Severity changed from 3 - minor to 2 - major
- Assignee set to Sage Weil
How about this: in OSDService, add
Mutex pg_epoch_lock;
Cond pg_epoch_cond;
multiset<epoch_t> pg_epochs;
map<pg_t,epoch_t> pg_epoch;
and
void pg_update_epoch(pg_t pgid, epoch_t epoch);
that updates the pg_epochs map and multiset. And then a
void pg_update_get_lower_bound()
that returns the oldest epoch. And a wait(). Then in hadnle_osd_map, if the LB is more than X epochs behind, we block. Or, mark ourselves down until we can catch up.
That doesn't seem like it's addressing the issue the right way. We've deliberately set it so that PGs which don't get activity won't wake up and process new maps as frequently; the solution needs to adjust waking those PGs up, not simply blocking if nothing's done so. So perhaps if we're processing a map and we have PGs that are 100 epochs behind, we issue them a null event (to bring them up to date); but we don't want to block (or mark ourselves down) until they're much farther behind (eg 1000 epochs).
Honestly I thought we had some mechanism like that already, but maybe not (or maybe it's not functioning properly)?
We looked at this in standup today. There is a queue_null on every PG in OSD::consume_map(), so they should be getting woken up. (ie, I was mistaken about the current state of affairs.) I'm not sure what else is going on around here.
- Status changed from 12 to In Progress
- Status changed from In Progress to Fix Under Review
- Status changed from Fix Under Review to Pending Backport
- Priority changed from Urgent to High
- Status changed from Pending Backport to Resolved
- Status changed from Resolved to Pending Backport
still want to backport this to firefly ...
- Priority changed from High to Normal
- Status changed from Pending Backport to Resolved
Also available in: Atom
PDF