Bug #63881
openInaccurate pg splits/merges and pool deletion/creation on OSD mapgap
0%
Updated by Matan Breizman 5 months ago
- Subject changed from Inaccurate split handling to Inaccurate pg splits/merges on OSD mapgap
- Assignee set to Matan Breizman
- Backport set to quincy,reef
When handling MOSDMap with OSDMaps of epochs [first, last] we add each one to `added_maps`.
Later on, we use `added_maps` to calculate the "diff" in terms of pool deletion/creation and pg num changes.
To find the "diff" we compare between each OSDMap to its previous map (See assertion below).
This tracking (of pg_num_history) will be mainly used by OSDService::identify_splits_and_merges.
void OSD::handle_osd_map(MOSDMap *m) {
..
// check for pg_num changes and deleted pools
OSDMapRef lastmap;
for (auto& i : added_maps) {
if (!lastmap) {
if (!(lastmap = service.try_get_map(i.first - 1))) {
dout(10) << __func__ << " can't get previous map " << i.first - 1
<< " probably first start of this osd" << dendl;
continue;
}
}
ceph_assert(lastmap->get_epoch() + 1 == i.second->get_epoch());
}
If we can't find the previous map of the first "added_map" - we assume this is the first start of the OSD and we can skip
this iteration.
As a result, we won't check the "diff" between the last epoch of which the OSD was up before the gap (n) to the first added map (m) epoch.
The OSD's information of loaded_pgs on boot may remain stale and cause issues.
int OSD::init()
{
..
osdmap = get_map(superblock.current_epoch);
set_osdmap(osdmap);
..
// initialize osdmap references in sharded wq
for (auto& shard : shards) {
std::lock_guard l(shard->osdmap_lock);
shard->shard_osdmap = osdmap;
}
// load up pgs (as they previously existed)
load_pgs();
..
}
The "previously existed" pgs will be loaded and the `shard_osdmap` will be updated to superblock.current_epoch (n).
When identify_splits_and_merges will be called we will compare between the `shard_osdmap` to m without
maintaining any corresponding changes (deletion/creation of pools and pg splits and merges) throughout maps epochs of [ n + 1, m + 1].
Updated by Matan Breizman 5 months ago
- Subject changed from Inaccurate pg splits/merges on OSD mapgap to Inaccurate pg splits/merges and pool deletion/creation on OSD mapgap
Updated by Matan Breizman about 1 month ago
- Status changed from New to Fix Under Review
Updated by Matan Breizman about 1 month ago
- Related to Bug #57628: osd:PeeringState.cc: FAILED ceph_assert(info.history.same_interval_since != 0) added