Actions
Bug #43580
closedpg: fastinfo incorrect when last_update moves backward in time
% Done:
0%
Source:
Tags:
Backport:
luminous,mimic,nautilus
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
If, during peering, last_update moves backwards, we may rewrite the full info but leave a fastinfo record in place with a newer last_update.
From ML:
In the scenario of EC deployment, suppose we done a peering process for a pg and changed one shard's last_update from lu1(e1'3) to lu2(e1'2) .lu1 was written as fastinfo and lu2 was written as info. After that we restarted this osd and loaded pgs again. when we read pg info from disk, we will find the pg info is lu1 applied to lu2, which becomes incorrect. the true value should be lu2. That may cause the coming peering execute incorrectly and result in unfound objects. I currently considered below two options: 1. delete fastinfo when we need to change info; 2. add extra sequence number to fastinfo and info structure to make it keep them in the right order.
Actions