Failure on restart after repairing corrupted PG
1. Start 2 OSDs, 8 PGs, pool size = 2
2. Run write workload for some time.
3. Stop workload
4. rm -rf dev/osd0/current/0.2_head/*
5. ceph osd scrub 0
6. ceph pg repair 0.2
7. Restart OSD.
8. Get "error (17) File exists not handled on operation"
The root cause is that "head" meta file wasn't restored by pg repair. So
all omap_get/setkeys fail for that PG.
On restart load_pgs skips that PG because it can't read metadata, but later when
OSD tries to recreate PG it hit the error from the test case, because
all the data files are in place, restored by repair.
#2 Updated by Evgeniy Firsov about 5 years ago
The problem is that corruption is silent, scrub and repair doesn't report any errors. After a year of run all replicas may get affected, so short, planned downtime may turn into disaster, where no node can start and there is no replicas to recover from.
Leave the fix here for reference: https://github.com/ceph/ceph/pull/7470