Project

General

Profile

Actions

Bug #14521

closed

Failure on restart after repairing corrupted PG

Added by Evgeniy Firsov over 8 years ago. Updated about 8 years ago.

Status:
Won't Fix
Priority:
Normal
Category:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
repair, meta, corruption, file exists
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Test case:
1. Start 2 OSDs, 8 PGs, pool size = 2
2. Run write workload for some time.
3. Stop workload
4. rm -rf dev/osd0/current/0.2_head/*
5. ceph osd scrub 0
6. ceph pg repair 0.2
7. Restart OSD.
8. Get "error (17) File exists not handled on operation"

The root cause is that "head" meta file wasn't restored by pg repair. So
all omap_get/setkeys fail for that PG.

On restart load_pgs skips that PG because it can't read metadata, but later when
OSD tries to recreate PG it hit the error from the test case, because
all the data files are in place, restored by repair.

Actions

Also available in: Atom PDF