Project

General

Profile

Actions

Bug #22881

closed

scrub interaction with HEAD boundaries and snapmapper repair is broken

Added by Sage Weil over 6 years ago. Updated almost 6 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
David Zafman
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
luminous
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

symptom:

2018-02-01T12:08:44.630 INFO:tasks.ceph.osd.4.smithi135.stderr:2018-02-01 12:08:44.620649 7f14491a2700 -1 osd.4 pg_epoch: 752 pg[3.2( v 752'2572 (630'2174,752'2572] local-lis/les=751/752 n=50 ec=19/19 lis/c 751/751 les/c/f 752/752/0 726/751/572) [4,2,0] r=0 lpr=751 crt=752'2572 lcod 752'2570 mlcod 752'2570 active+clean+scrubbing] _scan_snaps no head for 3:6702c991:::smithi09729659-280:194 (have MIN)
2018-02-01T12:08:44.632 INFO:tasks.ceph.osd.2.smithi097.stderr:2018-02-01 12:08:44.630462 7fd0eed21700 -1 osd.2 pg_epoch: 752 pg[3.2( v 752'2572 (653'2262,752'2572] local-lis/les=751/752 n=50 ec=19/19 lis/c 751/751 les/c/f 752/752/0 726/751/572) [4,2,0] r=1 lpr=751 luod=0'0 crt=752'2572 lcod 752'2570 active] _scan_snaps no head for 3:6702c991:::smithi09729659-280:194 (have MIN)
2018-02-01T12:08:44.633 INFO:tasks.ceph.osd.0.smithi097.stderr:2018-02-01 12:08:44.631970 7fcccfd05700 -1 osd.0 pg_epoch: 752 pg[3.2( v 752'2572 (630'2174,752'2572] local-lis/les=751/752 n=50 ec=19/19 lis/c 751/751 les/c/f 752/752/0 726/751/572) [4,2,0] r=2 lpr=751 luod=0'0 crt=752'2572 lcod 752'2570 active] _scan_snaps no head for 3:6702c991:::smithi09729659-280:194 (have MIN)

leading to
Scrubbing terminated -- not all pgs were active and clean.

scrub will accumulate a cleaned_meta_map that aligns to object boundaries that is used for the PG-type specific scrub step (scrub_snapshot_metadata). this is all good.

however, for each chunk, which does not align to an object boundary (and should not!), we also do _scan_snaps(). This method does the snapmapper repair, and assumes that we have the head object in the same chunk in order to do it. this is not true in general and is basically broken.

the problem unfortunately is that _scan_snaps happens locally on primary and replicas, while the scrub_snapshot_metadata (and accumulation of the cleaned_meta_map) happens only on primary. that means it cannot repair snapmapper on the replica.

we probably need to accumulate cleaned_meta_map on replicas too and similarly gate the _scan_snaps on that...


Related issues 4 (0 open4 closed)

Related to RADOS - Bug #23646: scrub interaction with HEAD boundaries and clones is brokenResolved04/10/2018

Actions
Related to RADOS - Bug #23909: snaps missing in mapper, should be: 188,18f,191,195,197,198,199,19d,19e,1a1,1a3 was r -2...repairedResolvedDavid Zafman04/27/2018

Actions
Related to RADOS - Bug #24045: Eviction still raced with scrub due to preemptionResolvedDavid Zafman05/07/2018

Actions
Copied to RADOS - Backport #24016: luminous: scrub interaction with HEAD boundaries and snapmapper repair is brokenResolvedDavid ZafmanActions
Actions

Also available in: Atom PDF