Project

General

Profile

Actions

Bug #22881

closed

scrub interaction with HEAD boundaries and snapmapper repair is broken

Added by Sage Weil about 6 years ago. Updated almost 6 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
David Zafman
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
luminous
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

symptom:

2018-02-01T12:08:44.630 INFO:tasks.ceph.osd.4.smithi135.stderr:2018-02-01 12:08:44.620649 7f14491a2700 -1 osd.4 pg_epoch: 752 pg[3.2( v 752'2572 (630'2174,752'2572] local-lis/les=751/752 n=50 ec=19/19 lis/c 751/751 les/c/f 752/752/0 726/751/572) [4,2,0] r=0 lpr=751 crt=752'2572 lcod 752'2570 mlcod 752'2570 active+clean+scrubbing] _scan_snaps no head for 3:6702c991:::smithi09729659-280:194 (have MIN)
2018-02-01T12:08:44.632 INFO:tasks.ceph.osd.2.smithi097.stderr:2018-02-01 12:08:44.630462 7fd0eed21700 -1 osd.2 pg_epoch: 752 pg[3.2( v 752'2572 (653'2262,752'2572] local-lis/les=751/752 n=50 ec=19/19 lis/c 751/751 les/c/f 752/752/0 726/751/572) [4,2,0] r=1 lpr=751 luod=0'0 crt=752'2572 lcod 752'2570 active] _scan_snaps no head for 3:6702c991:::smithi09729659-280:194 (have MIN)
2018-02-01T12:08:44.633 INFO:tasks.ceph.osd.0.smithi097.stderr:2018-02-01 12:08:44.631970 7fcccfd05700 -1 osd.0 pg_epoch: 752 pg[3.2( v 752'2572 (630'2174,752'2572] local-lis/les=751/752 n=50 ec=19/19 lis/c 751/751 les/c/f 752/752/0 726/751/572) [4,2,0] r=2 lpr=751 luod=0'0 crt=752'2572 lcod 752'2570 active] _scan_snaps no head for 3:6702c991:::smithi09729659-280:194 (have MIN)

leading to
Scrubbing terminated -- not all pgs were active and clean.

scrub will accumulate a cleaned_meta_map that aligns to object boundaries that is used for the PG-type specific scrub step (scrub_snapshot_metadata). this is all good.

however, for each chunk, which does not align to an object boundary (and should not!), we also do _scan_snaps(). This method does the snapmapper repair, and assumes that we have the head object in the same chunk in order to do it. this is not true in general and is basically broken.

the problem unfortunately is that _scan_snaps happens locally on primary and replicas, while the scrub_snapshot_metadata (and accumulation of the cleaned_meta_map) happens only on primary. that means it cannot repair snapmapper on the replica.

we probably need to accumulate cleaned_meta_map on replicas too and similarly gate the _scan_snaps on that...


Related issues 4 (0 open4 closed)

Related to RADOS - Bug #23646: scrub interaction with HEAD boundaries and clones is brokenResolved04/10/2018

Actions
Related to RADOS - Bug #23909: snaps missing in mapper, should be: 188,18f,191,195,197,198,199,19d,19e,1a1,1a3 was r -2...repairedResolvedDavid Zafman04/27/2018

Actions
Related to RADOS - Bug #24045: Eviction still raced with scrub due to preemptionResolvedDavid Zafman05/07/2018

Actions
Copied to RADOS - Backport #24016: luminous: scrub interaction with HEAD boundaries and snapmapper repair is brokenResolvedDavid ZafmanActions
Actions #1

Updated by Sage Weil about 6 years ago

  • Backport set to luminous
Actions #2

Updated by Sage Weil about 6 years ago

  • Subject changed from scrub interaction with HEAD boundaries is broken to scrub interaction with HEAD boundaries and snapmapper repair is broken
Actions #3

Updated by Sage Weil about 6 years ago

  • Related to Bug #23646: scrub interaction with HEAD boundaries and clones is broken added
Actions #4

Updated by David Zafman about 6 years ago

  • Status changed from 12 to In Progress
  • Assignee set to David Zafman
Actions #5

Updated by David Zafman almost 6 years ago

  • Related to Bug #23909: snaps missing in mapper, should be: 188,18f,191,195,197,198,199,19d,19e,1a1,1a3 was r -2...repaired added
Actions #6

Updated by David Zafman almost 6 years ago

  • Status changed from In Progress to Pending Backport
Actions #7

Updated by David Zafman almost 6 years ago

https://github.com/ceph/ceph/pull/21546

Backport the entire pull request which also fixes http://tracker.ceph.com/issues/23909

Actions #8

Updated by Nathan Cutler almost 6 years ago

  • Copied to Backport #24016: luminous: scrub interaction with HEAD boundaries and snapmapper repair is broken added
Actions #9

Updated by David Zafman almost 6 years ago

  • Related to Bug #24045: Eviction still raced with scrub due to preemption added
Actions #10

Updated by David Zafman almost 6 years ago

  • Status changed from Pending Backport to Resolved
Actions

Also available in: Atom PDF