Project

General

Profile

Bug #22881

scrub interaction with HEAD boundaries and snapmapper repair is broken

Added by Sage Weil about 6 years ago. Updated almost 6 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
David Zafman
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
luminous
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

symptom:

2018-02-01T12:08:44.630 INFO:tasks.ceph.osd.4.smithi135.stderr:2018-02-01 12:08:44.620649 7f14491a2700 -1 osd.4 pg_epoch: 752 pg[3.2( v 752'2572 (630'2174,752'2572] local-lis/les=751/752 n=50 ec=19/19 lis/c 751/751 les/c/f 752/752/0 726/751/572) [4,2,0] r=0 lpr=751 crt=752'2572 lcod 752'2570 mlcod 752'2570 active+clean+scrubbing] _scan_snaps no head for 3:6702c991:::smithi09729659-280:194 (have MIN)
2018-02-01T12:08:44.632 INFO:tasks.ceph.osd.2.smithi097.stderr:2018-02-01 12:08:44.630462 7fd0eed21700 -1 osd.2 pg_epoch: 752 pg[3.2( v 752'2572 (653'2262,752'2572] local-lis/les=751/752 n=50 ec=19/19 lis/c 751/751 les/c/f 752/752/0 726/751/572) [4,2,0] r=1 lpr=751 luod=0'0 crt=752'2572 lcod 752'2570 active] _scan_snaps no head for 3:6702c991:::smithi09729659-280:194 (have MIN)
2018-02-01T12:08:44.633 INFO:tasks.ceph.osd.0.smithi097.stderr:2018-02-01 12:08:44.631970 7fcccfd05700 -1 osd.0 pg_epoch: 752 pg[3.2( v 752'2572 (630'2174,752'2572] local-lis/les=751/752 n=50 ec=19/19 lis/c 751/751 les/c/f 752/752/0 726/751/572) [4,2,0] r=2 lpr=751 luod=0'0 crt=752'2572 lcod 752'2570 active] _scan_snaps no head for 3:6702c991:::smithi09729659-280:194 (have MIN)

leading to
Scrubbing terminated -- not all pgs were active and clean.

scrub will accumulate a cleaned_meta_map that aligns to object boundaries that is used for the PG-type specific scrub step (scrub_snapshot_metadata). this is all good.

however, for each chunk, which does not align to an object boundary (and should not!), we also do _scan_snaps(). This method does the snapmapper repair, and assumes that we have the head object in the same chunk in order to do it. this is not true in general and is basically broken.

the problem unfortunately is that _scan_snaps happens locally on primary and replicas, while the scrub_snapshot_metadata (and accumulation of the cleaned_meta_map) happens only on primary. that means it cannot repair snapmapper on the replica.

we probably need to accumulate cleaned_meta_map on replicas too and similarly gate the _scan_snaps on that...


Related issues

Related to RADOS - Bug #23646: scrub interaction with HEAD boundaries and clones is broken Resolved 04/10/2018
Related to RADOS - Bug #23909: snaps missing in mapper, should be: 188,18f,191,195,197,198,199,19d,19e,1a1,1a3 was r -2...repaired Resolved 04/27/2018
Related to RADOS - Bug #24045: Eviction still raced with scrub due to preemption Resolved 05/07/2018
Copied to RADOS - Backport #24016: luminous: scrub interaction with HEAD boundaries and snapmapper repair is broken Resolved

History

#1 Updated by Sage Weil about 6 years ago

  • Backport set to luminous

#2 Updated by Sage Weil almost 6 years ago

  • Subject changed from scrub interaction with HEAD boundaries is broken to scrub interaction with HEAD boundaries and snapmapper repair is broken

#3 Updated by Sage Weil almost 6 years ago

  • Related to Bug #23646: scrub interaction with HEAD boundaries and clones is broken added

#4 Updated by David Zafman almost 6 years ago

  • Status changed from 12 to In Progress
  • Assignee set to David Zafman

#5 Updated by David Zafman almost 6 years ago

  • Related to Bug #23909: snaps missing in mapper, should be: 188,18f,191,195,197,198,199,19d,19e,1a1,1a3 was r -2...repaired added

#6 Updated by David Zafman almost 6 years ago

  • Status changed from In Progress to Pending Backport

#7 Updated by David Zafman almost 6 years ago

https://github.com/ceph/ceph/pull/21546

Backport the entire pull request which also fixes http://tracker.ceph.com/issues/23909

#8 Updated by Nathan Cutler almost 6 years ago

  • Copied to Backport #24016: luminous: scrub interaction with HEAD boundaries and snapmapper repair is broken added

#9 Updated by David Zafman almost 6 years ago

  • Related to Bug #24045: Eviction still raced with scrub due to preemption added

#10 Updated by David Zafman almost 6 years ago

  • Status changed from Pending Backport to Resolved

Also available in: Atom PDF