Bug #50446: PGs always go into active+clean+scrubbing+deep+repair in the LRC - RADOS - Ceph

Actions

Copy link

Bug #50446

open

PGs always go into active+clean+scrubbing+deep+repair in the LRC

Added by Neha Ojha almost 3 years ago. Updated over 1 year ago.

Status:

Pending Backport

Priority:

Normal

Assignee:

Ronen Friedman

Category:

Target version:

% Done:

Source:

Tags:

backport_processed

Backport:

pacific, octopus, nautilus

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Component(RADOS):

Pull request ID:

41258

Crash signature (v1):

Crash signature (v2):

Description

nojha@reesi001:~$ sudo ceph pg ls repair
PG       OBJECTS  DEGRADED  MISPLACED  UNFOUND  BYTES         OMAP_BYTES*  OMAP_KEYS*  LOG   STATE                               SINCE  VERSION           REPORTED          UP                           ACTING                       SCRUB_STAMP                      DEEP_SCRUB_STAMP               
119.ba     52175         0          0        0  124169550191            0           0  2662  active+clean+scrubbing+deep+repair     7m  10581216'1694111  10581216:7779621  [139,150,11,124,39,140]p139  [139,150,11,124,39,140]p139  2021-04-19T17:40:50.162166+0000  2021-04-13T02:28:36.538165+0000
119.2f1    51675         0          0        0  123382978322            0           0  2632  active+clean+scrubbing+deep+repair    14m  10581216'1687023  10581216:7327261     [74,25,75,112,104,99]p74     [74,25,75,112,104,99]p74  2021-04-19T21:07:22.915737+0000  2021-04-13T17:31:33.995392+0000
119.304    52065         0          0        0  124614229368            0           0  2689  active+clean+scrubbing+deep+repair     5m  10581216'1703214  10581216:7525521       [137,21,83,91,7,6]p137       [137,21,83,91,7,6]p137  2021-04-19T20:30:27.242688+0000  2021-04-19T20:30:27.242688+0000

Related issues 3 (0 open — 3 closed)

Actions

Copy link

Updated by Neha Ojha almost 3 years ago

Status changed from New to Triaged

nojha@reesi001:~$ sudo ceph pg 119.ba query|grep repair
    "state": "active+clean+scrubbing+deep+repair",
            "state": "active+clean+scrubbing+deep+repair",
                "num_objects_repaired": 0
                    "num_objects_repaired": 0
                    "num_objects_repaired": 0
                    "num_objects_repaired": 0
                    "num_objects_repaired": 0
                    "num_objects_repaired": 0
        "must_repair": false,
        "auto_repair": true,
        "check_repair": false,

nojha@reesi001:~$ sudo ceph pg 119.ba query|grep errors
                "num_scrub_errors": 0,
                "num_shallow_scrub_errors": 0,
                "num_deep_scrub_errors": 0,
                    "num_scrub_errors": 0,
                    "num_shallow_scrub_errors": 0,
                    "num_deep_scrub_errors": 0,
                    "num_scrub_errors": 0,
                    "num_shallow_scrub_errors": 0,
                    "num_deep_scrub_errors": 0,
                    "num_scrub_errors": 0,
                    "num_shallow_scrub_errors": 0,
                    "num_deep_scrub_errors": 0,
                    "num_scrub_errors": 0,
                    "num_shallow_scrub_errors": 0,
                    "num_deep_scrub_errors": 0,
                    "num_scrub_errors": 0,
                    "num_shallow_scrub_errors": 0,
                    "num_deep_scrub_errors": 0,
        "shallow_errors": 0,
        "deep_errors": 0,

nojha@reesi001:~$ sudo ceph pg 119.ba query|grep auto
        "need_auto": false,
        "auto_repair": true,

The important fields to note above are:

1. num_deep_scrub_errors is always 0
2. auto_repair is true
3. need_auto is false

The problem is that though there are no deep scrub errors(which could lead to auto_repair=true) and need_auto is also false, we are incorrectly setting auto_repair to true. This is because when try_to_auto_repair and planned.time_for_deep are true, we set planned.auto_repair = true. Looking at is_time_for_deep(), time_for_deep=true can be due to reasons unrelated to deep scrub errors or need_auto.

The reason that this showed up in the LRC and not in other places, is because osd_scrub_auto_repair(default: false) is set to true there and
try_to_auto_repair relies on it.

Actions

Copy link

Updated by Neha Ojha almost 3 years ago

Status changed from Triaged to Fix Under Review
Pull request ID set to 40949

Actions

Copy link

Updated by Neha Ojha almost 3 years ago

Backport set to pacific

Actions

Copy link

Updated by Neha Ojha almost 3 years ago

Assignee changed from Neha Ojha to Ronen Friedman
Pull request ID changed from 40949 to 41258

Actions

Copy link

Updated by Kefu Chai almost 3 years ago

Status changed from Fix Under Review to Pending Backport

Actions

Copy link

Updated by Backport Bot almost 3 years ago

Copied to Backport #50900: pacific: PGs always go into active+clean+scrubbing+deep+repair in the LRC added

Actions

Copy link

Updated by Neha Ojha almost 3 years ago

Backport changed from pacific to pacific, octopus, nautilus

This issue exists in nautilus and octopus as well. We might want to take a less intrusive approach for the backports.

Actions

Copy link

Updated by Backport Bot almost 3 years ago

Copied to Backport #50910: octopus: PGs always go into active+clean+scrubbing+deep+repair in the LRC added

Actions

Copy link

Updated by Backport Bot almost 3 years ago

Copied to Backport #50911: nautilus: PGs always go into active+clean+scrubbing+deep+repair in the LRC added

Actions

Copy link

#10

Updated by Backport Bot over 1 year ago

Tags set to backport_processed

Actions

Copy link

Also available in: Atom PDF

Copied to RADOS - Backport #50900: pacific: PGs always go into active+clean+scrubbing+deep+repair in the LRC	Resolved		Actions
Copied to RADOS - Backport #50910: octopus: PGs always go into active+clean+scrubbing+deep+repair in the LRC	Rejected	Ronen Friedman	Actions
Copied to RADOS - Backport #50911: nautilus: PGs always go into active+clean+scrubbing+deep+repair in the LRC	Rejected		Actions

Project

General

Profile

Ceph » RADOS

Custom queries

Bug #50446

PGs always go into active+clean+scrubbing+deep+repair in the LRC

Updated by Neha Ojha almost 3 years ago

Updated by Neha Ojha almost 3 years ago

Updated by Neha Ojha almost 3 years ago

Updated by Neha Ojha almost 3 years ago

Updated by Kefu Chai almost 3 years ago

Updated by Backport Bot almost 3 years ago

Updated by Neha Ojha almost 3 years ago

Updated by Backport Bot almost 3 years ago

Updated by Backport Bot almost 3 years ago

Updated by Backport Bot over 1 year ago