Project

General

Profile

Actions

Bug #50446

open

PGs always go into active+clean+scrubbing+deep+repair in the LRC

Added by Neha Ojha almost 3 years ago. Updated over 1 year ago.

Status:
Pending Backport
Priority:
Normal
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
backport_processed
Backport:
pacific, octopus, nautilus
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

nojha@reesi001:~$ sudo ceph pg ls repair
PG       OBJECTS  DEGRADED  MISPLACED  UNFOUND  BYTES         OMAP_BYTES*  OMAP_KEYS*  LOG   STATE                               SINCE  VERSION           REPORTED          UP                           ACTING                       SCRUB_STAMP                      DEEP_SCRUB_STAMP               
119.ba     52175         0          0        0  124169550191            0           0  2662  active+clean+scrubbing+deep+repair     7m  10581216'1694111  10581216:7779621  [139,150,11,124,39,140]p139  [139,150,11,124,39,140]p139  2021-04-19T17:40:50.162166+0000  2021-04-13T02:28:36.538165+0000
119.2f1    51675         0          0        0  123382978322            0           0  2632  active+clean+scrubbing+deep+repair    14m  10581216'1687023  10581216:7327261     [74,25,75,112,104,99]p74     [74,25,75,112,104,99]p74  2021-04-19T21:07:22.915737+0000  2021-04-13T17:31:33.995392+0000
119.304    52065         0          0        0  124614229368            0           0  2689  active+clean+scrubbing+deep+repair     5m  10581216'1703214  10581216:7525521       [137,21,83,91,7,6]p137       [137,21,83,91,7,6]p137  2021-04-19T20:30:27.242688+0000  2021-04-19T20:30:27.242688+0000

Related issues 3 (0 open3 closed)

Copied to RADOS - Backport #50900: pacific: PGs always go into active+clean+scrubbing+deep+repair in the LRCResolvedActions
Copied to RADOS - Backport #50910: octopus: PGs always go into active+clean+scrubbing+deep+repair in the LRCRejectedRonen FriedmanActions
Copied to RADOS - Backport #50911: nautilus: PGs always go into active+clean+scrubbing+deep+repair in the LRCRejectedActions
Actions #1

Updated by Neha Ojha almost 3 years ago

  • Status changed from New to Triaged
nojha@reesi001:~$ sudo ceph pg 119.ba query|grep repair
    "state": "active+clean+scrubbing+deep+repair",
            "state": "active+clean+scrubbing+deep+repair",
                "num_objects_repaired": 0
                    "num_objects_repaired": 0
                    "num_objects_repaired": 0
                    "num_objects_repaired": 0
                    "num_objects_repaired": 0
                    "num_objects_repaired": 0
        "must_repair": false,
        "auto_repair": true,
        "check_repair": false,

nojha@reesi001:~$ sudo ceph pg 119.ba query|grep errors
                "num_scrub_errors": 0,
                "num_shallow_scrub_errors": 0,
                "num_deep_scrub_errors": 0,
                    "num_scrub_errors": 0,
                    "num_shallow_scrub_errors": 0,
                    "num_deep_scrub_errors": 0,
                    "num_scrub_errors": 0,
                    "num_shallow_scrub_errors": 0,
                    "num_deep_scrub_errors": 0,
                    "num_scrub_errors": 0,
                    "num_shallow_scrub_errors": 0,
                    "num_deep_scrub_errors": 0,
                    "num_scrub_errors": 0,
                    "num_shallow_scrub_errors": 0,
                    "num_deep_scrub_errors": 0,
                    "num_scrub_errors": 0,
                    "num_shallow_scrub_errors": 0,
                    "num_deep_scrub_errors": 0,
        "shallow_errors": 0,
        "deep_errors": 0,

nojha@reesi001:~$ sudo ceph pg 119.ba query|grep auto
        "need_auto": false,
        "auto_repair": true,

The important fields to note above are:

1. num_deep_scrub_errors is always 0
2. auto_repair is true
3. need_auto is false

The problem is that though there are no deep scrub errors(which could lead to auto_repair=true) and need_auto is also false, we are incorrectly setting auto_repair to true. This is because when try_to_auto_repair and planned.time_for_deep are true, we set planned.auto_repair = true. Looking at is_time_for_deep(), time_for_deep=true can be due to reasons unrelated to deep scrub errors or need_auto.

The reason that this showed up in the LRC and not in other places, is because osd_scrub_auto_repair(default: false) is set to true there and
try_to_auto_repair relies on it.

Actions #2

Updated by Neha Ojha almost 3 years ago

  • Status changed from Triaged to Fix Under Review
  • Pull request ID set to 40949
Actions #3

Updated by Neha Ojha almost 3 years ago

  • Backport set to pacific
Actions #4

Updated by Neha Ojha almost 3 years ago

  • Assignee changed from Neha Ojha to Ronen Friedman
  • Pull request ID changed from 40949 to 41258
Actions #5

Updated by Kefu Chai almost 3 years ago

  • Status changed from Fix Under Review to Pending Backport
Actions #6

Updated by Backport Bot almost 3 years ago

  • Copied to Backport #50900: pacific: PGs always go into active+clean+scrubbing+deep+repair in the LRC added
Actions #7

Updated by Neha Ojha almost 3 years ago

  • Backport changed from pacific to pacific, octopus, nautilus

This issue exists in nautilus and octopus as well. We might want to take a less intrusive approach for the backports.

Actions #8

Updated by Backport Bot almost 3 years ago

  • Copied to Backport #50910: octopus: PGs always go into active+clean+scrubbing+deep+repair in the LRC added
Actions #9

Updated by Backport Bot almost 3 years ago

  • Copied to Backport #50911: nautilus: PGs always go into active+clean+scrubbing+deep+repair in the LRC added
Actions #10

Updated by Backport Bot over 1 year ago

  • Tags set to backport_processed
Actions

Also available in: Atom PDF