Bug #50446
openPGs always go into active+clean+scrubbing+deep+repair in the LRC
0%
Description
nojha@reesi001:~$ sudo ceph pg ls repair PG OBJECTS DEGRADED MISPLACED UNFOUND BYTES OMAP_BYTES* OMAP_KEYS* LOG STATE SINCE VERSION REPORTED UP ACTING SCRUB_STAMP DEEP_SCRUB_STAMP 119.ba 52175 0 0 0 124169550191 0 0 2662 active+clean+scrubbing+deep+repair 7m 10581216'1694111 10581216:7779621 [139,150,11,124,39,140]p139 [139,150,11,124,39,140]p139 2021-04-19T17:40:50.162166+0000 2021-04-13T02:28:36.538165+0000 119.2f1 51675 0 0 0 123382978322 0 0 2632 active+clean+scrubbing+deep+repair 14m 10581216'1687023 10581216:7327261 [74,25,75,112,104,99]p74 [74,25,75,112,104,99]p74 2021-04-19T21:07:22.915737+0000 2021-04-13T17:31:33.995392+0000 119.304 52065 0 0 0 124614229368 0 0 2689 active+clean+scrubbing+deep+repair 5m 10581216'1703214 10581216:7525521 [137,21,83,91,7,6]p137 [137,21,83,91,7,6]p137 2021-04-19T20:30:27.242688+0000 2021-04-19T20:30:27.242688+0000
Updated by Neha Ojha almost 3 years ago
- Status changed from New to Triaged
nojha@reesi001:~$ sudo ceph pg 119.ba query|grep repair "state": "active+clean+scrubbing+deep+repair", "state": "active+clean+scrubbing+deep+repair", "num_objects_repaired": 0 "num_objects_repaired": 0 "num_objects_repaired": 0 "num_objects_repaired": 0 "num_objects_repaired": 0 "num_objects_repaired": 0 "must_repair": false, "auto_repair": true, "check_repair": false, nojha@reesi001:~$ sudo ceph pg 119.ba query|grep errors "num_scrub_errors": 0, "num_shallow_scrub_errors": 0, "num_deep_scrub_errors": 0, "num_scrub_errors": 0, "num_shallow_scrub_errors": 0, "num_deep_scrub_errors": 0, "num_scrub_errors": 0, "num_shallow_scrub_errors": 0, "num_deep_scrub_errors": 0, "num_scrub_errors": 0, "num_shallow_scrub_errors": 0, "num_deep_scrub_errors": 0, "num_scrub_errors": 0, "num_shallow_scrub_errors": 0, "num_deep_scrub_errors": 0, "num_scrub_errors": 0, "num_shallow_scrub_errors": 0, "num_deep_scrub_errors": 0, "shallow_errors": 0, "deep_errors": 0, nojha@reesi001:~$ sudo ceph pg 119.ba query|grep auto "need_auto": false, "auto_repair": true,
The important fields to note above are:
1. num_deep_scrub_errors is always 0
2. auto_repair is true
3. need_auto is false
The problem is that though there are no deep scrub errors(which could lead to auto_repair=true) and need_auto is also false, we are incorrectly setting auto_repair to true. This is because when try_to_auto_repair and planned.time_for_deep are true, we set planned.auto_repair = true. Looking at is_time_for_deep(), time_for_deep=true can be due to reasons unrelated to deep scrub errors or need_auto.
The reason that this showed up in the LRC and not in other places, is because osd_scrub_auto_repair(default: false) is set to true there and
try_to_auto_repair relies on it.
Updated by Neha Ojha almost 3 years ago
- Status changed from Triaged to Fix Under Review
- Pull request ID set to 40949
Updated by Neha Ojha almost 3 years ago
- Assignee changed from Neha Ojha to Ronen Friedman
- Pull request ID changed from 40949 to 41258
Updated by Kefu Chai almost 3 years ago
- Status changed from Fix Under Review to Pending Backport
Updated by Backport Bot almost 3 years ago
- Copied to Backport #50900: pacific: PGs always go into active+clean+scrubbing+deep+repair in the LRC added
Updated by Neha Ojha almost 3 years ago
- Backport changed from pacific to pacific, octopus, nautilus
This issue exists in nautilus and octopus as well. We might want to take a less intrusive approach for the backports.
Updated by Backport Bot almost 3 years ago
- Copied to Backport #50910: octopus: PGs always go into active+clean+scrubbing+deep+repair in the LRC added
Updated by Backport Bot almost 3 years ago
- Copied to Backport #50911: nautilus: PGs always go into active+clean+scrubbing+deep+repair in the LRC added