Project

General

Profile

Bug #54172

ceph version 16.2.7 PG scrubs not progressing

Added by Daan van Gorkum about 2 years ago. Updated over 1 year ago.

Status:
Resolved
Priority:
High
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
quincy,pacific
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

A week ago I've upgraded a 16.2.4 cluster (3 nodes, 33 osds) to 16.2.7 using cephadm and since then we're experiencing hanging PG scrubs. Because other scrubs are delayed we now have the cluster in WARN with PG_NOT_DEEP_SCRUBBED. The date/time of information gathering is 7 Feb 02:32am UTC.

The 4 PGs that are stuck scrubbing were started at 2022-02-06T05:20 UTC.
Things I've tried so far:
- Restarting the affected OSDs (and tried restarting all OSDs).
- Set/unset the nodeep-scrub and noscrub flags.

This is the information I have, if there's anything missing feel free to let me know:

$ sudo ceph pg dump pgs_brief | grep deep
dumped pgs_brief
4.6b     active+clean+scrubbing+deep  [12,24,23]          12  [12,24,23]              12
4.4e     active+clean+scrubbing+deep  [21,14,22]          21  [21,14,22]              21
4.3d     active+clean+scrubbing+deep   [4,25,26]           4   [4,25,26]               4
4.12b    active+clean+scrubbing+deep  [29,19,13]          29  [29,19,13]              29
$ sudo ceph versions
{
    "mon": {
        "ceph version 16.2.7 (dd0603118f56ab514f133c8d2e3adfc983942503) pacific (stable)": 3
    },
    "mgr": {
        "ceph version 16.2.7 (dd0603118f56ab514f133c8d2e3adfc983942503) pacific (stable)": 3
    },
    "osd": {
        "ceph version 16.2.7 (dd0603118f56ab514f133c8d2e3adfc983942503) pacific (stable)": 33
    },
    "mds": {
        "ceph version 16.2.7 (dd0603118f56ab514f133c8d2e3adfc983942503) pacific (stable)": 3
    },
    "overall": {
        "ceph version 16.2.7 (dd0603118f56ab514f133c8d2e3adfc983942503) pacific (stable)": 42
    }
}

Scrubber information from the PGs:

$ sudo ceph pg 4.12b query | jq .scrubber
{
  "epoch_start": "11332",
  "active": true,
  "start": "4:d4800000::::0",
  "end": "4:d4800000::::0",
  "m_max_end": "MIN",
  "subset_last_update": "0'0",
  "deep": true,
  "must_scrub": false,
  "must_deep_scrub": false,
  "must_repair": false,
  "need_auto": false,
  "req_scrub": false,
  "time_for_deep": true,
  "auto_repair": false,
  "check_repair": false,
  "deep_scrub_on_error": false,
  "scrub_reg_stamp": "2022-01-30T07:37:41.422075+0000",
  "priority": 5,
  "shallow_errors": 0,
  "deep_errors": 0,
  "fixed": 0,
  "waiting_on_whom": []
}

$ sudo ceph pg 4.4e query | jq .scrubber
{
  "epoch_start": "11321",
  "active": true,
  "start": "4:72000000::::0",
  "end": "4:72000000::::0",
  "m_max_end": "MIN",
  "subset_last_update": "0'0",
  "deep": true,
  "must_scrub": false,
  "must_deep_scrub": false,
  "must_repair": false,
  "need_auto": false,
  "req_scrub": false,
  "time_for_deep": true,
  "auto_repair": false,
  "check_repair": false,
  "deep_scrub_on_error": false,
  "scrub_reg_stamp": "2022-01-29T12:47:32.977259+0000",
  "priority": 5,
  "shallow_errors": 0,
  "deep_errors": 0,
  "fixed": 0,
  "waiting_on_whom": []
}

$ sudo ceph pg 4.3d query | jq .scrubber
{
  "epoch_start": "11330",
  "active": true,
  "start": "4:bc000000::::0",
  "end": "4:bc000000::::0",
  "m_max_end": "MIN",
  "subset_last_update": "0'0",
  "deep": true,
  "must_scrub": false,
  "must_deep_scrub": false,
  "must_repair": false,
  "need_auto": false,
  "req_scrub": false,
  "time_for_deep": true,
  "auto_repair": false,
  "check_repair": false,
  "deep_scrub_on_error": false,
  "scrub_reg_stamp": "2022-01-30T07:34:25.060565+0000",
  "priority": 5,
  "shallow_errors": 0,
  "deep_errors": 0,
  "fixed": 0,
  "waiting_on_whom": []
}

$ sudo ceph pg 4.12b query | jq .scrubber
{
  "epoch_start": "11332",
  "active": true,
  "start": "4:d4800000::::0",
  "end": "4:d4800000::::0",
  "m_max_end": "MIN",
  "subset_last_update": "0'0",
  "deep": true,
  "must_scrub": false,
  "must_deep_scrub": false,
  "must_repair": false,
  "need_auto": false,
  "req_scrub": false,
  "time_for_deep": true,
  "auto_repair": false,
  "check_repair": false,
  "deep_scrub_on_error": false,
  "scrub_reg_stamp": "2022-01-30T07:37:41.422075+0000",
  "priority": 5,
  "shallow_errors": 0,
  "deep_errors": 0,
  "fixed": 0,
  "waiting_on_whom": []
}

An active+clean PG looks like this:
$ sudo ceph pg 4.bd query | jq .scrubber
{
  "epoch_start": "0",
  "active": false
}

I tried searching the bug tracker for a simialr problem but I did not find any. If this is already a known problem feel free to close this ticket.

osd.12.xz (23.9 KB) Daan van Gorkum, 02/11/2022 12:16 AM

osd.24.xz (164 KB) Daan van Gorkum, 02/11/2022 12:16 AM

osd.23.xz (165 KB) Daan van Gorkum, 02/11/2022 12:16 AM


Related issues

Copied to RADOS - Backport #56408: quincy: ceph version 16.2.7 PG scrubs not progressing Resolved
Copied to RADOS - Backport #56409: pacific: ceph version 16.2.7 PG scrubs not progressing Resolved

History

#1 Updated by Daan van Gorkum about 2 years ago

Additional information:

I tried disabling all client I/O and after that there's zero I/O on the devices hosting the OSD so nothing seems to happen on the device.

#2 Updated by Neha Ojha about 2 years ago

  • Project changed from Ceph to RADOS

#3 Updated by Neha Ojha about 2 years ago

  • Assignee set to Ronen Friedman

What is osd_scrub_sleep set to?

Ronen, this sounds similar to one of the issues you were looking into, here it is deep-scrubs though. Can we use your script to rule out a slow peer in this case as well?

#4 Updated by Daan van Gorkum about 2 years ago

This is set to 0.0. I think we did this to speed up recovery after we did some CRUSH tuning.

$ sudo ceph config get osd osd_scrub_sleep
0.000000

#5 Updated by Daan van Gorkum about 2 years ago

Correction. osd_recovery_sleep_hdd was set to 0.0 from the original 0.1. osd_scrub_sleep has been untouched.

#6 Updated by Daan van Gorkum about 2 years ago

One more thing to add is that when I set the noscrub, nodeep-scrub flags the pgs actually don't stop scrubbing either (flags have been set for 30+ min at this point):

$ sudo ceph health detail | grep noscrub
HEALTH_WARN noscrub,nodeep-scrub flag(s) set; 43 pgs not deep-scrubbed in time; 15 pgs not scrubbed in time
[WRN] OSDMAP_FLAGS: noscrub,nodeep-scrub flag(s) set
$ sudo ceph pg dump pgs_brief | grep deep
dumped pgs_brief
4.6b     active+clean+scrubbing+deep  [12,24,23]          12  [12,24,23]              12
4.4e     active+clean+scrubbing+deep  [21,14,22]          21  [21,14,22]              21
4.3d     active+clean+scrubbing+deep   [4,25,26]           4   [4,25,26]               4
4.12b    active+clean+scrubbing+deep  [29,19,13]          29  [29,19,13]              29

The amount of PGs no (deep-)scrubbed is increasing rapidly:

$ sudo ceph health detail | grep PG_NOT
[WRN] PG_NOT_DEEP_SCRUBBED: 43 pgs not deep-scrubbed in time
[WRN] PG_NOT_SCRUBBED: 15 pgs not scrubbed in time

#7 Updated by Daan van Gorkum about 2 years ago

Some more information on the non-default config settings (excluding MGR):

$ sudo ceph config dump | grep -v mgr
WHO                 MASK  LEVEL     OPTION                                   VALUE                                                                                      RO
global                    basic     container_image                          quay.io/ceph/ceph@sha256:bb6a71f7f481985f6d3b358e3b9ef64c6755b3db5aa53198e0aac38be5c8ae54  * 
global                    advanced  objecter_inflight_ops                    819200                                                                                       
global                    basic     osd_memory_target                        8589934592                                                                                   
  mon                     advanced  auth_allow_insecure_global_id_reclaim    false                                                                                        
  mon                     advanced  public_network                           10.0.3.0/24                                                                                * 
  osd                     dev       bluestore_cache_size_hdd                 4294967296                                                                                   
  osd                     advanced  bluestore_default_buffered_write         true                                                                                         
  osd                     dev       osd_memory_cache_min                     4294967296                                                                                   
  mds                     basic     mds_cache_memory_limit                   17179869184                                                                                  
    mds.production        basic     mds_join_fs                              production                                                                                   

#8 Updated by Ronen Friedman about 2 years ago

There was (I'll have to check in which Ceph versions) a bug, where setting noscrub or nodeepscrub at the "wrong"
time might cause the scrub to hung forever.
Daan - Can you share the osd logs with me?

#9 Updated by Daan van Gorkum about 2 years ago

Added the logs for OSD 12,23,24 part of pg 4.6b. I don't think the logs are from the beginning when the osd booted and also not with debug enabled. Let me know if you need logs with a certain log level.

#10 Updated by Daan van Gorkum about 2 years ago

I would just like to add that scrubs started all of the sudden and the cluster is HEALTH_OK again.

#11 Updated by André Cruz about 2 years ago

I'm also seeing the same issue on 16.2.7, but it's been going on for almost two weeks. Already set and unset noscrub/nodeep-scrub, and had to restart OSDs to stop some scrubs.

I don't see any disk IO related to these scrubs...

#12 Updated by Wout van Heeswijk almost 2 years ago

We've seen this at a customers cluster as well. A simple repeer of the pg gets it unstuck. We've not investigated any rot cause of it yet.

ceph pg repeer x.x

#13 Updated by Neha Ojha almost 2 years ago

  • Priority changed from Normal to High

#14 Updated by Cory Snyder almost 2 years ago

We've experienced this issue as well, on both 16.2.6 and 16.2.7, and I've identified the cause. Here's the scenario:

- Start with a cluster that has nodeep-scrub and noscrub flags set
- nodeep-scrub flag is unset
- deep scrubs started, progress to ActiveScrubbing,PendingTimer state
- deep scrubs are aborted because [PGScrubber::should_abort](https://github.com/ceph/ceph/blob/v16.2.6/src/osd/pg_scrubber.cc#L128) returns true for deep scrubs if either nodeep-scrub or noscrub are set (and noscrub is still set). Usually happens while checking whether PGScrubResched op is still relevant in [PGScrubber::send_scrub_reshed](https://github.com/ceph/ceph/blob/v16.2.6/src/osd/pg_scrubber.cc#L211). Scrub state is cleared [up the call stack](https://github.com/ceph/ceph/blob/v16.2.6/src/osd/pg_scrubber.cc#L122).
- The PG reschedules the deep scrub again for the same PG on the next tick. It again progresses to the ActiveScrubbing,PendingTimer state. It is aborted again for the same reason. This time, however, the scrub state is not cleared because the epoch has not changed and [this condition](https://github.com/ceph/ceph/blob/v16.2.6/src/osd/pg_scrubber.cc#L121) is false. This time, the abort only has the effect of causing the scrub reschedule op to be [deemed irrelevant](https://github.com/ceph/ceph/blob/v16.2.6/src/osd/pg_scrubber.cc#L211), which prevents the InternalSchedScrub FSM event from being sent. At this point, the scrub machine is stuck in the ActiveScrubbing,PendingTimer state indefinitely (until primary OSD is restarted).

Here are some OSD log snippets (with irrelevant details omitted) that supports the explanation above:

Jun 16 16:41:52 ceph20 scrub_send_scrub_resched queued at: 108117
Jun 16 16:41:52 ceph20 scrubber pg(2.7ebs0) scrubber event -->> send_scrub_resched epoch: 108117
Jun 16 16:41:52 ceph20 scrubber pg(2.7ebs0) noscrub set, aborting
Jun 16 16:41:52 ceph20 scrubber pg(2.7ebs0) verify_against_abort aborting. incoming epoch: 108117 vs last-aborted: 108109
Jun 16 16:41:52 ceph20 scrubber pg(2.7ebs0) scrub_clear_state
Jun 16 16:41:52 ceph20 scrubber pg(2.7ebs0) clear_pgscrub_state
Jun 16 16:41:52 ceph20 scrubber pg(2.7ebs0) clear_scrub_reservations
Jun 16 16:41:52 ceph20 scrubber pg(2.7ebs0) reset_internal_state
Jun 16 16:41:52 ceph20 scrubberFSM pg(2.7ebs0) ActiveScrubbing::react(const FullReset&)
Jun 16 16:41:52 ceph20 scrubber pg(2.7ebs0) unreserve_replicas
Jun 16 16:41:52 ceph20 scrubberFSM pg(2.7ebs0) -- state -->> NotActive
Jun 16 16:41:52 ceph20 scrubber pg(2.7ebs0) scrubber event --<< send_scrub_resched

...

Jun 16 16:41:56 ceph20 scrub_send_scrub_resched queued at: 108117
Jun 16 16:41:56 ceph20 scrubber pg(2.7ebs0) scrubber event -->> send_scrub_resched epoch: 108117
Jun 16 16:41:56 ceph20 scrubber pg(2.7ebs0) noscrub set, aborting
Jun 16 16:41:56 ceph20 scrubber pg(2.7ebs0) verify_against_abort aborting. incoming epoch: 108117 vs last-aborted: 108117
Jun 16 16:41:56 ceph20 scrubber pg(2.7ebs0) scrubber event --<< send_scrub_resched

In short, this happens whenever a deep scrub is started while the noscrub flag is set. The stuck scrub can be cleared by restarting the primary OSD associated with the PG, or may be cleared randomly if the epoch advances and an update_applied_notification triggered by client I/O causes another abort.

I believe that this PR should resolve the issue: https://github.com/ceph/ceph/pull/46727

#15 Updated by Ronen Friedman almost 2 years ago

@CorySnider: thanks. Your suggestion is spot on. The suggested fix
solves one issue. There is another problem related to noscrub, but I will separate the
solutions, and will help push this one forward.

#16 Updated by Neha Ojha almost 2 years ago

  • Backport set to quincy,pacific
  • Pull request ID set to 46727

#17 Updated by Neha Ojha almost 2 years ago

  • Status changed from New to Fix Under Review

#18 Updated by Cory Snyder over 1 year ago

  • Status changed from Fix Under Review to Pending Backport

#19 Updated by Backport Bot over 1 year ago

  • Copied to Backport #56408: quincy: ceph version 16.2.7 PG scrubs not progressing added

#20 Updated by Backport Bot over 1 year ago

  • Copied to Backport #56409: pacific: ceph version 16.2.7 PG scrubs not progressing added

#22 Updated by Backport Bot over 1 year ago

  • Tags set to backport_processed

#23 Updated by Konstantin Shalygin over 1 year ago

  • Status changed from Pending Backport to Resolved
  • Tags deleted (backport_processed)

Also available in: Atom PDF