Project

General

Profile

Bug #51581

scrub/osd-scrub-repair.sh: TEST_auto_repair_bluestore_tag failed

Added by Sridhar Seshasayee 3 months ago. Updated 2 months ago.

Status:
Resolved
Priority:
Normal
Category:
Scrub/Repair
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
pacific
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

2021-07-07T20:15:08.199 INFO:tasks.workunit.client.0.smithi180.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/scrub/osd-scrub-repair.sh:351: initiate_and_fetch_state:  (( i++ ))
2021-07-07T20:15:08.200 INFO:tasks.workunit.client.0.smithi180.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/scrub/osd-scrub-repair.sh:351: initiate_and_fetch_state:  (( i < 40 ))
2021-07-07T20:15:08.200 INFO:tasks.workunit.client.0.smithi180.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/scrub/osd-scrub-repair.sh:371: initiate_and_fetch_state:  echo 'Timeout waiting for deep-scrub of ' 1.0 ' on ' osd.1 ' to start'
2021-07-07T20:15:08.200 INFO:tasks.workunit.client.0.smithi180.stdout:Timeout waiting for deep-scrub of  1.0  on  osd.1  to start
2021-07-07T20:15:08.201 INFO:tasks.workunit.client.0.smithi180.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/scrub/osd-scrub-repair.sh:372: initiate_and_fetch_state:  return 1
2021-07-07T20:15:08.201 INFO:tasks.workunit.client.0.smithi180.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/scrub/osd-scrub-repair.sh:431: TEST_auto_repair_bluestore_tag:  r=1
2021-07-07T20:15:08.202 INFO:tasks.workunit.client.0.smithi180.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/scrub/osd-scrub-repair.sh:432: TEST_auto_repair_bluestore_tag:  echo 'initiate_and_fetch_state ret: ' 1
2021-07-07T20:15:08.202 INFO:tasks.workunit.client.0.smithi180.stdout:initiate_and_fetch_state ret:  1

/a/sseshasa-2021-07-07_19:22:19-rados:standalone-master-distro-basic-smithi/6258019


Related issues

Copied to RADOS - Backport #51766: pacific: scrub/osd-scrub-repair.sh: TEST_auto_repair_bluestore_tag failed Resolved

History

#1 Updated by Neha Ojha 3 months ago

  • Subject changed from scrub/osd-recovery-scrub.sh: TEST_auto_repair_bluestore_tag failed to scrub/osd-scrub-repair.sh: TEST_auto_repair_bluestore_tag failed

#2 Updated by Neha Ojha 3 months ago

/a/sage-2021-06-12_13:06:29-rados-master-distro-basic-smithi/6168272

#3 Updated by Neha Ojha 3 months ago

  • Assignee set to Ronen Friedman

Looks like an issue with the test that was added in d6eb3e3a3c29a02d6c7c088ef7c8c668a872d16e. Ronen, can you please take a look.

#4 Updated by Ronen Friedman 3 months ago

  • Status changed from New to In Progress

#5 Updated by Ronen Friedman 3 months ago

The bug is triggered when scrubbing is not initiated on the first tick-timer after being requested. That happens if the 'should we scrub' coin-flip fails (which explains why the failure was not consistent). And when delayed:
we are continueously polling the PG status, looking for 'scrubbing'. Each 'N' attempts - we 'flush' the status. That flushing operation takes a pretty long time - enough to miss the scrubbing.

The fix tested now:
- disable the coin flip during the test;
- avoid the flushing. Just poll the state - and hope for the best.

#6 Updated by Ronen Friedman 2 months ago

  • Status changed from In Progress to Fix Under Review
  • Pull request ID set to 42410

#7 Updated by Neha Ojha 2 months ago

  • Status changed from Fix Under Review to Pending Backport
  • Backport set to pacific

#8 Updated by Backport Bot 2 months ago

  • Copied to Backport #51766: pacific: scrub/osd-scrub-repair.sh: TEST_auto_repair_bluestore_tag failed added

#9 Updated by Loïc Dachary 2 months ago

  • Status changed from Pending Backport to Resolved

While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".

Also available in: Atom PDF