Project

General

Profile

Actions

Bug #51581

closed

scrub/osd-scrub-repair.sh: TEST_auto_repair_bluestore_tag failed

Added by Sridhar Seshasayee almost 3 years ago. Updated over 2 years ago.

Status:
Resolved
Priority:
Normal
Category:
Scrub/Repair
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
pacific
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

2021-07-07T20:15:08.199 INFO:tasks.workunit.client.0.smithi180.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/scrub/osd-scrub-repair.sh:351: initiate_and_fetch_state:  (( i++ ))
2021-07-07T20:15:08.200 INFO:tasks.workunit.client.0.smithi180.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/scrub/osd-scrub-repair.sh:351: initiate_and_fetch_state:  (( i < 40 ))
2021-07-07T20:15:08.200 INFO:tasks.workunit.client.0.smithi180.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/scrub/osd-scrub-repair.sh:371: initiate_and_fetch_state:  echo 'Timeout waiting for deep-scrub of ' 1.0 ' on ' osd.1 ' to start'
2021-07-07T20:15:08.200 INFO:tasks.workunit.client.0.smithi180.stdout:Timeout waiting for deep-scrub of  1.0  on  osd.1  to start
2021-07-07T20:15:08.201 INFO:tasks.workunit.client.0.smithi180.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/scrub/osd-scrub-repair.sh:372: initiate_and_fetch_state:  return 1
2021-07-07T20:15:08.201 INFO:tasks.workunit.client.0.smithi180.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/scrub/osd-scrub-repair.sh:431: TEST_auto_repair_bluestore_tag:  r=1
2021-07-07T20:15:08.202 INFO:tasks.workunit.client.0.smithi180.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/scrub/osd-scrub-repair.sh:432: TEST_auto_repair_bluestore_tag:  echo 'initiate_and_fetch_state ret: ' 1
2021-07-07T20:15:08.202 INFO:tasks.workunit.client.0.smithi180.stdout:initiate_and_fetch_state ret:  1

/a/sseshasa-2021-07-07_19:22:19-rados:standalone-master-distro-basic-smithi/6258019


Related issues 1 (0 open1 closed)

Copied to RADOS - Backport #51766: pacific: scrub/osd-scrub-repair.sh: TEST_auto_repair_bluestore_tag failedResolvedActions
Actions #1

Updated by Neha Ojha almost 3 years ago

  • Subject changed from scrub/osd-recovery-scrub.sh: TEST_auto_repair_bluestore_tag failed to scrub/osd-scrub-repair.sh: TEST_auto_repair_bluestore_tag failed
Actions #2

Updated by Neha Ojha almost 3 years ago

/a/sage-2021-06-12_13:06:29-rados-master-distro-basic-smithi/6168272

Actions #3

Updated by Neha Ojha almost 3 years ago

  • Assignee set to Ronen Friedman

Looks like an issue with the test that was added in d6eb3e3a3c29a02d6c7c088ef7c8c668a872d16e. Ronen, can you please take a look.

Actions #4

Updated by Ronen Friedman almost 3 years ago

  • Status changed from New to In Progress
Actions #5

Updated by Ronen Friedman almost 3 years ago

The bug is triggered when scrubbing is not initiated on the first tick-timer after being requested. That happens if the 'should we scrub' coin-flip fails (which explains why the failure was not consistent). And when delayed:
we are continueously polling the PG status, looking for 'scrubbing'. Each 'N' attempts - we 'flush' the status. That flushing operation takes a pretty long time - enough to miss the scrubbing.

The fix tested now:
- disable the coin flip during the test;
- avoid the flushing. Just poll the state - and hope for the best.

Actions #6

Updated by Ronen Friedman almost 3 years ago

  • Status changed from In Progress to Fix Under Review
  • Pull request ID set to 42410
Actions #7

Updated by Neha Ojha almost 3 years ago

  • Status changed from Fix Under Review to Pending Backport
  • Backport set to pacific
Actions #8

Updated by Backport Bot almost 3 years ago

  • Copied to Backport #51766: pacific: scrub/osd-scrub-repair.sh: TEST_auto_repair_bluestore_tag failed added
Actions #9

Updated by Loïc Dachary over 2 years ago

  • Status changed from Pending Backport to Resolved

While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".

Actions

Also available in: Atom PDF