Bug #51581: scrub/osd-scrub-repair.sh: TEST_auto_repair_bluestore_tag failed - RADOS - Ceph

Actions

Copy link

Bug #51581

closed

scrub/osd-scrub-repair.sh: TEST_auto_repair_bluestore_tag failed

Added by Sridhar Seshasayee almost 3 years ago. Updated over 2 years ago.

Status:

Resolved

Priority:

Normal

Assignee:

Ronen Friedman

Category:

Scrub/Repair

Target version:

% Done:

Source:

Tags:

Backport:

pacific

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Component(RADOS):

Pull request ID:

42410

Crash signature (v1):

Crash signature (v2):

Description

2021-07-07T20:15:08.199 INFO:tasks.workunit.client.0.smithi180.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/scrub/osd-scrub-repair.sh:351: initiate_and_fetch_state:  (( i++ ))
2021-07-07T20:15:08.200 INFO:tasks.workunit.client.0.smithi180.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/scrub/osd-scrub-repair.sh:351: initiate_and_fetch_state:  (( i < 40 ))
2021-07-07T20:15:08.200 INFO:tasks.workunit.client.0.smithi180.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/scrub/osd-scrub-repair.sh:371: initiate_and_fetch_state:  echo 'Timeout waiting for deep-scrub of ' 1.0 ' on ' osd.1 ' to start'
2021-07-07T20:15:08.200 INFO:tasks.workunit.client.0.smithi180.stdout:Timeout waiting for deep-scrub of  1.0  on  osd.1  to start
2021-07-07T20:15:08.201 INFO:tasks.workunit.client.0.smithi180.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/scrub/osd-scrub-repair.sh:372: initiate_and_fetch_state:  return 1
2021-07-07T20:15:08.201 INFO:tasks.workunit.client.0.smithi180.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/scrub/osd-scrub-repair.sh:431: TEST_auto_repair_bluestore_tag:  r=1
2021-07-07T20:15:08.202 INFO:tasks.workunit.client.0.smithi180.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/scrub/osd-scrub-repair.sh:432: TEST_auto_repair_bluestore_tag:  echo 'initiate_and_fetch_state ret: ' 1
2021-07-07T20:15:08.202 INFO:tasks.workunit.client.0.smithi180.stdout:initiate_and_fetch_state ret:  1

/a/sseshasa-2021-07-07_19:22:19-rados:standalone-master-distro-basic-smithi/6258019

Related issues 1 (0 open — 1 closed)

Actions

Copy link

Updated by Neha Ojha almost 3 years ago

Subject changed from scrub/osd-recovery-scrub.sh: TEST_auto_repair_bluestore_tag failed to scrub/osd-scrub-repair.sh: TEST_auto_repair_bluestore_tag failed

Actions

Copy link

Updated by Neha Ojha almost 3 years ago

/a/sage-2021-06-12_13:06:29-rados-master-distro-basic-smithi/6168272

Actions

Copy link

Updated by Neha Ojha almost 3 years ago

Assignee set to Ronen Friedman

Looks like an issue with the test that was added in d6eb3e3a3c29a02d6c7c088ef7c8c668a872d16e. Ronen, can you please take a look.

Actions

Copy link

Updated by Ronen Friedman almost 3 years ago

Status changed from New to In Progress

Actions

Copy link

Updated by Ronen Friedman almost 3 years ago

The bug is triggered when scrubbing is not initiated on the first tick-timer after being requested. That happens if the 'should we scrub' coin-flip fails (which explains why the failure was not consistent). And when delayed:
we are continueously polling the PG status, looking for 'scrubbing'. Each 'N' attempts - we 'flush' the status. That flushing operation takes a pretty long time - enough to miss the scrubbing.

The fix tested now:
- disable the coin flip during the test;
- avoid the flushing. Just poll the state - and hope for the best.

Actions

Copy link

Updated by Ronen Friedman almost 3 years ago

Status changed from In Progress to Fix Under Review
Pull request ID set to 42410

Actions

Copy link

Updated by Neha Ojha almost 3 years ago

Status changed from Fix Under Review to Pending Backport
Backport set to pacific

Actions

Copy link

Updated by Backport Bot almost 3 years ago

Copied to Backport #51766: pacific: scrub/osd-scrub-repair.sh: TEST_auto_repair_bluestore_tag failed added

Actions

Copy link

Updated by Loïc Dachary over 2 years ago

Status changed from Pending Backport to Resolved

While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » RADOS

Custom queries

Bug #51581

scrub/osd-scrub-repair.sh: TEST_auto_repair_bluestore_tag failed

Updated by Neha Ojha almost 3 years ago

Updated by Neha Ojha almost 3 years ago

Updated by Neha Ojha almost 3 years ago

Updated by Ronen Friedman almost 3 years ago

Updated by Ronen Friedman almost 3 years ago

Updated by Ronen Friedman almost 3 years ago

Updated by Neha Ojha almost 3 years ago

Updated by Backport Bot almost 3 years ago

Updated by Loïc Dachary over 2 years ago