Bug #51581
closedscrub/osd-scrub-repair.sh: TEST_auto_repair_bluestore_tag failed
0%
Description
2021-07-07T20:15:08.199 INFO:tasks.workunit.client.0.smithi180.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/scrub/osd-scrub-repair.sh:351: initiate_and_fetch_state: (( i++ )) 2021-07-07T20:15:08.200 INFO:tasks.workunit.client.0.smithi180.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/scrub/osd-scrub-repair.sh:351: initiate_and_fetch_state: (( i < 40 )) 2021-07-07T20:15:08.200 INFO:tasks.workunit.client.0.smithi180.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/scrub/osd-scrub-repair.sh:371: initiate_and_fetch_state: echo 'Timeout waiting for deep-scrub of ' 1.0 ' on ' osd.1 ' to start' 2021-07-07T20:15:08.200 INFO:tasks.workunit.client.0.smithi180.stdout:Timeout waiting for deep-scrub of 1.0 on osd.1 to start 2021-07-07T20:15:08.201 INFO:tasks.workunit.client.0.smithi180.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/scrub/osd-scrub-repair.sh:372: initiate_and_fetch_state: return 1 2021-07-07T20:15:08.201 INFO:tasks.workunit.client.0.smithi180.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/scrub/osd-scrub-repair.sh:431: TEST_auto_repair_bluestore_tag: r=1 2021-07-07T20:15:08.202 INFO:tasks.workunit.client.0.smithi180.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/scrub/osd-scrub-repair.sh:432: TEST_auto_repair_bluestore_tag: echo 'initiate_and_fetch_state ret: ' 1 2021-07-07T20:15:08.202 INFO:tasks.workunit.client.0.smithi180.stdout:initiate_and_fetch_state ret: 1
/a/sseshasa-2021-07-07_19:22:19-rados:standalone-master-distro-basic-smithi/6258019
Updated by Neha Ojha almost 3 years ago
- Subject changed from scrub/osd-recovery-scrub.sh: TEST_auto_repair_bluestore_tag failed to scrub/osd-scrub-repair.sh: TEST_auto_repair_bluestore_tag failed
Updated by Neha Ojha almost 3 years ago
/a/sage-2021-06-12_13:06:29-rados-master-distro-basic-smithi/6168272
Updated by Neha Ojha almost 3 years ago
- Assignee set to Ronen Friedman
Looks like an issue with the test that was added in d6eb3e3a3c29a02d6c7c088ef7c8c668a872d16e. Ronen, can you please take a look.
Updated by Ronen Friedman almost 3 years ago
- Status changed from New to In Progress
Updated by Ronen Friedman almost 3 years ago
The bug is triggered when scrubbing is not initiated on the first tick-timer after being requested. That happens if the 'should we scrub' coin-flip fails (which explains why the failure was not consistent). And when delayed:
we are continueously polling the PG status, looking for 'scrubbing'. Each 'N' attempts - we 'flush' the status. That flushing operation takes a pretty long time - enough to miss the scrubbing.
The fix tested now:
- disable the coin flip during the test;
- avoid the flushing. Just poll the state - and hope for the best.
Updated by Ronen Friedman almost 3 years ago
- Status changed from In Progress to Fix Under Review
- Pull request ID set to 42410
Updated by Neha Ojha almost 3 years ago
- Status changed from Fix Under Review to Pending Backport
- Backport set to pacific
Updated by Backport Bot almost 3 years ago
- Copied to Backport #51766: pacific: scrub/osd-scrub-repair.sh: TEST_auto_repair_bluestore_tag failed added
Updated by Loïc Dachary over 2 years ago
- Status changed from Pending Backport to Resolved
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".