Bug #43807
closedosd-backfill-recovery-log.sh fails
0%
Description
2020-01-24T10:16:33.646 INFO:tasks.workunit.client.0.smithi134.stderr://home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-backfill-recovery-log.sh:82: _common_test: jq '.pg_log_t.log | length' td/osd-backfill-recovery-log/result.log 2020-01-24T10:16:33.646 INFO:tasks.workunit.client.0.smithi134.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-backfill-recovery-log.sh:82: _common_test: LOGLEN=2 2020-01-24T10:16:33.647 INFO:tasks.workunit.client.0.smithi134.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-backfill-recovery-log.sh:83: _common_test: '[' 2 '!=' 1 ']' 2020-01-24T10:16:33.647 INFO:tasks.workunit.client.0.smithi134.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-backfill-recovery-log.sh:84: _common_test: echo 'FAILED: Wrong log length got 2 (expected 1)' 2020-01-24T10:16:33.647 INFO:tasks.workunit.client.0.smithi134.stderr://home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-backfill-recovery-log.sh:85: _common_test: expr 0 + 1 2020-01-24T10:16:33.648 INFO:tasks.workunit.client.0.smithi134.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-backfill-recovery-log.sh:85: _common_test: ERRORS=1 2020-01-24T10:16:33.648 INFO:tasks.workunit.client.0.smithi134.stderr://home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-backfill-recovery-log.sh:87: _common_test: jq '.pg_log_t.dups | length' td/osd-backfill-recovery-log/result.log 2020-01-24T10:16:33.648 INFO:tasks.workunit.client.0.smithi134.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-backfill-recovery-log.sh:87: _common_test: DUPSLEN=8 2020-01-24T10:16:33.648 INFO:tasks.workunit.client.0.smithi134.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-backfill-recovery-log.sh:88: _common_test: '[' 8 '!=' 9 ']' 2020-01-24T10:16:33.648 INFO:tasks.workunit.client.0.smithi134.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-backfill-recovery-log.sh:89: _common_test: echo 'FAILED: Wrong dups length got 8 (expected 9)' 2020-01-24T10:16:33.648 INFO:tasks.workunit.client.0.smithi134.stderr://home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-backfill-recovery-log.sh:90: _common_test: expr 1 + 1 2020-01-24T10:16:33.648 INFO:tasks.workunit.client.0.smithi134.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-backfill-recovery-log.sh:90: _common_test: ERRORS=2 2020-01-24T10:16:33.649 INFO:tasks.workunit.client.0.smithi134.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-backfill-recovery-log.sh:92: _common_test: grep 'copy_up_to\|copy_after' td/osd-backfill-recovery-log/osd.0.log td/osd-backfill-recovery-log/osd.1.log td/osd-backfill-recovery-log/osd.2.log td/osd-backfill-recovery-log/osd.3.log td/osd-backfill-recovery-log/osd.4.log td/osd-backfill-recovery-log/osd.5.log 2020-01-24T10:16:33.649 INFO:tasks.workunit.client.0.smithi134.stdout:FAILED: Wrong log length got 2 (expected 1) 2020-01-24T10:16:33.649 INFO:tasks.workunit.client.0.smithi134.stdout:FAILED: Wrong dups length got 8 (expected 9) ... 2020-01-24T10:16:33.661 INFO:tasks.workunit.client.0.smithi134.stdout:TEST FAILED 2020-01-24T10:16:33.661 INFO:tasks.workunit.client.0.smithi134.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-backfill-recovery-log.sh:93: _common_test: rm -f td/osd-backfill-recovery-log/result.log 2020-01-24T10:16:33.661 INFO:tasks.workunit.client.0.smithi134.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-backfill-recovery-log.sh:94: _common_test: '[' 2 '!=' 0 ']' 2020-01-24T10:16:33.661 INFO:tasks.workunit.client.0.smithi134.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-backfill-recovery-log.sh:95: _common_test: echo TEST FAILED 2020-01-24T10:16:33.661 INFO:tasks.workunit.client.0.smithi134.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-backfill-recovery-log.sh:96: _common_test: return 1 2020-01-24T10:16:33.661 INFO:tasks.workunit.client.0.smithi134.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-backfill-recovery-log.sh:33: run: return 1
/a/sage-2020-01-24_01:55:08-rados-wip-sage4-testing-2020-01-23-1347-distro-basic-smithi/4698015
Updated by Josh Durgin over 4 years ago
The test needs to be updated due to https://github.com/ceph/ceph/pull/32683 - anything else that sets the log lengths (like the rados suite facet) needs updating too. If the tests set osd_target_pg_log_entries_per_osd they can get a similar effect.
Updated by Sage Weil over 4 years ago
/a/sage-2020-01-24_13:15:58-rados-wip-sage2-testing-2020-01-23-1953-distro-basic-smithi/4701051
Updated by Neha Ojha over 4 years ago
- Status changed from New to Fix Under Review
- Pull request ID set to 32851
Updated by Neha Ojha over 4 years ago
- Status changed from Fix Under Review to In Progress
Updated by Neha Ojha over 4 years ago
- Status changed from In Progress to Fix Under Review
- Assignee set to Neha Ojha
Updated by Sage Weil over 4 years ago
//a/sage-2020-01-24_23:29:53-rados-wip-sage2-testing-2020-01-24-1408-distro-basic-smithi/4703160
Updated by Sage Weil over 4 years ago
- Status changed from Fix Under Review to Resolved
Updated by Sage Weil about 4 years ago
/a/sage-2020-03-17_13:59:54-rados-wip-sage-testing-2020-03-17-0740-distro-basic-smithi/4863239
Updated by Neha Ojha about 4 years ago
- Status changed from Resolved to New
Note that this is a resurrection of the same failure with different symptoms
/a/sage-2020-03-17_13:59:54-rados-wip-sage-testing-2020-03-17-0740-distro-basic-smithi/4863239
TEST_backfill_log_1
2020-03-17T16:17:27.682 INFO:tasks.workunit.client.0.smithi014.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-backfill-recovery-log.sh:82: _common_test: LOGLEN=50 2020-03-17T16:17:27.682 INFO:tasks.workunit.client.0.smithi014.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-backfill-recovery-log.sh:83: _common_test: '[' 50 '!=' 2 ']' 2020-03-17T16:17:27.683 INFO:tasks.workunit.client.0.smithi014.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-backfill-recovery-log.sh:84: _common_test: echo 'FAILED: Wrong log length got 50 (expected 2)' 2020-03-17T16:17:27.683 INFO:tasks.workunit.client.0.smithi014.stderr://home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-backfill-recovery-log.sh:85: _common_test: expr 0 + 1 2020-03-17T16:17:27.683 INFO:tasks.workunit.client.0.smithi014.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-backfill-recovery-log.sh:85: _common_test: ERRORS=1 2020-03-17T16:17:27.683 INFO:tasks.workunit.client.0.smithi014.stderr://home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-backfill-recovery-log.sh:87: _common_test: jq '.pg_log_t.dups | length' td/osd-backfill-recovery-log/result.log 2020-03-17T16:17:27.683 INFO:tasks.workunit.client.0.smithi014.stdout:FAILED: Wrong log length got 50 (expected 2) 2020-03-17T16:17:27.684 INFO:tasks.workunit.client.0.smithi014.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-backfill-recovery-log.sh:87: _common_test: DUPSLEN=7 2020-03-17T16:17:27.684 INFO:tasks.workunit.client.0.smithi014.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-backfill-recovery-log.sh:88: _common_test: '[' 7 '!=' 8 ']' 2020-03-17T16:17:27.685 INFO:tasks.workunit.client.0.smithi014.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-backfill-recovery-log.sh:89: _common_test: echo 'FAILED: Wrong dups length got 7 (expected 8)' 2020-03-17T16:17:27.685 INFO:tasks.workunit.client.0.smithi014.stderr://home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-backfill-recovery-log.sh:90: _common_test: expr 1 + 1 2020-03-17T16:17:27.685 INFO:tasks.workunit.client.0.smithi014.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-backfill-recovery-log.sh:90: _common_test: ERRORS=2 2020-03-17T16:17:27.685 INFO:tasks.workunit.client.0.smithi014.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-backfill-recovery-log.sh:92: _common_test: grep 'copy_up_to\|copy_after' td/osd-backfill-recovery-log/osd.0.log td/osd-backfill-recovery-log/osd.1.log td/osd-backfill-recovery-log/osd.2.log td/osd-backfill-recovery-log/osd.3.log td/osd-backfill-recovery-log/osd.4.log td/osd-backfill-recovery-log/osd.5.log 2020-03-17T16:17:27.685 INFO:tasks.workunit.client.0.smithi014.stdout:FAILED: Wrong dups length got 7 (expected 8)
/a/nojha-2020-03-16_17:35:35-rados:standalone-master-distro-basic-smithi/4860657/
TEST_backfill_log_2
2020-03-16T18:53:54.721 INFO:tasks.workunit.client.0.smithi001.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-backfill-recovery-log.sh:82: _common_test: LOGLEN=50 2020-03-16T18:53:54.721 INFO:tasks.workunit.client.0.smithi001.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-backfill-recovery-log.sh:83: _common_test: '[' 50 '!=' 2 ']' 2020-03-16T18:53:54.721 INFO:tasks.workunit.client.0.smithi001.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-backfill-recovery-log.sh:84: _common_test: echo 'FAILED: Wrong log length got 50 (expected 2)' 2020-03-16T18:53:54.721 INFO:tasks.workunit.client.0.smithi001.stderr://home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-backfill-recovery-log.sh:85: _common_test: expr 0 + 1 2020-03-16T18:53:54.721 INFO:tasks.workunit.client.0.smithi001.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-backfill-recovery-log.sh:85: _common_test: ERRORS=1 2020-03-16T18:53:54.721 INFO:tasks.workunit.client.0.smithi001.stderr://home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-backfill-recovery-log.sh:87: _common_test: jq '.pg_log_t.dups | length' td/osd-backfill-recovery-log/result.log 2020-03-16T18:53:54.722 INFO:tasks.workunit.client.0.smithi001.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-backfill-recovery-log.sh:87: _common_test: DUPSLEN=100 2020-03-16T18:53:54.722 INFO:tasks.workunit.client.0.smithi001.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-backfill-recovery-log.sh:88: _common_test: '[' 100 '!=' 148 ']' 2020-03-16T18:53:54.722 INFO:tasks.workunit.client.0.smithi001.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-backfill-recovery-log.sh:89: _common_test: echo 'FAILED: Wrong dups length got 100 (expected 148)' 2020-03-16T18:53:54.722 INFO:tasks.workunit.client.0.smithi001.stderr://home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-backfill-recovery-log.sh:90: _common_test: expr 1 + 1 2020-03-16T18:53:54.722 INFO:tasks.workunit.client.0.smithi001.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-backfill-recovery-log.sh:90: _common_test: ERRORS=2 2020-03-16T18:53:54.722 INFO:tasks.workunit.client.0.smithi001.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-backfill-recovery-log.sh:92: _common_test: grep 'copy_up_to\|copy_after' td/osd-backfill-recovery-log/osd.0.log td/osd-backfill-recovery-log/osd.1.log td/osd-backfill-recovery-log/osd.2.log td/osd-backfill-recovery-log/osd.3.log td/osd-backfill-recovery-log/osd.4.log td/osd-backfill-recovery-log/osd.5.log 2020-03-16T18:53:54.722 INFO:tasks.workunit.client.0.smithi001.stdout:FAILED: Wrong log length got 50 (expected 2) 2020-03-16T18:53:54.723 INFO:tasks.workunit.client.0.smithi001.stdout:FAILED: Wrong dups length got 100 (expected 148)
Updated by Neha Ojha about 4 years ago
/a/sage-2020-03-17_13:59:54-rados-wip-sage-testing-2020-03-17-0740-distro-basic-smithi/4863239
Comparing a failed TEST_backfill_log_1 test with one that passed, I see that the "newprimary" is different in both cases. Note that the test marks osd.0,1,2 out and waits for clean.
PASSED test
"up":[4,3,5],"acting":[4,3,5]
../qa/standalone/osd/osd-backfill-recovery-log.sh:78: _common_test: newprimary=4
FAILED test
2020-03-17T16:17:26.220 INFO:tasks.workunit.client.0.smithi014.stderr:/home/ubuntu/cephtest/clone.client.0/qa/standalone/osd/osd-backfill-recovery-log.sh:77: _common_test: newprimary=1
Not sure why osd.1 still remained the primary here.
Updated by Neha Ojha about 4 years ago
Neha Ojha wrote:
/a/sage-2020-03-17_13:59:54-rados-wip-sage-testing-2020-03-17-0740-distro-basic-smithi/4863239
Comparing a failed TEST_backfill_log_1 test with one that passed, I see that the "newprimary" is different in both cases. Note that the test marks osd.0,1,2 out and waits for clean.
PASSED test
"up":[4,3,5],"acting":[4,3,5]
[...]
FAILED test
[...]
Not sure why osd.1 still remained the primary here.
Before wait_for_clean
2020-03-22T02:40:05.571 INFO:tasks.workunit.client.0.smithi001.stdout:{"pg_ready":true,"pg_stats":[{"pgid":"1.0","version":"34'143","reported_seq":"153","reported_epoch":"34","state":"active+clean","last_fresh":"2020-03-22T02:40:01.191973+0000","last_change":"2020-03-22T02:39:49.653126+0000","last_active":"2020-03-22T02:40:01.191973+0000","last_peered":"2020-03-22T02:40:01.191973+0000","last_clean":"2020-03-22T02:40:01.191973+0000","last_became_active":"2020-03-22T02:39:49.652216+0000","last_became_peered":"2020-03-22T02:39:49.652216+0000","last_unstale":"2020-03-22T02:40:01.191973+0000","last_undegraded":"2020-03-22T02:40:01.191973+0000","last_fullsized":"2020-03-22T02:40:01.191973+0000","mapping_epoch":32,"log_start":"34'100","ondisk_log_start":"34'100","created":32,"last_epoch_clean":33,"parent":"0.0","parent_split_bits":0,"last_scrub":"0'0","last_scrub_stamp":"2020-03-22T02:39:49.018635+0000","last_deep_scrub":"0'0","last_deep_scrub_stamp":"2020-03-22T02:39:49.018635+0000","last_clean_scrub_stamp":"2020-03-22T02:39:49.018635+0000","log_size":43,"ondisk_log_size":43,"stats_invalid":false,"dirty_stats_invalid":false,"omap_stats_invalid":false,"hitset_stats_invalid":false,"hitset_bytes_stats_invalid":false,"pin_stats_invalid":false,"manifest_stats_invalid":false,"snaptrimq_len":0,"stat_sum":{"num_bytes":974116,"num_objects":143,"num_object_clones":0,"num_object_copies":429,"num_objects_missing_on_primary":0,"num_objects_missing":0,"num_objects_degraded":0,"num_objects_misplaced":0,"num_objects_unfound":0,"num_objects_dirty":143,"num_whiteouts":0,"num_read":0,"num_read_kb":0,"num_write":143,"num_write_kb":1001,"num_scrub_errors":0,"num_shallow_scrub_errors":0,"num_deep_scrub_errors":0,"num_objects_recovered":0,"num_bytes_recovered":0,"num_keys_recovered":0,"num_objects_omap":0,"num_objects_hit_set_archive":0,"num_bytes_hit_set_archive":0,"num_flush":0,"num_flush_kb":0,"num_evict":0,"num_evict_kb":0,"num_promote":0,"num_flush_mode_high":0,"num_flush_mode_low":0,"num_evict_mode_some":0,"num_evict_mode_full":0,"num_objects_pinned":0,"num_legacy_snapsets":0,"num_large_omap_objects":0,"num_objects_manifest":0,"num_omap_bytes":0,"num_omap_keys":0,"num_objects_repaired":0},"up":[1,0,2],"acting":[1,0,2],"avail_no_missing":[],"object_location_counts":[],"blocked_by":[],"up_primary":1,"acting_primary":1,"purged_snaps":[]}]}
After wait_for_clean
2020-03-22T02:40:10.670 INFO:tasks.workunit.client.0.smithi001.stdout:{"pg_ready":true,"pg_stats":[{"pgid":"1.0","version":"34'150","reported_seq":"160","reported_epoch":"34","state":"active+clean","last_fresh":"2020-03-22T02:40:01.734859+0000","last_change":"2020-03-22T02:39:49.653126+0000","last_active":"2020-03-22T02:40:01.734859+0000","last_peered":"2020-03-22T02:40:01.734859+0000","last_clean":"2020-03-22T02:40:01.734859+0000","last_became_active":"2020-03-22T02:39:49.652216+0000","last_became_peered":"2020-03-22T02:39:49.652216+0000","last_unstale":"2020-03-22T02:40:01.734859+0000","last_undegraded":"2020-03-22T02:40:01.734859+0000","last_fullsized":"2020-03-22T02:40:01.734859+0000","mapping_epoch":32,"log_start":"34'100","ondisk_log_start":"34'100","created":32,"last_epoch_clean":33,"parent":"0.0","parent_split_bits":0,"last_scrub":"0'0","last_scrub_stamp":"2020-03-22T02:39:49.018635+0000","last_deep_scrub":"0'0","last_deep_scrub_stamp":"2020-03-22T02:39:49.018635+0000","last_clean_scrub_stamp":"2020-03-22T02:39:49.018635+0000","log_size":50,"ondisk_log_size":50,"stats_invalid":false,"dirty_stats_invalid":false,"omap_stats_invalid":false,"hitset_stats_invalid":false,"hitset_bytes_stats_invalid":false,"pin_stats_invalid":false,"manifest_stats_invalid":false,"snaptrimq_len":0,"stat_sum":{"num_bytes":1021800,"num_objects":150,"num_object_clones":0,"num_object_copies":450,"num_objects_missing_on_primary":0,"num_objects_missing":0,"num_objects_degraded":0,"num_objects_misplaced":0,"num_objects_unfound":0,"num_objects_dirty":150,"num_whiteouts":0,"num_read":0,"num_read_kb":0,"num_write":150,"num_write_kb":1050,"num_scrub_errors":0,"num_shallow_scrub_errors":0,"num_deep_scrub_errors":0,"num_objects_recovered":0,"num_bytes_recovered":0,"num_keys_recovered":0,"num_objects_omap":0,"num_objects_hit_set_archive":0,"num_bytes_hit_set_archive":0,"num_flush":0,"num_flush_kb":0,"num_evict":0,"num_evict_kb":0,"num_promote":0,"num_flush_mode_high":0,"num_flush_mode_low":0,"num_evict_mode_some":0,"num_evict_mode_full":0,"num_objects_pinned":0,"num_legacy_snapsets":0,"num_large_omap_objects":0,"num_objects_manifest":0,"num_omap_bytes":0,"num_omap_keys":0,"num_objects_repaired":0},"up":[1,0,2],"acting":[1,0,2],"avail_no_missing":[],"object_location_counts":[],"blocked_by":[],"up_primary":1,"acting_primary":1,"purged_snaps":[]}]}
Note that the "reported_epoch" remained the same. This is because mgr_stats_period is 5 seconds, and we have not fetched the latest stats yet.
Updated by Neha Ojha about 4 years ago
- Status changed from In Progress to Fix Under Review
Updated by Sage Weil about 4 years ago
- Status changed from Fix Under Review to Pending Backport
- Backport set to octopus
Updated by Nathan Cutler about 4 years ago
- Status changed from Pending Backport to Resolved
- Backport deleted (
octopus)
$ git branch --contains b208177 master * octopus
Updated by Neha Ojha about 4 years ago
- Status changed from Resolved to Pending Backport
Nathan, we need https://github.com/ceph/ceph/pull/34126 as well - See https://tracker.ceph.com/issues/43807#note-15
Updated by Nathan Cutler about 4 years ago
- Copied to Backport #44847: octopus: osd-backfill-recovery-log.sh fails added
Updated by Nathan Cutler about 4 years ago
Neha Ojha wrote:
Nathan, we need https://github.com/ceph/ceph/pull/34126 as well - See https://tracker.ceph.com/issues/43807#note-15
Oops! Sorry! Got it now, and octopus PR is open for review.
Updated by Nathan Cutler about 4 years ago
- Status changed from Pending Backport to Resolved
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".