Project

General

Profile

Bug #58089

get_acting_recovery_backfill().empty() assertion

Added by Matan Breizman 2 months ago. Updated about 2 months ago.

Status:
In Progress
Priority:
Normal
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

CRIMSON_COMPAT=true RBD_FEATURES= bin/ceph_test_librbd --gtest_filter=TestLibRBD.TestIO

Single OSD:
[       OK ] TestLibRBD.TestIO (13677 ms)
[----------] 1 test from TestLibRBD (13677 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test suite ran. (14546 ms total)
[  PASSED  ] 1 test.

3 OSDs:
Test is hanging because of an OSD hang:

ead+balance_reads+known_if_redirected+supports_pool_eio e16) v8]).0: after await_map stage
DEBUG 2022-11-27 10:09:31,967 [shard 0] osd - client_request(id=504, detail=m=[osd_op(client.4136.0:36 2.11 2:8c2137b3:::rb.0.1028.6d3f116e.000000000000:head {read 0~512} snapc 0={} ondisk+read+balance_reads+known_if_redirected+supports_pool_eio e16) v8]).0: after wait_for_map
DEBUG 2022-11-27 10:09:31,967 [shard 0] osd - client_request(id=504, detail=m=[osd_op(client.4136.0:36 2.11 2:8c2137b3:::rb.0.1028.6d3f116e.000000000000:head {read 0~512} snapc 0={} ondisk+read+balance_reads+known_if_redirected+supports_pool_eio e16) v8]).0: after wait_for_active stage                                                                                             
DEBUG 2022-11-27 10:09:31,967 [shard 0] osd - client_request(id=504, detail=m=[osd_op(client.4136.0:36 2.11 2:8c2137b3:::rb.0.1028.6d3f116e.000000000000:head {read 0~512} snapc 0={} ondisk+read+balance_reads+known_if_redirected+supports_pool_eio e16) v8]).0: after wait_for_active
DEBUG 2022-11-27 10:09:31,967 [shard 0] osd - do_recover_missing check for recovery, 2:8c2137b3:::rb.0.1028.6d3f116e.000000000000:head                                                       
ERROR 2022-11-27 10:09:31,968 [shard 0] none - ../src/crimson/osd/pg.cc:1204 : In function 'bool crimson::osd::PG::is_degraded_or_backfilling_object(const hobject_t&) const', ceph_assert(%s)
!get_acting_recovery_backfill().empty()

After some debugging it looks like the assertion is triggered because the pg is not primary.
Skipping the check for recovery on non-primary pg will result in misdirected dropped op. (And a test hang).

History

#2 Updated by Matan Breizman 2 months ago

  • Status changed from New to In Progress
  • Pull request ID set to 49116

#3 Updated by Matan Breizman about 2 months ago

  • Assignee set to Matan Breizman

#4 Updated by Samuel Just about 2 months ago

sjust-2022-12-12_07:11:41-crimson-rados-wip-sjust-testing-2022-12-10-142339-distro-default-smithi/7112880

Also available in: Atom PDF