Project

General

Profile

Actions

Bug #58893

open

test_map_discontinuity: AssertionError: wait_for_clean: failed before timeout expired

Added by Laura Flores about 1 year ago. Updated 3 days ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
reef,quincy,squid
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

/a/yuriw-2023-02-24_17:50:19-rados-main-distro-default-smithi/7186711

2023-02-25T00:20:03.741 INFO:tasks.ceph.ceph_manager.ceph:PG 3.15 is not active+clean
2023-02-25T00:20:03.741 INFO:tasks.ceph.ceph_manager.ceph:{'pgid': '3.15', 'version': "831'2798", 'reported_seq': 6832, 'reported_epoch': 1624, 'state': 'active+clean+scrubbing', 'last_fresh': '2023-02-25T00:19:54.855315+0000', 'last_change': '2023-02-24T23:51:09.118454+0000', 'last_active': '2023-02-25T00:19:54.855315+0000', 'last_peered': '2023-02-25T00:19:54.855315+0000', 'last_clean': '2023-02-25T00:19:54.855315+0000', 'last_became_active': '2023-02-24T23:48:11.272128+0000', 'last_became_peered': '2023-02-24T23:48:11.272128+0000', 'last_unstale': '2023-02-25T00:19:54.855315+0000', 'last_undegraded': '2023-02-25T00:19:54.855315+0000', 'last_fullsized': '2023-02-25T00:19:54.855315+0000', 'mapping_epoch': 690, 'log_start': "0'0", 'ondisk_log_start': "0'0", 'created': 666, 'last_epoch_clean': 691, 'parent': '0.0', 'parent_split_bits': 5, 'last_scrub': "747'2570", 'last_scrub_stamp': '2023-02-24T23:49:08.414582+0000', 'last_deep_scrub': "0'0", 'last_deep_scrub_stamp': '2023-02-24T23:36:38.882905+0000', 'last_clean_scrub_stamp': '2023-02-24T23:49:08.414582+0000', 'objects_scrubbed': 0, 'log_size': 2798, 'log_dups_size': 0, 'ondisk_log_size': 2798, 'stats_invalid': False, 'dirty_stats_invalid': False, 'omap_stats_invalid': False, 'hitset_stats_invalid': False, 'hitset_bytes_stats_invalid': False, 'pin_stats_invalid': False, 'manifest_stats_invalid': False, 'snaptrimq_len': 0, 'last_scrub_duration': 1, 'scrub_schedule': 'scrubbing for 1726s', 'scrub_duration': 0.608960773, 'objects_trimmed': 0, 'snaptrim_duration': 0.581131349, 'stat_sum': {'num_bytes': 61734912, 'num_objects': 19, 'num_object_clones': 14, 'num_object_copies': 57, 'num_objects_missing_on_primary': 0, 'num_objects_missing': 0, 'num_objects_degraded': 0, 'num_objects_misplaced': 0, 'num_objects_unfound': 0, 'num_objects_dirty': 19, 'num_whiteouts': 0, 'num_read': 36, 'num_read_kb': 107729, 'num_write': 90, 'num_write_kb': 32280, 'num_scrub_errors': 0, 'num_shallow_scrub_errors': 0, 'num_deep_scrub_errors': 0, 'num_objects_recovered': 0, 'num_bytes_recovered': 0, 'num_keys_recovered': 0, 'num_objects_omap': 0, 'num_objects_hit_set_archive': 0, 'num_bytes_hit_set_archive': 0, 'num_flush': 0, 'num_flush_kb': 0, 'num_evict': 0, 'num_evict_kb': 0, 'num_promote': 0, 'num_flush_mode_high': 0, 'num_flush_mode_low': 0, 'num_evict_mode_some': 0, 'num_evict_mode_full': 0, 'num_objects_pinned': 0, 'num_legacy_snapsets': 0, 'num_large_omap_objects': 0, 'num_objects_manifest': 0, 'num_omap_bytes': 0, 'num_omap_keys': 0, 'num_objects_repaired': 0}, 'up': [7, 3, 6], 'acting': [7, 3, 6], 'avail_no_missing': [], 'object_location_counts': [], 'blocked_by': [], 'up_primary': 7, 'acting_primary': 7, 'purged_snaps': [{'start': '247', 'length': '1'}, {'start': '254', 'length': '1'}, {'start': '25f', 'length': '1'}, {'start': '262', 'length': '1'}, {'start': '269', 'length': '2'}, {'start': '26f', 'length': '1'}, {'start': '27d', 'length': '1'}, {'start': '280', 'length': '1'}, {'start': '283', 'length': '1'}, {'start': '28e', 'length': '2'}, {'start': '297', 'length': '2'}, {'start': '29d', 'length': '1'}, {'start': '2a1', 'length': '2'}, {'start': '2a4', 'length': '2'}, {'start': '2ab', 'length': '2'}, {'start': '2b0', 'length': '3'}, {'start': '2b4', 'length': '1'}, {'start': '2b7', 'length': '1'}, {'start': '2bc', 'length': '1'}, {'start': '2be', 'length': '1'}, {'start': '2c1', 'length': '2'}, {'start': '2cd', 'length': '1'}]}
2023-02-25T00:20:03.742 INFO:tasks.thrashosds.thrasher:Traceback (most recent call last):
  File "/home/teuthworker/src/github.com_ceph_ceph-c_f9d812a56231a14fafcdfb339f87d3d9a9e6e55f/qa/tasks/ceph_manager.py", line 190, in wrapper
    return func(self)
  File "/home/teuthworker/src/github.com_ceph_ceph-c_f9d812a56231a14fafcdfb339f87d3d9a9e6e55f/qa/tasks/ceph_manager.py", line 1391, in _do_thrash
    self.test_map_discontinuity()
  File "/home/teuthworker/src/github.com_ceph_ceph-c_f9d812a56231a14fafcdfb339f87d3d9a9e6e55f/qa/tasks/ceph_manager.py", line 1175, in test_map_discontinuity
    self.ceph_manager.wait_for_clean(
  File "/home/teuthworker/src/github.com_ceph_ceph-c_f9d812a56231a14fafcdfb339f87d3d9a9e6e55f/qa/tasks/ceph_manager.py", line 2747, in wait_for_clean
    assert time.time() - start < timeout, \
AssertionError: wait_for_clean: failed before timeout expired

Last pg map before Traceback:

{
  "pgs_by_state": [
    {
      "state_name": "active+clean",
      "count": 43
    },
    {
      "state_name": "down+remapped",
      "count": 1
    },
    {
      "state_name": "active+clean+scrubbing",
      "count": 1
    }
  ],
  "num_pgs": 45,
  "num_pools": 3,
  "num_objects": 191,
  "data_bytes": 465383968,
  "bytes_used": 2945564672,
  "bytes_avail": 673511784448,
  "bytes_total": 676457349120,
  "inactive_pgs_ratio": 0.02222222276031971
}


Related issues 1 (0 open1 closed)

Related to RADOS - Bug #51076: "wait_for_recovery: failed before timeout expired" during thrashosd test with EC backend.ResolvedLaura Flores

Actions
Actions #1

Updated by Laura Flores about 1 year ago

  • Related to Bug #51076: "wait_for_recovery: failed before timeout expired" during thrashosd test with EC backend. added
Actions #2

Updated by Laura Flores about 1 year ago

Marking this as related to #51076 since there was a case of `test_map_discontinuity` logged there.

Actions #3

Updated by Laura Flores 12 months ago

/a/yuriw-2023-05-06_14:41:44-rados-pacific-release-distro-default-smithi/7264242

2023-05-06T20:06:37.258 INFO:tasks.ceph.ceph_manager.ceph:PG 4.3 is not active+clean
2023-05-06T20:06:37.258 INFO:tasks.ceph.ceph_manager.ceph:{'pgid': '4.3', 'version': "868'9308", 'reported_seq': 13765, 'reported_epoch': 1012, 'state': 'active+clean+scrubbing', 'last_fresh': '2023-05-06T20:06:30.270194+0000', 'last_change': '2023-05-06T20:06:26.499964+0000', 'last_active': '2023-05-06T20:06:30.270194+0000', 'last_peered': '2023-05-06T20:06:30.270194+0000', 'last_clean': '2023-05-06T20:06:30.270194+0000', 'last_became_active': '2023-05-06T19:16:50.458537+0000', 'last_became_peered': '2023-05-06T19:16:50.458537+0000', 'last_unstale': '2023-05-06T20:06:30.270194+0000', 'last_undegraded': '2023-05-06T20:06:30.270194+0000', 'last_fullsized': '2023-05-06T20:06:30.270194+0000', 'mapping_epoch': 545, 'log_start': "521'6307", 'ondisk_log_start': "521'6307", 'created': 24, 'last_epoch_clean': 546, 'parent': '0.0', 'parent_split_bits': 0, 'last_scrub': "868'9308", 'last_scrub_stamp': '2023-05-06T20:05:17.468428+0000', 'last_deep_scrub': "868'9308", 'last_deep_scrub_stamp': '2023-05-06T19:58:16.808394+0000', 'last_clean_scrub_stamp': '2023-05-06T20:05:17.468428+0000', 'log_size': 3001, 'ondisk_log_size': 3001, 'stats_invalid': False, 'dirty_stats_invalid': False, 'omap_stats_invalid': False, 'hitset_stats_invalid': False, 'hitset_bytes_stats_invalid': False, 'pin_stats_invalid': False, 'manifest_stats_invalid': False, 'snaptrimq_len': 0, 'stat_sum': {'num_bytes': 2902340146, 'num_objects': 988, 'num_object_clones': 0, 'num_object_copies': 1976, 'num_objects_missing_on_primary': 0, 'num_objects_missing': 0, 'num_objects_degraded': 0, 'num_objects_misplaced': 0, 'num_objects_unfound': 0, 'num_objects_dirty': 488, 'num_whiteouts': 6, 'num_read': 3249, 'num_read_kb': 67067, 'num_write': 7831, 'num_write_kb': 2128015, 'num_scrub_errors': 0, 'num_shallow_scrub_errors': 0, 'num_deep_scrub_errors': 0, 'num_objects_recovered': 372, 'num_bytes_recovered': 470734940, 'num_keys_recovered': 0, 'num_objects_omap': 0, 'num_objects_hit_set_archive': 8, 'num_bytes_hit_set_archive': 3355, 'num_flush': 1103, 'num_flush_kb': 3225344, 'num_evict': 616, 'num_evict_kb': 1801030, 'num_promote': 1601, 'num_flush_mode_high': 0, 'num_flush_mode_low': 0, 'num_evict_mode_some': 0, 'num_evict_mode_full': 0, 'num_objects_pinned': 0, 'num_legacy_snapsets': 0, 'num_large_omap_objects': 0, 'num_objects_manifest': 0, 'num_omap_bytes': 0, 'num_omap_keys': 0, 'num_objects_repaired': 0}, 'up': [0, 7], 'acting': [0, 7], 'avail_no_missing': [], 'object_location_counts': [], 'blocked_by': [], 'up_primary': 0, 'acting_primary': 0, 'purged_snaps': []}
2023-05-06T20:06:37.258 INFO:tasks.thrashosds.thrasher:Traceback (most recent call last):
  File "/home/teuthworker/src/github.com_ceph_ceph-c_8f93a58b82b94b6c9ac48277cc15bd48d4c0a902/qa/tasks/ceph_manager.py", line 188, in wrapper
    return func(self)
  File "/home/teuthworker/src/github.com_ceph_ceph-c_8f93a58b82b94b6c9ac48277cc15bd48d4c0a902/qa/tasks/ceph_manager.py", line 1410, in _do_thrash
    self.test_map_discontinuity()
  File "/home/teuthworker/src/github.com_ceph_ceph-c_8f93a58b82b94b6c9ac48277cc15bd48d4c0a902/qa/tasks/ceph_manager.py", line 1194, in test_map_discontinuity
    self.ceph_manager.wait_for_clean(
  File "/home/teuthworker/src/github.com_ceph_ceph-c_8f93a58b82b94b6c9ac48277cc15bd48d4c0a902/qa/tasks/ceph_manager.py", line 2752, in wait_for_clean
    assert time.time() - start < timeout, \
AssertionError: wait_for_clean: failed before timeout expired
Actions #4

Updated by Laura Flores 12 months ago

  • Backport set to pacific
Actions #5

Updated by Nitzan Mordechai 9 days ago

/a/yuriw-2024-04-02_15:39:50-rados-wip-yuri2-testing-2024-04-01-1235-quincy-distro-default-smithi/7636676

Actions #6

Updated by Nitzan Mordechai 8 days ago

in: /a/yuriw-2024-04-02_15:39:50-rados-wip-yuri2-testing-2024-04-01-1235-quincy-distro-default-smithi/7636676

thrasher waiting for clean, it then timeout and terminate:

2024-04-03T03:13:46.433 ERROR:tasks.thrashosds.thrasher:exception:
Traceback (most recent call last):
  File "/home/teuthworker/src/github.com_ceph_ceph-c_49e8f32b3804bf60194367ce55a8ccc716a4fbdf/qa/tasks/ceph_manager.py", line 1314, in do_thrash
    self._do_thrash()
  File "/home/teuthworker/src/github.com_ceph_ceph-c_49e8f32b3804bf60194367ce55a8ccc716a4fbdf/qa/tasks/ceph_manager.py", line 198, in wrapper
    return func(self)
  File "/home/teuthworker/src/github.com_ceph_ceph-c_49e8f32b3804bf60194367ce55a8ccc716a4fbdf/qa/tasks/ceph_manager.py", line 1436, in _do_thrash
    self.test_map_discontinuity()
  File "/home/teuthworker/src/github.com_ceph_ceph-c_49e8f32b3804bf60194367ce55a8ccc716a4fbdf/qa/tasks/ceph_manager.py", line 1218, in test_map_discontinuity
    self.ceph_manager.wait_for_clean(
  File "/home/teuthworker/src/github.com_ceph_ceph-c_49e8f32b3804bf60194367ce55a8ccc716a4fbdf/qa/tasks/ceph_manager.py", line 2789, in wait_for_clean
    assert time.time() - start < timeout, \
AssertionError: wait_for_clean: failed before timeout expired

pg waiting to be clean:

2024-04-03T02:43:45.986 INFO:tasks.ceph.ceph_manager.ceph:PG 3.5 is not active+clean
2024-04-03T02:43:45.987 INFO:tasks.ceph.ceph_manager.ceph:{'pgid': '3.5', 'version': "589'526", 'reported_seq': 2580, 'reported_epoch': 618, 'state': 'down', 'last_fresh': '2024-04-03T02:43:41.886107+0000', 'last_change': '2024-04-03T02:43:29.830663+0000', 'last_active': '2024-04-03T02:43:27.824533+0000', 'last_peered': '2024-04-03T02:43:26.906283+0000', 'last_clean': '2024-04-03T02:43:24.770012+0000', 'last_became_active': '2024-04-03T02:43:26.857213+0000', 'last_became_peered': '2024-04-03T02:43:26.857213+0000', 'last_unstale': '2024-04-03T02:43:41.886107+0000', 'last_undegraded': '2024-04-03T02:43:41.886107+0000', 'last_fullsized': '2024-04-03T02:43:41.886107+0000', 'mapping_epoch': 608, 'log_start': "0'0", 'ondisk_log_start': "0'0", 'created': 21, 'last_epoch_clean': 480, 'parent': '0.0', 'parent_split_bits': 0, 'last_scrub': "589'526", 'last_scrub_stamp': '2024-04-03T02:43:24.769765+0000', 'last_deep_scrub': "0'0", 'last_deep_scrub_stamp': '2024-04-03T02:33:41.830220+0000', 'last_clean_scrub_stamp': '2024-04-03T02:43:24.769765+0000', 'objects_scrubbed': 3, 'log_size': 526, 'log_dups_size': 0, 'ondisk_log_size': 526, 'stats_invalid': False, 'dirty_stats_invalid': False, 'omap_stats_invalid': False, 'hitset_stats_invalid': False, 'hitset_bytes_stats_invalid': False, 'pin_stats_invalid': False, 'manifest_stats_invalid': False, 'snaptrimq_len': 0, 'last_scrub_duration': 1, 'scrub_schedule': 'periodic scrub scheduled @ 2024-04-03T02:44:53.455297+0000', 'scrub_duration': 0.004198545, 'objects_trimmed': 0, 'snaptrim_duration': 0.001410438, 'stat_sum': {'num_bytes': 1171456, 'num_objects': 3, 'num_object_clones': 2, 'num_object_copies': 9, 'num_objects_missing_on_primary': 0, 'num_objects_missing': 0, 'num_objects_degraded': 0, 'num_objects_misplaced': 0, 'num_objects_unfound': 0, 'num_objects_dirty': 3, 'num_whiteouts': 1, 'num_read': 40, 'num_read_kb': 53686, 'num_write': 76, 'num_write_kb': 9096, 'num_scrub_errors': 0, 'num_shallow_scrub_errors': 0, 'num_deep_scrub_errors': 0, 'num_objects_recovered': 51, 'num_bytes_recovered': 36487168, 'num_keys_recovered': 0, 'num_objects_omap': 0, 'num_objects_hit_set_archive': 0, 'num_bytes_hit_set_archive': 0, 'num_flush': 0, 'num_flush_kb': 0, 'num_evict': 0, 'num_evict_kb': 0, 'num_promote': 0, 'num_flush_mode_high': 0, 'num_flush_mode_low': 0, 'num_evict_mode_some': 0, 'num_evict_mode_full': 0, 'num_objects_pinned': 0, 'num_legacy_snapsets': 0, 'num_large_omap_objects': 0, 'num_objects_manifest': 0, 'num_omap_bytes': 0, 'num_omap_keys': 0, 'num_objects_repaired': 0}, 'up': [5, 0, 1], 'acting': [5, 0, 1], 'avail_no_missing': [], 'object_location_counts': [], 'blocked_by': [2], 'up_primary': 5, 'acting_primary': 5, 'purged_snaps': [{'start': '1df', 'length': '1'}, {'start': '1f9', 'length': '1'}, {'start': '206', 'length': '1'}, {'start': '219', 'length': '1'}]}

Actions #7

Updated by Radoslaw Zarzynski 3 days ago

Just a supplement to Nitzan's comment:

  • this PG was down and
  • 'blocked_by': [2].

This brings the question: what's going with osd.2?

Actions #8

Updated by Laura Flores 3 days ago

  • Backport changed from pacific to reef,quincy,squid

/a/yuriw-2024-04-09_01:16:20-rados-reef-release-distro-default-smithi/7647835

Actions

Also available in: Atom PDF