Actions
Bug #58893
opentest_map_discontinuity: AssertionError: wait_for_clean: failed before timeout expired
Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:
0%
Source:
Tags:
Backport:
reef,quincy,squid
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
/a/yuriw-2023-02-24_17:50:19-rados-main-distro-default-smithi/7186711
2023-02-25T00:20:03.741 INFO:tasks.ceph.ceph_manager.ceph:PG 3.15 is not active+clean
2023-02-25T00:20:03.741 INFO:tasks.ceph.ceph_manager.ceph:{'pgid': '3.15', 'version': "831'2798", 'reported_seq': 6832, 'reported_epoch': 1624, 'state': 'active+clean+scrubbing', 'last_fresh': '2023-02-25T00:19:54.855315+0000', 'last_change': '2023-02-24T23:51:09.118454+0000', 'last_active': '2023-02-25T00:19:54.855315+0000', 'last_peered': '2023-02-25T00:19:54.855315+0000', 'last_clean': '2023-02-25T00:19:54.855315+0000', 'last_became_active': '2023-02-24T23:48:11.272128+0000', 'last_became_peered': '2023-02-24T23:48:11.272128+0000', 'last_unstale': '2023-02-25T00:19:54.855315+0000', 'last_undegraded': '2023-02-25T00:19:54.855315+0000', 'last_fullsized': '2023-02-25T00:19:54.855315+0000', 'mapping_epoch': 690, 'log_start': "0'0", 'ondisk_log_start': "0'0", 'created': 666, 'last_epoch_clean': 691, 'parent': '0.0', 'parent_split_bits': 5, 'last_scrub': "747'2570", 'last_scrub_stamp': '2023-02-24T23:49:08.414582+0000', 'last_deep_scrub': "0'0", 'last_deep_scrub_stamp': '2023-02-24T23:36:38.882905+0000', 'last_clean_scrub_stamp': '2023-02-24T23:49:08.414582+0000', 'objects_scrubbed': 0, 'log_size': 2798, 'log_dups_size': 0, 'ondisk_log_size': 2798, 'stats_invalid': False, 'dirty_stats_invalid': False, 'omap_stats_invalid': False, 'hitset_stats_invalid': False, 'hitset_bytes_stats_invalid': False, 'pin_stats_invalid': False, 'manifest_stats_invalid': False, 'snaptrimq_len': 0, 'last_scrub_duration': 1, 'scrub_schedule': 'scrubbing for 1726s', 'scrub_duration': 0.608960773, 'objects_trimmed': 0, 'snaptrim_duration': 0.581131349, 'stat_sum': {'num_bytes': 61734912, 'num_objects': 19, 'num_object_clones': 14, 'num_object_copies': 57, 'num_objects_missing_on_primary': 0, 'num_objects_missing': 0, 'num_objects_degraded': 0, 'num_objects_misplaced': 0, 'num_objects_unfound': 0, 'num_objects_dirty': 19, 'num_whiteouts': 0, 'num_read': 36, 'num_read_kb': 107729, 'num_write': 90, 'num_write_kb': 32280, 'num_scrub_errors': 0, 'num_shallow_scrub_errors': 0, 'num_deep_scrub_errors': 0, 'num_objects_recovered': 0, 'num_bytes_recovered': 0, 'num_keys_recovered': 0, 'num_objects_omap': 0, 'num_objects_hit_set_archive': 0, 'num_bytes_hit_set_archive': 0, 'num_flush': 0, 'num_flush_kb': 0, 'num_evict': 0, 'num_evict_kb': 0, 'num_promote': 0, 'num_flush_mode_high': 0, 'num_flush_mode_low': 0, 'num_evict_mode_some': 0, 'num_evict_mode_full': 0, 'num_objects_pinned': 0, 'num_legacy_snapsets': 0, 'num_large_omap_objects': 0, 'num_objects_manifest': 0, 'num_omap_bytes': 0, 'num_omap_keys': 0, 'num_objects_repaired': 0}, 'up': [7, 3, 6], 'acting': [7, 3, 6], 'avail_no_missing': [], 'object_location_counts': [], 'blocked_by': [], 'up_primary': 7, 'acting_primary': 7, 'purged_snaps': [{'start': '247', 'length': '1'}, {'start': '254', 'length': '1'}, {'start': '25f', 'length': '1'}, {'start': '262', 'length': '1'}, {'start': '269', 'length': '2'}, {'start': '26f', 'length': '1'}, {'start': '27d', 'length': '1'}, {'start': '280', 'length': '1'}, {'start': '283', 'length': '1'}, {'start': '28e', 'length': '2'}, {'start': '297', 'length': '2'}, {'start': '29d', 'length': '1'}, {'start': '2a1', 'length': '2'}, {'start': '2a4', 'length': '2'}, {'start': '2ab', 'length': '2'}, {'start': '2b0', 'length': '3'}, {'start': '2b4', 'length': '1'}, {'start': '2b7', 'length': '1'}, {'start': '2bc', 'length': '1'}, {'start': '2be', 'length': '1'}, {'start': '2c1', 'length': '2'}, {'start': '2cd', 'length': '1'}]}
2023-02-25T00:20:03.742 INFO:tasks.thrashosds.thrasher:Traceback (most recent call last):
File "/home/teuthworker/src/github.com_ceph_ceph-c_f9d812a56231a14fafcdfb339f87d3d9a9e6e55f/qa/tasks/ceph_manager.py", line 190, in wrapper
return func(self)
File "/home/teuthworker/src/github.com_ceph_ceph-c_f9d812a56231a14fafcdfb339f87d3d9a9e6e55f/qa/tasks/ceph_manager.py", line 1391, in _do_thrash
self.test_map_discontinuity()
File "/home/teuthworker/src/github.com_ceph_ceph-c_f9d812a56231a14fafcdfb339f87d3d9a9e6e55f/qa/tasks/ceph_manager.py", line 1175, in test_map_discontinuity
self.ceph_manager.wait_for_clean(
File "/home/teuthworker/src/github.com_ceph_ceph-c_f9d812a56231a14fafcdfb339f87d3d9a9e6e55f/qa/tasks/ceph_manager.py", line 2747, in wait_for_clean
assert time.time() - start < timeout, \
AssertionError: wait_for_clean: failed before timeout expired
Last pg map before Traceback:
{
"pgs_by_state": [
{
"state_name": "active+clean",
"count": 43
},
{
"state_name": "down+remapped",
"count": 1
},
{
"state_name": "active+clean+scrubbing",
"count": 1
}
],
"num_pgs": 45,
"num_pools": 3,
"num_objects": 191,
"data_bytes": 465383968,
"bytes_used": 2945564672,
"bytes_avail": 673511784448,
"bytes_total": 676457349120,
"inactive_pgs_ratio": 0.02222222276031971
}
Updated by Laura Flores about 1 year ago
- Related to Bug #51076: "wait_for_recovery: failed before timeout expired" during thrashosd test with EC backend. added
Updated by Laura Flores about 1 year ago
Marking this as related to #51076 since there was a case of `test_map_discontinuity` logged there.
Updated by Laura Flores 12 months ago
/a/yuriw-2023-05-06_14:41:44-rados-pacific-release-distro-default-smithi/7264242
2023-05-06T20:06:37.258 INFO:tasks.ceph.ceph_manager.ceph:PG 4.3 is not active+clean
2023-05-06T20:06:37.258 INFO:tasks.ceph.ceph_manager.ceph:{'pgid': '4.3', 'version': "868'9308", 'reported_seq': 13765, 'reported_epoch': 1012, 'state': 'active+clean+scrubbing', 'last_fresh': '2023-05-06T20:06:30.270194+0000', 'last_change': '2023-05-06T20:06:26.499964+0000', 'last_active': '2023-05-06T20:06:30.270194+0000', 'last_peered': '2023-05-06T20:06:30.270194+0000', 'last_clean': '2023-05-06T20:06:30.270194+0000', 'last_became_active': '2023-05-06T19:16:50.458537+0000', 'last_became_peered': '2023-05-06T19:16:50.458537+0000', 'last_unstale': '2023-05-06T20:06:30.270194+0000', 'last_undegraded': '2023-05-06T20:06:30.270194+0000', 'last_fullsized': '2023-05-06T20:06:30.270194+0000', 'mapping_epoch': 545, 'log_start': "521'6307", 'ondisk_log_start': "521'6307", 'created': 24, 'last_epoch_clean': 546, 'parent': '0.0', 'parent_split_bits': 0, 'last_scrub': "868'9308", 'last_scrub_stamp': '2023-05-06T20:05:17.468428+0000', 'last_deep_scrub': "868'9308", 'last_deep_scrub_stamp': '2023-05-06T19:58:16.808394+0000', 'last_clean_scrub_stamp': '2023-05-06T20:05:17.468428+0000', 'log_size': 3001, 'ondisk_log_size': 3001, 'stats_invalid': False, 'dirty_stats_invalid': False, 'omap_stats_invalid': False, 'hitset_stats_invalid': False, 'hitset_bytes_stats_invalid': False, 'pin_stats_invalid': False, 'manifest_stats_invalid': False, 'snaptrimq_len': 0, 'stat_sum': {'num_bytes': 2902340146, 'num_objects': 988, 'num_object_clones': 0, 'num_object_copies': 1976, 'num_objects_missing_on_primary': 0, 'num_objects_missing': 0, 'num_objects_degraded': 0, 'num_objects_misplaced': 0, 'num_objects_unfound': 0, 'num_objects_dirty': 488, 'num_whiteouts': 6, 'num_read': 3249, 'num_read_kb': 67067, 'num_write': 7831, 'num_write_kb': 2128015, 'num_scrub_errors': 0, 'num_shallow_scrub_errors': 0, 'num_deep_scrub_errors': 0, 'num_objects_recovered': 372, 'num_bytes_recovered': 470734940, 'num_keys_recovered': 0, 'num_objects_omap': 0, 'num_objects_hit_set_archive': 8, 'num_bytes_hit_set_archive': 3355, 'num_flush': 1103, 'num_flush_kb': 3225344, 'num_evict': 616, 'num_evict_kb': 1801030, 'num_promote': 1601, 'num_flush_mode_high': 0, 'num_flush_mode_low': 0, 'num_evict_mode_some': 0, 'num_evict_mode_full': 0, 'num_objects_pinned': 0, 'num_legacy_snapsets': 0, 'num_large_omap_objects': 0, 'num_objects_manifest': 0, 'num_omap_bytes': 0, 'num_omap_keys': 0, 'num_objects_repaired': 0}, 'up': [0, 7], 'acting': [0, 7], 'avail_no_missing': [], 'object_location_counts': [], 'blocked_by': [], 'up_primary': 0, 'acting_primary': 0, 'purged_snaps': []}
2023-05-06T20:06:37.258 INFO:tasks.thrashosds.thrasher:Traceback (most recent call last):
File "/home/teuthworker/src/github.com_ceph_ceph-c_8f93a58b82b94b6c9ac48277cc15bd48d4c0a902/qa/tasks/ceph_manager.py", line 188, in wrapper
return func(self)
File "/home/teuthworker/src/github.com_ceph_ceph-c_8f93a58b82b94b6c9ac48277cc15bd48d4c0a902/qa/tasks/ceph_manager.py", line 1410, in _do_thrash
self.test_map_discontinuity()
File "/home/teuthworker/src/github.com_ceph_ceph-c_8f93a58b82b94b6c9ac48277cc15bd48d4c0a902/qa/tasks/ceph_manager.py", line 1194, in test_map_discontinuity
self.ceph_manager.wait_for_clean(
File "/home/teuthworker/src/github.com_ceph_ceph-c_8f93a58b82b94b6c9ac48277cc15bd48d4c0a902/qa/tasks/ceph_manager.py", line 2752, in wait_for_clean
assert time.time() - start < timeout, \
AssertionError: wait_for_clean: failed before timeout expired
Updated by Nitzan Mordechai 9 days ago
/a/yuriw-2024-04-02_15:39:50-rados-wip-yuri2-testing-2024-04-01-1235-quincy-distro-default-smithi/7636676
Updated by Nitzan Mordechai 8 days ago
in: /a/yuriw-2024-04-02_15:39:50-rados-wip-yuri2-testing-2024-04-01-1235-quincy-distro-default-smithi/7636676
thrasher waiting for clean, it then timeout and terminate:
2024-04-03T03:13:46.433 ERROR:tasks.thrashosds.thrasher:exception: Traceback (most recent call last): File "/home/teuthworker/src/github.com_ceph_ceph-c_49e8f32b3804bf60194367ce55a8ccc716a4fbdf/qa/tasks/ceph_manager.py", line 1314, in do_thrash self._do_thrash() File "/home/teuthworker/src/github.com_ceph_ceph-c_49e8f32b3804bf60194367ce55a8ccc716a4fbdf/qa/tasks/ceph_manager.py", line 198, in wrapper return func(self) File "/home/teuthworker/src/github.com_ceph_ceph-c_49e8f32b3804bf60194367ce55a8ccc716a4fbdf/qa/tasks/ceph_manager.py", line 1436, in _do_thrash self.test_map_discontinuity() File "/home/teuthworker/src/github.com_ceph_ceph-c_49e8f32b3804bf60194367ce55a8ccc716a4fbdf/qa/tasks/ceph_manager.py", line 1218, in test_map_discontinuity self.ceph_manager.wait_for_clean( File "/home/teuthworker/src/github.com_ceph_ceph-c_49e8f32b3804bf60194367ce55a8ccc716a4fbdf/qa/tasks/ceph_manager.py", line 2789, in wait_for_clean assert time.time() - start < timeout, \ AssertionError: wait_for_clean: failed before timeout expired
pg waiting to be clean:
2024-04-03T02:43:45.986 INFO:tasks.ceph.ceph_manager.ceph:PG 3.5 is not active+clean 2024-04-03T02:43:45.987 INFO:tasks.ceph.ceph_manager.ceph:{'pgid': '3.5', 'version': "589'526", 'reported_seq': 2580, 'reported_epoch': 618, 'state': 'down', 'last_fresh': '2024-04-03T02:43:41.886107+0000', 'last_change': '2024-04-03T02:43:29.830663+0000', 'last_active': '2024-04-03T02:43:27.824533+0000', 'last_peered': '2024-04-03T02:43:26.906283+0000', 'last_clean': '2024-04-03T02:43:24.770012+0000', 'last_became_active': '2024-04-03T02:43:26.857213+0000', 'last_became_peered': '2024-04-03T02:43:26.857213+0000', 'last_unstale': '2024-04-03T02:43:41.886107+0000', 'last_undegraded': '2024-04-03T02:43:41.886107+0000', 'last_fullsized': '2024-04-03T02:43:41.886107+0000', 'mapping_epoch': 608, 'log_start': "0'0", 'ondisk_log_start': "0'0", 'created': 21, 'last_epoch_clean': 480, 'parent': '0.0', 'parent_split_bits': 0, 'last_scrub': "589'526", 'last_scrub_stamp': '2024-04-03T02:43:24.769765+0000', 'last_deep_scrub': "0'0", 'last_deep_scrub_stamp': '2024-04-03T02:33:41.830220+0000', 'last_clean_scrub_stamp': '2024-04-03T02:43:24.769765+0000', 'objects_scrubbed': 3, 'log_size': 526, 'log_dups_size': 0, 'ondisk_log_size': 526, 'stats_invalid': False, 'dirty_stats_invalid': False, 'omap_stats_invalid': False, 'hitset_stats_invalid': False, 'hitset_bytes_stats_invalid': False, 'pin_stats_invalid': False, 'manifest_stats_invalid': False, 'snaptrimq_len': 0, 'last_scrub_duration': 1, 'scrub_schedule': 'periodic scrub scheduled @ 2024-04-03T02:44:53.455297+0000', 'scrub_duration': 0.004198545, 'objects_trimmed': 0, 'snaptrim_duration': 0.001410438, 'stat_sum': {'num_bytes': 1171456, 'num_objects': 3, 'num_object_clones': 2, 'num_object_copies': 9, 'num_objects_missing_on_primary': 0, 'num_objects_missing': 0, 'num_objects_degraded': 0, 'num_objects_misplaced': 0, 'num_objects_unfound': 0, 'num_objects_dirty': 3, 'num_whiteouts': 1, 'num_read': 40, 'num_read_kb': 53686, 'num_write': 76, 'num_write_kb': 9096, 'num_scrub_errors': 0, 'num_shallow_scrub_errors': 0, 'num_deep_scrub_errors': 0, 'num_objects_recovered': 51, 'num_bytes_recovered': 36487168, 'num_keys_recovered': 0, 'num_objects_omap': 0, 'num_objects_hit_set_archive': 0, 'num_bytes_hit_set_archive': 0, 'num_flush': 0, 'num_flush_kb': 0, 'num_evict': 0, 'num_evict_kb': 0, 'num_promote': 0, 'num_flush_mode_high': 0, 'num_flush_mode_low': 0, 'num_evict_mode_some': 0, 'num_evict_mode_full': 0, 'num_objects_pinned': 0, 'num_legacy_snapsets': 0, 'num_large_omap_objects': 0, 'num_objects_manifest': 0, 'num_omap_bytes': 0, 'num_omap_keys': 0, 'num_objects_repaired': 0}, 'up': [5, 0, 1], 'acting': [5, 0, 1], 'avail_no_missing': [], 'object_location_counts': [], 'blocked_by': [2], 'up_primary': 5, 'acting_primary': 5, 'purged_snaps': [{'start': '1df', 'length': '1'}, {'start': '1f9', 'length': '1'}, {'start': '206', 'length': '1'}, {'start': '219', 'length': '1'}]}
Updated by Radoslaw Zarzynski 3 days ago
Just a supplement to Nitzan's comment:
- this PG was
down
and 'blocked_by': [2]
.
This brings the question: what's going with osd.2
?
Updated by Laura Flores 3 days ago
- Backport changed from pacific to reef,quincy,squid
/a/yuriw-2024-04-09_01:16:20-rados-reef-release-distro-default-smithi/7647835
Actions