Project

General

Profile

Actions

Bug #63198

open

rados/thrash: AssertionError: wait_for_recovery: failed before timeout expired

Added by Kamoltat (Junior) Sirivadhna 7 months ago. Updated 3 days ago.

Status:
New
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

/a/ksirivad-2023-10-13_01:58:36-rados-wip-ksirivad-fix-63183-distro-default-smithi/7423809/teuthology.log

2023-10-13T05:29:39.337 DEBUG:teuthology.orchestra.run:got remote process result: 124
2023-10-13T05:29:39.338 ERROR:teuthology.run_tasks:Manager failed: thrashosds
Traceback (most recent call last):
  File "/home/teuthworker/src/git.ceph.com_teuthology_8cdab074dcca9a68965bc5a50e9c30b691949723/teuthology/run_tasks.py", line 154, in run_tasks
    suppress = manager.__exit__(*exc_info)
  File "/usr/lib/python3.8/contextlib.py", line 120, in __exit__
    next(self.gen)
  File "/home/teuthworker/src/github.com_ceph_ceph-c_5d1b5da21591c57cb0cbbbc8775b6ea0ced953a4/qa/tasks/thrashosds.py", line 215, in task
    cluster_manager.wait_for_all_osds_up()
  File "/home/teuthworker/src/github.com_ceph_ceph-c_5d1b5da21591c57cb0cbbbc8775b6ea0ced953a4/qa/tasks/ceph_manager.py", line 2792, in wait_for_all_osds_up
    while not self.are_all_osds_up():
  File "/home/teuthworker/src/github.com_ceph_ceph-c_5d1b5da21591c57cb0cbbbc8775b6ea0ced953a4/qa/tasks/ceph_manager.py", line 2782, in are_all_osds_up
    x = self.get_osd_dump()
  File "/home/teuthworker/src/github.com_ceph_ceph-c_5d1b5da21591c57cb0cbbbc8775b6ea0ced953a4/qa/tasks/ceph_manager.py", line 2545, in get_osd_dump
    return self.get_osd_dump_json()['osds']
  File "/home/teuthworker/src/github.com_ceph_ceph-c_5d1b5da21591c57cb0cbbbc8775b6ea0ced953a4/qa/tasks/ceph_manager.py", line 2537, in get_osd_dump_json
    out = self.raw_cluster_cmd('osd', 'dump', '--format=json')
  File "/home/teuthworker/src/github.com_ceph_ceph-c_5d1b5da21591c57cb0cbbbc8775b6ea0ced953a4/qa/tasks/ceph_manager.py", line 1611, in raw_cluster_cmd
    return self.run_cluster_cmd(**kwargs).stdout.getvalue()
  File "/home/teuthworker/src/github.com_ceph_ceph-c_5d1b5da21591c57cb0cbbbc8775b6ea0ced953a4/qa/tasks/ceph_manager.py", line 1602, in run_cluster_cmd
    return self.controller.run(**kwargs)
  File "/home/teuthworker/src/git.ceph.com_teuthology_8cdab074dcca9a68965bc5a50e9c30b691949723/teuthology/orchestra/remote.py", line 522, in run
    r = self._runner(client=self.ssh, name=self.shortname, **kwargs)
  File "/home/teuthworker/src/git.ceph.com_teuthology_8cdab074dcca9a68965bc5a50e9c30b691949723/teuthology/orchestra/run.py", line 455, in run
    r.wait()
  File "/home/teuthworker/src/git.ceph.com_teuthology_8cdab074dcca9a68965bc5a50e9c30b691949723/teuthology/orchestra/run.py", line 161, in wait
    self._raise_for_status()
  File "/home/teuthworker/src/git.ceph.com_teuthology_8cdab074dcca9a68965bc5a50e9c30b691949723/teuthology/orchestra/run.py", line 181, in _raise_for_status
    raise CommandFailedError(
teuthology.exceptions.CommandFailedError: Command failed on smithi072 with status 124: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph osd dump --format=json'
2023-10-13T05:29:39.347 DEBUG:teuthology.run_tasks:Unwinding manager ceph
2023-10-13T05:29:39.356 INFO:tasks.ceph.ceph_manager.ceph:waiting for clean
Actions #1

Updated by Neha Ojha 7 months ago

Looks like the test already failed much earlier

2023-10-13T04:28:32.682 INFO:tasks.ceph.ceph_manager.ceph:PG 46.3 is not active+clean
2023-10-13T04:28:32.683 INFO:tasks.ceph.ceph_manager.ceph:PG 22.0 is not active+clean
2023-10-13T04:28:32.683 INFO:tasks.ceph.ceph_manager.ceph:PG 22.12 is not active+clean
2023-10-13T04:28:32.684 INFO:tasks.ceph.ceph_manager.ceph:PG 21.1a is not active+clean
...
2023-10-13T04:28:32.685 INFO:tasks.thrashosds.thrasher:Traceback (most recent call last):
  File "/home/teuthworker/src/github.com_ceph_ceph-c_5d1b5da21591c57cb0cbbbc8775b6ea0ced953a4/qa/tasks/ceph_manager.py", line 190, in wrapper
    return func(self)
  File "/home/teuthworker/src/github.com_ceph_ceph-c_5d1b5da21591c57cb0cbbbc8775b6ea0ced953a4/qa/tasks/ceph_manager.py", line 1409, in _do_thrash
    self.ceph_manager.wait_for_recovery(
  File "/home/teuthworker/src/github.com_ceph_ceph-c_5d1b5da21591c57cb0cbbbc8775b6ea0ced953a4/qa/tasks/ceph_manager.py", line 2854, in wait_for_recovery
    assert now - start < timeout, \
AssertionError: wait_for_recovery: failed before timeout expired
Actions #2

Updated by Laura Flores 3 months ago

  • Subject changed from Command failed: ceph --cluster ceph osd dump --format=json to rados/thrash: AssertionError: wait_for_recovery: failed before timeout expired
Actions #3

Updated by Laura Flores 3 months ago

/a/yuriw-2024-02-15_15:09:46-rados-wip-yuri5-testing-2024-02-14-1335-distro-default-smithi/7561084

Actions #4

Updated by Radoslaw Zarzynski 2 months ago

  • Assignee set to Laura Flores
Actions #5

Updated by Aishwarya Mathuria about 2 months ago

/a/yuriw-2024-03-19_00:09:45-rados-wip-yuri5-testing-2024-03-18-1144-distro-default-smithi/7609848

Actions #6

Updated by Radoslaw Zarzynski about 1 month ago

Bump up but not terribly high prio.

Actions #7

Updated by Laura Flores 6 days ago

/a/yuriw-2024-04-20_15:32:38-rados-wip-yuriw-testing-20240419.185239-main-distro-default-smithi/7664923

2024-04-20T18:32:31.959 INFO:tasks.ceph.ceph_manager.ceph:PG 7.1b is not active+clean
2024-04-20T18:32:31.959 INFO:tasks.ceph.ceph_manager.ceph:{'pgid': '7.1b', 'version': "0'0", 'reported_seq': 1304, 'reported_epoch': 1285, 'state': 'active+clean+remapped', 'last_fresh': '2024-04-20T18:32:25.425840+0000', 'last_change': '2024-04-20T18:29:34.627792+0000', 'last_active': '2024-04-20T18:32:25.425840+0000', 'last_peered': '2024-04-20T18:32:25.425840+0000', 'last_clean': '2024-04-20T18:32:25.425840+0000', 'last_became_active': '2024-04-20T18:12:06.247961+0000', 'last_became_peered': '2024-04-20T18:12:06.247961+0000', 'last_unstale': '2024-04-20T18:32:25.425840+0000', 'last_undegraded': '2024-04-20T18:32:25.425840+0000', 'last_fullsized': '2024-04-20T18:32:25.425840+0000', 'mapping_epoch': 649, 'log_start': "0'0", 'ondisk_log_start': "0'0", 'created': 62, 'last_epoch_clean': 650, 'parent': '0.0', 'parent_split_bits': 0, 'last_scrub': "0'0", 'last_scrub_stamp': '2024-04-20T18:29:34.627661+0000', 'last_deep_scrub': "0'0", 'last_deep_scrub_stamp': '2024-04-20T18:08:16.466604+0000', 'last_clean_scrub_stamp': '2024-04-20T18:29:34.627661+0000', 'objects_scrubbed': 0, 'log_size': 0, 'log_dups_size': 0, 'ondisk_log_size': 0, 'stats_invalid': False, 'dirty_stats_invalid': False, 'omap_stats_invalid': False, 'hitset_stats_invalid': False, 'hitset_bytes_stats_invalid': False, 'pin_stats_invalid': False, 'manifest_stats_invalid': False, 'snaptrimq_len': 0, 'last_scrub_duration': 1, 'scrub_schedule': 'periodic scrub scheduled @ 2024-04-20T18:32:27.754898+0000', 'scrub_duration': 3, 'objects_trimmed': 0, 'snaptrim_duration': 0, 'stat_sum': {'num_bytes': 0, 'num_objects': 0, 'num_object_clones': 0, 'num_object_copies': 0, 'num_objects_missing_on_primary': 0, 'num_objects_missing': 0, 'num_objects_degraded': 0, 'num_objects_misplaced': 0, 'num_objects_unfound': 0, 'num_objects_dirty': 0, 'num_whiteouts': 0, 'num_read': 0, 'num_read_kb': 0, 'num_write': 0, 'num_write_kb': 0, 'num_scrub_errors': 0, 'num_shallow_scrub_errors': 0, 'num_deep_scrub_errors': 0, 'num_objects_recovered': 0, 'num_bytes_recovered': 0, 'num_keys_recovered': 0, 'num_objects_omap': 0, 'num_objects_hit_set_archive': 0, 'num_bytes_hit_set_archive': 0, 'num_flush': 0, 'num_flush_kb': 0, 'num_evict': 0, 'num_evict_kb': 0, 'num_promote': 0, 'num_flush_mode_high': 0, 'num_flush_mode_low': 0, 'num_evict_mode_some': 0, 'num_evict_mode_full': 0, 'num_objects_pinned': 0, 'num_legacy_snapsets': 0, 'num_large_omap_objects': 0, 'num_objects_manifest': 0, 'num_omap_bytes': 0, 'num_omap_keys': 0, 'num_objects_repaired': 0}, 'up': [8], 'acting': [8, 9], 'avail_no_missing': ['8', '9'], 'object_location_counts': [], 'blocked_by': [], 'up_primary': 8, 'acting_primary': 8, 'purged_snaps': []}
2024-04-20T18:32:31.960 INFO:tasks.thrashosds.thrasher:Traceback (most recent call last):
  File "/home/teuthworker/src/github.com_ceph_ceph-c_36c371567dddacf6207ea36f2535396ab31415fc/qa/tasks/ceph_manager.py", line 190, in wrapper
    return func(self)
  File "/home/teuthworker/src/github.com_ceph_ceph-c_36c371567dddacf6207ea36f2535396ab31415fc/qa/tasks/ceph_manager.py", line 1453, in _do_thrash
    self.ceph_manager.wait_for_recovery(
  File "/home/teuthworker/src/github.com_ceph_ceph-c_36c371567dddacf6207ea36f2535396ab31415fc/qa/tasks/ceph_manager.py", line 2918, in wait_for_recovery
    assert now - start < timeout, \
AssertionError: wait_for_recovery: failed before timeout expired

Actions #8

Updated by Aishwarya Mathuria 3 days ago ยท Edited

/a/yuriw-2024-04-30_14:17:59-rados-wip-yuri5-testing-2024-04-17-1400-distro-default-smithi/7680980/
/a/yuriw-2024-04-30_14:17:59-rados-wip-yuri5-testing-2024-04-17-1400-distro-default-smithi/7680983/

Actions

Also available in: Atom PDF