Project

General

Profile

Actions

Bug #58289

open

"AssertionError: wait_for_recovery: failed before timeout expired" from down pg in pacific-p2p-pacific

Added by Laura Flores over 1 year ago. Updated 2 months ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
pacific
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

/a/yuriw-2022-12-13_15:58:24-upgrade:pacific-p2p-pacific_16.2.11_RC-distro-default-smithi/7114849

2022-12-13T20:07:22.900 INFO:tasks.ceph.ceph_manager.ceph:{'pgid': '3.1f', 'version': "61'142", 'reported_seq': 1477, 'reported_epoch': 856, 'state': 'active+undersized+degraded', 'last_fresh': '2022-12-13T20:07:13.130657+0000', 'last_change': '2022-12-13T19:46:55.610235+0000', 'last_active': '2022-12-13T20:07:13.130657+0000', 'last_peered': '2022-12-13T20:07:13.130657+0000', 'last_clean': '2022-12-13T19:36:58.876248+0000', 'last_became_active': '2022-12-13T19:46:47.528634+0000', 'last_became_peered': '2022-12-13T19:46:47.528634+0000', 'last_unstale': '2022-12-13T20:07:13.130657+0000', 'last_undegraded': '2022-12-13T19:46:47.528019+0000', 'last_fullsized': '2022-12-13T19:46:47.527939+0000', 'mapping_epoch': 444, 'log_start': "61'140", 'ondisk_log_start': "61'140", 'created': 31, 'last_epoch_clean': 419, 'parent': '0.0', 'parent_split_bits': 0, 'last_scrub': "0'0", 'last_scrub_stamp': '2022-12-13T19:33:21.647056+0000', 'last_deep_scrub': "0'0", 'last_deep_scrub_stamp': '2022-12-13T19:33:21.647056+0000', 'last_clean_scrub_stamp': '2022-12-13T19:33:21.647056+0000', 'log_size': 2, 'ondisk_log_size': 2, 'stats_invalid': False, 'dirty_stats_invalid': False, 'omap_stats_invalid': False, 'hitset_stats_invalid': False, 'hitset_bytes_stats_invalid': False, 'pin_stats_invalid': False, 'manifest_stats_invalid': False, 'snaptrimq_len': 0, 'stat_sum': {'num_bytes': 0, 'num_objects': 6, 'num_object_clones': 0, 'num_object_copies': 12, 'num_objects_missing_on_primary': 0, 'num_objects_missing': 0, 'num_objects_degraded': 6, 'num_objects_misplaced': 0, 'num_objects_unfound': 0, 'num_objects_dirty': 6, 'num_whiteouts': 6, 'num_read': 979, 'num_read_kb': 2060, 'num_write': 143, 'num_write_kb': 3728, 'num_scrub_errors': 0, 'num_shallow_scrub_errors': 0, 'num_deep_scrub_errors': 0, 'num_objects_recovered': 12, 'num_bytes_recovered': 0, 'num_keys_recovered': 0, 'num_objects_omap': 0, 'num_objects_hit_set_archive': 0, 'num_bytes_hit_set_archive': 0, 'num_flush': 0, 'num_flush_kb': 0, 'num_evict': 0, 'num_evict_kb': 0, 'num_promote': 0, 'num_flush_mode_high': 0, 'num_flush_mode_low': 0, 'num_evict_mode_some': 0, 'num_evict_mode_full': 0, 'num_objects_pinned': 0, 'num_legacy_snapsets': 0, 'num_large_omap_objects': 0, 'num_objects_manifest': 0, 'num_omap_bytes': 0, 'num_omap_keys': 0, 'num_objects_repaired': 0}, 'up': [5], 'acting': [5], 'avail_no_missing': ['5'], 'object_location_counts': [{'shards': '5', 'objects': 6}], 'blocked_by': [], 'up_primary': 5, 'acting_primary': 5, 'purged_snaps': []}
2022-12-13T20:07:22.902 INFO:tasks.thrashosds.thrasher:Traceback (most recent call last):
  File "/home/teuthworker/src/github.com_ceph_ceph-c_68a6e828debb314ae6c812948f6f24a4191862e3/qa/tasks/ceph_manager.py", line 188, in wrapper
    return func(self)
  File "/home/teuthworker/src/github.com_ceph_ceph-c_68a6e828debb314ae6c812948f6f24a4191862e3/qa/tasks/ceph_manager.py", line 1412, in _do_thrash
    self.ceph_manager.wait_for_recovery(
  File "/home/teuthworker/src/github.com_ceph_ceph-c_68a6e828debb314ae6c812948f6f24a4191862e3/qa/tasks/ceph_manager.py", line 2827, in wait_for_recovery
    assert now - start < timeout, \
AssertionError: wait_for_recovery: failed before timeout expired

Last pg map before assertion:

{
  "pgs_by_state": [
    {
      "state_name": "active+clean",
      "count": 46
    },
    {
      "state_name": "active+undersized+degraded",
      "count": 16
    },
    {
      "state_name": "active+undersized",
      "count": 4
    },
    {
      "state_name": "down",
      "count": 1
    }
  ],
  "num_pgs": 67,
  "num_pools": 4,
  "num_objects": 76608,
  "data_bytes": 5004656683,
  "bytes_used": 18130448384,
  "bytes_avail": 653506707456,
  "bytes_total": 671637155840,
  "inactive_pgs_ratio": 0.014925372786819935,
  "degraded_objects": 14474,
  "degraded_total": 153216,
  "degraded_ratio": 0.09446794068504595
}

This issue is likely related to other "wait_for_recovery" issues such as https://tracker.ceph.com/issues/57599, but see how the traceback in this issue is slightly different than the one in https://tracker.ceph.com/issues/57599. Also, we don't have a case documented yet for "down" pgs.


Related issues 1 (1 open0 closed)

Related to RADOS - Bug #57599: thrash-erasure-code: AssertionError: wait_for_recovery timeout due to "recovering+forced_recovery+undersized+remapped+peered" pgNewLaura Flores

Actions
Actions #1

Updated by Laura Flores over 1 year ago

  • Project changed from Ceph to RADOS
Actions #2

Updated by Laura Flores over 1 year ago

  • Related to Bug #57599: thrash-erasure-code: AssertionError: wait_for_recovery timeout due to "recovering+forced_recovery+undersized+remapped+peered" pg added
Actions #3

Updated by Laura Flores 12 months ago

/a/yuriw-2023-04-25_14:52:56-upgrade:pacific-p2p-pacific-release-distro-default-smithi/7252143

Actions #4

Updated by Matan Breizman 2 months ago

/a/yuriw-2024-02-09_00:15:46-rados-wip-yuri2-testing-2024-02-08-0727-distro-default-smithi/7553209

Actions

Also available in: Atom PDF