Project

General

Profile

Actions

Bug #65517

open

rados/thrash-erasure-code-crush-4-nodes: ceph task fails at getting monitors

Added by Laura Flores 14 days ago. Updated about 3 hours ago.

Status:
Fix Under Review
Priority:
Normal
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

/a/yuriw-2024-04-11_17:03:54-rados-wip-yuri6-testing-2024-04-02-1310-distro-default-smithi/7652508

2024-04-12T01:31:02.231 INFO:tasks.ceph:config {'conf': {'osd': {'debug monc': 20, 'bluestore block size': 96636764160, 'bluestore compression algorithm': 'snappy', 'bluestore compression mode': 'aggressive', 'bluestore fsck on mount': True, 'bluestore zero block detection': True, 'debug bluefs': 20, 'debug bluestore': 20, 'debug ms': 1, 'debug osd': 20, 'debug rocksdb': 10, 'mon osd backfillfull_ratio': 0.85, 'mon osd full ratio': 0.9, 'mon osd nearfull ratio': 0.8, 'osd beacon report interval': 30, 'osd blocked scrub grace period': 3600, 'osd debug verify cached snaps': True, 'osd debug verify missing on start': True, 'osd failsafe full ratio': 0.95, 'osd heartbeat use min delay socket': True, 'osd map cache size': 1, 'osd max backfills': 6, 'osd max markdown count': 1000, 'osd mclock override recovery settings': True, 'osd mclock profile': 'high_recovery_ops', 'osd objectstore': 'bluestore', 'osd op queue': 'debug_random', 'osd op queue cut off': 'debug_random', 'osd scrub during recovery': False, 'osd scrub max interval': 120, 'osd scrub min interval': 60}, 'global': {'mon client directed command retry': 5, 'mon election default strategy': 1, 'ms inject socket failures': 5000, 'osd_async_recovery_min_cost': 1}, 'mgr': {'debug mgr': 20, 'debug ms': 1}, 'mon': {'debug mon': 20, 'debug ms': 1, 'debug paxos': 20, 'mon min osdmap epochs': 50, 'mon osdmap full prune interval': 2, 'mon osdmap full prune min': 15, 'mon osdmap full prune txsize': 2, 'mon scrub interval': 300, 'paxos service trim min': 10}}, 'fs': 'xfs', 'mkfs_options': None, 'mount_options': None, 'skip_mgr_daemons': False, 'log_ignorelist': ['\\(MDS_ALL_DOWN\\)', '\\(MDS_UP_LESS_THAN_MAX\\)', '\\(OSD_SLOW_PING_TIME', 'but it is still running', 'objects unfound and apparently lost', 'osd_map_cache_size', 'overall HEALTH_', '\\(OSDMAP_FLAGS\\)', '\\(OSD_', '\\(PG_', '\\(POOL_', '\\(CACHE_POOL_', '\\(SMALLER_PGP_NUM\\)', '\\(OBJECT_', '\\(SLOW_OPS\\)', '\\(REQUEST_SLOW\\)', '\\(TOO_FEW_PGS\\)', 'slow request', 'timeout on replica', 'late reservation from', 'MON_DOWN', 'OSDMAP_FLAGS', 'OSD_DOWN', 'PG_DEGRADED', 'PG_AVAILABILITY', 'POOL_APP_NOT_ENABLED', 'mons down', 'mon down', 'out of quorum'], 'cpu_profile': set(), 'cluster': 'ceph', 'mon_bind_msgr2': True, 'mon_bind_addrvec': True}
2024-04-12T01:31:02.231 INFO:tasks.ceph:ctx.config {'arch': 'x86_64', 'archive_path': '/home/teuthworker/archive/yuriw-2024-04-11_17:03:54-rados-wip-yuri6-testing-2024-04-02-1310-distro-default-smithi/7652508', 'branch': 'wip-yuri6-testing-2024-04-02-1310', 'description': 'rados/thrash-erasure-code-crush-4-nodes/{arch/x86_64 ceph mon_election/classic msgr-failures/few objectstore/bluestore-comp-snappy rados recovery-overrides/{more-async-recovery} supported-random-distro$/{ubuntu_latest} thrashers/mapgap thrashosds-health workloads/ec-rados-plugin=jerasure-k=8-m=6-crush}', 'email': None, 'first_in_suite': False, 'job_id': '7652508', 'kernel': {'kdb': 1, 'sha1': 'distro'}, 'ktype': 'distro', 'last_in_suite': False, 'machine_type': 'smithi', 'name': 'yuriw-2024-04-11_17:03:54-rados-wip-yuri6-testing-2024-04-02-1310-distro-default-smithi', 'no_nested_subset': False, 'nuke-on-error': True, 'os_type': 'ubuntu', 'os_version': '22.04', 'overrides': {'admin_socket': {'branch': 'wip-yuri6-testing-2024-04-02-1310'}, 'ceph': {'conf': {'global': {'mon client directed command retry': 5, 'mon election default strategy': 1, 'ms inject socket failures': 5000, 'osd_async_recovery_min_cost': 1}, 'mgr': {'debug mgr': 20, 'debug ms': 1}, 'mon': {'debug mon': 20, 'debug ms': 1, 'debug paxos': 20, 'mon min osdmap epochs': 50, 'mon osdmap full prune interval': 2, 'mon osdmap full prune min': 15, 'mon osdmap full prune txsize': 2, 'mon scrub interval': 300, 'paxos service trim min': 10}, 'osd': {'bluestore block size': 96636764160, 'bluestore compression algorithm': 'snappy', 'bluestore compression mode': 'aggressive', 'bluestore fsck on mount': True, 'bluestore zero block detection': True, 'debug bluefs': 20, 'debug bluestore': 20, 'debug ms': 1, 'debug osd': 20, 'debug rocksdb': 10, 'mon osd backfillfull_ratio': 0.85, 'mon osd full ratio': 0.9, 'mon osd nearfull ratio': 0.8, 'osd beacon report interval': 30, 'osd blocked scrub grace period': 3600, 'osd debug verify cached snaps': True, 'osd debug verify missing on start': True, 'osd failsafe full ratio': 0.95, 'osd heartbeat use min delay socket': True, 'osd map cache size': 1, 'osd max backfills': 6, 'osd max markdown count': 1000, 'osd mclock override recovery settings': True, 'osd mclock profile': 'high_recovery_ops', 'osd objectstore': 'bluestore', 'osd op queue': 'debug_random', 'osd op queue cut off': 'debug_random', 'osd scrub during recovery': False, 'osd scrub max interval': 120, 'osd scrub min interval': 60}}, 'flavor': 'default', 'fs': 'xfs', 'log-ignorelist': ['\\(MDS_ALL_DOWN\\)', '\\(MDS_UP_LESS_THAN_MAX\\)', '\\(OSD_SLOW_PING_TIME', 'but it is still running', 'objects unfound and apparently lost', 'osd_map_cache_size', 'overall HEALTH_', '\\(OSDMAP_FLAGS\\)', '\\(OSD_', '\\(PG_', '\\(POOL_', '\\(CACHE_POOL_', '\\(SMALLER_PGP_NUM\\)', '\\(OBJECT_', '\\(SLOW_OPS\\)', '\\(REQUEST_SLOW\\)', '\\(TOO_FEW_PGS\\)', 'slow request', 'timeout on replica', 'late reservation from', 'MON_DOWN', 'OSDMAP_FLAGS', 'OSD_DOWN', 'PG_DEGRADED', 'PG_AVAILABILITY', 'POOL_APP_NOT_ENABLED', 'mons down', 'mon down', 'out of quorum'], 'sha1': 'a5074d4516d566e9d8b6aec912f26afd099de101'}, 'ceph-deploy': {'conf': {'client': {'log file': '/var/log/ceph/ceph-$name.$pid.log'}, 'mon': {}}}, 'install': {'ceph': {'flavor': 'default', 'sha1': 'a5074d4516d566e9d8b6aec912f26afd099de101'}}, 'roles': [['mon.a', 'mgr.y', 'osd.0', 'osd.4', 'osd.8', 'osd.12'], ['mon.b', 'osd.1', 'osd.5', 'osd.9', 'osd.13'], ['mon.c', 'osd.2', 'osd.6', 'osd.10', 'osd.14'], ['mgr.x', 'osd.3', 'osd.7', 'osd.11', 'osd.15', 'client.0']], 'thrashosds': {'bdev_inject_crash': 2, 'bdev_inject_crash_probability': 0.5}, 'workunit': {'branch': 'wip-yuri6-testing-2024-04-02-1310', 'sha1': 'a5074d4516d566e9d8b6aec912f26afd099de101'}}, 'owner': 'scheduled_yuriw@teuthology', 'priority': 99, 'repo': 'https://github.com/ceph/ceph-ci.git', 'seed': 1434, 'sha1': 'a5074d4516d566e9d8b6aec912f26afd099de101', 'sleep_before_teardown': 0, 'subset': '111/120000', 'suite': 'rados', 'suite_branch': 'wip-yuri6-testing-2024-04-02-1310', 'suite_path': '/home/teuthworker/src/github.com_ceph_ceph-c_a5074d4516d566e9d8b6aec912f26afd099de101/qa', 'suite_relpath': 'qa', 'suite_repo': 'https://github.com/ceph/ceph-ci.git', 'suite_sha1': 'a5074d4516d566e9d8b6aec912f26afd099de101', 'tasks': [{'internal.check_packages': None}, {'internal.buildpackages_prep': None}, {'internal.save_config': None}, {'internal.add_remotes': None}, {'kernel': {'kdb': 1, 'sha1': 'distro'}}, {'internal.archive_upload': None}, {'internal.timer': None}, {'ansible.cephlab': None}, {'clock': None}, {'install': None}, {'ceph': {'conf': {'osd': {'debug monc': 20, 'bluestore block size': 96636764160, 'bluestore compression algorithm': 'snappy', 'bluestore compression mode': 'aggressive', 'bluestore fsck on mount': True, 'bluestore zero block detection': True, 'debug bluefs': 20, 'debug bluestore': 20, 'debug ms': 1, 'debug osd': 20, 'debug rocksdb': 10, 'mon osd backfillfull_ratio': 0.85, 'mon osd full ratio': 0.9, 'mon osd nearfull ratio': 0.8, 'osd beacon report interval': 30, 'osd blocked scrub grace period': 3600, 'osd debug verify cached snaps': True, 'osd debug verify missing on start': True, 'osd failsafe full ratio': 0.95, 'osd heartbeat use min delay socket': True, 'osd map cache size': 1, 'osd max backfills': 6, 'osd max markdown count': 1000, 'osd mclock override recovery settings': True, 'osd mclock profile': 'high_recovery_ops', 'osd objectstore': 'bluestore', 'osd op queue': 'debug_random', 'osd op queue cut off': 'debug_random', 'osd scrub during recovery': False, 'osd scrub max interval': 120, 'osd scrub min interval': 60}, 'global': {'mon client directed command retry': 5, 'mon election default strategy': 1, 'ms inject socket failures': 5000, 'osd_async_recovery_min_cost': 1}, 'mgr': {'debug mgr': 20, 'debug ms': 1}, 'mon': {'debug mon': 20, 'debug ms': 1, 'debug paxos': 20, 'mon min osdmap epochs': 50, 'mon osdmap full prune interval': 2, 'mon osdmap full prune min': 15, 'mon osdmap full prune txsize': 2, 'mon scrub interval': 300, 'paxos service trim min': 10}}, 'flavor': 'default', 'fs': 'xfs', 'log-ignorelist': ['\\(MDS_ALL_DOWN\\)', '\\(MDS_UP_LESS_THAN_MAX\\)', '\\(OSD_SLOW_PING_TIME', 'but it is still running', 'objects unfound and apparently lost', 'osd_map_cache_size', 'overall HEALTH_', '\\(OSDMAP_FLAGS\\)', '\\(OSD_', '\\(PG_', '\\(POOL_', '\\(CACHE_POOL_', '\\(SMALLER_PGP_NUM\\)', '\\(OBJECT_', '\\(SLOW_OPS\\)', '\\(REQUEST_SLOW\\)', '\\(TOO_FEW_PGS\\)', 'slow request', 'timeout on replica', 'late reservation from', 'MON_DOWN', 'OSDMAP_FLAGS', 'OSD_DOWN', 'PG_DEGRADED', 'PG_AVAILABILITY', 'POOL_APP_NOT_ENABLED', 'mons down', 'mon down', 'out of quorum'], 'sha1': 'a5074d4516d566e9d8b6aec912f26afd099de101', 'cluster': 'ceph'}}, {'thrashosds': {'chance_pgnum_grow': 0.25, 'chance_pgnum_shrink': 0.25, 'chance_pgpnum_fix': 0.25, 'chance_test_map_discontinuity': 2, 'map_discontinuity_sleep_time': 200, 'min_in': 2, 'timeout': 1800}}, {'rados': {'clients': ['client.0'], 'ec_pool': True, 'erasure_code_crush': {'id': 86, 'max_size': 6, 'min_size': 3, 'name': 'jerasure86crush', 'steps': ['set_chooseleaf_tries 5', 'set_choose_tries 100', 'take default class hdd', 'choose indep 4 type host', 'choose indep 4 type osd', 'emit'], 'type': 'erasure'}, 'erasure_code_profile': {'crush-failure-domain': 'osd', 'k': 8, 'm': 6, 'name': 'jerasure86profile', 'plugin': 'jerasure', 'technique': 'reed_sol_van'}, 'max_in_flight': 64, 'max_seconds': 600, 'objects': 1024, 'op_weights': {'append': 100, 'copy_from': 50, 'delete': 50, 'read': 100, 'rmattr': 25, 'rollback': 50, 'setattr': 25, 'snap_create': 50, 'snap_remove': 50, 'write': 0}, 'ops': 400000, 'size': 16384, 'write_append_excl': False}}], 'teuthology': {'fragments_dropped': [], 'meta': {}, 'postmerge': []}, 'teuthology_branch': 'main', 'teuthology_sha1': '6c637841c215537a4502385240412f1966e0faab', 'timestamp': '2024-04-11_17:03:54', 'tube': 'smithi', 'user': 'yuriw', 'verbose': True, 'worker_log': '/home/teuthworker/archive/worker_logs/dispatcher.smithi.2226885'}
2024-04-12T01:31:02.232 INFO:tasks.ceph:remote_to_roles_to_devs: {}
2024-04-12T01:31:02.232 INFO:tasks.ceph:Generating config...
2024-04-12T01:31:02.232 ERROR:teuthology.contextutil:Saw exception from nested tasks
Traceback (most recent call last):
  File "/home/teuthworker/src/git.ceph.com_teuthology_6c637841c215537a4502385240412f1966e0faab/teuthology/contextutil.py", line 30, in nested
    vars.append(enter())
  File "/usr/lib/python3.8/contextlib.py", line 113, in __enter__
    return next(self.gen)
  File "/home/teuthworker/src/github.com_ceph_ceph-c_a5074d4516d566e9d8b6aec912f26afd099de101/qa/tasks/ceph.py", line 693, in cluster
    mons = get_mons(
  File "/home/teuthworker/src/github.com_ceph_ceph-c_a5074d4516d566e9d8b6aec912f26afd099de101/qa/tasks/ceph.py", line 510, in get_mons
    assert mons
AssertionError

Actions #1

Updated by Laura Flores 14 days ago

Looks like the change was made in https://github.com/ceph/ceph/pull/53308, which did initially pass QA testing, but more commits were pushed and it got merged early.

Actions #2

Updated by Laura Flores 14 days ago

  • Assignee set to Nitzan Mordechai

Hey @Nitzan Mordechai can you have a look?

Actions #3

Updated by Laura Flores 14 days ago

/a/yuriw-2024-03-24_22:19:24-rados-wip-yuri10-testing-2024-03-24-1159-distro-default-smithi/7620629
/a/yuriw-2024-03-24_22:19:24-rados-wip-yuri10-testing-2024-03-24-1159-distro-default-smithi/7620562
/a/yuriw-2024-03-24_22:19:24-rados-wip-yuri10-testing-2024-03-24-1159-distro-default-smithi/7620493
/a/yuriw-2024-03-24_22:19:24-rados-wip-yuri10-testing-2024-03-24-1159-distro-default-smithi/7620766
/a/yuriw-2024-03-24_22:19:24-rados-wip-yuri10-testing-2024-03-24-1159-distro-default-smithi/7620696

Actions #4

Updated by Aishwarya Mathuria 13 days ago

/a/yuriw-2024-04-09_14:35:50-rados-wip-yuri5-testing-2024-03-21-0833-distro-default-smithi/7648565
/a/yuriw-2024-04-09_14:35:50-rados-wip-yuri5-testing-2024-03-21-0833-distro-default-smithi/7648768
/a/yuriw-2024-04-09_14:35:50-rados-wip-yuri5-testing-2024-03-21-0833-distro-default-smithi/7648838
/a/yuriw-2024-04-09_14:35:50-rados-wip-yuri5-testing-2024-03-21-0833-distro-default-smithi/7648634
/a/yuriw-2024-04-09_14:35:50-rados-wip-yuri5-testing-2024-03-21-0833-distro-default-smithi/7648701

Actions #5

Updated by Radoslaw Zarzynski 8 days ago

Bump up./

Actions #6

Updated by Nitzan Mordechai about 3 hours ago

  • Pull request ID set to 56983
Actions #7

Updated by Nitzan Mordechai about 3 hours ago

  • Status changed from New to Fix Under Review
Actions

Also available in: Atom PDF