



Bug #65517


rados/thrash-erasure-code-crush-4-nodes: ceph task fails at getting monitors

Added by Laura Flores 19 days ago. Updated 3 days ago.

Fix Under Review
Target version:
% Done:


3 - minor
Affected Versions:
Pull request ID:
Crash signature (v1):
Crash signature (v2):



2024-04-12T01:31:02.231 INFO:tasks.ceph:config {'conf': {'osd': {'debug monc': 20, 'bluestore block size': 96636764160, 'bluestore compression algorithm': 'snappy', 'bluestore compression mode': 'aggressive', 'bluestore fsck on mount': True, 'bluestore zero block detection': True, 'debug bluefs': 20, 'debug bluestore': 20, 'debug ms': 1, 'debug osd': 20, 'debug rocksdb': 10, 'mon osd backfillfull_ratio': 0.85, 'mon osd full ratio': 0.9, 'mon osd nearfull ratio': 0.8, 'osd beacon report interval': 30, 'osd blocked scrub grace period': 3600, 'osd debug verify cached snaps': True, 'osd debug verify missing on start': True, 'osd failsafe full ratio': 0.95, 'osd heartbeat use min delay socket': True, 'osd map cache size': 1, 'osd max backfills': 6, 'osd max markdown count': 1000, 'osd mclock override recovery settings': True, 'osd mclock profile': 'high_recovery_ops', 'osd objectstore': 'bluestore', 'osd op queue': 'debug_random', 'osd op queue cut off': 'debug_random', 'osd scrub during recovery': False, 'osd scrub max interval': 120, 'osd scrub min interval': 60}, 'global': {'mon client directed command retry': 5, 'mon election default strategy': 1, 'ms inject socket failures': 5000, 'osd_async_recovery_min_cost': 1}, 'mgr': {'debug mgr': 20, 'debug ms': 1}, 'mon': {'debug mon': 20, 'debug ms': 1, 'debug paxos': 20, 'mon min osdmap epochs': 50, 'mon osdmap full prune interval': 2, 'mon osdmap full prune min': 15, 'mon osdmap full prune txsize': 2, 'mon scrub interval': 300, 'paxos service trim min': 10}}, 'fs': 'xfs', 'mkfs_options': None, 'mount_options': None, 'skip_mgr_daemons': False, 'log_ignorelist': ['\\(MDS_ALL_DOWN\\)', '\\(MDS_UP_LESS_THAN_MAX\\)', '\\(OSD_SLOW_PING_TIME', 'but it is still running', 'objects unfound and apparently lost', 'osd_map_cache_size', 'overall HEALTH_', '\\(OSDMAP_FLAGS\\)', '\\(OSD_', '\\(PG_', '\\(POOL_', '\\(CACHE_POOL_', '\\(SMALLER_PGP_NUM\\)', '\\(OBJECT_', '\\(SLOW_OPS\\)', '\\(REQUEST_SLOW\\)', '\\(TOO_FEW_PGS\\)', 'slow request', 'timeout on replica', 'late reservation from', 'MON_DOWN', 'OSDMAP_FLAGS', 'OSD_DOWN', 'PG_DEGRADED', 'PG_AVAILABILITY', 'POOL_APP_NOT_ENABLED', 'mons down', 'mon down', 'out of quorum'], 'cpu_profile': set(), 'cluster': 'ceph', 'mon_bind_msgr2': True, 'mon_bind_addrvec': True}
2024-04-12T01:31:02.231 INFO:tasks.ceph:ctx.config {'arch': 'x86_64', 'archive_path': '/home/teuthworker/archive/yuriw-2024-04-11_17:03:54-rados-wip-yuri6-testing-2024-04-02-1310-distro-default-smithi/7652508', 'branch': 'wip-yuri6-testing-2024-04-02-1310', 'description': 'rados/thrash-erasure-code-crush-4-nodes/{arch/x86_64 ceph mon_election/classic msgr-failures/few objectstore/bluestore-comp-snappy rados recovery-overrides/{more-async-recovery} supported-random-distro$/{ubuntu_latest} thrashers/mapgap thrashosds-health workloads/ec-rados-plugin=jerasure-k=8-m=6-crush}', 'email': None, 'first_in_suite': False, 'job_id': '7652508', 'kernel': {'kdb': 1, 'sha1': 'distro'}, 'ktype': 'distro', 'last_in_suite': False, 'machine_type': 'smithi', 'name': 'yuriw-2024-04-11_17:03:54-rados-wip-yuri6-testing-2024-04-02-1310-distro-default-smithi', 'no_nested_subset': False, 'nuke-on-error': True, 'os_type': 'ubuntu', 'os_version': '22.04', 'overrides': {'admin_socket': {'branch': 'wip-yuri6-testing-2024-04-02-1310'}, 'ceph': {'conf': {'global': {'mon client directed command retry': 5, 'mon election default strategy': 1, 'ms inject socket failures': 5000, 'osd_async_recovery_min_cost': 1}, 'mgr': {'debug mgr': 20, 'debug ms': 1}, 'mon': {'debug mon': 20, 'debug ms': 1, 'debug paxos': 20, 'mon min osdmap epochs': 50, 'mon osdmap full prune interval': 2, 'mon osdmap full prune min': 15, 'mon osdmap full prune txsize': 2, 'mon scrub interval': 300, 'paxos service trim min': 10}, 'osd': {'bluestore block size': 96636764160, 'bluestore compression algorithm': 'snappy', 'bluestore compression mode': 'aggressive', 'bluestore fsck on mount': True, 'bluestore zero block detection': True, 'debug bluefs': 20, 'debug bluestore': 20, 'debug ms': 1, 'debug osd': 20, 'debug rocksdb': 10, 'mon osd backfillfull_ratio': 0.85, 'mon osd full ratio': 0.9, 'mon osd nearfull ratio': 0.8, 'osd beacon report interval': 30, 'osd blocked scrub grace period': 3600, 'osd debug verify cached snaps': True, 'osd debug verify missing on start': True, 'osd failsafe full ratio': 0.95, 'osd heartbeat use min delay socket': True, 'osd map cache size': 1, 'osd max backfills': 6, 'osd max markdown count': 1000, 'osd mclock override recovery settings': True, 'osd mclock profile': 'high_recovery_ops', 'osd objectstore': 'bluestore', 'osd op queue': 'debug_random', 'osd op queue cut off': 'debug_random', 'osd scrub during recovery': False, 'osd scrub max interval': 120, 'osd scrub min interval': 60}}, 'flavor': 'default', 'fs': 'xfs', 'log-ignorelist': ['\\(MDS_ALL_DOWN\\)', '\\(MDS_UP_LESS_THAN_MAX\\)', '\\(OSD_SLOW_PING_TIME', 'but it is still running', 'objects unfound and apparently lost', 'osd_map_cache_size', 'overall HEALTH_', '\\(OSDMAP_FLAGS\\)', '\\(OSD_', '\\(PG_', '\\(POOL_', '\\(CACHE_POOL_', '\\(SMALLER_PGP_NUM\\)', '\\(OBJECT_', '\\(SLOW_OPS\\)', '\\(REQUEST_SLOW\\)', '\\(TOO_FEW_PGS\\)', 'slow request', 'timeout on replica', 'late reservation from', 'MON_DOWN', 'OSDMAP_FLAGS', 'OSD_DOWN', 'PG_DEGRADED', 'PG_AVAILABILITY', 'POOL_APP_NOT_ENABLED', 'mons down', 'mon down', 'out of quorum'], 'sha1': 'a5074d4516d566e9d8b6aec912f26afd099de101'}, 'ceph-deploy': {'conf': {'client': {'log file': '/var/log/ceph/ceph-$name.$pid.log'}, 'mon': {}}}, 'install': {'ceph': {'flavor': 'default', 'sha1': 'a5074d4516d566e9d8b6aec912f26afd099de101'}}, 'roles': [['mon.a', 'mgr.y', 'osd.0', 'osd.4', 'osd.8', 'osd.12'], ['mon.b', 'osd.1', 'osd.5', 'osd.9', 'osd.13'], ['mon.c', 'osd.2', 'osd.6', 'osd.10', 'osd.14'], ['mgr.x', 'osd.3', 'osd.7', 'osd.11', 'osd.15', 'client.0']], 'thrashosds': {'bdev_inject_crash': 2, 'bdev_inject_crash_probability': 0.5}, 'workunit': {'branch': 'wip-yuri6-testing-2024-04-02-1310', 'sha1': 'a5074d4516d566e9d8b6aec912f26afd099de101'}}, 'owner': 'scheduled_yuriw@teuthology', 'priority': 99, 'repo': '', 'seed': 1434, 'sha1': 'a5074d4516d566e9d8b6aec912f26afd099de101', 'sleep_before_teardown': 0, 'subset': '111/120000', 'suite': 'rados', 'suite_branch': 'wip-yuri6-testing-2024-04-02-1310', 'suite_path': '/home/teuthworker/src/github.com_ceph_ceph-c_a5074d4516d566e9d8b6aec912f26afd099de101/qa', 'suite_relpath': 'qa', 'suite_repo': '', 'suite_sha1': 'a5074d4516d566e9d8b6aec912f26afd099de101', 'tasks': [{'internal.check_packages': None}, {'internal.buildpackages_prep': None}, {'internal.save_config': None}, {'internal.add_remotes': None}, {'kernel': {'kdb': 1, 'sha1': 'distro'}}, {'internal.archive_upload': None}, {'internal.timer': None}, {'ansible.cephlab': None}, {'clock': None}, {'install': None}, {'ceph': {'conf': {'osd': {'debug monc': 20, 'bluestore block size': 96636764160, 'bluestore compression algorithm': 'snappy', 'bluestore compression mode': 'aggressive', 'bluestore fsck on mount': True, 'bluestore zero block detection': True, 'debug bluefs': 20, 'debug bluestore': 20, 'debug ms': 1, 'debug osd': 20, 'debug rocksdb': 10, 'mon osd backfillfull_ratio': 0.85, 'mon osd full ratio': 0.9, 'mon osd nearfull ratio': 0.8, 'osd beacon report interval': 30, 'osd blocked scrub grace period': 3600, 'osd debug verify cached snaps': True, 'osd debug verify missing on start': True, 'osd failsafe full ratio': 0.95, 'osd heartbeat use min delay socket': True, 'osd map cache size': 1, 'osd max backfills': 6, 'osd max markdown count': 1000, 'osd mclock override recovery settings': True, 'osd mclock profile': 'high_recovery_ops', 'osd objectstore': 'bluestore', 'osd op queue': 'debug_random', 'osd op queue cut off': 'debug_random', 'osd scrub during recovery': False, 'osd scrub max interval': 120, 'osd scrub min interval': 60}, 'global': {'mon client directed command retry': 5, 'mon election default strategy': 1, 'ms inject socket failures': 5000, 'osd_async_recovery_min_cost': 1}, 'mgr': {'debug mgr': 20, 'debug ms': 1}, 'mon': {'debug mon': 20, 'debug ms': 1, 'debug paxos': 20, 'mon min osdmap epochs': 50, 'mon osdmap full prune interval': 2, 'mon osdmap full prune min': 15, 'mon osdmap full prune txsize': 2, 'mon scrub interval': 300, 'paxos service trim min': 10}}, 'flavor': 'default', 'fs': 'xfs', 'log-ignorelist': ['\\(MDS_ALL_DOWN\\)', '\\(MDS_UP_LESS_THAN_MAX\\)', '\\(OSD_SLOW_PING_TIME', 'but it is still running', 'objects unfound and apparently lost', 'osd_map_cache_size', 'overall HEALTH_', '\\(OSDMAP_FLAGS\\)', '\\(OSD_', '\\(PG_', '\\(POOL_', '\\(CACHE_POOL_', '\\(SMALLER_PGP_NUM\\)', '\\(OBJECT_', '\\(SLOW_OPS\\)', '\\(REQUEST_SLOW\\)', '\\(TOO_FEW_PGS\\)', 'slow request', 'timeout on replica', 'late reservation from', 'MON_DOWN', 'OSDMAP_FLAGS', 'OSD_DOWN', 'PG_DEGRADED', 'PG_AVAILABILITY', 'POOL_APP_NOT_ENABLED', 'mons down', 'mon down', 'out of quorum'], 'sha1': 'a5074d4516d566e9d8b6aec912f26afd099de101', 'cluster': 'ceph'}}, {'thrashosds': {'chance_pgnum_grow': 0.25, 'chance_pgnum_shrink': 0.25, 'chance_pgpnum_fix': 0.25, 'chance_test_map_discontinuity': 2, 'map_discontinuity_sleep_time': 200, 'min_in': 2, 'timeout': 1800}}, {'rados': {'clients': ['client.0'], 'ec_pool': True, 'erasure_code_crush': {'id': 86, 'max_size': 6, 'min_size': 3, 'name': 'jerasure86crush', 'steps': ['set_chooseleaf_tries 5', 'set_choose_tries 100', 'take default class hdd', 'choose indep 4 type host', 'choose indep 4 type osd', 'emit'], 'type': 'erasure'}, 'erasure_code_profile': {'crush-failure-domain': 'osd', 'k': 8, 'm': 6, 'name': 'jerasure86profile', 'plugin': 'jerasure', 'technique': 'reed_sol_van'}, 'max_in_flight': 64, 'max_seconds': 600, 'objects': 1024, 'op_weights': {'append': 100, 'copy_from': 50, 'delete': 50, 'read': 100, 'rmattr': 25, 'rollback': 50, 'setattr': 25, 'snap_create': 50, 'snap_remove': 50, 'write': 0}, 'ops': 400000, 'size': 16384, 'write_append_excl': False}}], 'teuthology': {'fragments_dropped': [], 'meta': {}, 'postmerge': []}, 'teuthology_branch': 'main', 'teuthology_sha1': '6c637841c215537a4502385240412f1966e0faab', 'timestamp': '2024-04-11_17:03:54', 'tube': 'smithi', 'user': 'yuriw', 'verbose': True, 'worker_log': '/home/teuthworker/archive/worker_logs/dispatcher.smithi.2226885'}
2024-04-12T01:31:02.232 INFO:tasks.ceph:remote_to_roles_to_devs: {}
2024-04-12T01:31:02.232 INFO:tasks.ceph:Generating config...
2024-04-12T01:31:02.232 ERROR:teuthology.contextutil:Saw exception from nested tasks
Traceback (most recent call last):
  File "/home/teuthworker/src/git.ceph.com_teuthology_6c637841c215537a4502385240412f1966e0faab/teuthology/", line 30, in nested
  File "/usr/lib/python3.8/", line 113, in __enter__
    return next(self.gen)
  File "/home/teuthworker/src/github.com_ceph_ceph-c_a5074d4516d566e9d8b6aec912f26afd099de101/qa/tasks/", line 693, in cluster
    mons = get_mons(
  File "/home/teuthworker/src/github.com_ceph_ceph-c_a5074d4516d566e9d8b6aec912f26afd099de101/qa/tasks/", line 510, in get_mons
    assert mons

Actions #1

Updated by Laura Flores 19 days ago

Looks like the change was made in, which did initially pass QA testing, but more commits were pushed and it got merged early.

Actions #2

Updated by Laura Flores 19 days ago

  • Assignee set to Nitzan Mordechai

Hey @Nitzan Mordechai can you have a look?

Actions #3

Updated by Laura Flores 19 days ago


Actions #4

Updated by Aishwarya Mathuria 19 days ago


Actions #5

Updated by Radoslaw Zarzynski 13 days ago

Bump up./

Actions #6

Updated by Nitzan Mordechai 6 days ago

  • Pull request ID set to 56983
Actions #7

Updated by Nitzan Mordechai 6 days ago

  • Status changed from New to Fix Under Review
Actions #8

Updated by Aishwarya Mathuria 3 days ago



Also available in: Atom PDF