Actions
Bug #53677
closedqa/tasks/backfill_toofull.py: AssertionError: 2.0 not in backfilling
% Done:
0%
Source:
Tags:
Backport:
pacific,octopus
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
2021-12-19T02:15:16.582 INFO:tasks.backfill_toofull:pg={'pgid': '2.0', 'version': "23'14551", 'reported_seq': 14906, 'reported_epoch': 33, 'state': 'active+undersized+degraded+remapped+backfill_toofull', 'last_fresh': '2021-12-19T02:14:54.380853+0000', 'last_change': '2021-12-19T02:14:54.380853+0000', 'last_active': '2021-12-19T02:14:54.380853+0000', 'last_peered': '2021-12-19T02:14:54.380853+0000', 'last_clean': '2021-12-19T01:34:28.459149+0000', 'last_became_active': '2021-12-19T01:35:54.242082+0000', 'last_became_peered': '2021-12-19T01:35:54.242082+0000', 'last_unstale': '2021-12-19T02:14:54.380853+0000', 'last_undegraded': '2021-12-19T01:35:54.225980+0000', 'last_fullsized': '2021-12-19T01:35:54.221427+0000', 'mapping_epoch': 28, 'log_start': "23'14469", 'ondisk_log_start': "23'14469", 'created': 15, 'last_epoch_clean': 16, 'parent': '0.0', 'parent_split_bits': 0, 'last_scrub': "0'0", 'last_scrub_stamp': '2021-12-19T01:33:15.714744+0000', 'last_deep_scrub': "0'0", 'last_deep_scrub_stamp': '2021-12-19T01:33:15.714744+0000', 'last_clean_scrub_stamp': '2021-12-19T01:33:15.714744+0000', 'log_size': 82, 'ondisk_log_size': 82, 'stats_invalid': False, 'dirty_stats_invalid': False, 'omap_stats_invalid': False, 'hitset_stats_invalid': False, 'hitset_bytes_stats_invalid': False, 'pin_stats_invalid': False, 'manifest_stats_invalid': False, 'snaptrimq_len': 0, 'last_scrub_duration': 0, 'scrub_schedule': 'periodic scrub scheduled @ 2021-12-20T03:48:01.242929+0000', 'stat_sum': {'num_bytes': 25794969624, 'num_objects': 6151, 'num_object_clones': 0, 'num_object_copies': 18453, 'num_objects_missing_on_primary': 0, 'num_objects_missing': 0, 'num_objects_degraded': 6151, 'num_objects_misplaced': 0, 'num_objects_unfound': 0, 'num_objects_dirty': 6151, 'num_whiteouts': 0, 'num_read': 1, 'num_read_kb': 1, 'num_write': 14551, 'num_write_kb': 42389506, 'num_scrub_errors': 0, 'num_shallow_scrub_errors': 0, 'num_deep_scrub_errors': 0, 'num_objects_recovered': 0, 'num_bytes_recovered': 0, 'num_keys_recovered': 0, 'num_objects_omap': 0, 'num_objects_hit_set_archive': 0, 'num_bytes_hit_set_archive': 0, 'num_flush': 0, 'num_flush_kb': 0, 'num_evict': 0, 'num_evict_kb': 0, 'num_promote': 0, 'num_flush_mode_high': 0, 'num_flush_mode_low': 0, 'num_evict_mode_some': 0, 'num_evict_mode_full': 0, 'num_objects_pinned': 0, 'num_legacy_snapsets': 0, 'num_large_omap_objects': 0, 'num_objects_manifest': 0, 'num_omap_bytes': 0, 'num_omap_keys': 0, 'num_objects_repaired': 0}, 'up': [3, 1, 2], 'acting': [3, 2147483647, 2], 'avail_no_missing': ['3(0)', '2(2)'], 'object_location_counts': [{'shards': '2(2),3(0)', 'objects': 6151}], 'blocked_by': [], 'up_primary': 3, 'acting_primary': 3, 'purged_snaps': []} 2021-12-19T02:15:16.582 DEBUG:tasks.backfill_toofull:not backfilling 2021-12-19T02:15:16.583 ERROR:teuthology.run_tasks:Saw exception from tasks. Traceback (most recent call last): File "/home/teuthworker/src/git.ceph.com_git_teuthology_95a7d4799b562f3bbb5ec66107094963abd62fa1/teuthology/run_tasks.py", line 91, in run_tasks manager = run_one_task(taskname, ctx=ctx, config=config) File "/home/teuthworker/src/git.ceph.com_git_teuthology_95a7d4799b562f3bbb5ec66107094963abd62fa1/teuthology/run_tasks.py", line 70, in run_one_task return task(**kwargs) File "/home/teuthworker/src/github.com_ceph_ceph-c_9f128e94b494f6aacaa2c4dc4adc35fe6fbd0e12/qa/tasks/backfill_toofull.py", line 169, in task wait_for_pg_state(manager, pgid, 'backfilling', target) File "/home/teuthworker/src/github.com_ceph_ceph-c_9f128e94b494f6aacaa2c4dc4adc35fe6fbd0e12/qa/tasks/backfill_toofull.py", line 30, in wait_for_pg_state assert False, '%s not in %s' % (pgid, state) AssertionError: 2.0 not in backfilling
/a/yuriw-2021-12-18_18:14:24-rados-wip-yuriw-master-12.18.21-distro-default-smithi/6570206
Mykola: could you please take a look?
Updated by Mykola Golub over 2 years ago
- Status changed from New to In Progress
- Backport set to pacific,octopus
This looks like a false test failure due to the test setting backfillfull ratio too low.
We see that the test calculated the needed backfillfull ratio as 0.14347279565777277, but when passing to the ceph it truncated it to 0.143:
2021-12-19T01:36:06.245 DEBUG:tasks.backfill_toofull:update nearfull ratio to 0.1340676106928349 and backfillfull ratio to 0.14347279565777277 2021-12-19T01:36:06.245 DEBUG:teuthology.orchestra.run.smithi138:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph osd set-nearfull-ratio 0.134 2021-12-19T01:36:07.348 DEBUG:teuthology.orchestra.run.smithi138:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph osd set-backfillfull-ratio 0.143
And the osd when checking reservation calculated tentative usage to be 0.143018, i.e. higher than the backfillfull ratio 0.143 and rejected reservation:
2021-12-19T01:37:54.246+0000 7f68a7b4b700 10 osd.1 pg_epoch: 31 pg[2.0s1( v 23'14551 (23'14546,23'14551] lb MIN local-lis/les=28/29 n=0 ec=15/15 lis/c=28/15 les/c/f=29/16/0 sis=28) [3,1,2]/[3,NONE,2]p3(0) r=-1 lpr=28 pi=[15,28)/1 luod=0'0 crt=23'14551 lcod 0'0 mlcod 0'0 active+remapped mbc={}] do_peering_event: epoch_sent: 31 epoch_requested: 31 RequestBackfillPrio: priority 151 primary bytes 25794969624 local bytes 17611882520 2021-12-19T01:37:54.246+0000 7f68a7b4b700 10 osd.1 pg_epoch: 31 pg[2.0s1( v 23'14551 (23'14546,23'14551] lb MIN local-lis/les=28/29 n=0 ec=15/15 lis/c=28/15 les/c/f=29/16/0 sis=28) [3,1,2]/[3,NONE,2]p3(0) r=-1 lpr=28 pi=[15,28)/1 luod=0'0 crt=23'14551 lcod 0'0 mlcod 0'0 active+remapped mbc={}] try_reserve_recovery_space primary_bytes 12595200KiB local 8599552KiB pending_adjustments 3995648KiB 2021-12-19T01:37:54.246+0000 7f68a7b4b700 20 osd.1 pg_epoch: 31 pg[2.0s1( v 23'14551 (23'14546,23'14551] lb MIN local-lis/les=28/29 n=0 ec=15/15 lis/c=28/15 les/c/f=29/16/0 sis=28) [3,1,2]/[3,NONE,2]p3(0) r=-1 lpr=28 pi=[15,28)/1 luod=0'0 crt=23'14551 lcod 0'0 mlcod 0'0 active+remapped mbc={}] _tentative_full type backfillfull adjust_used 3995648KiB 2021-12-19T01:37:54.246+0000 7f68a7b4b700 20 osd.1 31 compute_adjusted_ratio Before kb_used() 9405084 2021-12-19T01:37:54.246+0000 7f68a7b4b700 20 osd.1 31 compute_adjusted_ratio After kb_used() 13400732 2021-12-19T01:37:54.246+0000 7f68a7b4b700 10 osd.1 pg_epoch: 31 pg[2.0s1( v 23'14551 (23'14546,23'14551] lb MIN local-lis/les=28/29 n=0 ec=15/15 lis/c=28/15 les/c/f=29/16/0 sis=28) [3,1,2]/[3,NONE,2]p3(0) r=-1 lpr=28 pi=[15,28)/1 luod=0'0 crt=23'14551 lcod 0'0 mlcod 0'0 active+remapped mbc={}] _tentative_full tentative usage is 0.143018 2021-12-19T01:37:54.246+0000 7f68a7b4b700 10 osd.1 pg_epoch: 31 pg[2.0s1( v 23'14551 (23'14546,23'14551] lb MIN local-lis/les=28/29 n=0 ec=15/15 lis/c=28/15 les/c/f=29/16/0 sis=28) [3,1,2]/[3,NONE,2]p3(0) r=-1 lpr=28 pi=[15,28)/1 luod=0'0 crt=23'14551 lcod 0'0 mlcod 0'0 active+remapped mbc={}] backfill reservation rejected: backfill full
Note if the backfillfull ratio had been set to the non-truncated value 0.14347279565777277, it would be greater than the tentative usage 0.143018 and the test would pass.
I am going to improve the test.
Updated by Mykola Golub over 2 years ago
- Status changed from In Progress to Fix Under Review
- Pull request ID set to 44385
Updated by Neha Ojha over 2 years ago
- Status changed from Fix Under Review to Pending Backport
Updated by Backport Bot over 2 years ago
- Copied to Backport #53701: octopus: qa/tasks/backfill_toofull.py: AssertionError: 2.0 not in backfilling added
Updated by Backport Bot over 2 years ago
- Copied to Backport #53702: pacific: qa/tasks/backfill_toofull.py: AssertionError: 2.0 not in backfilling added
Updated by Neha Ojha almost 2 years ago
- Status changed from Pending Backport to Resolved
Actions