Project

General

Profile

Actions

Bug #53677

closed

qa/tasks/backfill_toofull.py: AssertionError: 2.0 not in backfilling

Added by Neha Ojha over 2 years ago. Updated almost 2 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
pacific,octopus
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

2021-12-19T02:15:16.582 INFO:tasks.backfill_toofull:pg={'pgid': '2.0', 'version': "23'14551", 'reported_seq': 14906, 'reported_epoch': 33, 'state': 'active+undersized+degraded+remapped+backfill_toofull', 'last_fresh': '2021-12-19T02:14:54.380853+0000', 'last_change': '2021-12-19T02:14:54.380853+0000', 'last_active': '2021-12-19T02:14:54.380853+0000', 'last_peered': '2021-12-19T02:14:54.380853+0000', 'last_clean': '2021-12-19T01:34:28.459149+0000', 'last_became_active': '2021-12-19T01:35:54.242082+0000', 'last_became_peered': '2021-12-19T01:35:54.242082+0000', 'last_unstale': '2021-12-19T02:14:54.380853+0000', 'last_undegraded': '2021-12-19T01:35:54.225980+0000', 'last_fullsized': '2021-12-19T01:35:54.221427+0000', 'mapping_epoch': 28, 'log_start': "23'14469", 'ondisk_log_start': "23'14469", 'created': 15, 'last_epoch_clean': 16, 'parent': '0.0', 'parent_split_bits': 0, 'last_scrub': "0'0", 'last_scrub_stamp': '2021-12-19T01:33:15.714744+0000', 'last_deep_scrub': "0'0", 'last_deep_scrub_stamp': '2021-12-19T01:33:15.714744+0000', 'last_clean_scrub_stamp': '2021-12-19T01:33:15.714744+0000', 'log_size': 82, 'ondisk_log_size': 82, 'stats_invalid': False, 'dirty_stats_invalid': False, 'omap_stats_invalid': False, 'hitset_stats_invalid': False, 'hitset_bytes_stats_invalid': False, 'pin_stats_invalid': False, 'manifest_stats_invalid': False, 'snaptrimq_len': 0, 'last_scrub_duration': 0, 'scrub_schedule': 'periodic scrub scheduled @ 2021-12-20T03:48:01.242929+0000', 'stat_sum': {'num_bytes': 25794969624, 'num_objects': 6151, 'num_object_clones': 0, 'num_object_copies': 18453, 'num_objects_missing_on_primary': 0, 'num_objects_missing': 0, 'num_objects_degraded': 6151, 'num_objects_misplaced': 0, 'num_objects_unfound': 0, 'num_objects_dirty': 6151, 'num_whiteouts': 0, 'num_read': 1, 'num_read_kb': 1, 'num_write': 14551, 'num_write_kb': 42389506, 'num_scrub_errors': 0, 'num_shallow_scrub_errors': 0, 'num_deep_scrub_errors': 0, 'num_objects_recovered': 0, 'num_bytes_recovered': 0, 'num_keys_recovered': 0, 'num_objects_omap': 0, 'num_objects_hit_set_archive': 0, 'num_bytes_hit_set_archive': 0, 'num_flush': 0, 'num_flush_kb': 0, 'num_evict': 0, 'num_evict_kb': 0, 'num_promote': 0, 'num_flush_mode_high': 0, 'num_flush_mode_low': 0, 'num_evict_mode_some': 0, 'num_evict_mode_full': 0, 'num_objects_pinned': 0, 'num_legacy_snapsets': 0, 'num_large_omap_objects': 0, 'num_objects_manifest': 0, 'num_omap_bytes': 0, 'num_omap_keys': 0, 'num_objects_repaired': 0}, 'up': [3, 1, 2], 'acting': [3, 2147483647, 2], 'avail_no_missing': ['3(0)', '2(2)'], 'object_location_counts': [{'shards': '2(2),3(0)', 'objects': 6151}], 'blocked_by': [], 'up_primary': 3, 'acting_primary': 3, 'purged_snaps': []}
2021-12-19T02:15:16.582 DEBUG:tasks.backfill_toofull:not backfilling
2021-12-19T02:15:16.583 ERROR:teuthology.run_tasks:Saw exception from tasks.
Traceback (most recent call last):
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_95a7d4799b562f3bbb5ec66107094963abd62fa1/teuthology/run_tasks.py", line 91, in run_tasks
    manager = run_one_task(taskname, ctx=ctx, config=config)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_95a7d4799b562f3bbb5ec66107094963abd62fa1/teuthology/run_tasks.py", line 70, in run_one_task
    return task(**kwargs)
  File "/home/teuthworker/src/github.com_ceph_ceph-c_9f128e94b494f6aacaa2c4dc4adc35fe6fbd0e12/qa/tasks/backfill_toofull.py", line 169, in task
    wait_for_pg_state(manager, pgid, 'backfilling', target)
  File "/home/teuthworker/src/github.com_ceph_ceph-c_9f128e94b494f6aacaa2c4dc4adc35fe6fbd0e12/qa/tasks/backfill_toofull.py", line 30, in wait_for_pg_state
    assert False, '%s not in %s' % (pgid, state)
AssertionError: 2.0 not in backfilling

/a/yuriw-2021-12-18_18:14:24-rados-wip-yuriw-master-12.18.21-distro-default-smithi/6570206

https://sentry.ceph.com/organizations/ceph/issues/15974/events/612b123bed3c427184a6a1e65e3799b6/events/?project=2

Mykola: could you please take a look?


Related issues 2 (0 open2 closed)

Copied to RADOS - Backport #53701: octopus: qa/tasks/backfill_toofull.py: AssertionError: 2.0 not in backfillingResolvedMykola GolubActions
Copied to RADOS - Backport #53702: pacific: qa/tasks/backfill_toofull.py: AssertionError: 2.0 not in backfillingResolvedMykola GolubActions
Actions #1

Updated by Mykola Golub over 2 years ago

  • Status changed from New to In Progress
  • Backport set to pacific,octopus

This looks like a false test failure due to the test setting backfillfull ratio too low.

We see that the test calculated the needed backfillfull ratio as 0.14347279565777277, but when passing to the ceph it truncated it to 0.143:

2021-12-19T01:36:06.245 DEBUG:tasks.backfill_toofull:update nearfull ratio to 0.1340676106928349 and backfillfull ratio to 0.14347279565777277
2021-12-19T01:36:06.245 DEBUG:teuthology.orchestra.run.smithi138:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph osd set-nearfull-ratio 0.134
2021-12-19T01:36:07.348 DEBUG:teuthology.orchestra.run.smithi138:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph osd set-backfillfull-ratio 0.143

And the osd when checking reservation calculated tentative usage to be 0.143018, i.e. higher than the backfillfull ratio 0.143 and rejected reservation:

2021-12-19T01:37:54.246+0000 7f68a7b4b700 10 osd.1 pg_epoch: 31 pg[2.0s1( v 23'14551 (23'14546,23'14551] lb MIN local-lis/les=28/29 n=0 ec=15/15 lis/c=28/15 les/c/f=29/16/0 sis=28) [3,1,2]/[3,NONE,2]p3(0) r=-1 lpr=28 pi=[15,28)/1 luod=0'0 crt=23'14551 lcod 0'0 mlcod 0'0 active+remapped mbc={}] do_peering_event: epoch_sent: 31 epoch_requested: 31 RequestBackfillPrio: priority 151 primary bytes 25794969624 local bytes 17611882520
2021-12-19T01:37:54.246+0000 7f68a7b4b700 10 osd.1 pg_epoch: 31 pg[2.0s1( v 23'14551 (23'14546,23'14551] lb MIN local-lis/les=28/29 n=0 ec=15/15 lis/c=28/15 les/c/f=29/16/0 sis=28) [3,1,2]/[3,NONE,2]p3(0) r=-1 lpr=28 pi=[15,28)/1 luod=0'0 crt=23'14551 lcod 0'0 mlcod 0'0 active+remapped mbc={}] try_reserve_recovery_space primary_bytes 12595200KiB local 8599552KiB pending_adjustments 3995648KiB
2021-12-19T01:37:54.246+0000 7f68a7b4b700 20 osd.1 pg_epoch: 31 pg[2.0s1( v 23'14551 (23'14546,23'14551] lb MIN local-lis/les=28/29 n=0 ec=15/15 lis/c=28/15 les/c/f=29/16/0 sis=28) [3,1,2]/[3,NONE,2]p3(0) r=-1 lpr=28 pi=[15,28)/1 luod=0'0 crt=23'14551 lcod 0'0 mlcod 0'0 active+remapped mbc={}] _tentative_full type backfillfull adjust_used 3995648KiB
2021-12-19T01:37:54.246+0000 7f68a7b4b700 20 osd.1 31 compute_adjusted_ratio Before kb_used() 9405084
2021-12-19T01:37:54.246+0000 7f68a7b4b700 20 osd.1 31 compute_adjusted_ratio After kb_used() 13400732
2021-12-19T01:37:54.246+0000 7f68a7b4b700 10 osd.1 pg_epoch: 31 pg[2.0s1( v 23'14551 (23'14546,23'14551] lb MIN local-lis/les=28/29 n=0 ec=15/15 lis/c=28/15 les/c/f=29/16/0 sis=28) [3,1,2]/[3,NONE,2]p3(0) r=-1 lpr=28 pi=[15,28)/1 luod=0'0 crt=23'14551 lcod 0'0 mlcod 0'0 active+remapped mbc={}] _tentative_full tentative usage is 0.143018 
2021-12-19T01:37:54.246+0000 7f68a7b4b700 10 osd.1 pg_epoch: 31 pg[2.0s1( v 23'14551 (23'14546,23'14551] lb MIN local-lis/les=28/29 n=0 ec=15/15 lis/c=28/15 les/c/f=29/16/0 sis=28) [3,1,2]/[3,NONE,2]p3(0) r=-1 lpr=28 pi=[15,28)/1 luod=0'0 crt=23'14551 lcod 0'0 mlcod 0'0 active+remapped mbc={}] backfill reservation rejected: backfill full

Note if the backfillfull ratio had been set to the non-truncated value 0.14347279565777277, it would be greater than the tentative usage 0.143018 and the test would pass.

I am going to improve the test.

Actions #2

Updated by Mykola Golub over 2 years ago

  • Status changed from In Progress to Fix Under Review
  • Pull request ID set to 44385
Actions #3

Updated by Neha Ojha over 2 years ago

  • Status changed from Fix Under Review to Pending Backport
Actions #4

Updated by Backport Bot over 2 years ago

  • Copied to Backport #53701: octopus: qa/tasks/backfill_toofull.py: AssertionError: 2.0 not in backfilling added
Actions #5

Updated by Backport Bot over 2 years ago

  • Copied to Backport #53702: pacific: qa/tasks/backfill_toofull.py: AssertionError: 2.0 not in backfilling added
Actions #6

Updated by Neha Ojha almost 2 years ago

  • Status changed from Pending Backport to Resolved
Actions

Also available in: Atom PDF