Project

General

Profile

Actions

Bug #51847

open

A PG in "incomplete" state may end up in a backfill loop.

Added by Sridhar Seshasayee over 2 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

This was one of the issued observed in:
/a/nojha-2021-07-22_18:00:50-rados-wip-sseshasa-testing-2021-07-14-1320-distro-basic-smithi/6287310

One of the rados API tests did not complete and the test failed due to a timeout in waiting for recovery to complete.

2021-07-22T19:41:18.285 INFO:tasks.ceph.ceph_manager.ceph:PG 7.2 is not active+clean
2021-07-22T19:41:18.285 INFO:tasks.ceph.ceph_manager.ceph:{'pgid': '7.2', 'version': "0'0", 'reported_seq': 2181, 'reported_epoch': 2405, 'state': 'active+clean+scrubbing', 'last_fresh': '2021-07-22T19:41:15.062512+0000', 'last_change': '2021-07-22T19:41:15.062512+0000', 'last_active': '2021-07-22T19:41:15.062512+0000', 'last_peered': '2021-07-22T19:41:15.062512+0000', 'last_clean': '2021-07-22T19:41:15.062512+0000', 'last_became_active': '2021-07-22T18:39:48.880762+0000', 'last_became_peered': '2021-07-22T18:39:48.880762+0000', 'last_unstale': '2021-07-22T19:41:15.062512+0000', 'last_undegraded': '2021-07-22T19:41:15.062512+0000', 'last_fullsized': '2021-07-22T19:41:15.062512+0000', 'mapping_epoch': 54, 'log_start': "0'0", 'ondisk_log_start': "0'0", 'created': 54, 'last_epoch_clean': 57, 'parent': '0.0', 'parent_split_bits': 0, 'last_scrub': "0'0", 'last_scrub_stamp': '2021-07-22T18:39:46.618079+0000', 'last_deep_scrub': "0'0", 'last_deep_scrub_stamp': '2021-07-22T18:39:46.618079+0000', 'last_clean_scrub_stamp': '2021-07-22T18:39:46.618079+0000', 'log_size': 0, 'ondisk_log_size': 0, 'stats_invalid': False, 'dirty_stats_invalid': False, 'omap_stats_invalid': False, 'hitset_stats_invalid': False, 'hitset_bytes_stats_invalid': False, 'pin_stats_invalid': False, 'manifest_stats_invalid': False, 'snaptrimq_len': 0, 'stat_sum': {'num_bytes': 0, 'num_objects': 0, 'num_object_clones': 0, 'num_object_copies': 0, 'num_objects_missing_on_primary': 0, 'num_objects_missing': 0, 'num_objects_degraded': 0, 'num_objects_misplaced': 0, 'num_objects_unfound': 0, 'num_objects_dirty': 0, 'num_whiteouts': 0, 'num_read': 0, 'num_read_kb': 0, 'num_write': 0, 'num_write_kb': 0, 'num_scrub_errors': 0, 'num_shallow_scrub_errors': 0, 'num_deep_scrub_errors': 0, 'num_objects_recovered': 0, 'num_bytes_recovered': 0, 'num_keys_recovered': 0, 'num_objects_omap': 0, 'num_objects_hit_set_archive': 0, 'num_bytes_hit_set_archive': 0, 'num_flush': 0, 'num_flush_kb': 0, 'num_evict': 0, 'num_evict_kb': 0, 'num_promote': 0, 'num_flush_mode_high': 0, 'num_flush_mode_low': 0, 'num_evict_mode_some': 0, 'num_evict_mode_full': 0, 'num_objects_pinned': 0, 'num_legacy_snapsets': 0, 'num_large_omap_objects': 0, 'num_objects_manifest': 0, 'num_omap_bytes': 0, 'num_omap_keys': 0, 'num_objects_repaired': 0}, 'up': [4, 6, 7], 'acting': [4, 6, 7], 'avail_no_missing': [], 'object_location_counts': [], 'blocked_by': [], 'up_primary': 4, 'acting_primary': 4, 'purged_snaps': []}
2021-07-22T19:41:18.285 INFO:tasks.ceph.ceph_manager.ceph:PG 1.3 is not active+clean
2021-07-22T19:41:18.286 INFO:tasks.ceph.ceph_manager.ceph:{'pgid': '1.3', 'version': "20'117", 'reported_seq': 2467, 'reported_epoch': 2405, 'state': 'active+remapped+backfilling', 'last_fresh': '2021-07-22T19:41:12.951218+0000', 'last_change': '2021-07-22T18:56:16.734244+0000', 'last_active': '2021-07-22T19:41:12.951218+0000', 'last_peered': '2021-07-22T19:41:12.951218+0000', 'last_clean': '2021-07-22T18:40:15.901391+0000', 'last_became_active': '2021-07-22T18:40:21.271111+0000', 'last_became_peered': '2021-07-22T18:40:21.271111+0000', 'last_unstale': '2021-07-22T19:41:12.951218+0000', 'last_undegraded': '2021-07-22T19:41:12.951218+0000', 'last_fullsized': '2021-07-22T19:41:12.951218+0000', 'mapping_epoch': 85, 'log_start': "19'114", 'ondisk_log_start': "19'114", 'created': 30, 'last_epoch_clean': 66, 'parent': '0.0', 'parent_split_bits': 3, 'last_scrub': "20'117", 'last_scrub_stamp': '2021-07-22T18:39:44.980942+0000', 'last_deep_scrub': "0'0", 'last_deep_scrub_stamp': '2021-07-22T18:38:46.269805+0000', 'last_clean_scrub_stamp': '2021-07-22T18:39:44.980942+0000', 'log_size': 3, 'ondisk_log_size': 3, 'stats_invalid': False, 'dirty_stats_invalid': False, 'omap_stats_invalid': False, 'hitset_stats_invalid': False, 'hitset_bytes_stats_invalid': False, 'pin_stats_invalid': False, 'manifest_stats_invalid': False, 'snaptrimq_len': 0, 'stat_sum': {'num_bytes': 0, 'num_objects': 0, 'num_object_clones': 0, 'num_object_copies': 0, 'num_objects_missing_on_primary': 0, 'num_objects_missing': 0, 'num_objects_degraded': 0, 'num_objects_misplaced': 0, 'num_objects_unfound': 0, 'num_objects_dirty': 0, 'num_whiteouts': 0, 'num_read': 0, 'num_read_kb': 0, 'num_write': 0, 'num_write_kb': 0, 'num_scrub_errors': 0, 'num_shallow_scrub_errors': 0, 'num_deep_scrub_errors': 0, 'num_objects_recovered': 0, 'num_bytes_recovered': 0, 'num_keys_recovered': 0, 'num_objects_omap': 0, 'num_objects_hit_set_archive': 0, 'num_bytes_hit_set_archive': 0, 'num_flush': 0, 'num_flush_kb': 0, 'num_evict': 0, 'num_evict_kb': 0, 'num_promote': 0, 'num_flush_mode_high': 0, 'num_flush_mode_low': 0, 'num_evict_mode_some': 0, 'num_evict_mode_full': 0, 'num_objects_pinned': 0, 'num_legacy_snapsets': 0, 'num_large_omap_objects': 0, 'num_objects_manifest': 0, 'num_omap_bytes': 0, 'num_omap_keys': 0, 'num_objects_repaired': 0}, 'up': [4, 1, 2], 'acting': [4, 2, 7], 'avail_no_missing': ['4', '1', '2', '7'], 'object_location_counts': [], 'blocked_by': [], 'up_primary': 4, 'acting_primary': 4, 'purged_snaps': []}
2021-07-22T19:41:18.286 INFO:tasks.thrashosds.thrasher:Traceback (most recent call last):
  File "/home/teuthworker/src/git.ceph.com_ceph-c_f81345ca72228f2b22752a42ef3d4cfb048530bb/qa/tasks/ceph_manager.py", line 188, in wrapper
    return func(self)
  File "/home/teuthworker/src/git.ceph.com_ceph-c_f81345ca72228f2b22752a42ef3d4cfb048530bb/qa/tasks/ceph_manager.py", line 1404, in _do_thrash
    timeout=self.config.get('timeout')
  File "/home/teuthworker/src/git.ceph.com_ceph-c_f81345ca72228f2b22752a42ef3d4cfb048530bb/qa/tasks/ceph_manager.py", line 2804, in wait_for_recovery
    'wait_for_recovery: failed before timeout expired'
AssertionError: wait_for_recovery: failed before timeout expired

The logs below shows the sequence from when the concerned PG (pg1.3) goes into the "backfill_wait" state and then into "backfilling" on osd.4. osd.1 was the backfill target:

2021-07-22T18:56:16.432+0000 7f1f80833700 10 osd.4 pg_epoch: 866 pg[1.3( v 20'117 (19'114,20'117] local-lis/les=85/86 n=0 ec=30/14 lis/c=85/62 les/c/f=86/66/0 sis=85) [4,1,2]/[4,2,7] backfill=[1] r=0 lpr=85 pi=[62,85)/1 crt=20'117 lcod 0'0 mlcod 0'0 active+remapped+backfill_wait mbc={}] state<Started/Primary/Active/WaitRemoteBackfillReserved>: react num_bytes 0
o2021-07-22T18:56:16.432+0000 7f1f80833700 20 osd.4 pg_epoch: 866 pg[1.3( v 20'117 (19'114,20'117] local-lis/les=85/86 n=0 ec=30/14 lis/c=85/62 les/c/f=86/66/0 sis=85) [4,1,2]/[4,2,7] backfill=[1] r=0 lpr=85 pi=[62,85)/1 crt=20'117 lcod 0'0 mlcod 0'0 active+remapped+backfill_wait mbc={}] get_backfill_priority backfill priority is 110
o2021-07-22T18:56:16.492+0000 7f1f80833700 10 osd.4 pg_epoch: 866 pg[1.3( v 20'117 (19'114,20'117] local-lis/les=85/86 n=0 ec=30/14 lis/c=85/62 les/c/f=86/66/0 sis=85) [4,1,2]/[4,2,7] backfill=[1] r=0 lpr=85 pi=[62,85)/1 crt=20'117 lcod 0'0 mlcod 0'0 active+remapped+backfill_wait mbc={}] do_peering_event: epoch_sent: 866 epoch_requested: 866 DeferBackfill: delay 0
o2021-07-22T18:56:16.733+0000 7f1f80833700 10 osd.4 pg_epoch: 866 pg[1.3( v 20'117 (19'114,20'117] local-lis/les=85/86 n=0 ec=30/14 lis/c=85/62 les/c/f=86/66/0 sis=85) [4,1,2]/[4,2,7] backfill=[1] r=0 lpr=85 pi=[62,85)/1 crt=20'117 lcod 0'0 mlcod 0'0 active+remapped+backfill_wait mbc={}] do_peering_event: epoch_sent: 866 epoch_requested: 866 RemoteBackfillReserved
o2021-07-22T18:56:16.733+0000 7f1f80833700 10 osd.4 pg_epoch: 866 pg[1.3( v 20'117 (19'114,20'117] local-lis/les=85/86 n=0 ec=30/14 lis/c=85/62 les/c/f=86/66/0 sis=85) [4,1,2]/[4,2,7] backfill=[1] r=0 lpr=85 pi=[62,85)/1 crt=20'117 lcod 0'0 mlcod 0'0 active+remapped+backfill_wait mbc={}] state<Started/Primary/Active/WaitRemoteBackfillReserved>: react num_bytes 0
o2021-07-22T18:56:16.733+0000 7f1f80833700  5 osd.4 pg_epoch: 866 pg[1.3( v 20'117 (19'114,20'117] local-lis/les=85/86 n=0 ec=30/14 lis/c=85/62 les/c/f=86/66/0 sis=85) [4,1,2]/[4,2,7] backfill=[1] r=0 lpr=85 pi=[62,85)/1 crt=20'117 lcod 0'0 mlcod 0'0 active+remapped+backfill_wait mbc={}] exit Started/Primary/Active/WaitRemoteBackfillReserved 0.300198 2 0.000083
o2021-07-22T18:56:16.733+0000 7f1f80833700  5 osd.4 pg_epoch: 866 pg[1.3( v 20'117 (19'114,20'117] local-lis/les=85/86 n=0 ec=30/14 lis/c=85/62 les/c/f=86/66/0 sis=85) [4,1,2]/[4,2,7] backfill=[1] r=0 lpr=85 pi=[62,85)/1 crt=20'117 lcod 0'0 mlcod 0'0 active+remapped+backfill_wait mbc={}] enter Started/Primary/Active/Backfilling
o2021-07-22T18:56:16.733+0000 7f1f80833700 10 osd.4 pg_epoch: 866 pg[1.3( v 20'117 (19'114,20'117] local-lis/les=85/86 n=0 ec=30/14 lis/c=85/62 les/c/f=86/66/0 sis=85) [4,1,2]/[4,2,7] backfill=[1] r=0 lpr=85 pi=[62,85)/1 crt=20'117 lcod 0'0 mlcod 0'0 active+remapped+backfill_wait mbc={}] queue_recovery -- already queued
o2021-07-22T18:56:16.733+0000 7f1f80833700 20 osd.4 pg_epoch: 866 pg[1.3( v 20'117 (19'114,20'117] local-lis/les=85/86 n=0 ec=30/14 lis/c=85/62 les/c/f=86/66/0 sis=85) [4,1,2]/[4,2,7] backfill=[1] r=0 lpr=85 pi=[62,85)/1 crt=20'117 lcod 0'0 mlcod 0'0 active+remapped+backfilling mbc={}] update_calc_stats actingset 2,4,7 upset 1,2,4 acting_recovery_backfill 1,2,4,7

On osd.1 which is the backfill target, the pg 1.3 is in incomplete state indefinitely after processing a delete
a few minutes before the backfill was initiated:

2021-07-22T18:39:21.844+0000 7f94a77af700  1 osd.1 pg_epoch: 36 pg[1.3( v 20'117 (19'114,20'117] local-lis/les=34/35 n=0 ec=30/14 lis/c=34/34 les/c/f=35/35/0 sis=36 pruub=14.941528320s) [5,2,7] r=-1 lpr=36 pi=[34,36)/1 luod=0'0 crt=20'117 lcod 0'0 mlcod 0'0 active pruub 54.482391357s@ mbc={}] start_peering_interval up [5,2,7] -> [5,2,7], acting [5,2,1] -> [5,2,7], acting_primary 5 -> 5, up_primary 5 -> 5, role 2 -> -1, features acting 4540138303579357183 upacting 4540138303579357183
2021-07-22T18:39:21.844+0000 7f94a77af700 10 osd.1 pg_epoch: 36 pg[1.3( v 20'117 (19'114,20'117] local-lis/les=34/35 n=0 ec=30/14 lis/c=34/34 les/c/f=35/35/0 sis=36 pruub=14.941528320s) [5,2,7] r=-1 lpr=36 pi=[34,36)/1 crt=20'117 lcod 0'0 mlcod 0'0 unknown pruub 54.482391357s@ mbc={}] clear_primary_state
2021-07-22T18:39:21.844+0000 7f94a77af700 20 osd.1 pg_epoch: 36 pg[1.3( v 20'117 (19'114,20'117] local-lis/les=34/35 n=0 ec=30/14 lis/c=34/34 les/c/f=35/35/0 sis=36 pruub=14.941528320s) [5,2,7] r=-1 lpr=36 pi=[34,36)/1 crt=20'117 lcod 0'0 mlcod 0'0 unknown pruub 54.482391357s@ mbc={}] clear_primary_state

...

2021-07-22T18:39:23.103+0000 7f94a77af700  5 osd.1 pg_epoch: 37 pg[1.3( v 20'117 (19'114,20'117] local-lis/les=34/35 n=0 ec=30/14 lis/c=34/34 les/c/f=35/35/0 sis=36 pruub=13.882313728s) [5,2,7] r=-1 lpr=36 pi=[34,36)/1 crt=20'117 lcod 0'0 mlcod 0'0 unknown NOTIFY pruub 54.482391357s@ mbc={}] exit Started/Stray 1.258081 6 0.000102
2021-07-22T18:39:23.103+0000 7f94a77af700  5 osd.1 pg_epoch: 37 pg[1.3( v 20'117 (19'114,20'117] local-lis/les=34/35 n=0 ec=30/14 lis/c=34/34 les/c/f=35/35/0 sis=36 pruub=13.882313728s) [5,2,7] r=-1 lpr=36 pi=[34,36)/1 crt=20'117 lcod 0'0 mlcod 0'0 unknown NOTIFY pruub 54.482391357s@ mbc={}] enter Started/ToDelete
2021-07-22T18:39:23.103+0000 7f94a77af700  5 osd.1 pg_epoch: 37 pg[1.3( v 20'117 (19'114,20'117] local-lis/les=34/35 n=0 ec=30/14 lis/c=34/34 les/c/f=35/35/0 sis=36 pruub=13.882313728s) [5,2,7] r=-1 lpr=36 pi=[34,36)/1 crt=20'117 lcod 0'0 mlcod 0'0 unknown NOTIFY pruub 54.482391357s@ mbc={}] enter Started/ToDelete/WaitDeleteReseved
2021-07-22T18:39:23.103+0000 7f94a77af700 10 osd.1 pg_epoch: 37 pg[1.3( v 20'117 (19'114,20'117] local-lis/les=34/35 n=0 ec=30/14 lis/c=34/34 les/c/f=35/35/0 sis=36 pruub=13.882313728s) [5,2,7] r=-1 lpr=36 pi=[34,36)/1 crt=20'117 lcod 0'0 mlcod 0'0 unknown NOTIFY pruub 54.482391357s@ mbc={}] do_peering_event: epoch_sent: 37 epoch_requested: 37 DeleteReserved
2021-07-22T18:39:23.103+0000 7f94a77af700  5 osd.1 pg_epoch: 37 pg[1.3( v 20'117 (19'114,20'117] local-lis/les=34/35 n=0 ec=30/14 lis/c=34/34 les/c/f=35/35/0 sis=36 pruub=13.882313728s) [5,2,7] r=-1 lpr=36 pi=[34,36)/1 crt=20'117 lcod 0'0 mlcod 0'0 unknown NOTIFY pruub 54.482391357s@ mbc={}] exit Started/ToDelete/WaitDeleteReseved 0.000070 1 0.000049
2021-07-22T18:39:23.103+0000 7f94a77af700  5 osd.1 pg_epoch: 37 pg[1.3( v 20'117 (19'114,20'117] local-lis/les=34/35 n=0 ec=30/14 lis/c=34/34 les/c/f=35/35/0 sis=36 pruub=13.882313728s) [5,2,7] r=-1 lpr=36 pi=[34,36)/1 crt=20'117 lcod 0'0 mlcod 0'0 unknown NOTIFY pruub 54.482391357s@ mbc={}] enter Started/ToDelete/Deleting

...

2021-07-22T18:39:23.104+0000 7f94a77af700 10 osd.1 pg_epoch: 37 pg[1.3( v 20'117 (19'114,20'117] lb MIN local-lis/les=34/35 n=0 ec=30/14 lis/c=34/34 les/c/f=35/35/0 sis=36 pruub=13.882313728s) [5,2,7] r=-1 lpr=36 DELETING pi=[34,36)/1 crt=20'117 lcod 0'0 mlcod 0'0 unknown NOTIFY pruub 54.482391357s@ mbc={}] do_peering_event: epoch_sent: 37 epoch_requested: 37 DeleteSome
2021-07-22T18:39:23.104+0000 7f94a77af700 10 osd.1 pg_epoch: 37 pg[1.3( v 20'117 (19'114,20'117] lb MIN local-lis/les=34/35 n=0 ec=30/14 lis/c=34/34 les/c/f=35/35/0 sis=36 pruub=13.882313728s) [5,2,7] r=-1 lpr=36 DELETING pi=[34,36)/1 crt=20'117 lcod 0'0 mlcod 0'0 unknown NOTIFY pruub 54.482391357s@ mbc={}] do_delete_work
2021-07-22T18:39:23.104+0000 7f94a77af700 20 osd.1 pg_epoch: 37 pg[1.3( v 20'117 (19'114,20'117] lb MIN local-lis/les=34/35 n=0 ec=30/14 lis/c=34/34 les/c/f=35/35/0 sis=36 pruub=13.882313728s) [5,2,7] r=-1 lpr=36 DELETING pi=[34,36)/1 crt=20'117 lcod 0'0 mlcod 0'0 unknown NOTIFY pruub 54.482391357s@ mbc={}] do_delete_work [#1:c0000000::::head#]
2021-07-22T18:39:23.104+0000 7f94a77af700  5 osd.1 pg_epoch: 37 pg[1.3( v 20'117 (19'114,20'117] lb MIN local-lis/les=34/35 n=0 ec=30/14 lis/c=34/34 les/c/f=35/35/0 sis=36 pruub=13.882313728s) [5,2,7] r=-1 lpr=36 DELETING pi=[34,36)/1 crt=20'117 lcod 0'0 mlcod 0'0 unknown NOTIFY pruub 54.482391357s@ mbc={}] exit Started/ToDelete/Deleting 0.001199 1 0.000238
2021-07-22T18:39:23.104+0000 7f94a77af700  5 osd.1 pg_epoch: 37 pg[1.3( v 20'117 (19'114,20'117] lb MIN local-lis/les=34/35 n=0 ec=30/14 lis/c=34/34 les/c/f=35/35/0 sis=36 pruub=13.882313728s) [5,2,7] r=-1 lpr=36 pi=[34,36)/1 crt=20'117 lcod 0'0 mlcod 0'0 unknown NOTIFY pruub 54.482391357s@ mbc={}] exit Started/ToDelete 0.001302 0 0.000000
2021-07-22T18:39:23.104+0000 7f94a77af700  5 osd.1 pg_epoch: 37 pg[1.3( v 20'117 (19'114,20'117] lb MIN local-lis/les=34/35 n=0 ec=30/14 lis/c=34/34 les/c/f=35/35/0 sis=36 pruub=13.882313728s) [5,2,7] r=-1 lpr=36 pi=[34,36)/1 crt=20'117 lcod 0'0 mlcod 0'0 unknown NOTIFY pruub 54.482391357s@ mbc={}] exit Started 1.259437 0 0.000000

2021-07-22T18:39:52.398+0000 7f94a77af700  5 osd.1 pg_epoch: 58 pg[1.3(unlocked)] enter Initial
2021-07-22T18:39:52.398+0000 7f94a77af700 20 osd.1 pg_epoch: 58 pg[1.3(unlocked)] enter NotTrimming
2021-07-22T18:39:52.398+0000 7f94a77af700 15 osd.1 pg_epoch: 58 pg[1.3(unlocked)]  scrubberFSM pg(1.3) ScrubMachine created 1.3
2021-07-22T18:39:52.398+0000 7f94a77af700 10 osd.1 pg_epoch: 58 pg[1.3(unlocked)]  scrubberFSM pg(1.3) -- state -->> NotActive
2021-07-22T18:39:52.398+0000 7f94a77af700 10 osd.1 pg_epoch: 58 pg[1.3( DNE empty local-lis/les=0/0 n=0 ec=0/0 lis/c=0/0 les/c/f=0/0/0 sis=0) [] r=-1 lpr=0 crt=0'0 mlcod 0'0 unknown mbc={}] init role -1 up [4,1,2] acting [4,2,7] history ec=30/14 lis/c=45/45 les/c/f=46/46/0 sis=58 past_intervals ([45,57] all_participants=1,2,4,7 intervals=([45,55] acting 2,4,7),([56,57] acting 1,2,4))
2021-07-22T18:39:52.398+0000 7f94a77af700 20 osd.1 pg_epoch: 58 pg[1.3( empty local-lis/les=0/0 n=0 ec=30/14 lis/c=45/45 les/c/f=46/46/0 sis=58) [4,1,2]/[4,2,7] r=-1 lpr=0 pi=[45,58)/2 crt=0'0 mlcod 0'0 unknown mbc={}] on_new_interval
2021-07-22T18:39:52.398+0000 7f94a77af700 20 osd.1 pg_epoch: 58 pg[1.3( empty local-lis/les=0/0 n=0 ec=30/14 lis/c=45/45 les/c/f=46/46/0 sis=58) [4,1,2]/[4,2,7] r=-1 lpr=0 pi=[45,58)/2 crt=0'0 mlcod 0'0 unknown mbc={}] on_new_interval upacting_features 0x3f01cfbb7ffdffff from [4,2,7]+[4,1,2]
2021-07-22T18:39:52.398+0000 7f94a77af700 20 osd.1 pg_epoch: 58 pg[1.3( empty local-lis/les=0/0 n=0 ec=30/14 lis/c=45/45 les/c/f=46/46/0 sis=58) [4,1,2]/[4,2,7] r=-1 lpr=0 pi=[45,58)/2 crt=0'0 mlcod 0'0 unknown mbc={}] on_new_interval checking missing set deletes flag. missing = missing(0 may_include_deletes = 1)

...

2021-07-22T18:39:54.499+0000 7f94a37a7700 10 osd.1 pg_epoch: 61 pg[1.3( v 20'117 (19'114,20'117] lb MIN local-lis/les=58/59 n=0 ec=30/14 lis/c=45/45 les/c/f=46/46/0 sis=58) [4,1,2]/[4,2,7] r=-1 lpr=58 pi=[45,58)/2 luod=0'0 crt=20'117 lcod 0'0 mlcod 0'0 active mbc={}] do_peering_event: epoch_sent: 61 epoch_requested: 61 MInfoRec from 4 info: 1.3( v 20'117 (19'114,20'117] local-lis/les=58/59 n=0 ec=30/14 lis/c=58/45 les/c/f=59/46/0 sis=58) pg_lease(ru 85.778335571s ub 88.188583374s int 16.000000000s)
2021-07-22T18:39:54.499+0000 7f94a37a7700 20 osd.1 pg_epoch: 61 pg[1.3( v 20'117 (19'114,20'117] lb MIN local-lis/les=58/59 n=0 ec=30/14 lis/c=58/45 les/c/f=59/46/0 sis=58) [4,1,2]/[4,2,7] r=-1 lpr=58 pi=[45,58)/2 luod=0'0 crt=20'117 lcod 0'0 mlcod 0'0 active mbc={}] update_history advanced history from ec=30/14 lis/c=58/45 les/c/f=59/46/0 sis=58
2021-07-22T18:39:54.499+0000 7f94a37a7700 20 osd.1 pg_epoch: 61 pg[1.3( v 20'117 (19'114,20'117] lb MIN local-lis/les=58/59 n=0 ec=30/14 lis/c=58/45 les/c/f=59/46/0 sis=58) [4,1,2]/[4,2,7] r=-1 lpr=58 pi=[45,58)/2 luod=0'0 crt=20'117 lcod 0'0 mlcod 0'0 active mbc={}] on_info_history_change

...

2021-07-22T18:39:55.228+0000 7f94a77af700 10 osd.1 pg_epoch: 62 pg[1.3( v 20'117 (19'114,20'117] lb MIN local-lis/les=58/59 n=0 ec=30/14 lis/c=58/45 les/c/f=59/46/0 sis=58) [4,1,2]/[4,2,7] r=-1 lpr=58 pi=[45,58)/2 luod=0'0 crt=20'117 lcod 0'0 mlcod 0'0 active mbc={}] state<Started>: should_restart_peering, transitioning to Reset
2021-07-22T18:39:55.228+0000 7f94a77af700  5 osd.1 pg_epoch: 62 pg[1.3( v 20'117 (19'114,20'117] lb MIN local-lis/les=58/59 n=0 ec=30/14 lis/c=58/45 les/c/f=59/46/0 sis=58) [4,1,2]/[4,2,7] r=-1 lpr=58 pi=[45,58)/2 luod=0'0 crt=20'117 lcod 0'0 mlcod 0'0 active mbc={}] exit Started/ReplicaActive/RepNotRecovering 2.830019 9 0.000128
2021-07-22T18:39:55.228+0000 7f94a77af700  5 osd.1 pg_epoch: 62 pg[1.3( v 20'117 (19'114,20'117] lb MIN local-lis/les=58/59 n=0 ec=30/14 lis/c=58/45 les/c/f=59/46/0 sis=58) [4,1,2]/[4,2,7] r=-1 lpr=58 pi=[45,58)/2 luod=0'0 crt=20'117 lcod 0'0 mlcod 0'0 active mbc={}] exit Started/ReplicaActive 2.830031 0 0.000000
2021-07-22T18:39:55.228+0000 7f94a77af700  5 osd.1 pg_epoch: 62 pg[1.3( v 20'117 (19'114,20'117] lb MIN local-lis/les=58/59 n=0 ec=30/14 lis/c=58/45 les/c/f=59/46/0 sis=58) [4,1,2]/[4,2,7] r=-1 lpr=58 pi=[45,58)/2 luod=0'0 crt=20'117 lcod 0'0 mlcod 0'0 active mbc={}] exit Started 2.830226 0 0.000000
2021-07-22T18:39:55.228+0000 7f94a77af700  5 osd.1 pg_epoch: 62 pg[1.3( v 20'117 (19'114,20'117] lb MIN local-lis/les=58/59 n=0 ec=30/14 lis/c=58/45 les/c/f=59/46/0 sis=58) [4,1,2]/[4,2,7] r=-1 lpr=58 pi=[45,58)/2 luod=0'0 crt=20'117 lcod 0'0 mlcod 0'0 active mbc={}] enter Reset
2021-07-22T18:39:55.228+0000 7f94a77af700 20 osd.1 pg_epoch: 62 pg[1.3( v 20'117 (19'114,20'117] lb MIN local-lis/les=58/59 n=0 ec=30/14 lis/c=58/45 les/c/f=59/46/0 sis=58) [4,1,2]/[4,2,7] r=-1 lpr=58 pi=[45,58)/2 luod=0'0 crt=20'117 lcod 0'0 mlcod 0'0 active mbc={}] set_last_peering_reset 62

The PG was in incomplete state as indicated by "lb MIN" above until the test terminated. The backill initiated from osd.4 didn't have any missing or misplaced objects. The backfill process repeats in a loop and the PG never comes out of backfilling state.

The PG is from pool 1 which indicates a mgr pool. This issue is benign and very likely not seen on real world clusters..

The fix could probably add a check for num of objects before attempting a backfill.

No data to display

Actions

Also available in: Atom PDF