Bug #62248
openupstream Quincy incorrectly reporting pgs backfill_toofull
0%
Description
Our fault insertion testing with 17.2.6 is showing a cluster status reporting PGs toofull but when we dig into the PGs, they doesn't appear full at all.
health: HEALTH_WARN
noscrub,nodeep-scrub flag(s) set
Low space hindering backfill (add storage if this doesn't resolve itself): 12 pgs backfill_toofull
Degraded data redundancy: 16006187/201134076 objects degraded (7.958%), 2498 pgs degraded, 2482 pgs undersized
# ceph df --- RAW STORAGE --- CLASS SIZE AVAIL USED RAW USED %RAW USED hdd 361 TiB 174 TiB 187 TiB 187 TiB 51.73 TOTAL 361 TiB 174 TiB 187 TiB 187 TiB 51.73 --- POOLS --- POOL ID PGS STORED OBJECTS USED %USED MAX AVAIL .mgr 1 1 1.8 MiB 2 5.3 MiB 0 38 TiB .rgw.root 2 32 1.3 KiB 5 48 KiB 0 38 TiB default.rgw.log 3 128 29 MiB 212 84 MiB 0 40 TiB default.rgw.control 4 128 0 B 10 0 B 0 38 TiB default.rgw.meta 5 128 18 KiB 47 579 KiB 0 38 TiB default.rgw.buckets.index 6 256 2.8 GiB 220 8.0 GiB 0 40 TiB default.rgw.buckets.data 7 4096 127 TiB 33.52M 175 TiB 60.26 83 TiB default.rgw.buckets.non-ec 8 32 13 MiB 0 38 MiB 0 38 TiB
# ceph pg dump |grep _toofull 7.f54 8253 0 8253 0 0 30667702272 0 0 1457 1457 active+undersized+degraded+remapped+backfill_wait+backfill_toofull 2023-07-31T05:40:30.871799+0000 3416'11018 6521:32310 [129,124,181,166,139,165] 129 [NONE,124,181,166,139,165] 124 0'0 2023-07-30T12:52:05.672341+0000 0'0 2023-07-30T12:48:01.120327+0000 0 1 periodic scrub scheduled @ 2023-07-31T22:53:28.604484+0000 0 0 ... 7.731 8173 0 8173 0 0 30332157952 0 0 1648 1648 active+undersized+degraded+remapped+backfill_wait+backfill_toofull 2023-07-31T05:40:26.339303+0000 3416'11028 6521:24442 [18,115,116,174,169,23] 18 [18,115,116,174,NONE,23] 18 0'0 2023-07-30T12:52:03.861531+0000 0'0 2023-07-30T12:44:17.866343+0000 0 1 periodic scrub scheduled @ 2023-07-31T20:38:16.524162+0000 0 0 dumped all
# ceph pg 7.731 query { "snap_trimq": "[]", "snap_trimq_len": 0, "state": "active+undersized+degraded+remapped+backfill_wait+backfill_toofull", "epoch": 6527, "up": [ 18, 115, 116, 174, 169, 23 ], "acting": [ 18, 115, 116, 174, 2147483647, 23 ], "backfill_targets": [ "169(4)" ], "acting_recovery_backfill": [ "18(0)", "23(5)", "115(1)", "116(2)", "169(4)", "174(3)" ...
# ceph osd df tree| grep 169 ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR PGS STATUS TYPE NAME *169* hdd 1.87949 1.00000 1.9 TiB 834 GiB 772 GiB 11 MiB 3.2 GiB 1.1 TiB *43.36* 0.84 59 up osd.169
Updated by Tim Wilkinson 9 months ago
In time the cluster settled and completed its recovery without issue. The same issue has been observed on a different cluster using the same build.
Updated by Radoslaw Zarzynski 9 months ago
- Status changed from New to Need More Info
1. How reproducible this issue is?
2. May we have logs with debug_osd=20
?
Updated by Tim Wilkinson 9 months ago
It appears to be consistent. Does the debug_osd=20 need to be engaged prior to the issue or can we enable it for some debug capture after the issue is observed? We'll have space problems for that level debug for great lengths. Thanks.
Updated by Tim Wilkinson 9 months ago
This issue was reproduced on one of the clusters. A PG dump snapshot was taken at the time and both debug_osd and debug_mon were set to 20 for 3 minutes.
bootstrap: f28-h21-000-r630.rdu2.scalelab.redhat.com (root / 100yard-)
PD dump: /root/pg-dump_230808-1958.txt
The OSD logs should be available on any of the RGW/OSD nodes for that time (230808-1958) ...
Updated by Tim Wilkinson 9 months ago
All ceph logs have been copied to the root dir of each node in /root/230809-1607_ceph. So regardless if the cluster is redeployed, the OSD debug logs will be present.
Updated by Prashant D 9 months ago
We have seen this behavior for ec 2+2 pool and from the osd debug logs the PG were marked as backfill_toofull based on current reserve space allocated for the backfilling PGs. We are getting expected behavior as OSD considers reserve space allocated for incoming backfill and recovery requests along with current OSD raw usage. If OSD's projected usage is exceeding the defined backfillfull_ratio then PGs which are backfilling will be marked as backfill_toofull.
OSD debug logs from backfill target OSD :
2023-08-11T10:04:03.498+0000 7fb96278e640 10 osd.33 pg_epoch: 17791 pg[6.16bs3( v 17756'34185 (14498'28579,17756'34185] lb MIN local-lis/les=17788/17789 n=0 ec=16100/247 lis/c=17788/16958 les/c/f=17789/16959/0 sis=17788) [2,27,28,33]/[2,27,28,NONE]p2(0) r=-1 lpr=17788 pi=[16958,17788)/1 luod=0'0 crt=17756'34185 lcod 0'0 mlcod 0'0 active mbc={}] do_peering_event: epoch_sent: 17791 epoch_requested: 17791 RequestBackfillPrio: priority 151 primary bytes 6124208128 local bytes 0
2023-08-11T10:04:03.498+0000 7fb96278e640 10 osd.33 pg_epoch: 17791 pg[6.16bs3( v 17756'34185 (14498'28579,17756'34185] lb MIN local-lis/les=17788/17789 n=0 ec=16100/247 lis/c=17788/16958 les/c/f=17789/16959/0 sis=17788) [2,27,28,33]/[2,27,28,NONE]p2(0) r=-1 lpr=17788 pi=[16958,17788)/1 luod=0'0 crt=17756'34185 lcod 0'0 mlcod 0'0 active mbc={}] try_reserve_recovery_space primary_bytes 2990336KiB local 0KiB pending_adjustments 2990336KiB
2023-08-11T10:04:03.498+0000 7fb96278e640 20 osd.33 pg_epoch: 17791 pg[6.16bs3( v 17756'34185 (14498'28579,17756'34185] lb MIN local-lis/les=17788/17789 n=0 ec=16100/247 lis/c=17788/16958 les/c/f=17789/16959/0 sis=17788) [2,27,28,33]/[2,27,28,NONE]p2(0) r=-1 lpr=17788 pi=[16958,17788)/1 luod=0'0 crt=17756'34185 lcod 0'0 mlcod 0'0 active mbc={}] _tentative_full type backfillfull adjust_used 2990336KiB
2023-08-11T10:04:03.498+0000 7fb96278e640 20 osd.33 17791 compute_adjusted_ratio Before kb_used() 190615992
2023-08-11T10:04:03.498+0000 7fb96278e640 20 osd.33 17791 compute_adjusted_ratio After kb_used() 193606328
....
2023-08-11T10:04:03.498+0000 7fb96278e640 20 osd.33 17791 compute_adjusted_ratio backfill adjusted osd_stat(store_statfs(0x9c97e7d50/0x0/0x67ffffe000, data 0x29721a6af7/0x297246e000, compress 0x0/0x0/0x0, omap 0x3aa73ca, meta 0x70658c36), peers [0,2,3,4,6,7,8,10,11,12,14,15,16,18,19,20,22,23,24,26,27,28,30,31,32,34,35] op hist [0,0,0,0,0,2,1,4,1])
2023-08-11T10:04:03.498+0000 7fb96278e640 10 osd.33 pg_epoch: 17791 pg[6.16bs3( v 17756'34185 (14498'28579,17756'34185] lb MIN local-lis/les=17788/17789 n=0 ec=16100/247 lis/c=17788/16958 les/c/f=17789/16959/0 sis=17788) [2,27,28,33]/[2,27,28,NONE]p2(0) r=-1 lpr=17788 pi=[16958,17788)/1 luod=0'0 crt=17756'34185 lcod 0'0 mlcod 0'0 active mbc={}] _tentative_full tentative usage is 0.905893 <---------- hitting backfill_full ratio
2023-08-11T10:04:03.498+0000 7fb96278e640 10 osd.33 pg_epoch: 17791 pg[6.16bs3( v 17756'34185 (14498'28579,17756'34185] lb MIN local-lis/les=17788/17789 n=0 ec=16100/247 lis/c=17788/16958 les/c/f=17789/16959/0 sis=17788) [2,27,28,33]/[2,27,28,NONE]p2(0) r=-1 lpr=17788 pi=[16958,17788)/1 luod=0'0 crt=17756'34185 lcod 0'0 mlcod 0'0 active mbc={}] backfill reservation rejected: backfill full
2023-08-11T10:04:03.498+0000 7fb96278e640 5 osd.33 pg_epoch: 17791 pg[6.16bs3( v 17756'34185 (14498'28579,17756'34185] lb MIN local-lis/les=17788/17789 n=0 ec=16100/247 lis/c=17788/16958 les/c/f=17789/16959/0 sis=17788) [2,27,28,33]/[2,27,28,NONE]p2(0) r=-1 lpr=17788 pi=[16958,17788)/1 luod=0'0 crt=17756'34185 lcod 0'0 mlcod 0'0 active mbc={}] exit Started/ReplicaActive/RepNotRecovering 29.994311 7 0.000045
2023-08-11T10:04:03.498+0000 7fb96278e640 5 osd.33 pg_epoch: 17791 pg[6.16bs3( v 17756'34185 (14498'28579,17756'34185] lb MIN local-lis/les=17788/17789 n=0 ec=16100/247 lis/c=17788/16958 les/c/f=17789/16959/0 sis=17788) [2,27,28,33]/[2,27,28,NONE]p2(0) r=-1 lpr=17788 pi=[16958,17788)/1 luod=0'0 crt=17756'34185 lcod 0'0 mlcod 0'0 active mbc={}] enter Started/ReplicaActive/RepWaitBackfillReserved
2023-08-11T10:04:03.498+0000 7fb96278e640 5 osd.33 pg_epoch: 17791 pg[6.16bs3( v 17756'34185 (14498'28579,17756'34185] lb MIN local-lis/les=17788/17789 n=0 ec=16100/247 lis/c=17788/16958 les/c/f=17789/16959/0 sis=17788) [2,27,28,33]/[2,27,28,NONE]p2(0) r=-1 lpr=17788 pi=[16958,17788)/1 luod=0'0 crt=17756'34185 lcod 0'0 mlcod 0'0 active mbc={}] exit Started/ReplicaActive/RepWaitBackfillReserved 0.000053 0 0.000000
2023-08-11T10:04:03.498+0000 7fb96278e640 5 osd.33 pg_epoch: 17791 pg[6.16bs3( v 17756'34185 (14498'28579,17756'34185] lb MIN local-lis/les=17788/17789 n=0 ec=16100/247 lis/c=17788/16958 les/c/f=17789/16959/0 sis=17788) [2,27,28,33]/[2,27,28,NONE]p2(0) r=-1 lpr=17788 pi=[16958,17788)/1 luod=0'0 crt=17756'34185 lcod 0'0 mlcod 0'0 active mbc={}] enter Started/ReplicaActive/RepNotRecovering
In theory whenever PGs goes for backfill, the backfill_target OSD reserves space for the PG to be backfilled on it and checks for tentative full status based on projected additional space required to fulfill the backfill request. The backfilling OSD computes the adjusted ratio based on current OSD usage state + backfill reserved space for PGs which will be backfilling on it and if OSD is going to hit the backfill state based on project adjusted usage then PGs to be backfilled on it will be marked as backfill_toofull.
PG::try_reserve_recovery_space -> OSDService::tentative_backfill_full -> OSDService::_tentative_full -> OSDService::recalc_full_state -> return BACKFILLFULL
Updated by Radoslaw Zarzynski 8 months ago
We talked about this bug during the last BZ scrub. IIRC the idea there was to document this behavior.