Bug #62248: upstream Quincy incorrectly reporting pgs backfill_toofull - RADOS - Ceph

Actions

Copy link

Bug #62248

open

upstream Quincy incorrectly reporting pgs backfill_toofull

Added by Tim Wilkinson 9 months ago. Updated 8 months ago.

Status:

Need More Info

Priority:

Normal

Assignee:

Category:

Target version:

% Done:

Source:

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

Ceph - v17.2.6

ceph-qa-suite:

Component(RADOS):

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

Our fault insertion testing with 17.2.6 is showing a cluster status reporting PGs toofull but when we dig into the PGs, they doesn't appear full at all.

health: HEALTH_WARN
            noscrub,nodeep-scrub flag(s) set
            Low space hindering backfill (add storage if this doesn't resolve itself): 12 pgs backfill_toofull
            Degraded data redundancy: 16006187/201134076 objects degraded (7.958%), 2498 pgs degraded, 2482 pgs undersized

# ceph df
--- RAW STORAGE ---
CLASS     SIZE    AVAIL     USED  RAW USED  %RAW USED
hdd    361 TiB  174 TiB  187 TiB   187 TiB      51.73
TOTAL  361 TiB  174 TiB  187 TiB   187 TiB      51.73

--- POOLS ---
POOL                        ID   PGS   STORED  OBJECTS     USED  %USED  MAX AVAIL
.mgr                         1     1  1.8 MiB        2  5.3 MiB      0     38 TiB
.rgw.root                    2    32  1.3 KiB        5   48 KiB      0     38 TiB
default.rgw.log              3   128   29 MiB      212   84 MiB      0     40 TiB
default.rgw.control          4   128      0 B       10      0 B      0     38 TiB
default.rgw.meta             5   128   18 KiB       47  579 KiB      0     38 TiB
default.rgw.buckets.index    6   256  2.8 GiB      220  8.0 GiB      0     40 TiB
default.rgw.buckets.data     7  4096  127 TiB   33.52M  175 TiB  60.26     83 TiB
default.rgw.buckets.non-ec   8    32   13 MiB        0   38 MiB      0     38 TiB


# ceph pg dump |grep _toofull
7.f54       8253                   0      8253          0        0  30667702272            0           0  1457      1457  active+undersized+degraded+remapped+backfill_wait+backfill_toofull  2023-07-31T05:40:30.871799+0000   3416'11018   6521:32310  [129,124,181,166,139,165]         129   [NONE,124,181,166,139,165]             124         0'0  2023-07-30T12:52:05.672341+0000              0'0  2023-07-30T12:48:01.120327+0000              0                    1  periodic scrub scheduled @ 2023-07-31T22:53:28.604484+0000                 0                0
  ...
7.731       8173                   0      8173          0        0  30332157952            0           0  1648      1648  active+undersized+degraded+remapped+backfill_wait+backfill_toofull  2023-07-31T05:40:26.339303+0000   3416'11028   6521:24442    [18,115,116,174,169,23]          18     [18,115,116,174,NONE,23]              18         0'0  2023-07-30T12:52:03.861531+0000              0'0  2023-07-30T12:44:17.866343+0000              0                    1  periodic scrub scheduled @ 2023-07-31T20:38:16.524162+0000                 0                0
dumped all


# ceph pg 7.731 query 
{
    "snap_trimq": "[]",
    "snap_trimq_len": 0,
    "state": "active+undersized+degraded+remapped+backfill_wait+backfill_toofull",
    "epoch": 6527,
    "up": [
        18,
        115,
        116,
        174,
        169,
        23
    ],
    "acting": [
        18,
        115,
        116,
        174,
        2147483647,
        23
    ],
    "backfill_targets": [
        "169(4)" 
    ],
    "acting_recovery_backfill": [
        "18(0)",
        "23(5)",
        "115(1)",
        "116(2)",
        "169(4)",
        "174(3)" 
   ...


# ceph osd df tree| grep 169
ID   CLASS  WEIGHT     REWEIGHT  SIZE     RAW USE   DATA      OMAP     META     AVAIL     %USE   VAR   PGS  STATUS  TYPE NAME
*169*    hdd    1.87949   1.00000  1.9 TiB   834 GiB   772 GiB   11 MiB  3.2 GiB   1.1 TiB  *43.36*  0.84   59      up          osd.169

Actions

Copy link

Updated by Vikhyat Umrao 9 months ago

Private changed from Yes to No

Actions

Copy link

Updated by Tim Wilkinson 9 months ago

In time the cluster settled and completed its recovery without issue. The same issue has been observed on a different cluster using the same build.

Actions

Copy link

Updated by Igor Fedotov 9 months ago

Project changed from bluestore to RADOS

Actions

Copy link

Updated by Radoslaw Zarzynski 9 months ago

Status changed from New to Need More Info

1. How reproducible this issue is?
2. May we have logs with debug_osd=20?

Actions

Copy link

Updated by Tim Wilkinson 9 months ago

It appears to be consistent. Does the debug_osd=20 need to be engaged prior to the issue or can we enable it for some debug capture after the issue is observed? We'll have space problems for that level debug for great lengths. Thanks.

Actions

Copy link

Updated by Tim Wilkinson 9 months ago

This issue was reproduced on one of the clusters. A PG dump snapshot was taken at the time and both debug_osd and debug_mon were set to 20 for 3 minutes.

bootstrap: f28-h21-000-r630.rdu2.scalelab.redhat.com (root / 100yard-)

PD dump: /root/pg-dump_230808-1958.txt

The OSD logs should be available on any of the RGW/OSD nodes for that time (230808-1958) ...

Actions

Copy link

Updated by Tim Wilkinson 9 months ago

All ceph logs have been copied to the root dir of each node in /root/230809-1607_ceph. So regardless if the cluster is redeployed, the OSD debug logs will be present.

Actions

Copy link

Updated by Prashant D 9 months ago

We have seen this behavior for ec 2+2 pool and from the osd debug logs the PG were marked as backfill_toofull based on current reserve space allocated for the backfilling PGs. We are getting expected behavior as OSD considers reserve space allocated for incoming backfill and recovery requests along with current OSD raw usage. If OSD's projected usage is exceeding the defined backfillfull_ratio then PGs which are backfilling will be marked as backfill_toofull.

OSD debug logs from backfill target OSD :
2023-08-11T10:04:03.498+0000 7fb96278e640 10 osd.33 pg_epoch: 17791 pg[6.16bs3( v 17756'34185 (14498'28579,17756'34185] lb MIN local-lis/les=17788/17789 n=0 ec=16100/247 lis/c=17788/16958 les/c/f=17789/16959/0 sis=17788) [2,27,28,33]/[2,27,28,NONE]p2(0) r=-1 lpr=17788 pi=[16958,17788)/1 luod=0'0 crt=17756'34185 lcod 0'0 mlcod 0'0 active mbc={}] do_peering_event: epoch_sent: 17791 epoch_requested: 17791 RequestBackfillPrio: priority 151 primary bytes 6124208128 local bytes 0
2023-08-11T10:04:03.498+0000 7fb96278e640 10 osd.33 pg_epoch: 17791 pg[6.16bs3( v 17756'34185 (14498'28579,17756'34185] lb MIN local-lis/les=17788/17789 n=0 ec=16100/247 lis/c=17788/16958 les/c/f=17789/16959/0 sis=17788) [2,27,28,33]/[2,27,28,NONE]p2(0) r=-1 lpr=17788 pi=[16958,17788)/1 luod=0'0 crt=17756'34185 lcod 0'0 mlcod 0'0 active mbc={}] try_reserve_recovery_space primary_bytes 2990336KiB local 0KiB pending_adjustments 2990336KiB
2023-08-11T10:04:03.498+0000 7fb96278e640 20 osd.33 pg_epoch: 17791 pg[6.16bs3( v 17756'34185 (14498'28579,17756'34185] lb MIN local-lis/les=17788/17789 n=0 ec=16100/247 lis/c=17788/16958 les/c/f=17789/16959/0 sis=17788) [2,27,28,33]/[2,27,28,NONE]p2(0) r=-1 lpr=17788 pi=[16958,17788)/1 luod=0'0 crt=17756'34185 lcod 0'0 mlcod 0'0 active mbc={}] _tentative_full type backfillfull adjust_used 2990336KiB
2023-08-11T10:04:03.498+0000 7fb96278e640 20 osd.33 17791 compute_adjusted_ratio Before kb_used() 190615992
2023-08-11T10:04:03.498+0000 7fb96278e640 20 osd.33 17791 compute_adjusted_ratio After kb_used() 193606328
....
2023-08-11T10:04:03.498+0000 7fb96278e640 20 osd.33 17791 compute_adjusted_ratio backfill adjusted osd_stat(store_statfs(0x9c97e7d50/0x0/0x67ffffe000, data 0x29721a6af7/0x297246e000, compress 0x0/0x0/0x0, omap 0x3aa73ca, meta 0x70658c36), peers [0,2,3,4,6,7,8,10,11,12,14,15,16,18,19,20,22,23,24,26,27,28,30,31,32,34,35] op hist [0,0,0,0,0,2,1,4,1])
2023-08-11T10:04:03.498+0000 7fb96278e640 10 osd.33 pg_epoch: 17791 pg[6.16bs3( v 17756'34185 (14498'28579,17756'34185] lb MIN local-lis/les=17788/17789 n=0 ec=16100/247 lis/c=17788/16958 les/c/f=17789/16959/0 sis=17788) [2,27,28,33]/[2,27,28,NONE]p2(0) r=-1 lpr=17788 pi=[16958,17788)/1 luod=0'0 crt=17756'34185 lcod 0'0 mlcod 0'0 active mbc={}] _tentative_full tentative usage is 0.905893 <---------- hitting backfill_full ratio
2023-08-11T10:04:03.498+0000 7fb96278e640 10 osd.33 pg_epoch: 17791 pg[6.16bs3( v 17756'34185 (14498'28579,17756'34185] lb MIN local-lis/les=17788/17789 n=0 ec=16100/247 lis/c=17788/16958 les/c/f=17789/16959/0 sis=17788) [2,27,28,33]/[2,27,28,NONE]p2(0) r=-1 lpr=17788 pi=[16958,17788)/1 luod=0'0 crt=17756'34185 lcod 0'0 mlcod 0'0 active mbc={}] backfill reservation rejected: backfill full
2023-08-11T10:04:03.498+0000 7fb96278e640 5 osd.33 pg_epoch: 17791 pg[6.16bs3( v 17756'34185 (14498'28579,17756'34185] lb MIN local-lis/les=17788/17789 n=0 ec=16100/247 lis/c=17788/16958 les/c/f=17789/16959/0 sis=17788) [2,27,28,33]/[2,27,28,NONE]p2(0) r=-1 lpr=17788 pi=[16958,17788)/1 luod=0'0 crt=17756'34185 lcod 0'0 mlcod 0'0 active mbc={}] exit Started/ReplicaActive/RepNotRecovering 29.994311 7 0.000045
2023-08-11T10:04:03.498+0000 7fb96278e640 5 osd.33 pg_epoch: 17791 pg[6.16bs3( v 17756'34185 (14498'28579,17756'34185] lb MIN local-lis/les=17788/17789 n=0 ec=16100/247 lis/c=17788/16958 les/c/f=17789/16959/0 sis=17788) [2,27,28,33]/[2,27,28,NONE]p2(0) r=-1 lpr=17788 pi=[16958,17788)/1 luod=0'0 crt=17756'34185 lcod 0'0 mlcod 0'0 active mbc={}] enter Started/ReplicaActive/RepWaitBackfillReserved
2023-08-11T10:04:03.498+0000 7fb96278e640 5 osd.33 pg_epoch: 17791 pg[6.16bs3( v 17756'34185 (14498'28579,17756'34185] lb MIN local-lis/les=17788/17789 n=0 ec=16100/247 lis/c=17788/16958 les/c/f=17789/16959/0 sis=17788) [2,27,28,33]/[2,27,28,NONE]p2(0) r=-1 lpr=17788 pi=[16958,17788)/1 luod=0'0 crt=17756'34185 lcod 0'0 mlcod 0'0 active mbc={}] exit Started/ReplicaActive/RepWaitBackfillReserved 0.000053 0 0.000000
2023-08-11T10:04:03.498+0000 7fb96278e640 5 osd.33 pg_epoch: 17791 pg[6.16bs3( v 17756'34185 (14498'28579,17756'34185] lb MIN local-lis/les=17788/17789 n=0 ec=16100/247 lis/c=17788/16958 les/c/f=17789/16959/0 sis=17788) [2,27,28,33]/[2,27,28,NONE]p2(0) r=-1 lpr=17788 pi=[16958,17788)/1 luod=0'0 crt=17756'34185 lcod 0'0 mlcod 0'0 active mbc={}] enter Started/ReplicaActive/RepNotRecovering

In theory whenever PGs goes for backfill, the backfill_target OSD reserves space for the PG to be backfilled on it and checks for tentative full status based on projected additional space required to fulfill the backfill request. The backfilling OSD computes the adjusted ratio based on current OSD usage state + backfill reserved space for PGs which will be backfilling on it and if OSD is going to hit the backfill state based on project adjusted usage then PGs to be backfilled on it will be marked as backfill_toofull.

PG::try_reserve_recovery_space ->
                                 OSDService::tentative_backfill_full ->
                                                                       OSDService::_tentative_full ->
                                                                                                     OSDService::recalc_full_state ->
                                                                                                                                    return BACKFILLFULL

Actions

Copy link

Updated by Radoslaw Zarzynski 8 months ago

We talked about this bug during the last BZ scrub. IIRC the idea there was to document this behavior.

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » RADOS

Custom queries

Bug #62248

upstream Quincy incorrectly reporting pgs backfill_toofull

Updated by Vikhyat Umrao 9 months ago

Updated by Tim Wilkinson 9 months ago

Updated by Igor Fedotov 9 months ago

Updated by Radoslaw Zarzynski 9 months ago

Updated by Tim Wilkinson 9 months ago

Updated by Tim Wilkinson 9 months ago

Updated by Tim Wilkinson 9 months ago

Updated by Prashant D 9 months ago

Updated by Radoslaw Zarzynski 8 months ago