Bug #54263: cephadm upgrade pacific to quincy autoscaler is scaling pgs from 32 -> 32768 for cephfs meta pool - RADOS - Ceph

Actions

Copy link

Bug #54263

closed

cephadm upgrade pacific to quincy autoscaler is scaling pgs from 32 -> 32768 for cephfs meta pool

Added by Vikhyat Umrao about 2 years ago. Updated about 2 years ago.

Status:

Resolved

Priority:

High

Assignee:

Kamoltat (Junior) Sirivadhna

Category:

Target version:

% Done:

Source:

Tags:

Backport:

quincy, pacific

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

Ceph - v17.0.0

ceph-qa-suite:

Component(RADOS):

Pull request ID:

45200

Crash signature (v1):

Crash signature (v2):

Description

Pacific version - 16.2.7-34.el8cp
Quincy version - 17.0.0-10315-ga00e8b31

After doing some analysis it looks like during the upgrade to the quincy version autoscaler TARGET RATIO got set as 4.0.

- After upgrade some commands output:

# ceph osd pool ls detail
pool 1 '.mgr' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 1 pgp_num 1 autoscale_mode on last_change 25006 flags hashpspool,backfillfull stripe_width 0 pg_num_min 1 application mgr,mgr_devicehealth
pool 2 'rbd' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 256 pgp_num 256 autoscale_mode on last_change 25006 lfor 0/0/1324 flags hashpspool,backfillfull,selfmanaged_snaps stripe_width 0 application rbd
pool 3 'cephfs.cephfs.meta' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 32768 pgp_num 32768 autoscale_mode on last_change 25006 lfor 0/0/9281 flags hashpspool,backfillfull stripe_width 0 pg_num_min 16 recovery_priority 5 target_size_ratio 4 application cephfs
pool 4 'cephfs.cephfs.data' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 4435 pgp_num 4214 pg_num_target 32 pgp_num_target 32 autoscale_mode on last_change 25817 lfor 0/25815/25813 flags hashpspool,backfillfull stripe_width 0 application cephfs

# ceph osd pool autoscale-status
POOL                  SIZE  TARGET SIZE  RATE  RAW CAPACITY   RATIO  TARGET RATIO  EFFECTIVE RATIO  BIAS  PG_NUM  NEW PG_NUM  AUTOSCALE  BULK   
.mgr                448.5k                3.0        12506G  0.0000                                  1.0       1              on         False  
rbd                 40985M                3.0        12506G  0.0096                                  1.0     256              on         False  
cephfs.cephfs.meta  102.9M                3.0        12506G  1.0000        4.0000           1.0000   1.0   32768              on         False  
cephfs.cephfs.data   1733G                3.0        12506G  0.4158                                  1.0      32              on         False

From MGR and system logs:

Before upgrade:

2634769 Feb 11 00:48:44 f03-h02-000-r640 conmon[2849344]: debug 2022-02-11T00:48:44.028+0000 7f3bab474700  0 [pg_autoscaler INFO root] effective_target_ratio 0.0 0.0 0 13428844396544

After upgrade:

2022-02-11T00:57:14.734+0000 7f4ceec03000  0 ceph version 17.0.0-10315-ga00e8b31 (a00e8b315af02865380634f8100dc7d18a18af4f) quincy (dev), process ceph-mgr, pid 7
2022-02-11T00:58:57.186+0000 7f4add690700  0 [pg_autoscaler INFO root] effective_target_ratio 0.0 4.0 0 13428844396544

Related issues 4 (1 open — 3 closed)

Actions

Copy link

Updated by Vikhyat Umrao about 2 years ago

Subject changed from cephadm upgrade pacific to quincy autoscaler is scaling pgs from 32 -> 32768 to cephadm upgrade pacific to quincy autoscaler is scaling pgs from 32 -> 32768 for cephfs meta pool

Actions

Copy link

Updated by Vikhyat Umrao about 2 years ago

Related to Bug #54238: cephadm upgrade pacifc to quincy -> causing osd's FULL/cascading failure added

Actions

Copy link

Updated by Vikhyat Umrao about 2 years ago

The following path has MGR logs, Mon logs, Cluster logs, audit logs, and system logs.

/home/core/tracker54263

Actions

Copy link

Updated by Kamoltat (Junior) Sirivadhna about 2 years ago

In summary,
the root cause of the problem is after the upgrade to quincy, cephfs meta data pool was somehow given a 4.0 target _size_ratio. This should not happen when we only have 4 pools in the same root of the cluster, especially, when total_target_byte is also 0 for cephfs.cephfs.meta , it is guaranteed that effective ratio will be 1.0 for that of cephfs.cephfs.meta, hence it will take precedence over capacity_ratio and this means it will give cephfs.cephfs.meta the maximum number of PGs it is allow to give, in this case, 32768 PGs.

Here is a link to my findings:
https://docs.google.com/document/d/1lpNTXlrgtcQ6tQylHqfRHkeLijU5Af1xjkYa_u7ZmbY/edit#

Actions

Copy link

Updated by Kamoltat (Junior) Sirivadhna about 2 years ago

Update:

From the monitor sides of things of pool creation, target_size_ratio cannot be more than 1.0 or less than 0.0. As it was specified herein /src/mon/MonCommands.h,
therefore, We can rule out the possibility of `target_size_ratio` getting set off by the command `ceph osd pool create <pool-name> --target_size_ratio <ratio>` However,
`ceph osd pool set <pool-name> target_size_ratio <ratio>` is able to set the target_size_ratio to be out of 0.0-1.0 range.

Note:

target_size_ratio can be more than 1.0 and the bound that was set during pool creation in /src/mon/MonCommands.h, should be changed.

Actions

Copy link

Updated by Kamoltat (Junior) Sirivadhna about 2 years ago

Related to Backport #54412: pacific:osd:add pg_num_max value added

Actions

Copy link

Updated by Vikhyat Umrao about 2 years ago

Status changed from New to In Progress
Pull request ID set to 45200

Actions

Copy link

Updated by Neha Ojha about 2 years ago

Status changed from In Progress to Fix Under Review

Actions

Copy link

Updated by Kamoltat (Junior) Sirivadhna about 2 years ago

Update:

After recreating the problem by tweaking the upgrade/pacific-x/parallel suite and adding additional logs, we conclude that the problem lies in the declaration of `opt_mapping` in src/osd/osd_types.cc. https://github.com/ceph/ceph/pull/44054 added PG_NUM_MAX to the middle of the list, which we found out that the order of the list is important and we should always add to the end of list to preserve the order of options during upgrade. For more information regarding bug analysis please see: https://docs.google.com/document/d/10PJDwU2H7uY2o7_1lwTtQHUFoF9Fx7jguKeT-mKR2sA/edit?usp=sharing

Actions

Copy link

#10