Project

General

Profile

Actions

Bug #49364

closed

pg_autoscaler causes device_health_metrics pool to use 128 pgs preventing rgw from being deployed

Added by Daniel Pivonka about 3 years ago. Updated over 2 years ago.

Status:
Resolved
Priority:
Urgent
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
pacific, octopus
Regression:
No
Severity:
1 - critical
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

when trying to deploy rgw on a master or pacific cluster the device_health_metrics pool is using 128 pgs. when you try to setup a rgw service it will attempt to create pools but it fails because there are not enough pgs left to stay under the mon_max_pg_per_osd limit.

this could be the result of changes to the pg autoscaler https://github.com/ceph/ceph/pull/38805 https://github.com/ceph/ceph/pull/39248


Files

LXKPcmdN.txt (3.14 KB) LXKPcmdN.txt Daniel Pivonka, 02/18/2021 07:45 PM

Related issues 2 (0 open2 closed)

Copied to mgr - Backport #52519: pacific: pg_autoscaler causes device_health_metrics pool to use 128 pgs preventing rgw from being deployed ResolvedKamoltat (Junior) SirivadhnaActions
Copied to mgr - Backport #52520: octopus: pg_autoscaler causes device_health_metrics pool to use 128 pgs preventing rgw from being deployed RejectedActions
Actions #1

Updated by Neha Ojha about 3 years ago

reverting in octopus for now: https://github.com/ceph/ceph/pull/39560

Actions #2

Updated by Neha Ojha about 3 years ago

  • Project changed from Ceph to mgr
  • Subject changed from device_health_metrics pool is using 128 pgs preventing rgw from being deployed to pg_autoscaler causes device_health_metrics pool to use 128 pgs preventing rgw from being deployed
Actions #3

Updated by Neha Ojha about 3 years ago

  • Priority changed from Normal to Urgent
Actions #4

Updated by Yuri Weinstein about 3 years ago

Neha Ojha wrote:

reverting in octopus for now: https://github.com/ceph/ceph/pull/39560

merged

Actions #5

Updated by Kamoltat (Junior) Sirivadhna about 3 years ago

  • Assignee set to Kamoltat (Junior) Sirivadhna
Actions #6

Updated by Daniel Pivonka about 3 years ago

  • Blocks Bug #49435: cephadm: rgw not getting deployed due to HEALTH_WARN added
Actions #7

Updated by Neha Ojha about 3 years ago

  • Backport set to pacific, octopus, nautilus
Actions #8

Updated by Juan Miguel Olmo Martínez about 3 years ago

  • Severity changed from 3 - minor to 1 - critical
Actions #9

Updated by Juan Miguel Olmo Martínez about 3 years ago

Not be able to deploy/use any Ceph service is a critical issue

Actions #10

Updated by Neha Ojha about 3 years ago

  • Status changed from New to Fix Under Review
  • Pull request ID set to 39833
Actions #11

Updated by Neha Ojha about 3 years ago

reverting the original patch in pacific for now https://github.com/ceph/ceph/pull/39921

Actions #12

Updated by Sebastian Wagner about 3 years ago

  • Blocks deleted (Bug #49435: cephadm: rgw not getting deployed due to HEALTH_WARN)
Actions #13

Updated by Loïc Dachary over 2 years ago

  • Status changed from Fix Under Review to Pending Backport
  • Backport changed from pacific, octopus, nautilus to pacific, octopus
Actions #14

Updated by Backport Bot over 2 years ago

  • Copied to Backport #52519: pacific: pg_autoscaler causes device_health_metrics pool to use 128 pgs preventing rgw from being deployed added
Actions #15

Updated by Backport Bot over 2 years ago

  • Copied to Backport #52520: octopus: pg_autoscaler causes device_health_metrics pool to use 128 pgs preventing rgw from being deployed added
Actions #16

Updated by Kamoltat (Junior) Sirivadhna over 2 years ago

  • Status changed from Pending Backport to Resolved
  • Pull request ID changed from 39833 to 43999

https://github.com/ceph/ceph/pull/43999 resolved the issue in master and pacific, the PR ensures that default meta pools that are created won't scale the pgs by a large amount and block the deployment of other services. For octopus, we have reverted the scale-down feature: https://github.com/ceph/ceph/pull/39560 so there won't be a problem there as well

Actions

Also available in: Atom PDF