Project

General

Profile

Bug #49364

pg_autoscaler causes device_health_metrics pool to use 128 pgs preventing rgw from being deployed

Added by Daniel Pivonka about 2 months ago. Updated about 1 month ago.

Status:
Fix Under Review
Priority:
Urgent
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
pacific, octopus, nautilus
Regression:
No
Severity:
1 - critical
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

when trying to deploy rgw on a master or pacific cluster the device_health_metrics pool is using 128 pgs. when you try to setup a rgw service it will attempt to create pools but it fails because there are not enough pgs left to stay under the mon_max_pg_per_osd limit.

this could be the result of changes to the pg autoscaler https://github.com/ceph/ceph/pull/38805 https://github.com/ceph/ceph/pull/39248

LXKPcmdN.txt View (3.14 KB) Daniel Pivonka, 02/18/2021 07:45 PM

History

#1 Updated by Neha Ojha about 2 months ago

reverting in octopus for now: https://github.com/ceph/ceph/pull/39560

#2 Updated by Neha Ojha about 2 months ago

  • Project changed from Ceph to mgr
  • Subject changed from device_health_metrics pool is using 128 pgs preventing rgw from being deployed to pg_autoscaler causes device_health_metrics pool to use 128 pgs preventing rgw from being deployed

#3 Updated by Neha Ojha about 2 months ago

  • Priority changed from Normal to Urgent

#4 Updated by Yuri Weinstein about 2 months ago

Neha Ojha wrote:

reverting in octopus for now: https://github.com/ceph/ceph/pull/39560

merged

#5 Updated by Kamoltat Sirivadhna about 2 months ago

  • Assignee set to Kamoltat Sirivadhna

#6 Updated by Daniel Pivonka about 2 months ago

  • Blocks Bug #49435: cephadm: rgw not getting deployed due to HEALTH_WARN added

#7 Updated by Neha Ojha about 2 months ago

  • Backport set to pacific, octopus, nautilus

#8 Updated by Juan Miguel Olmo Martínez about 1 month ago

  • Severity changed from 3 - minor to 1 - critical

#9 Updated by Juan Miguel Olmo Martínez about 1 month ago

Not be able to deploy/use any Ceph service is a critical issue

#10 Updated by Neha Ojha about 1 month ago

  • Status changed from New to Fix Under Review
  • Pull request ID set to 39833

#11 Updated by Neha Ojha about 1 month ago

reverting the original patch in pacific for now https://github.com/ceph/ceph/pull/39921

#12 Updated by Sebastian Wagner about 1 month ago

  • Blocks deleted (Bug #49435: cephadm: rgw not getting deployed due to HEALTH_WARN)

Also available in: Atom PDF