Bug #49364
closedpg_autoscaler causes device_health_metrics pool to use 128 pgs preventing rgw from being deployed
0%
Description
when trying to deploy rgw on a master or pacific cluster the device_health_metrics pool is using 128 pgs. when you try to setup a rgw service it will attempt to create pools but it fails because there are not enough pgs left to stay under the mon_max_pg_per_osd limit.
this could be the result of changes to the pg autoscaler https://github.com/ceph/ceph/pull/38805 https://github.com/ceph/ceph/pull/39248
Files
Updated by Neha Ojha about 3 years ago
reverting in octopus for now: https://github.com/ceph/ceph/pull/39560
Updated by Neha Ojha about 3 years ago
- Project changed from Ceph to mgr
- Subject changed from device_health_metrics pool is using 128 pgs preventing rgw from being deployed to pg_autoscaler causes device_health_metrics pool to use 128 pgs preventing rgw from being deployed
Caused by https://tracker.ceph.com/issues/49118.
Updated by Yuri Weinstein about 3 years ago
Updated by Kamoltat (Junior) Sirivadhna about 3 years ago
- Assignee set to Kamoltat (Junior) Sirivadhna
Updated by Daniel Pivonka about 3 years ago
- Blocks Bug #49435: cephadm: rgw not getting deployed due to HEALTH_WARN added
Updated by Neha Ojha about 3 years ago
- Backport set to pacific, octopus, nautilus
Updated by Juan Miguel Olmo Martínez about 3 years ago
- Severity changed from 3 - minor to 1 - critical
Updated by Juan Miguel Olmo Martínez about 3 years ago
Not be able to deploy/use any Ceph service is a critical issue
Updated by Neha Ojha about 3 years ago
- Status changed from New to Fix Under Review
- Pull request ID set to 39833
Updated by Neha Ojha about 3 years ago
reverting the original patch in pacific for now https://github.com/ceph/ceph/pull/39921
Updated by Sebastian Wagner about 3 years ago
- Blocks deleted (Bug #49435: cephadm: rgw not getting deployed due to HEALTH_WARN)
Updated by Loïc Dachary over 2 years ago
- Status changed from Fix Under Review to Pending Backport
- Backport changed from pacific, octopus, nautilus to pacific, octopus
Updated by Backport Bot over 2 years ago
- Copied to Backport #52519: pacific: pg_autoscaler causes device_health_metrics pool to use 128 pgs preventing rgw from being deployed added
Updated by Backport Bot over 2 years ago
- Copied to Backport #52520: octopus: pg_autoscaler causes device_health_metrics pool to use 128 pgs preventing rgw from being deployed added
Updated by Kamoltat (Junior) Sirivadhna over 2 years ago
- Status changed from Pending Backport to Resolved
- Pull request ID changed from 39833 to 43999
https://github.com/ceph/ceph/pull/43999 resolved the issue in master and pacific, the PR ensures that default meta pools that are created won't scale the pgs by a large amount and block the deployment of other services. For octopus, we have reverted the scale-down feature: https://github.com/ceph/ceph/pull/39560 so there won't be a problem there as well