Bug #52489
openAdding a Pacific MON to an Octopus cluster: All PGs inactive
0%
Description
I'm in the midst of an upgrade from Octopus to Pacific. Due to issues during the upgrade, rather than simply upgrading the existing mons I decided to add a new Octopus mon to the existing quorum of 3, and test upgrading the new mon. The upgrade went fine, but a short time after the new octopus mon came up, the health of the cluster was alarming, with ALL pgs showing as inactive / unknown:
$ ceph -s cluster: id: c6618970-0ce0-4cb2-bc9a-dd5f29b62e24 health: HEALTH_WARN Reduced data availability: 5721 pgs inactive (muted: OSDMAP_FLAGS POOL_NO_REDUNDANCY) services: mon: 4 daemons, quorum k2,b2,b4,b5 (age 43m) mgr: b5(active, starting, since 40m), standbys: b4, b2 osd: 78 osds: 78 up (since 4d), 78 in (since 3w) flags noout data: pools: 12 pools, 5721 pgs objects: 0 objects, 0 B usage: 0 B used, 0 B / 0 B avail pgs: 100.000% pgs unknown 5721 unknown $ ceph health detail HEALTH_WARN Reduced data availability: 5721 pgs inactive; (muted: OSDMAP_FLAGS POOL_NO_REDUNDANCY) (MUTED) [WRN] OSDMAP_FLAGS: noout flag(s) set [WRN] PG_AVAILABILITY: Reduced data availability: 5721 pgs inactive pg 6.fcd is stuck inactive for 41m, current state unknown, last acting [] pg 6.fce is stuck inactive for 41m, current state unknown, last acting [] pg 6.fcf is stuck inactive for 41m, current state unknown, last acting [] pg 6.fd0 is stuck inactive for 41m, current state unknown, last acting [] ...etc.
Removing the Pacific mon restored the status to HEALTH_OK.
Updated by Sebastian Wagner over 2 years ago
- Project changed from Orchestrator to RADOS
- Subject changed from Ceph health alarming during mon uprade Octopus to Pacific to Adding a Pacific MON to an Octopus cluster: All PGs inactive
Updated by Sebastian Wagner over 2 years ago
- Related to Bug #52488: Pacific mon won't join Octopus mons added
Updated by Neha Ojha over 2 years ago
- Status changed from New to Duplicate
This is expected when mons don't form quorum, here it was caused by https://tracker.ceph.com/issues/52488. Let's use https://tracker.ceph.com/issues/52488 to debug the issue.
Updated by Chris Dunlop over 2 years ago
Neha Ojha wrote:
This is expected when mons don't form quorum, here it was caused by https://tracker.ceph.com/issues/52488. Let's use https://tracker.ceph.com/issues/52488 to debug the issue.
I'm not sure these are duplicates: at least the symptoms are completely different.
In https://tracker.ceph.com/issues/52488 a new Pacific mon won't join the existing Octopus mon quorum, and the CPU usage on all (Octopus and Pacific) mons goes over 100%.
In this issue a Pacific mon upgraded from Octopus joins the existing quorum but some time later all PGs are shown as inactive.
Also, is this latter case not supposed to be the normal upgrade path, i.e. upgrade each existing mon in turn, letting the mons form a quorum between each upgrade?
Updated by Neha Ojha over 2 years ago
- Status changed from Duplicate to New
Chris Dunlop wrote:
Neha Ojha wrote:
This is expected when mons don't form quorum, here it was caused by https://tracker.ceph.com/issues/52488. Let's use https://tracker.ceph.com/issues/52488 to debug the issue.
I'm not sure these are duplicates: at least the symptoms are completely different.
In https://tracker.ceph.com/issues/52488 a new Pacific mon won't join the existing Octopus mon quorum, and the CPU usage on all (Octopus and Pacific) mons goes over 100%.
In this issue a Pacific mon upgraded from Octopus joins the existing quorum but some time later all PGs are shown as inactive.
So you had one pacific mon and 3 octopus mons in quorum? What was the CPU utilization like on the mons and mgrs?
Also, is this latter case not supposed to be the normal upgrade path, i.e. upgrade each existing mon in turn, letting the mons form a quorum between each upgrade?