Project

General

Profile

Actions

Bug #52489

open

Adding a Pacific MON to an Octopus cluster: All PGs inactive

Added by Chris Dunlop over 2 years ago. Updated over 2 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

I'm in the midst of an upgrade from Octopus to Pacific. Due to issues during the upgrade, rather than simply upgrading the existing mons I decided to add a new Octopus mon to the existing quorum of 3, and test upgrading the new mon. The upgrade went fine, but a short time after the new octopus mon came up, the health of the cluster was alarming, with ALL pgs showing as inactive / unknown:

$ ceph -s
  cluster:
    id:     c6618970-0ce0-4cb2-bc9a-dd5f29b62e24
    health: HEALTH_WARN
            Reduced data availability: 5721 pgs inactive
            (muted: OSDMAP_FLAGS POOL_NO_REDUNDANCY)

  services:
    mon: 4 daemons, quorum k2,b2,b4,b5 (age 43m)
    mgr: b5(active, starting, since 40m), standbys: b4, b2
    osd: 78 osds: 78 up (since 4d), 78 in (since 3w)
         flags noout

  data:
    pools:   12 pools, 5721 pgs
    objects: 0 objects, 0 B
    usage:   0 B used, 0 B / 0 B avail
    pgs:     100.000% pgs unknown
             5721 unknown

$ ceph health detail
HEALTH_WARN Reduced data availability: 5721 pgs inactive; (muted: OSDMAP_FLAGS POOL_NO_REDUNDANCY)
(MUTED) [WRN] OSDMAP_FLAGS: noout flag(s) set
[WRN] PG_AVAILABILITY: Reduced data availability: 5721 pgs inactive
    pg 6.fcd is stuck inactive for 41m, current state unknown, last acting []
    pg 6.fce is stuck inactive for 41m, current state unknown, last acting []
    pg 6.fcf is stuck inactive for 41m, current state unknown, last acting []
    pg 6.fd0 is stuck inactive for 41m, current state unknown, last acting []
    ...etc.

Removing the Pacific mon restored the status to HEALTH_OK.


Related issues 1 (1 open0 closed)

Related to RADOS - Bug #52488: Pacific mon won't join Octopus monsNew

Actions
Actions #1

Updated by Sebastian Wagner over 2 years ago

  • Project changed from Orchestrator to RADOS
  • Subject changed from Ceph health alarming during mon uprade Octopus to Pacific to Adding a Pacific MON to an Octopus cluster: All PGs inactive
Actions #2

Updated by Sebastian Wagner over 2 years ago

  • Related to Bug #52488: Pacific mon won't join Octopus mons added
Actions #3

Updated by Neha Ojha over 2 years ago

  • Status changed from New to Duplicate

This is expected when mons don't form quorum, here it was caused by https://tracker.ceph.com/issues/52488. Let's use https://tracker.ceph.com/issues/52488 to debug the issue.

Actions #4

Updated by Chris Dunlop over 2 years ago

Neha Ojha wrote:

This is expected when mons don't form quorum, here it was caused by https://tracker.ceph.com/issues/52488. Let's use https://tracker.ceph.com/issues/52488 to debug the issue.

I'm not sure these are duplicates: at least the symptoms are completely different.

In https://tracker.ceph.com/issues/52488 a new Pacific mon won't join the existing Octopus mon quorum, and the CPU usage on all (Octopus and Pacific) mons goes over 100%.

In this issue a Pacific mon upgraded from Octopus joins the existing quorum but some time later all PGs are shown as inactive.

Also, is this latter case not supposed to be the normal upgrade path, i.e. upgrade each existing mon in turn, letting the mons form a quorum between each upgrade?

Actions #5

Updated by Neha Ojha over 2 years ago

  • Status changed from Duplicate to New

Chris Dunlop wrote:

Neha Ojha wrote:

This is expected when mons don't form quorum, here it was caused by https://tracker.ceph.com/issues/52488. Let's use https://tracker.ceph.com/issues/52488 to debug the issue.

I'm not sure these are duplicates: at least the symptoms are completely different.

In https://tracker.ceph.com/issues/52488 a new Pacific mon won't join the existing Octopus mon quorum, and the CPU usage on all (Octopus and Pacific) mons goes over 100%.

In this issue a Pacific mon upgraded from Octopus joins the existing quorum but some time later all PGs are shown as inactive.

So you had one pacific mon and 3 octopus mons in quorum? What was the CPU utilization like on the mons and mgrs?

Also, is this latter case not supposed to be the normal upgrade path, i.e. upgrade each existing mon in turn, letting the mons form a quorum between each upgrade?

Actions

Also available in: Atom PDF