Project

General

Profile

Backport #22724

luminous: miscounting degraded objects

Added by David Zafman about 6 years ago. Updated about 6 years ago.

Status:
Resolved
Priority:
High
Assignee:
David Zafman
Target version:
Release:
luminous
Crash signature (v1):
Crash signature (v2):

Description

on bigbang,

    cluster f502e0e8-63e1-42c8-b38b-5b4f8daba3f8
     health HEALTH_WARN
            129288 pgs degraded
            1287 pgs recovering
            128001 pgs recovery_wait
            129288 pgs stuck degraded
            129288 pgs stuck unclean
            recovery 489383396/601270785 objects degraded (81.392%)
     monmap e2: 3 mons, quorum p06636710a37514,p06636710a59202,p06636710a82299
        mgr e502: active: p06636710a59202, standbys: p06636710a82299, p06636710a
37514
     osdmap e37010: 6528 osds, 6526 up, 6526 in
      pgmap 262208 pgs, 2 pools, 192 TB data, 191M objects
            948 TB used, 34670 TB / 35618 TB avail
            489383396/601270785 objects degraded (81.392%)
              132920 active+clean
              128001 active+recovery_wait+degraded
                1287 active+recovering+degraded
recovery io 4712 MB/s, 4691 objects/s
  client io 3533 MB/s wr, 0 op/s rd, 7063 op/s wr

each pg is 3x. note that almost exactly 1/2 of them are degraded (i did a big pg split that updated pg_num and hten pgp_num from 131072 to 262144). so the degraded pgs probably have all 3 replicas in the wrong location, and the active+clean ones are obviously fine.

This should mean that no more than 50% of object (instances/copies) are degraded... right? Not sure where the 80% arithmetic is coming from.


Related issues

Related to RADOS - Backport #22387: luminous: PG stuck in recovery_unfound Resolved
Copied from RADOS - Bug #20059: miscounting degraded objects Resolved 05/23/2017

History

#1 Updated by David Zafman about 6 years ago

  • Copied from Bug #20059: miscounting degraded objects added

#2 Updated by David Zafman about 6 years ago

#3 Updated by Nathan Cutler about 6 years ago

David, while you're doing this one, can you include https://tracker.ceph.com/issues/22387 as well?

#4 Updated by Nathan Cutler about 6 years ago

  • Status changed from New to Need More Info

#5 Updated by David Zafman about 6 years ago

  • Status changed from Need More Info to Fix Under Review

#6 Updated by David Zafman about 6 years ago

  • Status changed from Fix Under Review to Resolved

#7 Updated by Nathan Cutler about 6 years ago

  • Target version set to v12.2.3

Also available in: Atom PDF