Project

General

Profile

Bug #21803

objects degraded higher than 100%

Added by David Zafman almost 2 years ago. Updated about 1 year ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
-
Target version:
-
Start date:
10/13/2017
Due date:
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:

Description

Original post:
1. Jewel deployment with filestore.
2. Upgrade to Luminous (including mgr deployment and "ceph osd
require-osd-release luminous"), still on filestore.
3. rados bench with subsequent cleanup.
4. All OSDs up, all PGs active+clean.
5. Stop one OSD. Remove from CRUSH, auth list, OSD map.
6. Reinitialize OSD with bluestore.
7. Start OSD, commencing backfill.
8. Degraded objects above 100%.

I reproduced with a simpler test:

1. ceph osd pool create test 1 1
2. ceph osd pool set test size 1
3. rados -p test bench 10 write --no-cleanup
4. ceph osd pool set test size 3


Related issues

Related to RADOS - Bug #21887: degraded calculation is off during backfill Duplicate 10/21/2017
Related to RADOS - Bug #20059: miscounting degraded objects Resolved 05/23/2017
Related to RADOS - Bug #22837: discover_all_missing() not always called during activating Resolved 01/30/2018

History

#1 Updated by David Zafman almost 2 years ago

  • Status changed from New to Testing
  • Assignee set to David Zafman

#2 Updated by David Zafman almost 2 years ago

  • Backport set to luminous, jewel

#3 Updated by Sage Weil almost 2 years ago

  • Related to Bug #21887: degraded calculation is off during backfill added

#4 Updated by David Zafman over 1 year ago

  • Related to Bug #20059: miscounting degraded objects added

#5 Updated by David Zafman over 1 year ago

  • Status changed from Testing to Pending Backport

I'm marking this pending backport. Needs to be backported to luminous BEFORE backporting #20059 ( https://github.com/ceph/ceph/pull/19850 )

#6 Updated by David Zafman over 1 year ago

  • Status changed from Pending Backport to Resolved
  • Backport deleted (luminous, jewel)

It turns out this change is completely superseded by #20059. So I'm switching it to resolved.

I've decided that we won't backport to jewel for now either.

#7 Updated by Florian Haas about 1 year ago

  • Affected Versions v10.2.9, v12.2.8 added

David Zafman wrote:

It turns out this change is completely superseded by #20059. So I'm switching it to resolved.

I created the "original post" referred to in the description (part of a longer thread on the issue):

http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-October/021512.html

We are still seeing this reproducibly on current Luminous (upgraded from latest Jewel). So I don't believe that #20059 fixed this. Is there anything users can do to avoid this issue? It can massively lengthen recovery times, rather unexpectedly.

#8 Updated by Nathan Cutler about 1 year ago

  • Status changed from Resolved to Pending Backport
  • Priority changed from Normal to High
  • Backport set to luminous

#9 Updated by Nathan Cutler about 1 year ago

  • Status changed from Pending Backport to Feedback

#10 Updated by David Zafman about 1 year ago

  • Related to Bug #22837: discover_all_missing() not always called during activating added

#11 Updated by David Zafman about 1 year ago

  • Status changed from Feedback to Resolved
  • Backport deleted (luminous)

This change fixes the internal calculation of degraded objects. The _update_calc_stats() function was re-written by #20059, so this code can not be backported.

cluster:
health: HEALTH_WARN
3/1524 objects misplaced (0.197%)
Degraded data redundancy: 197528/1524 objects degraded
(12961.155%), 1057 pgs unclean, 1055 pgs degraded, 3 pgs undersized

data:
pools: 1 pools, 2048 pgs
objects: 508 objects, 1467 MB
usage: 127 GB used, 35639 GB / 35766 GB avail
pgs: 197528/1524 objects degraded (12961.155%)
3/1524 objects misplaced (0.197%)
1042 active+recovery_wait+degraded
991 active+clean
8 active+recovering+degraded
3 active+undersized+degraded+remapped+backfill_wait
2 active+recovery_wait+degraded+remapped
2 active+remapped+backfill_wait

io:
recovery: 340 kB/s, 80 objects/s

There are multiple issues reflected by the above status:
  1. There are still 508 objects present (asynchronous deletes still in progress?)
  2. Deleting an OSD from the crush map may have cause many PGs to move around requiring lots of recovery
    Caused 7 PGs to need to be temporarily remapped (state: remapped)
    Still need to recover 1052 PGs (states: recovery_wait or recovering)
    Need to backfill 5 PGs. (states: backfill_wait)
  3. Master had additional pull request https://github.com/ceph/ceph/pull/20220 (#22837)

#12 Updated by David Zafman about 1 year ago

The tracker #22837 which which I'm marking for backport might address some of the high degraded count.

#13 Updated by David Zafman about 1 year ago

I think there is a procedure for filestore to bluestore conversion. That conversion should NOT change the crush map and the osd retains it's number and noout might be set so that PGs don't move while the OSD is down.

Also available in: Atom PDF