Bug #21803
closedobjects degraded higher than 100%
0%
Description
Original post:
1. Jewel deployment with filestore.
2. Upgrade to Luminous (including mgr deployment and "ceph osd
require-osd-release luminous"), still on filestore.
3. rados bench with subsequent cleanup.
4. All OSDs up, all PGs active+clean.
5. Stop one OSD. Remove from CRUSH, auth list, OSD map.
6. Reinitialize OSD with bluestore.
7. Start OSD, commencing backfill.
8. Degraded objects above 100%.
I reproduced with a simpler test:
1. ceph osd pool create test 1 1
2. ceph osd pool set test size 1
3. rados -p test bench 10 write --no-cleanup
4. ceph osd pool set test size 3
Updated by David Zafman over 6 years ago
- Status changed from New to 7
- Assignee set to David Zafman
Updated by Sage Weil over 6 years ago
- Related to Bug #21887: degraded calculation is off during backfill added
Updated by David Zafman over 6 years ago
- Related to Bug #20059: miscounting degraded objects added
Updated by David Zafman over 6 years ago
- Status changed from 7 to Pending Backport
I'm marking this pending backport. Needs to be backported to luminous BEFORE backporting #20059 ( https://github.com/ceph/ceph/pull/19850 )
Updated by David Zafman over 6 years ago
- Status changed from Pending Backport to Resolved
- Backport deleted (
luminous, jewel)
It turns out this change is completely superseded by #20059. So I'm switching it to resolved.
I've decided that we won't backport to jewel for now either.
Updated by Florian Haas over 5 years ago
- Affected Versions v10.2.9, v12.2.8 added
David Zafman wrote:
It turns out this change is completely superseded by #20059. So I'm switching it to resolved.
I created the "original post" referred to in the description (part of a longer thread on the issue):
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-October/021512.html
We are still seeing this reproducibly on current Luminous (upgraded from latest Jewel). So I don't believe that #20059 fixed this. Is there anything users can do to avoid this issue? It can massively lengthen recovery times, rather unexpectedly.
Updated by Nathan Cutler over 5 years ago
- Status changed from Resolved to Pending Backport
- Priority changed from Normal to High
- Backport set to luminous
Updated by Nathan Cutler over 5 years ago
- Status changed from Pending Backport to 4
Updated by David Zafman over 5 years ago
- Related to Bug #22837: discover_all_missing() not always called during activating added
Updated by David Zafman over 5 years ago
- Status changed from 4 to Resolved
- Backport deleted (
luminous)
This change fixes the internal calculation of degraded objects. The _update_calc_stats() function was re-written by #20059, so this code can not be backported.
There are multiple issues reflected by the above status:cluster:
health: HEALTH_WARN
3/1524 objects misplaced (0.197%)
Degraded data redundancy: 197528/1524 objects degraded
(12961.155%), 1057 pgs unclean, 1055 pgs degraded, 3 pgs undersizeddata:
pools: 1 pools, 2048 pgs
objects: 508 objects, 1467 MB
usage: 127 GB used, 35639 GB / 35766 GB avail
pgs: 197528/1524 objects degraded (12961.155%)
3/1524 objects misplaced (0.197%)
1042 active+recovery_wait+degraded
991 active+clean
8 active+recovering+degraded
3 active+undersized+degraded+remapped+backfill_wait
2 active+recovery_wait+degraded+remapped
2 active+remapped+backfill_waitio:
recovery: 340 kB/s, 80 objects/s
- There are still 508 objects present (asynchronous deletes still in progress?)
- Deleting an OSD from the crush map may have cause many PGs to move around requiring lots of recovery
Caused 7 PGs to need to be temporarily remapped (state: remapped)
Still need to recover 1052 PGs (states: recovery_wait or recovering)
Need to backfill 5 PGs. (states: backfill_wait) - Master had additional pull request https://github.com/ceph/ceph/pull/20220 (#22837)
Updated by David Zafman over 5 years ago
The tracker #22837 which which I'm marking for backport might address some of the high degraded count.
Updated by David Zafman over 5 years ago
I think there is a procedure for filestore to bluestore conversion. That conversion should NOT change the crush map and the osd retains it's number and noout might be set so that PGs don't move while the OSD is down.