Project

General

Profile

Actions

Bug #19377

closed

mark_unfound_lost revert won't actually recover the objects unless there are some found objects to recover as well

Added by Samuel Just about 7 years ago. Updated over 6 years ago.

Status:
Duplicate
Priority:
High
Assignee:
-
Category:
Scrub/Repair
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
OSD
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

See ReplicatedPG::start_recovery_ops. If the num_missing==num_unfound, we don't try to do recovery. This is problematic as mark_unfound_lost adds the log entries to enable recovery, but they don't actually make the objects not missing on the holders of the version we are reverting to. You probably want it that way since you don't want to write or read from an object in that state prior to recovery. However, it means that start_recovery_ops won't actually do anything unless there's at least one found object to recover to the primary as well. The best fix is probably to rework that logic so that we'll take a crack at it there is mark_unfound_lost revert work to be done. Maybe just eliminate that check and in the worst case check all of the unfound objects once (just make sure the recovery task doesn't continually get requeued if there's no work to be done).

How did this pass tests? I think the test added after the rework focused on ensuring that mark_unfound_lost was safe with concurrent IO (which the old implementation wasn't). Thus, the test tends to always have a mix of unfound and not-unfound objects to make progress on avoiding this bug (I haven't confirmed this, but that's my guess).

To reproduce:
( ../src/stop.sh; rm -rf dev/*; CEPH_NUM_OSD=3 ../src/vstart.sh --short --localhost -n -x -d ; ) # start a vstart cluster
./ceph osd pool create foo1 1 1 # create a pool with 1 pg
./ceph osd pool set foo1 size 2 # make it size 2
./ceph osd pool set foo1 min_size 1 # and min_size 1
./rbd create -p foo1 test-rbd --size 1G # create rbd image on that pool
./rbd bench-write -p foo1 test-rbd --io-size 4 --io-threads 4 # write to the image for a while to create objects
./ceph osd set noout # set noout
  1. kill the process for the primary for that pg
    ./rbd bench-write p foo1 test-rbd --io-size 1024 --io-threads 1 # do writes to the degraded objects
    ./ceph tell osd.\* injectargs -
    '--osd_recovery_delay_start=1000' # delay recovery on the live osds
    ./ceph-osd -i 1 -c /home/sam/git-checkouts/ceph/src/ceph.conf --osd-recovery-delay-start=1000 # start osd.1 with recovery delay
  2. wait for that pg to go active+recovering (it should stay there because of the recovery-delay
  3. kill the other osd for that pg (the one with the new updates)
ceph/src [656b5b6●] » ./ceph -s
  • DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
    2017-03-24 13:29:56.682668 7ff65340f700 -1 WARNING: the following dangerous and experimental features are enabled: *
    2017-03-24 13:29:56.682682 7ff65340f700 0 lockdep start
    2017-03-24 13:29:56.686452 7ff65340f700 -1 WARNING: the following dangerous and experimental features are enabled: *
    cluster 23b16c60-b909-4753-8691-e8d5e62261b5
    health HEALTH_WARN
    25 pgs degraded
    1 pgs recovering
    7 pgs stuck unclean
    25 pgs undersized
    recovery 36/78 objects degraded (46.154%)
    recovery 7/29 unfound (24.138%)
    1/3 in osds are down
    noout flag(s) set
    monmap e1: 3 mons at {a=127.0.0.1:6789/0,b=127.0.0.1:6790/0,c=127.0.0.1:6791/0}
    election epoch 6, quorum 0,1,2 a,b,c
    fsmap e5: 1/1/1 up {0=c=up:active}, 2 up:standby
    osdmap e27: 3 osds: 2 up, 3 in; 25 remapped pgs
    flags noout,sortbitwise,require_jewel_osds
    pgmap v134: 25 pgs, 4 pools, 20482 kB data, 29 objects
    1149 MB used, 298 GB / 299 GB avail
    36/78 objects degraded (46.154%)
    7/29 unfound (24.138%)
    24 active+undersized+degraded
    1 active+recovering+undersized+degraded

Confirm that you now have unfound objects.

./ceph osd lost 2 --yes-i-really-mean-it # mark osd 2 lost
./ceph pg 3.0 mark_unfound_lost revert # tell the pg to revert the objects
  • DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
    2017-03-24 13:31:39.610025 7ff01ef56700 -1 WARNING: the following dangerous and experimental features are enabled: *
    2017-03-24 13:31:39.610038 7ff01ef56700 0 lockdep start
    2017-03-24 13:31:39.622171 7ff01ef56700 -1 WARNING: the following dangerous and experimental features are enabled: *
    pg has 7 objects unfound and apparently lost marking
    2017-03-24 13:31:39.741546 7ff01ef56700 0 lockdep stop
ceph/src [656b5b6●] » ./ceph -s
  • DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
    2017-03-24 13:32:49.036955 7fa737c2e700 -1 WARNING: the following dangerous and experimental features are enabled: *
    2017-03-24 13:32:49.036970 7fa737c2e700 0 lockdep start
    2017-03-24 13:32:49.040929 7fa737c2e700 -1 WARNING: the following dangerous and experimental features are enabled: *
    cluster 23b16c60-b909-4753-8691-e8d5e62261b5
    health HEALTH_WARN
    25 pgs degraded
    1 pgs recovering
    7 pgs stuck unclean
    25 pgs undersized
    recovery 31/78 objects degraded (39.744%)
    recovery 2/29 unfound (6.897%)
    1/3 in osds are down
    noout flag(s) set
    monmap e1: 3 mons at {a=127.0.0.1:6789/0,b=127.0.0.1:6790/0,c=127.0.0.1:6791/0}
    election epoch 6, quorum 0,1,2 a,b,c
    fsmap e5: 1/1/1 up {0=c=up:active}, 2 up:standby
    osdmap e28: 3 osds: 2 up, 3 in; 25 remapped pgs
    flags noout,sortbitwise,require_jewel_osds
    pgmap v147: 25 pgs, 4 pools, 20482 kB data, 29 objects
    1143 MB used, 298 GB / 299 GB avail
    31/78 objects degraded (39.744%)
    2/29 unfound (6.897%)
    24 active+undersized+degraded
    1 active+recovering+undersized+degraded

Note that not all unfound objects were recovered.


Related issues 1 (0 open1 closed)

Related to RADOS - Bug #22145: PG stuck in recovery_unfoundResolvedSage Weil11/16/2017

Actions
Actions #1

Updated by Samuel Just about 7 years ago

There is a very clumsy workaround to this issue. Once the mark_unfound_lost revert commands claims to have completed, any recovery will actually fix the unfound objects. Thus, all you have to do is create a missing object which isn't unfound. If you have at least 2 replicas left, you can do that by taking down the primary, doing a single write to a single new object, and bringing the primary back up. You may have to reduce min_size to permit writes.

Actions #2

Updated by Samuel Just about 7 years ago

  • Priority changed from Urgent to High
Actions #3

Updated by Greg Farnum almost 7 years ago

  • Project changed from Ceph to RADOS
  • Category set to Scrub/Repair
  • Component(RADOS) OSD added
Actions #4

Updated by Sage Weil over 6 years ago

  • Status changed from New to Duplicate
Actions #5

Updated by Sage Weil over 6 years ago

  • Related to Bug #22145: PG stuck in recovery_unfound added
Actions

Also available in: Atom PDF