Bug #10017: OSD wrongly marks object as unfound if only the primary is corrupted for EC pool - Ceph - Ceph

Actions

Copy link

Bug #10017

closed

OSD wrongly marks object as unfound if only the primary is corrupted for EC pool

Added by Guang Yang over 9 years ago. Updated about 9 years ago.

Status:

Resolved

Priority:

Urgent

Assignee:

Loïc Dachary

Category:

OSD

Target version:

% Done:

50%

Source:

other

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

Recently we observed there was one PG stuck at recovering with one object marked as lost, the scrubbing log showed that only the primary chunk of the object has inconsistency in terms of its stored digest and computed digest, all other chunks are good.

Looking at the implementation, I think the problem comes from the way how PG repairs an object:

oid PG::repair_object(
  const hobject_t& soid, ScrubMap::object *po,
  pg_shard_t bad_peer, pg_shard_t ok_peer)
{
  eversion_t v;
  bufferlist bv;
  bv.push_back(po->attrs[OI_ATTR]);
  object_info_t oi(bv);
  if (bad_peer != primary) {
    peer_missing[bad_peer].add(soid, oi.version, eversion_t());
  } else {
    // We should only be scrubbing if the PG is clean.
    assert(waiting_for_unreadable_object.empty());

    pg_log.missing_add(soid, oi.version, eversion_t());
    missing_loc.add_missing(soid, oi.version, eversion_t());
    missing_loc.add_location(soid, ok_peer);

    pg_log.set_last_requested(0);
  }
}

Here we can see that if the primary is corrupted, it will call to:

  missing_loc.add_location(soid, ok_peer);

So that only one shard (the authoritative shard) is added as the good one, this is fine for replication, however, for EC, when it checks if the object is recoverable:

ECRecPred::operator()(onst set<pg_shard_t> &_have)

It will need to check if enough good chunks are there to determine if the object is recoverable or not, as a result, it alway fail for EC as only one chunk (shard) was added.

Ceph version: 0.80.4
Platform: RHEL6

Related issues 3 (0 open — 3 closed)

Actions

Copy link

Updated by Loïc Dachary over 9 years ago

Assignee set to Loïc Dachary

Actions

Copy link

Updated by Loïc Dachary over 9 years ago

Priority changed from Normal to Urgent

Actions

Copy link

Updated by Guang Yang over 9 years ago

Besides the code fix, I am wondering what is the right way to fix the PG state (and object)? Bringing the OSD out might work as it remap the primary to other non-corrupted OSDs, but that seems too overkill..

Actions

Copy link

Updated by Samuel Just over 9 years ago

That all looks right. I'd mark the osd down, get the object, re-put it, and mark the osd back up. Should cause recovery. Loic: you'll want to cover this in the same test as the hinfo one.

Actions

Copy link

Updated by Samuel Just over 9 years ago

Actually, the marking down thing won't work.

Actions

Copy link

Updated by Loïc Dachary over 9 years ago

Samuel Just wrote:

Loic: you'll want to cover this in the same test as the hinfo one.

Ack :-)

Actions

Copy link

Updated by Loïc Dachary over 9 years ago

Status changed from New to 12

Actions

Copy link

Updated by Loïc Dachary over 9 years ago

Status changed from 12 to In Progress

Actions

Copy link

Updated by Loïc Dachary over 9 years ago

Status changed from In Progress to 12

Actions

Copy link

#10

Updated by Loïc Dachary over 9 years ago

Category set to OSD
Status changed from 12 to In Progress

Actions

Copy link

#11

Updated by Loïc Dachary over 9 years ago

I think https://github.com/ceph/ceph-qa-suite/pull/250 reproduces the problem reliably.

MUST_REPAIR MUST_DEEP_SCRUB MUST_SCRUB] sched_scrub: reserved 1(0),2(2), waiting for replicas
...
scrub   osd.1 has 1 items
scrub replica 0(1) has 1 items
scrub replica 2(2) has 1 items
...
1.0s0 shard 1(0): soid 3df68405/repair_test_obj/head//1 candidate had a read error, missing attr hinfo_key
...
repair_object 3df68405/repair_test_obj/head//1 bad_peer osd.1(0) ok_peer osd.2(2)

Actions

Copy link

#12

Updated by Loïc Dachary over 9 years ago

The same problem shows up when two OSDs are missing (k=2, m=2).

Actions

Copy link

#13

Updated by Loïc Dachary over 9 years ago

https://github.com/ceph/ceph/pull/3034

Actions

Copy link

#14

Updated by Loïc Dachary over 9 years ago

Fixed #10211 that showed up while experimenting

Actions

Copy link

#15

Updated by Loïc Dachary over 9 years ago

When the primary shard is lost in k=2, m=2, the PG has an unfound object (because, as explained in the description) there are not enough shards to rebuild (u=1).

2014-12-03 14:44:18.926370 7fcfcf8d7700  7 osd.3 pg_epoch: 30 pg[2.0s0( v 26'1 (0'0,26'1] local-les=30 n=1 ec=23 les/c 30/30 29/29/29) [3,1,2,0] r=0 lpr=29 crt=0'0 lcod 0'0 mlcod 0'0 active+clean+scrubbing+deep+repair] sub_op_scrub_map
2014-12-03 14:44:18.926381 7fcfcf8d7700 10 osd.3 pg_epoch: 30 pg[2.0s0( v 26'1 (0'0,26'1] local-les=30 n=1 ec=23 les/c 30/30 29/29/29) [3,1,2,0] r=0 lpr=29 crt=0'0 lcod 0'0 mlcod 0'0 active+clean+scrubbing+deep+repair]  got 2(2) scrub map
2014-12-03 14:44:18.926402 7fcfcf8d7700 10 osd.3 pg_epoch: 30 pg[2.0s0( v 26'1 (0'0,26'1] local-les=30 n=1 ec=23 les/c 30/30 29/29/29) [3,1,2,0] r=0 lpr=29 crt=0'0 lcod 0'0 mlcod 0'0 active+clean+scrubbing+deep+repair] map version is 26'1
2014-12-03 14:44:18.926414 7fcfcf8d7700 10 osd.3 30 dequeue_op 0x53dbf00 finish
2014-12-03 14:44:18.926429 7fcfcc6f1700 20 osd.3 pg_epoch: 30 pg[2.0s0( v 26'1 (0'0,26'1] local-les=30 n=1 ec=23 les/c 30/30 29/29/29) [3,1,2,0] r=0 lpr=29 crt=0'0 lcod 0'0 mlcod 0'0 active+clean+scrubbing+deep+repair] scrub state WAIT_REPLICAS
2014-12-03 14:44:18.926446 7fcfcc6f1700 20 osd.3 pg_epoch: 30 pg[2.0s0( v 26'1 (0'0,26'1] local-les=30 n=1 ec=23 les/c 30/30 29/29/29) [3,1,2,0] r=0 lpr=29 crt=0'0 lcod 0'0 mlcod 0'0 active+clean+scrubbing+deep+repair] scrub state COMPARE_MAPS
2014-12-03 14:44:18.926452 7fcfcc6f1700 10 osd.3 pg_epoch: 30 pg[2.0s0( v 26'1 (0'0,26'1] local-les=30 n=1 ec=23 les/c 30/30 29/29/29) [3,1,2,0] r=0 lpr=29 crt=0'0 lcod 0'0 mlcod 0'0 active+clean+scrubbing+deep+repair] scrub_compare_maps has maps, analyzing
2014-12-03 14:44:18.926462 7fcfcc6f1700 10 osd.3 pg_epoch: 30 pg[2.0s0( v 26'1 (0'0,26'1] local-les=30 n=1 ec=23 les/c 30/30 29/29/29) [3,1,2,0] r=0 lpr=29 crt=0'0 lcod 0'0 mlcod 0'0 active+clean+scrubbing+deep+repair] scrub  comparing replica scrub maps
2014-12-03 14:44:18.926472 7fcfcc6f1700  2 osd.3 pg_epoch: 30 pg[2.0s0( v 26'1 (0'0,26'1] local-les=30 n=1 ec=23 les/c 30/30 29/29/29) [3,1,2,0] r=0 lpr=29 crt=0'0 lcod 0'0 mlcod 0'0 active+clean+scrubbing+deep+repair] scrub   osd.3 has 0 items
2014-12-03 14:44:18.926479 7fcfcc6f1700  2 osd.3 pg_epoch: 30 pg[2.0s0( v 26'1 (0'0,26'1] local-les=30 n=1 ec=23 les/c 30/30 29/29/29) [3,1,2,0] r=0 lpr=29 crt=0'0 lcod 0'0 mlcod 0'0 active+clean+scrubbing+deep+repair] scrub replica 0(3) has 1 items
2014-12-03 14:44:18.926486 7fcfcc6f1700  2 osd.3 pg_epoch: 30 pg[2.0s0( v 26'1 (0'0,26'1] local-les=30 n=1 ec=23 les/c 30/30 29/29/29) [3,1,2,0] r=0 lpr=29 crt=0'0 lcod 0'0 mlcod 0'0 active+clean+scrubbing+deep+repair] scrub replica 1(1) has 1 items
2014-12-03 14:44:18.926492 7fcfcc6f1700  2 osd.3 pg_epoch: 30 pg[2.0s0( v 26'1 (0'0,26'1] local-les=30 n=1 ec=23 les/c 30/30 29/29/29) [3,1,2,0] r=0 lpr=29 crt=0'0 lcod 0'0 mlcod 0'0 active+clean+scrubbing+deep+repair] scrub replica 2(2) has 1 items
2014-12-03 14:44:18.926507 7fcfcc6f1700 10 osd.3 pg_epoch: 30 pg[2.0s0( v 26'1 (0'0,26'1] local-les=30 n=1 ec=23 les/c 30/30 29/29/29) [3,1,2,0] r=0 lpr=29 crt=0'0 lcod 0'0 mlcod 0'0 active+clean+scrubbing+deep+repair] be_select_auth_object: selecting osd 0(3) for obj 847441d7/SOMETHING/head//2, auth == maps.end()
2014-12-03 14:44:18.926529 7fcfcc6f1700 10 osd.3 pg_epoch: 30 pg[2.0s0( v 26'1 (0'0,26'1] local-les=30 n=1 ec=23 les/c 30/30 29/29/29) [3,1,2,0] r=0 lpr=29 crt=0'0 lcod 0'0 mlcod 0'0 active+clean+scrubbing+deep+repair] be_select_auth_object: selecting osd 1(1) for obj 847441d7/SOMETHING/head//2
2014-12-03 14:44:18.926545 7fcfcc6f1700 10 osd.3 pg_epoch: 30 pg[2.0s0( v 26'1 (0'0,26'1] local-les=30 n=1 ec=23 les/c 30/30 29/29/29) [3,1,2,0] r=0 lpr=29 crt=0'0 lcod 0'0 mlcod 0'0 active+clean+scrubbing+deep+repair] be_select_auth_object: selecting osd 2(2) for obj 847441d7/SOMETHING/head//2
2014-12-03 14:44:18.926572 7fcfcc6f1700  2 osd.3 pg_epoch: 30 pg[2.0s0( v 26'1 (0'0,26'1] local-les=30 n=1 ec=23 les/c 30/30 29/29/29) [3,1,2,0] r=0 lpr=29 crt=0'0 lcod 0'0 mlcod 0'0 active+clean+scrubbing+deep+repair] be_compare_scrubmaps: 2.0s0 shard 3(0) missing 847441d7/SOMETHING/head//2

2014-12-03 14:44:18.926581 7fcfcc6f1700 -1 log_channel(default) log [ERR] : be_compare_scrubmaps: 2.0s0 shard 3(0) missing 847441d7/SOMETHING/head//2
2014-12-03 14:44:18.926598 7fcfcc6f1700 10 osd.3 pg_epoch: 30 pg[2.0s0( v 26'1 (0'0,26'1] local-les=30 n=1 ec=23 les/c 30/30 29/29/29) [3,1,2,0] r=0 lpr=29 crt=0'0 lcod 0'0 mlcod 0'0 active+clean+scrubbing+deep+repair] _scrub
2014-12-03 14:44:18.926623 7fcfcc6f1700 20 osd.3 pg_epoch: 30 pg[2.0s0( v 26'1 (0'0,26'1] local-les=30 n=1 ec=23 les/c 30/30 29/29/29) [3,1,2,0] r=0 lpr=29 crt=0'0 lcod 0'0 mlcod 0'0 active+clean+scrubbing+deep+repair] repair  847441d7/SOMETHING/head//2 847441d7/SOMETHING/head//2(26'1 client.4147.0:1 wrlock_by=unknown.0.0:0 dirty s 7 uv1)
2014-12-03 14:44:18.926636 7fcfcc6f1700 10 osd.3 pg_epoch: 30 pg[2.0s0( v 26'1 (0'0,26'1] local-les=30 n=1 ec=23 les/c 30/30 29/29/29) [3,1,2,0] r=0 lpr=29 crt=0'0 lcod 0'0 mlcod 0'0 active+clean+scrubbing+deep+repair] _scrub (repair) finish
2014-12-03 14:44:18.926646 7fcfcc6f1700 15 osd.3 pg_epoch: 30 pg[2.0s0( v 26'1 (0'0,26'1] local-les=30 n=1 ec=23 les/c 30/30 29/29/29) [3,1,2,0] r=0 lpr=29 crt=0'0 lcod 0'0 mlcod 0'0 active+clean+scrubbing+deep+repair]  requeue_ops 
2014-12-03 14:44:18.926654 7fcfcc6f1700 20 osd.3 pg_epoch: 30 pg[2.0s0( v 26'1 (0'0,26'1] local-les=30 n=1 ec=23 les/c 30/30 29/29/29) [3,1,2,0] r=0 lpr=29 crt=0'0 lcod 0'0 mlcod 0'0 active+clean+scrubbing+deep+repair] scrub state FINISH
2014-12-03 14:44:18.926660 7fcfcc6f1700 10 osd.3 pg_epoch: 30 pg[2.0s0( v 26'1 (0'0,26'1] local-les=30 n=1 ec=23 les/c 30/30 29/29/29) [3,1,2,0] r=0 lpr=29 crt=0'0 lcod 0'0 mlcod 0'0 active+clean+scrubbing+deep+repair] repair got 1/1 objects, 0/0 clones, 1/1 dirty, 0/0 omap, 0/0 hit_set_archive, 7/7 bytes,0/0 hit_set_archive bytes.
2014-12-03 14:44:18.926668 7fcfcc6f1700 10 osd.3 pg_epoch: 30 pg[2.0s0( v 26'1 (0'0,26'1] local-les=30 n=1 ec=23 les/c 30/30 29/29/29) [3,1,2,0] r=0 lpr=29 crt=0'0 lcod 0'0 mlcod 0'0 active+clean+scrubbing+deep+repair] process_inconsistent() checking authoritative
2014-12-03 14:44:18.926675 7fcfcc6f1700  2 osd.3 pg_epoch: 30 pg[2.0s0( v 26'1 (0'0,26'1] local-les=30 n=1 ec=23 les/c 30/30 29/29/29) [3,1,2,0] r=0 lpr=29 crt=0'0 lcod 0'0 mlcod 0'0 active+clean+scrubbing+deep+repair] 2.0s0 repair 1 missing, 0 inconsistent objects
2014-12-03 14:44:18.926684 7fcfcc6f1700 -1 log_channel(default) log [ERR] : 2.0s0 repair 1 missing, 0 inconsistent objects
2014-12-03 14:44:18.926689 7fcfcc6f1700 10 osd.3 pg_epoch: 30 pg[2.0s0( v 26'1 (0'0,26'1] local-les=30 n=1 ec=23 les/c 30/30 29/29/29) [3,1,2,0] r=0 lpr=29 crt=0'0 lcod 0'0 mlcod 0'0 active+scrubbing+deep+repair] repair_object 847441d7/SOMETHING/head//2 bad_peer osd.3(0) ok_peer osd.2(2)
2014-12-03 14:44:18.926717 7fcfcc6f1700 -1 log_channel(default) log [ERR] : 2.0 repair 1 errors, 1 fixed
2014-12-03 14:44:18.926762 7fcfcc6f1700 10 log is not dirty
2014-12-03 14:44:18.926814 7fcfcc6f1700 15 osd.3 pg_epoch: 30 pg[2.0s0( v 26'1 (0'0,26'1] local-les=30 n=1 ec=23 les/c 30/30 29/29/29) [3,1,2,0] r=0 lpr=29 crt=0'0 lcod 0'0 mlcod 0'0 active m=1 u=1] publish_stats_to_osd 30:21
2014-12-03 14:44:18.926827 7fcfcc6f1700 20 osd.3 30 dec_scrubs_active 1 -> 0 (max 1, pending 0)
2014-12-03 14:44:18.926829 7fcfcc6f1700 15 osd.3 pg_epoch: 30 pg[2.0s0( v 26'1 (0'0,26'1] local-les=30 n=1 ec=23 les/c 30/30 29/29/29) [3,1,2,0] r=0 lpr=29 crt=0'0 lcod 0'0 mlcod 0'0 active m=1 u=1]  requeue_ops

The same situation (primary lost) in a replicated pool shows that after the objects are declared "fixed" there are no unfound object, just one missing objects and a copy will be fetch from a replica after going to Recovery.

2014-12-03 15:19:14.689796 7fd29d2cb700  7 osd.1 pg_epoch: 17 pg[1.3( v 13'1 (0'0,13'1] local-les=17 n=1 ec=3 les/c 17/17 16/16/16) [1,0] r=0 lpr=16 crt=0'0 lcod 0'0 mlcod 0'0 active+clean+scrubbing+deep+repair] sub_op_scrub_map
2014-12-03 15:19:14.689823 7fd29d2cb700 10 osd.1 pg_epoch: 17 pg[1.3( v 13'1 (0'0,13'1] local-les=17 n=1 ec=3 les/c 17/17 16/16/16) [1,0] r=0 lpr=16 crt=0'0 lcod 0'0 mlcod 0'0 active+clean+scrubbing+deep+repair]  got 0 scrub map
2014-12-03 15:19:14.689864 7fd29d2cb700 10 osd.1 pg_epoch: 17 pg[1.3( v 13'1 (0'0,13'1] local-les=17 n=1 ec=3 les/c 17/17 16/16/16) [1,0] r=0 lpr=16 crt=0'0 lcod 0'0 mlcod 0'0 active+clean+scrubbing+deep+repair] map version is 13'1
2014-12-03 15:19:14.689881 7fd29d2cb700 10 osd.1 17 dequeue_op 0x5543d00 finish
2014-12-03 15:19:14.689909 7fd29b9d8700 20 osd.1 pg_epoch: 17 pg[1.3( v 13'1 (0'0,13'1] local-les=17 n=1 ec=3 les/c 17/17 16/16/16) [1,0] r=0 lpr=16 crt=0'0 lcod 0'0 mlcod 0'0 active+clean+scrubbing+deep+repair] scrub state WAIT_REPLICAS
2014-12-03 15:19:14.689939 7fd29b9d8700 20 osd.1 pg_epoch: 17 pg[1.3( v 13'1 (0'0,13'1] local-les=17 n=1 ec=3 les/c 17/17 16/16/16) [1,0] r=0 lpr=16 crt=0'0 lcod 0'0 mlcod 0'0 active+clean+scrubbing+deep+repair] scrub state COMPARE_MAPS
2014-12-03 15:19:14.689948 7fd29b9d8700 10 osd.1 pg_epoch: 17 pg[1.3( v 13'1 (0'0,13'1] local-les=17 n=1 ec=3 les/c 17/17 16/16/16) [1,0] r=0 lpr=16 crt=0'0 lcod 0'0 mlcod 0'0 active+clean+scrubbing+deep+repair] scrub_compare_maps has maps, analyzing
2014-12-03 15:19:14.689957 7fd29b9d8700 10 osd.1 pg_epoch: 17 pg[1.3( v 13'1 (0'0,13'1] local-les=17 n=1 ec=3 les/c 17/17 16/16/16) [1,0] r=0 lpr=16 crt=0'0 lcod 0'0 mlcod 0'0 active+clean+scrubbing+deep+repair] scrub  comparing replica scrub maps
2014-12-03 15:19:14.689972 7fd29b9d8700  2 osd.1 pg_epoch: 17 pg[1.3( v 13'1 (0'0,13'1] local-les=17 n=1 ec=3 les/c 17/17 16/16/16) [1,0] r=0 lpr=16 crt=0'0 lcod 0'0 mlcod 0'0 active+clean+scrubbing+deep+repair] scrub   osd.1 has 0 items
2014-12-03 15:19:14.689982 7fd29b9d8700  2 osd.1 pg_epoch: 17 pg[1.3( v 13'1 (0'0,13'1] local-les=17 n=1 ec=3 les/c 17/17 16/16/16) [1,0] r=0 lpr=16 crt=0'0 lcod 0'0 mlcod 0'0 active+clean+scrubbing+deep+repair] scrub replica 0 has 1 items
2014-12-03 15:19:14.690004 7fd29b9d8700 10 osd.1 pg_epoch: 17 pg[1.3( v 13'1 (0'0,13'1] local-les=17 n=1 ec=3 les/c 17/17 16/16/16) [1,0] r=0 lpr=16 crt=0'0 lcod 0'0 mlcod 0'0 active+clean+scrubbing+deep+repair] be_select_auth_object: selecting osd 0 for obj 847441d7/SOMETHING/head//1, auth == maps.end()
2014-12-03 15:19:14.690032 7fd29b9d8700  2 osd.1 pg_epoch: 17 pg[1.3( v 13'1 (0'0,13'1] local-les=17 n=1 ec=3 les/c 17/17 16/16/16) [1,0] r=0 lpr=16 crt=0'0 lcod 0'0 mlcod 0'0 active+clean+scrubbing+deep+repair] be_compare_scrubmaps: 1.3 shard 1 missing 847441d7/SOMETHING/head//1

2014-12-03 15:19:14.690056 7fd29b9d8700 -1 log_channel(default) log [ERR] : be_compare_scrubmaps: 1.3 shard 1 missing 847441d7/SOMETHING/head//1
2014-12-03 15:19:14.690078 7fd29b9d8700 10 osd.1 pg_epoch: 17 pg[1.3( v 13'1 (0'0,13'1] local-les=17 n=1 ec=3 les/c 17/17 16/16/16) [1,0] r=0 lpr=16 crt=0'0 lcod 0'0 mlcod 0'0 active+clean+scrubbing+deep+repair] _scrub
2014-12-03 15:19:14.690119 7fd29b9d8700 20 osd.1 pg_epoch: 17 pg[1.3( v 13'1 (0'0,13'1] local-les=17 n=1 ec=3 les/c 17/17 16/16/16) [1,0] r=0 lpr=16 crt=0'0 lcod 0'0 mlcod 0'0 active+clean+scrubbing+deep+repair] repair  847441d7/SOMETHING/head//1 847441d7/SOMETHING/head//1(13'1 client.4131.0:1 wrlock_by=unknown.0.0:0 dirty s 7 uv1)
2014-12-03 15:19:14.690139 7fd29b9d8700 10 osd.1 pg_epoch: 17 pg[1.3( v 13'1 (0'0,13'1] local-les=17 n=1 ec=3 les/c 17/17 16/16/16) [1,0] r=0 lpr=16 crt=0'0 lcod 0'0 mlcod 0'0 active+clean+scrubbing+deep+repair] _scrub (repair) finish
2014-12-03 15:19:14.690154 7fd29b9d8700 15 osd.1 pg_epoch: 17 pg[1.3( v 13'1 (0'0,13'1] local-les=17 n=1 ec=3 les/c 17/17 16/16/16) [1,0] r=0 lpr=16 crt=0'0 lcod 0'0 mlcod 0'0 active+clean+scrubbing+deep+repair]  requeue_ops 
2014-12-03 15:19:14.690166 7fd29b9d8700 20 osd.1 pg_epoch: 17 pg[1.3( v 13'1 (0'0,13'1] local-les=17 n=1 ec=3 les/c 17/17 16/16/16) [1,0] r=0 lpr=16 crt=0'0 lcod 0'0 mlcod 0'0 active+clean+scrubbing+deep+repair] scrub state FINISH
2014-12-03 15:19:14.690175 7fd29b9d8700 10 osd.1 pg_epoch: 17 pg[1.3( v 13'1 (0'0,13'1] local-les=17 n=1 ec=3 les/c 17/17 16/16/16) [1,0] r=0 lpr=16 crt=0'0 lcod 0'0 mlcod 0'0 active+clean+scrubbing+deep+repair] repair got 1/1 objects, 0/0 clones, 1/1 dirty, 0/0 omap, 0/0 hit_set_archive, 7/7 bytes,0/0 hit_set_archive bytes.
2014-12-03 15:19:14.690187 7fd29b9d8700 10 osd.1 pg_epoch: 17 pg[1.3( v 13'1 (0'0,13'1] local-les=17 n=1 ec=3 les/c 17/17 16/16/16) [1,0] r=0 lpr=16 crt=0'0 lcod 0'0 mlcod 0'0 active+clean+scrubbing+deep+repair] process_inconsistent() checking authoritative
2014-12-03 15:19:14.690197 7fd29b9d8700  2 osd.1 pg_epoch: 17 pg[1.3( v 13'1 (0'0,13'1] local-les=17 n=1 ec=3 les/c 17/17 16/16/16) [1,0] r=0 lpr=16 crt=0'0 lcod 0'0 mlcod 0'0 active+clean+scrubbing+deep+repair] 1.3 repair 1 missing, 0 inconsistent objects
2014-12-03 15:19:14.690205 7fd29b9d8700 -1 log_channel(default) log [ERR] : 1.3 repair 1 missing, 0 inconsistent objects
2014-12-03 15:19:14.690213 7fd29b9d8700 10 osd.1 pg_epoch: 17 pg[1.3( v 13'1 (0'0,13'1] local-les=17 n=1 ec=3 les/c 17/17 16/16/16) [1,0] r=0 lpr=16 crt=0'0 lcod 0'0 mlcod 0'0 active+scrubbing+deep+repair] repair_object 847441d7/SOMETHING/head//1 bad_peer osd.1 ok_peer osd.0
2014-12-03 15:19:14.690253 7fd29b9d8700 -1 log_channel(default) log [ERR] : 1.3 repair 1 errors, 1 fixed
2014-12-03 15:19:14.690318 7fd29b9d8700 10 log is not dirty
2014-12-03 15:19:14.690382 7fd29b9d8700 15 osd.1 pg_epoch: 17 pg[1.3( v 13'1 (0'0,13'1] local-les=17 n=1 ec=3 les/c 17/17 16/16/16) [1,0] r=0 lpr=16 crt=0'0 lcod 0'0 mlcod 0'0 active m=1] publish_stats_to_osd 17:21
2014-12-03 15:19:14.690401 7fd29b9d8700 20 osd.1 17 dec_scrubs_active 1 -> 0 (max 1, pending 0)
2014-12-03 15:19:14.690404 7fd29b9d8700 15 osd.1 pg_epoch: 17 pg[1.3( v 13'1 (0'0,13'1] local-les=17 n=1 ec=3 les/c 17/17 16/16/16) [1,0] r=0 lpr=16 crt=0'0 lcod 0'0 mlcod 0'0 active m=1]  requeue_ops 
2014-12-03 15:19:14.690425 7fd29b9d8700 10 osd.1 pg_epoch: 17 pg[1.3( v 13'1 (0'0,13'1] local-les=17 n=1 ec=3 les/c 17/17 16/16/16) [1,0] r=0 lpr=16 crt=0'0 lcod 0'0 mlcod 0'0 active m=1] scrub requesting unreserve from osd.0
2014-12-03 15:19:14.690456 7fd29b9d8700 20 osd.1 17 share_map_peer 0x5579340 already has epoch 17
2014-12-03 15:19:14.690488 7fd29b9d8700 10 osd.1 pg_epoch: 17 pg[1.3( v 13'1 (0'0,13'1] local-les=17 n=1 ec=3 les/c 17/17 16/16/16) [1,0] r=0 lpr=16 crt=0'0 lcod 0'0 mlcod 0'0 active m=1] share_pg_info
2014-12-03 15:19:14.690592 7fd29b9d8700 20 osd.1 17 share_map_peer 0x5579340 already has epoch 17
2014-12-03 15:19:14.690640 7fd2a1da4700 10 osd.1 pg_epoch: 17 pg[1.3( v 13'1 (0'0,13'1] local-les=17 n=1 ec=3 les/c 17/17 16/16/16) [1,0] r=0 lpr=16 crt=0'0 lcod 0'0 mlcod 0'0 active m=1] handle_peering_event: epoch_sent: 17 epoch_requested: 17 DoRecovery
2014-12-03 15:19:14.690684 7fd2a1da4700  5 osd.1 pg_epoch: 17 pg[1.3( v 13'1 (0'0,13'1] local-les=17 n=1 ec=3 les/c 17/17 16/16/16) [1,0] r=0 lpr=16 crt=0'0 lcod 0'0 mlcod 0'0 active m=1] exit Started/Primary/Active/Clean 7.298244 1 0.000700
2014-12-03 15:19:14.690708 7fd2a1da4700  5 osd.1 pg_epoch: 17 pg[1.3( v 13'1 (0'0,13'1] local-les=17 n=1 ec=3 les/c 17/17 16/16/16) [1,0] r=0 lpr=16 crt=0'0 lcod 0'0 mlcod 0'0 active m=1] enter Started/Primary/Active/WaitLocalRecoveryReserved
2014-12-03 15:19:14.690749 7fd2a1da4700 10 log is not dirty

Actions

Copy link

#16

Updated by Loïc Dachary over 9 years ago

Exploring a two options:

changing PG::scrub_compare_maps to collect all shards for a given missing object so that PG::scrub_process_inconsistent can feed them to PG::repair_object to accurately reflect what can be recovered
ignoring the "unfound" flag in the case of erasure coded objects because recovery will be in a better position to figure out what shards are actually available

Actions

Copy link

#17

Updated by Loïc Dachary over 9 years ago

Here is a tentative approach. The idea is to accumulate authoritative peers instead of just keeping the last one. For replicated pool it should only keep the last. For erasure coded pool it would use all of them for reconstruction. Draft code at https://github.com/dachary/ceph/commit/edc132671804457443b22dff79362d804cf528da#diff-dfb9ddca0a3ee32b266623e8fa489626L3600.

I'm absolutely not sure this is sane, feedback is very welcome :-)

Actions

Copy link

#18