Project

General

Profile

Actions

Bug #1099

closed

osd: handle recovery of lost objects

Added by Sage Weil almost 13 years ago. Updated over 12 years ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Category:
OSD
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

osd/ReplicatedPG.cc: In function 'int ReplicatedPG::recover_primary(int)', in thread '0x7fc9a4f66700'
osd/ReplicatedPG.cc: 4306: FAILED assert(latest->is_update())
 ceph version 0.28-25-gbdc371e (commit:bdc371e5936ff21cf96ef94aa7a5ae31fcee8abd)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x53) [0x945e5e]
 2: (ReplicatedPG::recover_primary(int)+0x3b5) [0x73311d]
 3: (ReplicatedPG::start_recovery_ops(int)+0xc5) [0x7329df]
 4: (OSD::do_recovery(PG*)+0x242) [0x7bab86]
 5: (OSD::RecoveryWQ::_process(PG*)+0x27) [0x7cb5b5]
 6: (ThreadPool::WorkQueue<PG>::_void_process(void*)+0x2e) [0x822704]
 7: (ThreadPool::worker()+0x2bd) [0x9473cb]
 8: (ThreadPool::WorkThread::entry()+0x1c) [0x7c93ec]
 9: (Thread::_entry_func(void*)+0x23) [0x704565]
 10: (()+0x68ba) [0x7fc9b31d68ba]
 11: (clone()+0x6d) [0x7fc9b1e6b02d]
 ceph version 0.28-25-gbdc371e (commit:bdc371e5936ff21cf96ef94aa7a5ae31fcee8abd)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x53) [0x945e5e]
 2: (ReplicatedPG::recover_primary(int)+0x3b5) [0x73311d]
 3: (ReplicatedPG::start_recovery_ops(int)+0xc5) [0x7329df]
 4: (OSD::do_recovery(PG*)+0x242) [0x7bab86]
 5: (OSD::RecoveryWQ::_process(PG*)+0x27) [0x7cb5b5]
 6: (ThreadPool::WorkQueue<PG>::_void_process(void*)+0x2e) [0x822704]
 7: (ThreadPool::worker()+0x2bd) [0x9473cb]
 8: (ThreadPool::WorkThread::entry()+0x1c) [0x7c93ec]
 9: (Thread::_entry_func(void*)+0x23) [0x704565]
 10: (()+0x68ba) [0x7fc9b31d68ba]
 11: (clone()+0x6d) [0x7fc9b1e6b02d]
Actions #1

Updated by Sage Weil almost 13 years ago

My hacky workaround was

diff --git a/src/osd/ReplicatedPG.cc b/src/osd/ReplicatedPG.cc
index df27c27..c7d383b 100644
--- a/src/osd/ReplicatedPG.cc
+++ b/src/osd/ReplicatedPG.cc
@@ -4303,7 +4303,7 @@ int ReplicatedPG::recover_primary(int max)

     if (log.objects.count(p->second)) {
       latest = log.objects[p->second];
-      assert(latest->is_update());
+      assert(latest->is_update() || latest->is_lost());
       soid = latest->soid;
     } else {
       latest = 0;
@@ -4332,6 +4332,14 @@ int ReplicatedPG::recover_primary(int max)
       } else if (unfound) {
        ++skipped;
       } else {
+       if (latest && latest->op == Log::Entry::LOST) {
+         ObjectStore::Transaction *t = new ObjectStore::Transaction;
+         mark_obj_as_lost(*t, soid);
+         int tr = osd->store->queue_transaction(&osr, t);
+         assert(tr == 0);
+         continue;
+       }
+
        // is this a clone operation that we can do locally?
        if (latest && latest->op == Log::Entry::CLONE) {
          if (missing.is_missing(head) &&

but there are much larger issues here with LOST objects.

Actions #2

Updated by Sage Weil almost 13 years ago

  • Priority changed from High to Normal
  • Target version changed from v0.29 to 19

For the time being I disabled automatic marking of lost objects. That makes dealing when "recovering" them less of a pressing issue (they should only come up in horrible failure scenarios).

Actions #3

Updated by Sage Weil almost 13 years ago

  • Subject changed from osd: FAILED assert(latest->is_update()) in ReplicatedPG::recover_primary(int) to osd: handle recovery of lost objects
Actions #4

Updated by Sage Weil over 12 years ago

  • Target version deleted (19)
Actions #5

Updated by Sage Weil over 12 years ago

  • Translation missing: en.field_position set to 191
Actions #6

Updated by Sage Weil over 12 years ago

  • Status changed from New to Closed

this has been reimplemented (at least the revert case).

Actions

Also available in: Atom PDF