Bug #1099: osd: handle recovery of lost objects - Ceph - Ceph

Actions

Copy link

Bug #1099

closed

osd: handle recovery of lost objects

Added by Sage Weil almost 13 years ago. Updated over 12 years ago.

Status:

Closed

Priority:

Normal

Assignee:

Category:

OSD

Target version:

% Done:

Source:

Tags:

Backport:

Regression:

Severity:

Reviewed:

Affected Versions:

ceph-qa-suite:

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

osd/ReplicatedPG.cc: In function 'int ReplicatedPG::recover_primary(int)', in thread '0x7fc9a4f66700'
osd/ReplicatedPG.cc: 4306: FAILED assert(latest->is_update())
 ceph version 0.28-25-gbdc371e (commit:bdc371e5936ff21cf96ef94aa7a5ae31fcee8abd)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x53) [0x945e5e]
 2: (ReplicatedPG::recover_primary(int)+0x3b5) [0x73311d]
 3: (ReplicatedPG::start_recovery_ops(int)+0xc5) [0x7329df]
 4: (OSD::do_recovery(PG*)+0x242) [0x7bab86]
 5: (OSD::RecoveryWQ::_process(PG*)+0x27) [0x7cb5b5]
 6: (ThreadPool::WorkQueue<PG>::_void_process(void*)+0x2e) [0x822704]
 7: (ThreadPool::worker()+0x2bd) [0x9473cb]
 8: (ThreadPool::WorkThread::entry()+0x1c) [0x7c93ec]
 9: (Thread::_entry_func(void*)+0x23) [0x704565]
 10: (()+0x68ba) [0x7fc9b31d68ba]
 11: (clone()+0x6d) [0x7fc9b1e6b02d]
 ceph version 0.28-25-gbdc371e (commit:bdc371e5936ff21cf96ef94aa7a5ae31fcee8abd)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x53) [0x945e5e]
 2: (ReplicatedPG::recover_primary(int)+0x3b5) [0x73311d]
 3: (ReplicatedPG::start_recovery_ops(int)+0xc5) [0x7329df]
 4: (OSD::do_recovery(PG*)+0x242) [0x7bab86]
 5: (OSD::RecoveryWQ::_process(PG*)+0x27) [0x7cb5b5]
 6: (ThreadPool::WorkQueue<PG>::_void_process(void*)+0x2e) [0x822704]
 7: (ThreadPool::worker()+0x2bd) [0x9473cb]
 8: (ThreadPool::WorkThread::entry()+0x1c) [0x7c93ec]
 9: (Thread::_entry_func(void*)+0x23) [0x704565]
 10: (()+0x68ba) [0x7fc9b31d68ba]
 11: (clone()+0x6d) [0x7fc9b1e6b02d]

Actions

Copy link

Updated by Sage Weil almost 13 years ago

My hacky workaround was

diff --git a/src/osd/ReplicatedPG.cc b/src/osd/ReplicatedPG.cc
index df27c27..c7d383b 100644
--- a/src/osd/ReplicatedPG.cc
+++ b/src/osd/ReplicatedPG.cc
@@ -4303,7 +4303,7 @@ int ReplicatedPG::recover_primary(int max)

     if (log.objects.count(p->second)) {
       latest = log.objects[p->second];
-      assert(latest->is_update());
+      assert(latest->is_update() || latest->is_lost());
       soid = latest->soid;
     } else {
       latest = 0;
@@ -4332,6 +4332,14 @@ int ReplicatedPG::recover_primary(int max)
       } else if (unfound) {
        ++skipped;
       } else {
+       if (latest && latest->op == Log::Entry::LOST) {
+         ObjectStore::Transaction *t = new ObjectStore::Transaction;
+         mark_obj_as_lost(*t, soid);
+         int tr = osd->store->queue_transaction(&osr, t);
+         assert(tr == 0);
+         continue;
+       }
+
        // is this a clone operation that we can do locally?
        if (latest && latest->op == Log::Entry::CLONE) {
          if (missing.is_missing(head) &&

but there are much larger issues here with LOST objects.

Actions

Copy link

Updated by Sage Weil almost 13 years ago

Priority changed from High to Normal
Target version changed from v0.29 to 19

For the time being I disabled automatic marking of lost objects. That makes dealing when "recovering" them less of a pressing issue (they should only come up in horrible failure scenarios).

Actions

Copy link