Actions
Bug #1099
closedosd: handle recovery of lost objects
Status:
Closed
Priority:
Normal
Assignee:
-
Category:
OSD
Target version:
-
% Done:
0%
Source:
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
osd/ReplicatedPG.cc: In function 'int ReplicatedPG::recover_primary(int)', in thread '0x7fc9a4f66700' osd/ReplicatedPG.cc: 4306: FAILED assert(latest->is_update()) ceph version 0.28-25-gbdc371e (commit:bdc371e5936ff21cf96ef94aa7a5ae31fcee8abd) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x53) [0x945e5e] 2: (ReplicatedPG::recover_primary(int)+0x3b5) [0x73311d] 3: (ReplicatedPG::start_recovery_ops(int)+0xc5) [0x7329df] 4: (OSD::do_recovery(PG*)+0x242) [0x7bab86] 5: (OSD::RecoveryWQ::_process(PG*)+0x27) [0x7cb5b5] 6: (ThreadPool::WorkQueue<PG>::_void_process(void*)+0x2e) [0x822704] 7: (ThreadPool::worker()+0x2bd) [0x9473cb] 8: (ThreadPool::WorkThread::entry()+0x1c) [0x7c93ec] 9: (Thread::_entry_func(void*)+0x23) [0x704565] 10: (()+0x68ba) [0x7fc9b31d68ba] 11: (clone()+0x6d) [0x7fc9b1e6b02d] ceph version 0.28-25-gbdc371e (commit:bdc371e5936ff21cf96ef94aa7a5ae31fcee8abd) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x53) [0x945e5e] 2: (ReplicatedPG::recover_primary(int)+0x3b5) [0x73311d] 3: (ReplicatedPG::start_recovery_ops(int)+0xc5) [0x7329df] 4: (OSD::do_recovery(PG*)+0x242) [0x7bab86] 5: (OSD::RecoveryWQ::_process(PG*)+0x27) [0x7cb5b5] 6: (ThreadPool::WorkQueue<PG>::_void_process(void*)+0x2e) [0x822704] 7: (ThreadPool::worker()+0x2bd) [0x9473cb] 8: (ThreadPool::WorkThread::entry()+0x1c) [0x7c93ec] 9: (Thread::_entry_func(void*)+0x23) [0x704565] 10: (()+0x68ba) [0x7fc9b31d68ba] 11: (clone()+0x6d) [0x7fc9b1e6b02d]
Updated by Sage Weil almost 13 years ago
My hacky workaround was
diff --git a/src/osd/ReplicatedPG.cc b/src/osd/ReplicatedPG.cc index df27c27..c7d383b 100644 --- a/src/osd/ReplicatedPG.cc +++ b/src/osd/ReplicatedPG.cc @@ -4303,7 +4303,7 @@ int ReplicatedPG::recover_primary(int max) if (log.objects.count(p->second)) { latest = log.objects[p->second]; - assert(latest->is_update()); + assert(latest->is_update() || latest->is_lost()); soid = latest->soid; } else { latest = 0; @@ -4332,6 +4332,14 @@ int ReplicatedPG::recover_primary(int max) } else if (unfound) { ++skipped; } else { + if (latest && latest->op == Log::Entry::LOST) { + ObjectStore::Transaction *t = new ObjectStore::Transaction; + mark_obj_as_lost(*t, soid); + int tr = osd->store->queue_transaction(&osr, t); + assert(tr == 0); + continue; + } + // is this a clone operation that we can do locally? if (latest && latest->op == Log::Entry::CLONE) { if (missing.is_missing(head) &&
but there are much larger issues here with LOST objects.
Updated by Sage Weil almost 13 years ago
- Priority changed from High to Normal
- Target version changed from v0.29 to 19
For the time being I disabled automatic marking of lost objects. That makes dealing when "recovering" them less of a pressing issue (they should only come up in horrible failure scenarios).
Updated by Sage Weil almost 13 years ago
- Subject changed from osd: FAILED assert(latest->is_update()) in ReplicatedPG::recover_primary(int) to osd: handle recovery of lost objects
Updated by Sage Weil over 12 years ago
- Translation missing: en.field_position set to 191
Updated by Sage Weil over 12 years ago
- Status changed from New to Closed
this has been reimplemented (at least the revert case).
Actions