Actions
Bug #458
closedOSD::activate_pg
Status:
Won't Fix
Priority:
Normal
Assignee:
-
Category:
OSD
Target version:
-
% Done:
0%
Source:
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
On one of my OSD's (osd7) I started to see:
2010-10-04 19:39:42.749102 7f3c82240710 osd7 17871 pg[0.55( v 409'847 (11825'1587,409'847]+backlog n=1496 ec=2 les=17871 17868/17868/17868) [7,4] r=0 (log bound mismatch, empty) lcod 0'0 mlcod 0'0 active] clear_prior 2010-10-04 19:39:42.749134 7f3c82240710 osd7 17871 pg[0.55( v 409'847 (11825'1587,409'847]+backlog n=1496 ec=2 les=17871 17868/17868/17868) [7,4] r=0 (log bound mismatch, empty) lcod 0'0 mlcod 0'0 active] cancel_generate_backlog 2010-10-04 19:39:42.749174 7f3c82240710 osd7 17871 pg[0.55( v 409'847 (11825'1587,409'847]+backlog n=1496 ec=2 les=17871 17868/17868/17868) [7,4] r=0 (log bound mismatch, empty) lcod 0'0 mlcod 0'0 active] write_log 2010-10-04 19:39:42.749207 7f3c82240710 osd7 17871 pg[0.55( v 409'847 (11825'1587,409'847]+backlog n=1496 ec=2 les=17871 17868/17868/17868) [7,4] r=0 (log bound mismatch, empty) lcod 0'0 mlcod 0'0 active] write_log to 0~0 2010-10-04 19:39:42.749236 7f3c82240710 osd7 17871 pg[0.55( v 409'847 (11825'1587,409'847]+backlog n=1496 ec=2 les=17871 17868/17868/17868) [7,4] r=0 (log bound mismatch, empty) lcod 0'0 mlcod 0'0 active] clean_up_local osd/ReplicatedPG.cc: In function 'virtual void ReplicatedPG::clean_up_local(ObjectStore::Transaction&)': osd/ReplicatedPG.cc:3760: FAILED assert(info.last_update >= log.tail) ceph version 0.22~rc (01ae1be288bae196180ad03065e14be867b5e12e) 1: (PG::activate(ObjectStore::Transaction&, std::list<Context*, std::allocator<Context*> >&, std::map<int, MOSDPGInfo*, std::less<int>, std::allocator<std::pair<int const, MOSDPGInfo*> > >*)+0x1a7) [0x539557] 2: (OSD::activate_pg(pg_t, utime_t)+0x238) [0x4c4208] 3: (OSD::check_replay_queue()+0x188) [0x4c4428] 4: (OSD::tick()+0x226) [0x4f4d06] 5: (SafeTimer::EventWrapper::finish(int)+0x269) [0x5bfcd9] 6: (Timer::timer_entry()+0x7bc) [0x5c20ac] 7: (Timer::TimerThread::entry()+0xd) [0x45a21d] 8: (Thread::_entry_func(void*)+0xa) [0x46e82a] 9: (()+0x69ca) [0x7f3c8e0b49ca] 10: (clone()+0x6d) [0x7f3c8d06c6fd] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
I've uploaded the logs, core and binary to logger.ceph.widodh.nl:/srv/ceph/issues/osd_crash_activate_pg
Updated by Sage Weil over 13 years ago
- Status changed from New to Won't Fix
This is from the old (broken) recovery code attempting to forget lost objects. The bandaid is to just comment out that assertion. The problem will go away once some object has been updated in that pg. (I don't want to kill the assertion because with the fixed code it's still valid.)
BTW, to answer your question the other day, yes, a mkcephfs will make all this go away, but if you're up for it the current state of things is turning up interesting corner cases that can be fixed. Feel free to give up and wipe whenever you need to get work done. :)
Actions