Project

General

Profile

Actions

Bug #458

closed

OSD::activate_pg

Added by Wido den Hollander over 13 years ago. Updated over 13 years ago.

Status:
Won't Fix
Priority:
Normal
Assignee:
-
Category:
OSD
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

On one of my OSD's (osd7) I started to see:

2010-10-04 19:39:42.749102 7f3c82240710 osd7 17871 pg[0.55( v 409'847 (11825'1587,409'847]+backlog n=1496 ec=2 les=17871 17868/17868/17868) [7,4] r=0 (log bound mismatch, empty) lcod 0'0 mlcod 0'0 active] clear_prior
2010-10-04 19:39:42.749134 7f3c82240710 osd7 17871 pg[0.55( v 409'847 (11825'1587,409'847]+backlog n=1496 ec=2 les=17871 17868/17868/17868) [7,4] r=0 (log bound mismatch, empty) lcod 0'0 mlcod 0'0 active] cancel_generate_backlog
2010-10-04 19:39:42.749174 7f3c82240710 osd7 17871 pg[0.55( v 409'847 (11825'1587,409'847]+backlog n=1496 ec=2 les=17871 17868/17868/17868) [7,4] r=0 (log bound mismatch, empty) lcod 0'0 mlcod 0'0 active] write_log
2010-10-04 19:39:42.749207 7f3c82240710 osd7 17871 pg[0.55( v 409'847 (11825'1587,409'847]+backlog n=1496 ec=2 les=17871 17868/17868/17868) [7,4] r=0 (log bound mismatch, empty) lcod 0'0 mlcod 0'0 active] write_log to 0~0
2010-10-04 19:39:42.749236 7f3c82240710 osd7 17871 pg[0.55( v 409'847 (11825'1587,409'847]+backlog n=1496 ec=2 les=17871 17868/17868/17868) [7,4] r=0 (log bound mismatch, empty) lcod 0'0 mlcod 0'0 active] clean_up_local
osd/ReplicatedPG.cc: In function 'virtual void ReplicatedPG::clean_up_local(ObjectStore::Transaction&)':
osd/ReplicatedPG.cc:3760: FAILED assert(info.last_update >= log.tail)
 ceph version 0.22~rc (01ae1be288bae196180ad03065e14be867b5e12e)
 1: (PG::activate(ObjectStore::Transaction&, std::list<Context*, std::allocator<Context*> >&, std::map<int, MOSDPGInfo*, std::less<int>, std::allocator<std::pair<int const, MOSDPGInfo*> > >*)+0x1a7) [0x539557]
 2: (OSD::activate_pg(pg_t, utime_t)+0x238) [0x4c4208]
 3: (OSD::check_replay_queue()+0x188) [0x4c4428]
 4: (OSD::tick()+0x226) [0x4f4d06]
 5: (SafeTimer::EventWrapper::finish(int)+0x269) [0x5bfcd9]
 6: (Timer::timer_entry()+0x7bc) [0x5c20ac]
 7: (Timer::TimerThread::entry()+0xd) [0x45a21d]
 8: (Thread::_entry_func(void*)+0xa) [0x46e82a]
 9: (()+0x69ca) [0x7f3c8e0b49ca]
 10: (clone()+0x6d) [0x7f3c8d06c6fd]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

I've uploaded the logs, core and binary to logger.ceph.widodh.nl:/srv/ceph/issues/osd_crash_activate_pg

Actions #1

Updated by Sage Weil over 13 years ago

  • Status changed from New to Won't Fix

This is from the old (broken) recovery code attempting to forget lost objects. The bandaid is to just comment out that assertion. The problem will go away once some object has been updated in that pg. (I don't want to kill the assertion because with the fixed code it's still valid.)

BTW, to answer your question the other day, yes, a mkcephfs will make all this go away, but if you're up for it the current state of things is turning up interesting corner cases that can be fixed. Feel free to give up and wipe whenever you need to get work done. :)

Actions

Also available in: Atom PDF