Actions
Bug #8811
closedJournal corruption during upgrade to 0.82 with standby-replay daemons
% Done:
0%
Source:
Community (user)
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
Two different ceph-users reports of hitting this issue on v0.82:
0> 2014-07-09 23:21:43.385274 7fb7f7b83700 -1 mds/MDLog.cc: In function 'void MDLog::_replay_thread()' thread 7fb7f7b83700 time 2014-07-09 23:21:43.383304 mds/MDLog.cc: 815: FAILED assert(journaler->is_readable()) ceph version 0.82 (14085f42ddd0fef4e7e1dc99402d07a8df82c04e) 1: (MDLog::_replay_thread()+0x197b) [0x85a3cb] 2: (MDLog::ReplayThread::entry()+0xd) [0x66466d] 3: (()+0x8062) [0x7fb7ffda1062] 4: (clone()+0x6d) [0x7fb7feb35a3d] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
0> 2014-07-10 11:35:36.107022 7f45f7c57700 -1 mds/MDLog.cc: In function 'void MDLog::_replay_thread()' thread 7f45f7c57700 time 2014-07-10 11:35:36.103147 mds/MDLog.cc: 815: FAILED assert(journaler->is_readable()) ceph version 0.82 (14085f42ddd0fef4e7e1dc99402d07a8df82c04e) 1: (MDLog::_replay_thread()+0x197b) [0x85a3cb] 2: (MDLog::ReplayThread::entry()+0xd) [0x66466d] 3: (()+0x6b50) [0x7f45ffdd7b50] 4: (clone()+0x6d) [0x7f45fec000ed] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
I went over the code a little bit and it looks good to me, but we just made the JournalStream changes so I'm sure that's the issue. For context, this MDLog assert follows a loop that waits until the Journaler is readable, so it appears to be changing its mind...presumably we're incorrectly manipulating the read_bug in some way?
Actions