Project

General

Profile

Actions

Bug #4061

closed

mds crashed at LogEvent::decode

Added by Tamilarasi muthamizhan about 11 years ago. Updated about 11 years ago.

Status:
Can't reproduce
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

hit this on burnupi60, when upgrading from ceph v0.56-598-gb970d05 to 0.56.2-12-gcc16791 on 4 feb and it seems to be happening on the cluster since then.

2013-02-08 08:25:34.226145 7f7bcfea4700 -1 mds/LogEvent.cc: In function 'static LogEvent* LogEvent::decode(ceph::bufferlist&)' thread 7f7bcfea4700 time 2013-02-08 08:25:34.225584
mds/LogEvent.cc: 95: FAILED assert(p.end())

ceph version 0.56.2-12-gcc16791 (cc167914ac9603f87083c63f2cbc8dac9441329f)
1: (LogEvent::decode(ceph::buffer::list&)+0x9ff) [0x6be31f]
2: (MDLog::_replay_thread()+0x2d8) [0x6a9908]
3: (MDLog::ReplayThread::entry()+0xd) [0x4d2e9d]
4: (()+0x7e9a) [0x7f7bd6f0ae9a]
5: (clone()+0x6d) [0x7f7bd56c0cbd]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Actions #1

Updated by Greg Farnum about 11 years ago

  • Description updated (diff)

IIRC I was waiting on some other info from Ken for this. Is that coming? :)

Actions #2

Updated by Tamilarasi muthamizhan about 11 years ago

sorry Greg, I pulled the information from ken and filed this bug. Please let me know if you need more info.

Actions #3

Updated by Greg Farnum about 11 years ago

Ken, what was the workload you were running on this before the crash?

Actions #4

Updated by Tamilarasi muthamizhan about 11 years ago

looks like it is same as bug#3773

Actions #5

Updated by Ken Franklin about 11 years ago

There was no load yet. I was attempting the ceph-fuse mount test after the update.

Today I have ceph version 0.56.2-12-gcc16791 installed. ceph health reports :
HEALTH_WARN mds a is laggy

There doesn't appear to be a ceph-mds process running and the log ceph-mds.a.log shows it crashed about the time I attempted a ceph-fuse -m ...

Actions #6

Updated by Greg Farnum about 11 years ago

  • Project changed from Ceph to CephFS

Probably the same Tamil, yes. This should be a little easier to debug if we get it again in the future following last night's merge of the MDS encoding changes, as that includes some dumping tools and things.
But I'll check and see how much of the state is the same as before.

Actions #7

Updated by Greg Farnum about 11 years ago

Hmm, actually this one might be different from the other. It's a client cap update event, and the event on disk claims to be of length 1390, which I believe is the right length for a log entry of 1394 (4 bytes for the event type).

If we decide this is important I can spend more time on it now, but for the moment my inclination is to wait until we see it under the new encoding stuff.

Actions #8

Updated by Greg Farnum about 11 years ago

  • Assignee deleted (Greg Farnum)
Actions #9

Updated by Ken Franklin about 11 years ago

  • Status changed from New to Can't reproduce

Burnupi60 had some hardware issue so I had it reimaged. I reinstalled the master branch and am not able to reproduce the issue.

Actions

Also available in: Atom PDF