Project

General

Profile

Actions

Bug #328

closed

MDS crash: MDCache::remove_inode(CInode*)

Added by Wido den Hollander over 13 years ago. Updated over 7 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Today i tried to sync kernel.org again, this went fine, until my log parition filled up and mds0 got stuck.

Somehow the kernel client did not switch to mds1, so this got stuck.

I killed mds0, cleaned up the logs and tried to start it again, this then failed:

mds/MDCache.cc: In function 'void MDCache::remove_inode(CInode*)':
mds/MDCache.cc:230: FAILED assert(o->get_num_ref() == 0)
 1: (EMetaBlob::replay(MDS*, LogSegment*)+0x100d) [0x625d6d]
 2: (EUpdate::replay(MDS*)+0x1f) [0x62b67f]
 3: (MDLog::_replay_thread()+0x700) [0x61a3d0]
 4: (MDLog::ReplayThread::entry()+0xd) [0x4a54bd]
 5: (Thread::_entry_func(void*)+0xa) [0x4883ba]
 6: (()+0x69ca) [0x7f9d8ffe29ca]
 7: (clone()+0x6d) [0x7f9d8f2016fd]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

Tried starting it several times, but kept crashing with the same errors.

The core, binary and logs are available on logger.ceph.widodh.nl in /srv/ceph/issues/cmds_crash_mdcache_remove_inode

Actions #1

Updated by Sage Weil over 13 years ago

  • Status changed from New to Resolved

Fixed the replay workaround in commit:2136ee763659e84f5715974450b89e8dea31a717

The original source of the problem, #329, still needs to be fixed!

Actions #2

Updated by John Spray over 7 years ago

  • Project changed from Ceph to CephFS
  • Category deleted (1)

Bulk updating project=ceph category=mds bugs so that I can remove the MDS category from the Ceph project to avoid confusion.

Actions

Also available in: Atom PDF