Bug #803: mds assert failed replaying journal after respawn - Ceph - Ceph

Actions

Copy link

Bug #803

closed

mds assert failed replaying journal after respawn

Added by John Leach about 13 years ago. Updated about 13 years ago.

Status:

Resolved

Priority:

Normal

Assignee:

Greg Farnum

Category:

Target version:

% Done:

Spent time:

0:15 h

Source:

Tags:

Backport:

Regression:

Severity:

Reviewed:

Affected Versions:

ceph-qa-suite:

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

sdc/Journaler.h:225: FAILED assert(readonly || state == STATE_READHEAD)

I created a dir with about 500,000 files, then ran "ls" in it. The process ran for many minutes, with the mds process at 100% cpu (and osd processes quite busy too iirc). Whilst it was running, I used injectargs to experiment with a few debug levels, at which point the mds crashed and now won't start back up.

Judging from the mds logs, it was detected as being laggy (possibly due to my messing with debug settings?), got set as down, respawned, but failed to replay the journal (logs attached).

No other processes crashed or oomed (all 4 osds and 3 mons stayed up).

It's a build from the master branch, commit da6966958471db1dbf20f30e467221338b2b2e7d.

Possibly related to #777.

Files

mds.replay.fail.log (7.91 KB) mds.replay.fail.log

John Leach, 02/13/2011 12:49 PM

Actions

Copy link

Updated by Greg Farnum about 13 years ago

Status changed from New to Resolved
Assignee set to Greg Farnum

This was just a bad assert missing an allowed case. Looks like this got hit while going through error-handling code, though, so there's probably another issue that will need to be dealt with separately.

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph

Custom queries

Bug #803

mds assert failed replaying journal after respawn

Updated by Greg Farnum about 13 years ago