Project

General

Profile

Actions

Bug #803

closed

mds assert failed replaying journal after respawn

Added by John Leach about 13 years ago. Updated about 13 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

0%

Spent time:
Source:
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

sdc/Journaler.h:225: FAILED assert(readonly || state == STATE_READHEAD)

I created a dir with about 500,000 files, then ran "ls" in it. The process ran for many minutes, with the mds process at 100% cpu (and osd processes quite busy too iirc). Whilst it was running, I used injectargs to experiment with a few debug levels, at which point the mds crashed and now won't start back up.

Judging from the mds logs, it was detected as being laggy (possibly due to my messing with debug settings?), got set as down, respawned, but failed to replay the journal (logs attached).

No other processes crashed or oomed (all 4 osds and 3 mons stayed up).

It's a build from the master branch, commit da6966958471db1dbf20f30e467221338b2b2e7d.

Possibly related to #777.


Files

mds.replay.fail.log (7.91 KB) mds.replay.fail.log John Leach, 02/13/2011 12:49 PM
Actions #1

Updated by Greg Farnum about 13 years ago

  • Status changed from New to Resolved
  • Assignee set to Greg Farnum

This was just a bad assert missing an allowed case. Looks like this got hit while going through error-handling code, though, so there's probably another issue that will need to be dealt with separately.

Actions

Also available in: Atom PDF