Project

General

Profile

Bug #803

mds assert failed replaying journal after respawn

Added by John Leach over 8 years ago. Updated over 8 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
-
Start date:
02/13/2011
Due date:
% Done:

0%

Spent time:
Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:

Description

sdc/Journaler.h:225: FAILED assert(readonly || state == STATE_READHEAD)

I created a dir with about 500,000 files, then ran "ls" in it. The process ran for many minutes, with the mds process at 100% cpu (and osd processes quite busy too iirc). Whilst it was running, I used injectargs to experiment with a few debug levels, at which point the mds crashed and now won't start back up.

Judging from the mds logs, it was detected as being laggy (possibly due to my messing with debug settings?), got set as down, respawned, but failed to replay the journal (logs attached).

No other processes crashed or oomed (all 4 osds and 3 mons stayed up).

It's a build from the master branch, commit da6966958471db1dbf20f30e467221338b2b2e7d.

Possibly related to #777.

mds.replay.fail.log View (7.91 KB) John Leach, 02/13/2011 12:49 PM

History

#1 Updated by Greg Farnum over 8 years ago

  • Status changed from New to Resolved
  • Assignee set to Greg Farnum

This was just a bad assert missing an allowed case. Looks like this got hit while going through error-handling code, though, so there's probably another issue that will need to be dealt with separately.

Also available in: Atom PDF