Bug #803
closedmds assert failed replaying journal after respawn
0%
Description
sdc/Journaler.h:225: FAILED assert(readonly || state == STATE_READHEAD)
I created a dir with about 500,000 files, then ran "ls" in it. The process ran for many minutes, with the mds process at 100% cpu (and osd processes quite busy too iirc). Whilst it was running, I used injectargs to experiment with a few debug levels, at which point the mds crashed and now won't start back up.
Judging from the mds logs, it was detected as being laggy (possibly due to my messing with debug settings?), got set as down, respawned, but failed to replay the journal (logs attached).
No other processes crashed or oomed (all 4 osds and 3 mons stayed up).
It's a build from the master branch, commit da6966958471db1dbf20f30e467221338b2b2e7d.
Possibly related to #777.
Files
Updated by Greg Farnum about 13 years ago
- Status changed from New to Resolved
- Assignee set to Greg Farnum
This was just a bad assert missing an allowed case. Looks like this got hit while going through error-handling code, though, so there's probably another issue that will need to be dealt with separately.