Bug #1775
closed
mds startup: _replay journaler got error -22, aborting, possible regresion?
Added by Szymon Szypulski over 12 years ago.
Updated over 7 years ago.
Description
ubuntu natty, kernel 3.2-rc2, ceph 0.38 (stable from git) with patch from #1756 and workaround for #1757
setup
s1: mds, osd, mon
s2: mds, osd, mon
s3: mon
In the middle of copying (Sage suggested wiping out cluster - #1757) both mds daemons crashed like showed in logs. It looks similar to #805, #873, but it was fixed.
Files
- Category set to 1
- Assignee set to Sage Weil
- Target version set to v0.40
Can you dump the mds journal so we can get a closer look at the corruption? Something like
ceph-mds -i foo --dump-journal 0 /tmp/journal.mds0
Also, did you have any OSD logging enabled at the time of the crash?
No I didn't have osd logging enabled, I'll provide you with journal in few minutes.
stick a
continue;
after the set_read_pos() call to avoid the second crash.
- Status changed from New to Need More Info
Without logs, it's hard to say, but it looks like something caused the OSD to drop a write (or series of writes). No msgr failures in the log.
Improving msgr qa coverage will help eliminate that possible cause.
- Assignee deleted (
Sage Weil)
- Target version deleted (
v0.40)
- Translation missing: en.field_position set to 108
- Status changed from Need More Info to Resolved
chalking this up to a msgr failure due to one of the zillions of bugs we've fixed in the last few months.
- Project changed from Ceph to CephFS
- Category deleted (
1)
Bulk updating project=ceph category=mds bugs so that I can remove the MDS category from the Ceph project to avoid confusion.
Also available in: Atom
PDF