Bug #1775
mds startup: _replay journaler got error -22, aborting, possible regresion?
0%
Description
ubuntu natty, kernel 3.2-rc2, ceph 0.38 (stable from git) with patch from #1756 and workaround for #1757
setup
s1: mds, osd, mon
s2: mds, osd, mon
s3: mon
In the middle of copying (Sage suggested wiping out cluster - #1757) both mds daemons crashed like showed in logs. It looks similar to #805, #873, but it was fixed.
History
#1 Updated by Sage Weil over 12 years ago
- Category set to 1
- Assignee set to Sage Weil
- Target version set to v0.40
Can you dump the mds journal so we can get a closer look at the corruption? Something like
ceph-mds -i foo --dump-journal 0 /tmp/journal.mds0
Also, did you have any OSD logging enabled at the time of the crash?
#2 Updated by Szymon Szypulski over 12 years ago
No I didn't have osd logging enabled, I'll provide you with journal in few minutes.
#3 Updated by Szymon Szypulski over 12 years ago
- File journal.mds0.bz2 added
#4 Updated by Szymon Szypulski over 12 years ago
- File mds.backup1.log.1.gz added
- File mds.backup2.log.1.gz added
#5 Updated by Sage Weil over 12 years ago
stick a
continue;
after the set_read_pos() call to avoid the second crash.
#6 Updated by Sage Weil over 12 years ago
- Status changed from New to Need More Info
Without logs, it's hard to say, but it looks like something caused the OSD to drop a write (or series of writes). No msgr failures in the log.
Improving msgr qa coverage will help eliminate that possible cause.
#7 Updated by Sage Weil over 12 years ago
- Assignee deleted (
Sage Weil)
#8 Updated by Sage Weil about 12 years ago
- Target version deleted (
v0.40) - translation missing: en.field_position set to 108
#9 Updated by Sage Weil over 11 years ago
- Status changed from Need More Info to Resolved
chalking this up to a msgr failure due to one of the zillions of bugs we've fixed in the last few months.
#10 Updated by John Spray over 7 years ago
- Project changed from Ceph to CephFS
- Category deleted (
1)
Bulk updating project=ceph category=mds bugs so that I can remove the MDS category from the Ceph project to avoid confusion.