Project

General

Profile

Actions

Bug #3261

closed

mds crashes in EMetaBlob::replay

Added by Tobias Florek over 11 years ago. Updated over 11 years ago.

Status:
Rejected
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Development
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

while testing cephfs using the debian wheezy packages on a fairly large volume (2TB) i ran into random unreproduceable client-stalls. now mds starts but crashes shortly after.

i am willing to debug this further, as ceph was promising before.


Files

ceph-mds.a.log (1.13 MB) ceph-mds.a.log Tobias Florek, 10/03/2012 03:46 PM
ceph-mon.a.log (4.11 KB) ceph-mon.a.log Tobias Florek, 10/03/2012 03:46 PM
ceph-osd.0.log (4.13 KB) ceph-osd.0.log Tobias Florek, 10/03/2012 03:46 PM
ceph-osd.1.log (4.13 KB) ceph-osd.1.log Tobias Florek, 10/03/2012 03:46 PM
ceph-mds.a.log.bz2 (7.56 MB) ceph-mds.a.log.bz2 Tobias Florek, 10/03/2012 03:57 PM
Actions #1

Updated by Sage Weil over 11 years ago

  • Status changed from New to Need More Info

can you put 'debug mds = 20' in the ceph.conf, restart ceph-mds, and then attach the resulting log (assuming it crashes again)?

thanks!

Actions #2

Updated by Sage Weil over 11 years ago

  • Project changed from Ceph to CephFS
Actions #3

Updated by Tobias Florek over 11 years ago

aww. i had debug ms = 20 in my ceph.conf. sorry.

the new one is attached

Actions #4

Updated by Sage Weil over 11 years ago

This looks like a problem with what's in the journal, but soo much MDS code has changed since then that I don't think we can make sense of this report. Are you in a position to retest against latest master?

Actions #5

Updated by Tobias Florek over 11 years ago

should i test the same btrfs volume with a new ceph? if so i might get to it in the next month. please close with insufficient data. i will reopen when i found the time.

unfortunately i don't have the hardware around anymore to replicate the whole test.

Actions #6

Updated by Sage Weil over 11 years ago

  • Status changed from Need More Info to Rejected

Understood. I'm sorry we weren't able to dig in when it happened. When do you get around to retesting we should be in a better position to follow up.

Thanks!

Actions

Also available in: Atom PDF