Project

General

Profile

Actions

Bug #10025

closed

Journal undump causes MDS to crash when start pos is not on object boundary

Added by John Spray over 9 years ago. Updated over 9 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Development
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Related ML thread from Jasper Siero, who first encountered the issue on firefly (http://lists.ceph.com/pipermail/ceph-users-ceph.com/2014-November/044383.html)

This happens on firefly and master. Do a bunch of file I/O such that the original segments on a filesystem have gotten trimmed, and expire_pos lies partway through an object, then stop clients+mds, dump, undump, start mds, and mount a client to get the following crash:

    -5> 2014-11-06 19:05:48.321015 7f71905b0700  1 -- 192.168.1.12:6813/2568 <== osd.1 192.168.1.12:6804/910 18 ==== osd_op_reply(21 200.00000000 [writefull 0~90] v13'1290 uv1290 ondisk = 0) v6 ==== 179+0+0 (397271313 0 0) 0x5c9a680 con 0x5bc23c0
    -4> 2014-11-06 19:05:48.321067 7f718ddab700 10 mds.0.journaler(rw) _finish_write_head loghead(trim 83889983, expire 83889983, write 87175895, stream_format 1)
    -3> 2014-11-06 19:05:48.321075 7f718ddab700 10 mds.0.journaler(rw) trim last_commited head was loghead(trim 83889983, expire 83889983, write 87175895, stream_format 1), can trim to 83886080
    -2> 2014-11-06 19:05:48.376340 7f718a3a2700 10 check_message_signature: seq # = 19 front_crc_ = 562903728 middle_crc = 0 data_crc = 2736439405
    -1> 2014-11-06 19:05:48.376403 7f71905b0700  1 -- 192.168.1.12:6813/2568 <== osd.1 192.168.1.12:6804/910 19 ==== osd_op_reply(19 100.00000000 [omap-get-header 0~0,omap-get-vals 0~16] v0'0 uv1794 ondisk = 0) v6 ==== 221+0+4594 (562903728 0 2736439405) 0x5c98580 con 0x5bc23c0
     0> 2014-11-06 19:05:48.398123 7f718ddab700 -1 osdc/Journaler.cc: In function 'void Journaler::_trim()' thread 7f718ddab700 time 2014-11-06 19:05:48.321079
osdc/Journaler.cc: 1136: FAILED assert(trim_to > trimming_pos)

 ceph version 0.87-569-g29d7786 (29d7786e030efde1a4aff134d9367865d9cc7d33)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x80) [0x1584318]
 2: (Journaler::_trim()+0x5d3) [0x13bb42f]
 3: (Journaler::_finish_write_head(int, Journaler::Header&, C_OnFinisher*)+0x3a6) [0x13b455e]
 4: (Journaler::C_WriteHead::finish(int)+0x38) [0x13be7ee]
 5: (Context::complete(int)+0x27) [0x1079d51]
 6: (Finisher::finisher_thread_entry()+0x323) [0x1469939]
 7: (Finisher::FinisherThread::entry()+0x1c) [0x107b4e8]
 8: (Thread::entry_wrapper()+0x84) [0x1572684]
 9: (Thread::_entry_func(void*)+0x18) [0x15725f6]
 10: (()+0x7f35) [0x7f7194b09f35]
 11: (clone()+0x6d) [0x7f71930c4c3d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Actions #1

Updated by Sage Weil over 9 years ago

  • Project changed from Ceph to CephFS
  • Priority changed from Normal to High
  • Source changed from other to Development
Actions #2

Updated by Greg Farnum over 9 years ago

  • Status changed from In Progress to Resolved

Merged into next in commit:69be8e9b30c18e47c17ff7dafc4ac8fbe00d48e7, and the appropriate backport bits were merged last week.

Actions

Also available in: Atom PDF