Bug #19450: PurgeQueue read journal crash - CephFS - Ceph

Actions

Copy link

Bug #19450

closed

PurgeQueue read journal crash

Added by Zheng Yan about 7 years ago. Updated almost 7 years ago.

Status:

Resolved

Priority:

Urgent

Assignee:

Category:

Target version:

% Done:

Source:

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Component(FS):

Labels (FS):

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

(gdb) bt
#0  0x00007fa3034f348f in raise () from /lib64/libpthread.so.0
#1  0x000056357fdcf208 in reraise_fatal (signum=6) at /home/zhyan/Ceph/ceph-1/src/global/signal_handler.cc:74
#2  handle_fatal_signal (signum=6) at /home/zhyan/Ceph/ceph-1/src/global/signal_handler.cc:138
#3  <signal handler called>
#4  0x00007fa30269391f in raise () from /lib64/libc.so.6
#5  0x00007fa30269551a in abort () from /lib64/libc.so.6
#6  0x000056357fe2251c in ceph::__ceph_assert_fail (assertion=assertion@entry=0x56358017c322 "start_ptr == read_pos", 
    file=file@entry=0x56358017b440 "/home/zhyan/Ceph/ceph-1/src/osdc/Journaler.cc", line=line@entry=1177, 
    func=func@entry=0x56358017c660 <Journaler::try_read_entry(ceph::buffer::list&)::__PRETTY_FUNCTION__> "bool Journaler::try_read_entry(ceph::bufferlist&)")
    at /home/zhyan/Ceph/ceph-1/src/common/assert.cc:77
#7  0x000056357fd8153c in Journaler::try_read_entry (this=this@entry=0x563589796760, bl=...) at /home/zhyan/Ceph/ceph-1/src/osdc/Journaler.cc:1177
#8  0x000056357fc25a44 in PurgeQueue::_consume (this=0x563589796460) at /home/zhyan/Ceph/ceph-1/src/mds/PurgeQueue.cc:296
#9  0x000056357fc2689f in PurgeQueue::<lambda(int)>::operator() (r=<optimized out>, __closure=0x56359403e730) at /home/zhyan/Ceph/ceph-1/src/mds/PurgeQueue.cc:413
#10 boost::detail::function::void_function_obj_invoker1<PurgeQueue::_execute_item(const PurgeItem&, uint64_t)::<lambda(int)>, void, int>::invoke(boost::detail::function::function_buffer &, int) (function_obj_ptr=..., a0=<optimized out>) at /mnt/misc/Ceph/ceph-1/build/boost/include/boost/function/function_template.hpp:159
#11 0x000056357faaae60 in boost::function1<void, int>::operator() (a0=<optimized out>, this=<optimized out>)
    at /mnt/misc/Ceph/ceph-1/build/boost/include/boost/function/function_template.hpp:771
#12 FunctionContext::finish (this=<optimized out>, r=<optimized out>) at /home/zhyan/Ceph/ceph-1/src/include/Context.h:489
#13 0x000056357faa8811 in Context::complete (this=0x56359403e720, r=<optimized out>) at /home/zhyan/Ceph/ceph-1/src/include/Context.h:70
#14 0x000056357fad816a in C_GatherBase<Context, Context>::delete_me (this=0x563594385fb0) at /home/zhyan/Ceph/ceph-1/src/include/Context.h:300
#15 C_GatherBase<Context, Context>::sub_finish (this=0x563594385fb0, sub=<optimized out>, sub@entry=0x56358b9f1d70, r=0) at /home/zhyan/Ceph/ceph-1/src/include/Context.h:295
#16 0x000056357fad84a9 in C_GatherBase<Context, Context>::C_GatherSub::finish (r=<optimized out>, this=0x56358b9f1d70) at /home/zhyan/Ceph/ceph-1/src/include/Context.h:318
#17 Context::complete (r=<optimized out>, this=0x56358b9f1d70) at /home/zhyan/Ceph/ceph-1/src/include/Context.h:70
#18 C_GatherBase<Context, Context>::C_GatherSub::complete (this=0x56358b9f1d70, r=<optimized out>) at /home/zhyan/Ceph/ceph-1/src/include/Context.h:315
#19 0x000056357ffc1301 in Objecter::handle_osd_op_reply (this=this@entry=0x5635897ce000, m=m@entry=0x56359316e680) at /home/zhyan/Ceph/ceph-1/src/osdc/Objecter.cc:3424
#20 0x000056357ffd3b03 in Objecter::ms_dispatch (this=0x5635897ce000, m=0x56359316e680) at /home/zhyan/Ceph/ceph-1/src/osdc/Objecter.cc:970
#21 0x00005635800830f6 in Messenger::ms_fast_dispatch (m=0x56359316e680, this=0x563589680000) at /home/zhyan/Ceph/ceph-1/src/msg/Messenger.h:573
#22 DispatchQueue::fast_dispatch (this=0x563589680178, m=m@entry=0x56359316e680) at /home/zhyan/Ceph/ceph-1/src/msg/DispatchQueue.cc:71
#23 0x00005635800c9d38 in AsyncConnection::process (this=0x5635897eb800) at /home/zhyan/Ceph/ceph-1/src/msg/async/AsyncConnection.cc:795
#24 0x000056357febb461 in EventCenter::process_events (this=this@entry=0x5635895fa080, timeout_microseconds=<optimized out>, timeout_microseconds@entry=30000000)
    at /home/zhyan/Ceph/ceph-1/src/msg/async/Event.cc:408
#25 0x000056357febf64f in NetworkStack::<lambda()>::operator()(void) const (__closure=0x5635895ead28) at /home/zhyan/Ceph/ceph-1/src/msg/async/Stack.cc:50
#26 0x00007fa302fff5cf in ?? () from /lib64/libstdc++.so.6
#27 0x00007fa3034e96ca in start_thread () from /lib64/libpthread.so.0
#28 0x00007fa302765f7f in clone () from /lib64/libc.so.6

my local test branch, based on 4100803cfe973baa58e1c869b8ceaff24e86a31f, no change to purgequeue

Actions

Copy link

Updated by Zheng Yan about 7 years ago

(gdb) f7
Undefined command: "f7".  Try "help".
(gdb) f 
#7  0x000056357fd8153c in Journaler::try_read_entry (this=this@entry=0x563589796760, bl=...) at /home/zhyan/Ceph/ceph-1/src/osdc/Journaler.cc:1177
1177          assert(start_ptr == read_pos);
(gdb) p read_pos
$14 =
(gdb) p start_ptr
$15 = 72127962782105600

zhyan-alpha:~/Ceph/ceph-1 zhyan$ echo "1711275985 % (1024 * 1024 * 4)" | bc
4194257
zhyan-alpha:~/Ceph/ceph-1 zhyan$ dd if=/tmp/504.00000197 bs=1 skip=4194257 | od -x
0000000 9793 5358 5926 3141 0049 0000 0101 0043
0000020 0000 e701 1a6b 0001 0005 0000 0000 0000
0000040 0000 0200 1802 0000 0000 4000 0100 0000
47+0 records in
47+0 records out
0000057
47 bytes copied, 0.000230734 s, 204 kB/s

A log entry need 93 bytes, 504.00000197 only has 43 byes left.

279zhyan-alpha:~/Ceph/ceph-1 zhyadd if=/tmp/504.00000198 bs=1 skip=0 count=93 | od -x
0000000 9793 5358 5926 3141 0049 0000 0101 0043
0000020 0000 3d01 1a68 0001 0005 2b00 2022 0000
0000040 0000 0200 1802 0000 0000 4000 0100 0000
0000060 0000 4000 4d00 0002 0000 0000 0000 0000
0000100 0000 0000 0100 0000 0000 0000 0000 0000
0000120 0000 0000 0000 0000 0066 0000 0000
0000135

But but it seems that 504.00000198 starts with a new entry.

Actions

Copy link

Updated by Zheng Yan about 7 years ago

Priority changed from Normal to Urgent

hit this again

(gdb) f 7
#7  0x000055e7350d13ac in Journaler::try_read_entry (this=this@entry=0x55e73f5c0760, bl=...) at /home/zhyan/Ceph/ceph/src/osdc/Journaler.cc:1177
1177          assert(start_ptr == read_pos);
(gdb) p start_ptr
$1 = 1729947405886947328
(gdb) p read_pos
$2 = 5620367305
(gdb) p 5620367305 / (4 * 1024 * 1024)
$3 = 1339
(gdb) p 5620367305 % (4 * 1024 * 1024)
$4 = 4194249

[zhyan@zhyan-beta ~]$ dd if=/tmp/500.0000053b ibs=1 skip=4194156 count=93 obs=4k | od -x
93+0 records in
0+1 records out
93 bytes copied, 6.9049e-05 s, 1.3 MB/s
0000000 9793 5358 5926 3141 0049 0000 0101 0043
0000020 0000 a801 b908 0003 0001 8b00 00a1 0000
0000040 0000 0200 1802 0000 0000 4000 0100 0000
0000060 0000 4000 0300 0000 0000 0000 0000 0000
0000100 0000 0000 0100 0000 0000 0000 0000 0000
0000120 0000 0000 6c00 ffff 014e 0000 0000
0000135
[zhyan@zhyan-beta ~]$ dd if=/tmp/500.0000053c ibs=1 count=93 obs=4k | od -x
93+0 records in
0+1 records out
93 bytes copied, 7.1663e-05 s, 1.3 MB/s
0000000 9793 5358 5926 3141 0049 0000 0101 0043
0000020 0000 8101 b915 0003 0001 9d00 00f5 0000
0000040 0000 0200 1802 0000 0000 4000 0100 0000
0000060 0000 4000 0300 0000 0000 0000 0000 0000
0000100 0000 0000 0100 0000 0000 0000 0000 0000
0000120 0000 0000 0000 0000 014f 0000 0000
0000135

void Journaler::_finish_probe_end(int r, uint64_t end)
{
  lock_guard l(lock);

  assert(state == STATE_PROBING);
  if (r < 0) { // error in probing
    goto out;
  }
  if (((int64_t)end) == -1) {
    end = write_pos;
    ldout(cct, 1) << "_finish_probe_end write_pos = " << end << " (header had " 
                  << write_pos << "). log was empty. recovered." << dendl;
    ceph_abort(); // hrm.
  } else {
    assert(end >= write_pos);
    ldout(cct, 1) << "_finish_probe_end write_pos = " << end
                  << " (header had " << write_pos << "). recovered." 
                  << dendl;
  }

  state = STATE_ACTIVE;

  prezeroing_pos = prezero_pos = write_pos = flush_pos = safe_pos = end;

out:
  // done.
  list<Context*> ls;
  ls.swap(waitfor_recover);
  finish_contexts(cct, ls, r);
}

Journaler::_finish_probe_end() looks suspicious. For log entry 5620367305, the first part data succeeded in writing to 500.0000053b, the second part data failed to write to 500.0000053c. When opening the journal, Journaler::_finish_probe_end found journal end is 5620367360 and set write_pos to it. No idea why we don't hint this when replaying mds log.

Actions

Copy link

Updated by Zheng Yan about 7 years ago

mds always first read all entries in mds log, then write new entries to it. The partial entry is detected and dropped during the read. For the purge queue journal, mds writes new entry to the purge queue journal before dropping partial entry

Actions

Copy link