Project

General

Profile

Actions

Bug #36340

closed

common: fix buffer advance length overflow to cause MDS crash

Added by Zhi Zhang over 5 years ago. Updated over 5 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Correctness/Safety
Target version:
% Done:

0%

Source:
Community (dev)
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Buffer advance length was defined as int, but in async msg, the real length of data buffer was defined as unsigned, so this overflow caused MDS crash.

2018-09-28 21:20:21.050388 7fa39eff8700 -1 *** Caught signal (Aborted) **
 in thread 7fa39eff8700 thread_name:ms_async_worker

 1: (()+0x51faca) [0x7fa3a454faca]
 2: (()+0xf370) [0x7fa3a342a370]
 3: (gsignal()+0x37) [0x7fa3a1e2c1d7]
 4: (abort()+0x148) [0x7fa3a1e2d8c8]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x165) [0x7fa3a27319d5]
 6: (()+0x5e946) [0x7fa3a272f946]
 7: (()+0x5e973) [0x7fa3a272f973]
 8: (()+0x5eb93) [0x7fa3a272fb93]
 9: (ceph::buffer::list::iterator_impl<false>::advance(int)+0xd0) [0x7fa3a465b940]
 10: (AsyncConnection::process()+0x449) [0x7fa3a47bb359]
 11: (EventCenter::process_events(int)+0x34f) [0x7fa3a476046f]
 12: (Worker::entry()+0x1f0) [0x7fa3a47360e0]
 13: (()+0x7dc5) [0x7fa3a3422dc5]
 14: (clone()+0x6d) [0x7fa3a1eee74d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

From below coredump, we can see "read" is defined as unsigned in asg msg and its real length is far larger than the upper bound of int.

(gdb) bt
#0  0x00007f9fb05f023b in raise () from /lib64/libpthread.so.0
#1  0x00007f9fb1715b95 in reraise_fatal (signum=6) at global/signal_handler.cc:71
#2  handle_fatal_signal (signum=6) at global/signal_handler.cc:133
#3  <signal handler called>
#4  0x00007f9faeff21d7 in raise () from /lib64/libc.so.6
#5  0x00007f9faeff38c8 in abort () from /lib64/libc.so.6
#6  0x00007f9faf8f79d5 in __gnu_cxx::__verbose_terminate_handler() () from /lib64/libstdc++.so.6
#7  0x00007f9faf8f5946 in ?? () from /lib64/libstdc++.so.6
#8  0x00007f9faf8f5973 in std::terminate() () from /lib64/libstdc++.so.6
#9  0x00007f9faf8f5b93 in __cxa_throw () from /lib64/libstdc++.so.6
#10 0x00007f9fb1821940 in ceph::buffer::list::iterator_impl<false>::advance (this=<optimized out>, o=<optimized out>) at common/buffer.cc:1051
#11 0x00007f9fb1981359 in AsyncConnection::process (this=0x7f9fbb94e000) at msg/async/AsyncConnection.cc:814
#12 0x00007f9fb192646f in EventCenter::process_events (this=this@entry=0x7f9fbb7dc2c8, timeout_microseconds=<optimized out>, timeout_microseconds@entry=30000000)
    at msg/async/Event.cc:378
#13 0x00007f9fb18fc0e0 in Worker::entry (this=0x7f9fbb7dc280) at msg/async/AsyncMessenger.cc:295
#14 0x00007f9fb05e8dc5 in start_thread () from /lib64/libpthread.so.0
#15 0x00007f9faf0b474d in clone () from /lib64/libc.so.6
(gdb) frame 11
#11 0x00007f9fb1981359 in AsyncConnection::process (this=0x7f9fbb94e000) at msg/async/AsyncConnection.cc:814
814                data_blp.advance(read);
(gdb) p data_blp
$1 = {<ceph::buffer::list::iterator_impl<false>> = {<std::iterator<std::forward_iterator_tag, char, long, char*, char&>> = {<No data fields>}, bl = 0x7f9fbb94f410, 
    ls = 0x7f9fbb94f410, off = 0, p = {_raw = , _off = 0, _len = 3553509376}, p_off = 0}, <No data fields>}

(gdb) p read 
$11 = 3553509376
(gdb) p (int)read 
$12 = -741457920
Actions #1

Updated by Zhi Zhang over 5 years ago

https://github.com/ceph/ceph/pull/24466

After this fix, MDS never gets crashed due to above buffer length overflow.

Actions #2

Updated by Patrick Donnelly over 5 years ago

  • Status changed from New to Fix Under Review
  • Assignee set to Zhi Zhang
  • Target version set to v14.0.0
  • Component(FS) MDS added
Actions #3

Updated by Patrick Donnelly over 5 years ago

  • Status changed from Fix Under Review to Resolved
Actions

Also available in: Atom PDF