Project

General

Profile

Bug #36340

common: fix buffer advance length overflow to cause MDS crash

Added by Zhi Zhang 2 months ago. Updated about 2 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Correctness/Safety
Target version:
Start date:
10/08/2018
Due date:
% Done:

0%

Source:
Community (dev)
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
Pull request ID:

Description

Buffer advance length was defined as int, but in async msg, the real length of data buffer was defined as unsigned, so this overflow caused MDS crash.

2018-09-28 21:20:21.050388 7fa39eff8700 -1 *** Caught signal (Aborted) **
 in thread 7fa39eff8700 thread_name:ms_async_worker

 1: (()+0x51faca) [0x7fa3a454faca]
 2: (()+0xf370) [0x7fa3a342a370]
 3: (gsignal()+0x37) [0x7fa3a1e2c1d7]
 4: (abort()+0x148) [0x7fa3a1e2d8c8]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x165) [0x7fa3a27319d5]
 6: (()+0x5e946) [0x7fa3a272f946]
 7: (()+0x5e973) [0x7fa3a272f973]
 8: (()+0x5eb93) [0x7fa3a272fb93]
 9: (ceph::buffer::list::iterator_impl<false>::advance(int)+0xd0) [0x7fa3a465b940]
 10: (AsyncConnection::process()+0x449) [0x7fa3a47bb359]
 11: (EventCenter::process_events(int)+0x34f) [0x7fa3a476046f]
 12: (Worker::entry()+0x1f0) [0x7fa3a47360e0]
 13: (()+0x7dc5) [0x7fa3a3422dc5]
 14: (clone()+0x6d) [0x7fa3a1eee74d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

From below coredump, we can see "read" is defined as unsigned in asg msg and its real length is far larger than the upper bound of int.

(gdb) bt
#0  0x00007f9fb05f023b in raise () from /lib64/libpthread.so.0
#1  0x00007f9fb1715b95 in reraise_fatal (signum=6) at global/signal_handler.cc:71
#2  handle_fatal_signal (signum=6) at global/signal_handler.cc:133
#3  <signal handler called>
#4  0x00007f9faeff21d7 in raise () from /lib64/libc.so.6
#5  0x00007f9faeff38c8 in abort () from /lib64/libc.so.6
#6  0x00007f9faf8f79d5 in __gnu_cxx::__verbose_terminate_handler() () from /lib64/libstdc++.so.6
#7  0x00007f9faf8f5946 in ?? () from /lib64/libstdc++.so.6
#8  0x00007f9faf8f5973 in std::terminate() () from /lib64/libstdc++.so.6
#9  0x00007f9faf8f5b93 in __cxa_throw () from /lib64/libstdc++.so.6
#10 0x00007f9fb1821940 in ceph::buffer::list::iterator_impl<false>::advance (this=<optimized out>, o=<optimized out>) at common/buffer.cc:1051
#11 0x00007f9fb1981359 in AsyncConnection::process (this=0x7f9fbb94e000) at msg/async/AsyncConnection.cc:814
#12 0x00007f9fb192646f in EventCenter::process_events (this=this@entry=0x7f9fbb7dc2c8, timeout_microseconds=<optimized out>, timeout_microseconds@entry=30000000)
    at msg/async/Event.cc:378
#13 0x00007f9fb18fc0e0 in Worker::entry (this=0x7f9fbb7dc280) at msg/async/AsyncMessenger.cc:295
#14 0x00007f9fb05e8dc5 in start_thread () from /lib64/libpthread.so.0
#15 0x00007f9faf0b474d in clone () from /lib64/libc.so.6
(gdb) frame 11
#11 0x00007f9fb1981359 in AsyncConnection::process (this=0x7f9fbb94e000) at msg/async/AsyncConnection.cc:814
814                data_blp.advance(read);
(gdb) p data_blp
$1 = {<ceph::buffer::list::iterator_impl<false>> = {<std::iterator<std::forward_iterator_tag, char, long, char*, char&>> = {<No data fields>}, bl = 0x7f9fbb94f410, 
    ls = 0x7f9fbb94f410, off = 0, p = {_raw = , _off = 0, _len = 3553509376}, p_off = 0}, <No data fields>}

(gdb) p read 
$11 = 3553509376
(gdb) p (int)read 
$12 = -741457920

History

#1 Updated by Zhi Zhang 2 months ago

https://github.com/ceph/ceph/pull/24466

After this fix, MDS never gets crashed due to above buffer length overflow.

#2 Updated by Patrick Donnelly 2 months ago

  • Status changed from New to Need Review
  • Assignee set to Zhi Zhang
  • Target version set to v14.0.0
  • Component(FS) MDS added

#3 Updated by Patrick Donnelly about 2 months ago

  • Status changed from Need Review to Resolved

Also available in: Atom PDF