Project

General

Profile

Actions

Bug #13985

closed

Messenger: large message data content length cause int overflow

Added by huang jun over 8 years ago. Updated about 5 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Community (dev)
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

OSD aborted when sending message:
1103> 2015-12-04 16:12:23.691914 7f9282bfe700 1 - 172.19.10.1:6801/186389 --> 172.19.10.1:6800/11348 -- osd_op_reply(5959811 100000003ec.00000000 [omap-get-header 0~0,omap-get-vals 0~16,getxattr parent (125)] v0'0 uv49527 ondisk = 0) v6 -- ?+0 0x7f927a456600 con 0x7f9298825180
1102> 2015-12-04 16:12:23.691968 7f9282bfe700 20 - 172.19.10.1:6801/186389 submit_message osd_op_reply(5959811 100000003ec.00000000 [omap-get-header 0~0,omap-get-vals 0~16,getxattr parent (125)] v0'0 uv49527 ondisk = 0) v6 remote, 172.19.10.1:6800/11348, have pipe.
1100> 2015-12-04 16:12:23.692073 7f9293df6700 10 - 172.19.10.1:6801/186389 >> 172.19.10.1:6800/11348 pipe(0x7f9298820000 sd=131 :6801 s=2 pgs=4363 cs=1 l=1 c=0x7f9298825180).writer: state = open policy.server=1
1099> 2015-12-04 16:12:23.692099 7f9293df6700 20 - 172.19.10.1:6801/186389 >> 172.19.10.1:6800/11348 pipe(0x7f9298820000 sd=131 :6801 s=2 pgs=4363 cs=1 l=1 c=0x7f9298825180).writer encoding 1 features 1125899906842623 0x7f927a456600 osd_op_reply(5959811 100000003ec.00000000 [omap-get-header 0~0,omap-get-vals 0~16,getxattr parent (125)] v0'0 uv49527 ondisk = 0) v6
2015-12-04 16:12:26.840045 7f9293df6700 20 -- 172.19.10.1:6801/186389 >> 172.19.10.1:6800/11348 pipe(0x7f9298820000 sd=131 :6801 s=2 pgs=4363 cs=1 l=1 c=0x7f9298825180).writer signed seq # 1): sig = 0
2015-12-04 16:12:27.661121 7f9293df6700 20 -- 172.19.10.1:6801/186389 >> 172.19.10.1:6800/11348 pipe(0x7f9298820000 sd=131 :6801 s=2 pgs=4363 cs=1 l=1 c=0x7f9298825180).writer sending 1 0x7f927a456600
2015-12-04 16:12:27.661125 7f9282bfe700 20 -- 172.19.10.1:6801/186389 submit_message osd_op_reply(5959824 100002b8ee5.00000000 [getxattr parent (169)] v0'0 uv30057 ondisk = 0) v6 remote, 172.19.10.1:6800/11348, have pipe.
2015-12-04 16:12:27.661176 7f9293ef7700 2 -- 172.19.10.1:6801/186389 >> 172.19.10.1:6800/11348 pipe(0x7f9298820000 sd=131 :6801 s=2 pgs=4363 cs=1 l=1 c=0x7f9298825180).reader couldn't read tag, (0) Success
2015-12-04 16:12:27.661200 7f9293ef7700 2 -- 172.19.10.1:6801/186389 >> 172.19.10.1:6800/11348 pipe(0x7f9298820000 sd=131 :6801 s=2 pgs=4363 cs=1 l=1 c=0x7f9298825180).fault (0) Success
2015-12-04 16:12:27.661238 7f9293ef7700 10 -- 172.19.10.1:6801/186389 >> 172.19.10.1:6800/11348 pipe(0x7f9298820000 sd=131 :6801 s=2 pgs=4363 cs=1 l=1 c=0x7f9298825180).fault on lossy channel, failing2015-12-04 16:12:27.661249 7f9293ef7700 10 -- 172.19.10.1:6801/186389 >> 172.19.10.1:6800/11348 pipe(0x7f9298820000 sd=131 :6801 s=2 pgs=4363 cs=1 l=1 c=0x7f9298825180).stop
2015-12-04 16:12:27.661258 7f9293ef7700 10 -- 172.19.10.1:6801/186389 >> 172.19.10.1:6800/11348 pipe(0x7f9298820000 sd=131 :6801 s=4 pgs=4363 cs=1 l=1 c=0x7f9298825180).unregister_pipe
2015-12-04 16:12:27.661264 7f9293ef7700 10 -- 172.19.10.1:6801/186389 >> 172.19.10.1:6800/11348 pipe(0x7f9298820000 sd=131 :6801 s=4 pgs=4363 cs=1 l=1 c=0x7f9298825180).discard_queue
2015-12-04 16:12:27.661268 7f9293ef7700 20 -- 172.19.10.1:6801/186389 >> 172.19.10.1:6800/11348 pipe(0x7f9298820000 sd=131 :6801 s=4 pgs=4363 cs=1 l=1 c=0x7f9298825180). discard 0x7f927a456c00
2015-12-04 16:12:27.661286 7f9293ef7700 10 -- 172.19.10.1:6801/186389 >> 172.19.10.1:6800/11348 pipe(0x7f9298820000 sd=131 :6801 s=4 pgs=4363 cs=1 l=1 c=0x7f9298825180).reader done
2015-12-04 18:07:53.709777 7f20b2775700 20 -- 172.19.10.1:6841/119941 >> 172.19.10.2:6800/11446 pipe(0x19616000 sd=158 :6841 s=2 pgs=655 cs=1 l=1 c=0x194b0d60).writer sending 1 0x1a009180 blist 2178765965
2015-12-04 18:07:53.709809 7f20b2775700 0 -- 172.19.10.1:6841/119941 >> 172.19.10.2:6800/11446 pipe(0x19616000 sd=158 :6841 s=2 pgs=655 cs=1 l=1 c=0x194b0d60). left -2116201331
2015-12-04 16:12:27.662109 7f9293df6700 -1 msg/simple/Pipe.cc: In function 'int Pipe::write_message(ceph_msg_header&, ceph_msg_footer&, ceph::bufferlist&)' thread

7f9293df6700 time 2015-12-04 16:12:27.661148
msg/simple/Pipe.cc: 2327: FAILED assert(left == 0)

After install package ceph-debuginfo,i reproduce the bug, and gdb stack info below:
(gdb) bt#0 0x000000315f60f5db in raise () from /lib64/libpthread.so.0#1 0x0000000000a48906 in reraise_fatal (signum=6) at global/signal_handler.cc:59#2

handle_fatal_signal (signum=6) at global/signal_handler.cc:109#3 <signal handler called>
#4 0x000000315f232925 in raise () from /lib64/libc.so.6
#5 0x000000315f234105 in abort () from /lib64/libc.so.6
#6 0x00000031616bea5d in _gnu_cxx::_verbose_terminate_handler() () from /usr/lib64/libstdc++.so.6
#7 0x00000031616bcbe6 in ?? () from /usr/lib64/libstdc++.so.6
#8 0x00000031616bcc13 in std::terminate() () from /usr/lib64/libstdc++.so.6
#9 0x00000031616bcd0e in _cxa_throw () from /usr/lib64/libstdc++.so.6
#10 0x0000000000accfda in ceph::
_ceph_assert_fail (assertion=0x7f927fc67980 "\353JaV\273f\331-", file=<value optimized out>,
line=-729399040, func=0xda59e0 "int Pipe::write_message(ceph_msg_header&, ceph_msg_footer&, ceph::bufferlist&)")
at common/assert.cc:77
#11 0x0000000000c2142c in Pipe::write_message (this=0x7f9298820000, header=<value optimized out>, footer=..., blist=...)
at msg/simple/Pipe.cc:2327
#12 0x0000000000c37a35 in Pipe::writer (this=0x7f9298820000) at msg/simple/Pipe.cc:1828
#13 0x0000000000c3ab4d in Pipe::Writer::entry (this=<value optimized out>) at msg/simple/Pipe.h:62
#14 0x000000315f6079d1 in start_thread () from /lib64/libpthread.so.0
#15 0x000000315f2e8b6d in clone () from /lib64/libc.so.6

Then i add some debug info to print the 'left' value, i got this:
2015-12-05 11:30:09.298753 7faa42076700 20 -- 172.19.10.1:6841/255898 >> 172.19.10.2:6800/11446 pipe(0x2087f000 sd=162 :6841 s=2 pgs=12117 cs=1 l=1 c=0x186c9020).writer signed seq # 1): sig = 0
2015-12-05 11:30:09.298787 7faa42076700 20 -- 172.19.10.1:6841/255898 >> 172.19.10.2:6800/11446 pipe(0x2087f000 sd=162 :6841 s=2 pgs=12117 cs=1 l=1 c=0x186c9020).writer sending 1 0x3ccc940 blist 271
2015-12-05 11:30:09.298792 7faa42076700 20 -- 172.19.10.1:6841/255898 >> 172.19.10.2:6800/11446 pipe(0x2087f000 sd=162 :6841 s=2 pgs=12117 cs=1 l=1 c=0x186c9020).writer sending 1 0x3ccc940 blist 271
2015-12-05 11:30:10.014508 7faa42076700 20 -- 172.19.10.1:6841/255898 >> 172.19.10.2:6800/11446 pipe(0x2087f000 sd=162 :6841 s=2 pgs=12117 cs=1 l=1 c=0x186c9020).writer sending 1 0x3ccc940 blist 2178765965
2015-12-05 11:30:10.014545 7faa42076700 0 -- 172.19.10.1:6841/255898 >> 172.19.10.2:6800/11446 pipe(0x2087f000 sd=162 :6841 s=2 pgs=12117 cs=1 l=1 c=0x186c9020). left -2116201331
2015-12-05 11:30:10.015764 7faa42076700 -1 msg/simple/Pipe.cc: In function 'int Pipe::write_message(ceph_msg_header&,ceph_msg_footer&, ceph::bufferlist&)' thread

7faa42076700 time 2015-12-05 11:30:10.014567
msg/simple/Pipe.cc: 2330: FAILED assert(left == 0)

Summary:
1) we got 2178765965 bytes data in message, but in Pipe.cc::write_message() function

// payload (front+data)
list<bufferptr>::const_iterator pb = blist.buffers().begin();
int b_off = 0; // carry-over buffer offset, if any
int bl_pos = 0; // blist pos
int left = blist.length();

'int' from -2147483648 to 2147483647
'unsigned int' from 0 to 4294967295
we set "unsigned int" to "int", that result the assert(left == 0) failed
i changed it to:
// payload (front+data)
list<bufferptr>::const_iterator pb = blist.buffers().begin();
unsigned b_off = 0; // carry-over buffer offset, if any
unsigned bl_pos = 0; // blist pos
unsigned left = blist.length()

it now works.

Actions

Also available in: Atom PDF