msg/async: crash in the case STATE_OPEN_MESSAGE_READ_DATA
This bug was found in version 12.2.5.
mds cannot be started.
When mds was restarted and would crash during the rejoin phase, the log of mds is listed below.
It seems client x.x.x.16 sent a big msg of size 2869776384 to mds and caused the crash.
2869776384 exceeds the boundary of int . when 2869776384 converted to int parameters in function advance(int), it inovked end of buffer.
2018-11-01 18:25:45.859359 7f87d9c34700 20 -- x.x.x.42:6800/778237716 >> x.x.x.16:0/3867854018 conn(0x558b5f74f000 :6800 s=STATE_OPEN_MESSAGE_READ_DATA pgs=548 cs=1 l=0).process prev state is STATE_OPEN_MESSAGE_READ_DATA
2018-11-01 18:25:45.859362 7f87d9c34700 25 -- x.x.x.42:6800/778237716 >> x.x.x.x:0/3867854018 conn(0x558b5f74f000 :6800 s=STATE_OPEN_MESSAGE_READ_DATA pgs=548 cs=1 l=0).read_until len is 2869776384 state_offset is 2869267096
2018-11-01 18:25:45.859635 7f87d9c34700 25 -- x.x.x.42:6800/778237716 >> x.x.x.16:0/3867854018 conn(0x558b5f74f000 :6800 s=STATE_OPEN_MESSAGE_READ_DATA pgs=548 cs=1 l=0).read_until read_bulk left is 509288 got 509288
2018-11-01 18:25:45.860316 7f87d9c34700 -1 ** Caught signal (Aborted) *
in thread 7f87d9c34700 thread_name:msgr-worker-1
From /var/log/messages ,I found output belows:
Nov 1 18:25:40 xxx ceph-mds: tcmalloc: large alloc 2869780480 bytes == 0x558b87dd0000 @ 0x7f87ddd8f4ef 0x7f87dddb0010 0x558b44c5ee94 0x558b44fc57ff 0x558b44d2f5a9 0x558 b44d3216e 0x7f87dbe6d2b0 0x7f87dc4f2e25 0x7f87db5d534d
Nov 1 18:25:45 xxx ceph-mds: terminate called after throwing an instance of 'ceph::buffer::end_of_buffer'
Nov 1 18:25:45 xxx ceph-mds: what(): buffer::end_of_buffer
Nov 1 18:25:45 xxx ceph-mds: ** Caught signal (Aborted) *
#2 Updated by Patrick Donnelly 3 months ago
- Subject changed from msg/async:Crash in the case STATE_OPEN_MESSAGE_READ_DATA of process(),When read is big unsigned int and will be converted to unsigned indata_blp.advance(), it will exceed the boundary of int. to msg/async: crash in the case STATE_OPEN_MESSAGE_READ_DATA
- Assignee set to shen hang
- Target version set to v14.0.0
- Start date deleted (
- Source set to Community (dev)
- Pull request ID set to 25315
- Affected Versions v12.2.10 added
- Affected Versions deleted (
v12.2.5, v12.2.6, v12.2.7, v12.2.8, v12.2.9)
- ceph-qa-suite deleted (