Bug #22480
closed
msg/simple: rados bench segv in ceph::buffer::list::iterator_impl::advance(), Pipe::read_message
Added by Sage Weil over 6 years ago.
Updated over 3 years ago.
Description
2017-12-19T08:12:14.323 INFO:tasks.radosbench.radosbench.0.smithi012.stderr:*** Caught signal (Segmentation fault) **
2017-12-19T08:12:14.324 INFO:tasks.radosbench.radosbench.0.smithi012.stderr: in thread 7f88247f8700 thread_name:ms_pipe_read
2017-12-19T08:12:14.324 INFO:tasks.radosbench.radosbench.0.smithi012.stderr: ceph version 13.0.0-4125-gdc6898e (dc6898ed56bd530acc62c3547d876952ba5d835d) mimic (dev)
2017-12-19T08:12:14.324 INFO:tasks.radosbench.radosbench.0.smithi012.stderr: 1: (()+0x59b4c) [0x5629b8b91b4c]
2017-12-19T08:12:14.324 INFO:tasks.radosbench.radosbench.0.smithi012.stderr: 2: (()+0x11390) [0x7f8830c1d390]
2017-12-19T08:12:14.325 INFO:tasks.radosbench.radosbench.0.smithi012.stderr: 3: (ceph::buffer::list::iterator_impl<false>::advance(int)+0x35) [0x7f8831050755]
2017-12-19T08:12:14.325 INFO:tasks.radosbench.radosbench.0.smithi012.stderr: 4: (Pipe::read_message(Message**, AuthSessionHandler*)+0x129c) [0x7f8831184adc]
2017-12-19T08:12:14.325 INFO:tasks.radosbench.radosbench.0.smithi012.stderr: 5: (Pipe::reader()+0xc29) [0x7f883118ebe9]
2017-12-19T08:12:14.325 INFO:tasks.radosbench.radosbench.0.smithi012.stderr: 6: (Pipe::Reader::entry()+0xd) [0x7f883119770d]
2017-12-19T08:12:14.325 INFO:tasks.radosbench.radosbench.0.smithi012.stderr: 7: (()+0x76ba) [0x7f8830c136ba]
2017-12-19T08:12:14.325 INFO:tasks.radosbench.radosbench.0.smithi012.stderr: 8: (clone()+0x6d) [0x7f882fc9c3dd]
2017-12-19T08:12:14.348 INFO:tasks.radosbench.radosbench.0.smithi012.stderr:Segmentation fault (core dumped)
/a/sage-2017-12-19_06:01:05-rados-wip-sage2-testing-2017-12-18-2147-distro-basic-smithi/1979727
rados/singleton/{all/thrash-eio.yaml msgr-failures/many.yaml msgr/random.yaml objectstore/bluestore-comp.yaml rados.yaml}
- Project changed from Ceph to RADOS
2018-07-10T22:07:45.103 INFO:tasks.radosbench.radosbench.0.smithi161.stderr:*** Caught signal (Segmentation fault) **
2018-07-10T22:07:45.103 INFO:tasks.radosbench.radosbench.0.smithi161.stderr: in thread 7f7950184700 thread_name:ms_pipe_read
2018-07-10T22:07:45.104 INFO:tasks.radosbench.radosbench.0.smithi161.stderr: ceph version 14.0.0-1233-g3944838 (3944838c7daaf6ab5ff54f23aebca2256e63d795) nautilus (dev)
2018-07-10T22:07:45.104 INFO:tasks.radosbench.radosbench.0.smithi161.stderr: 1: (()+0xf6d0) [0x7f7958ccd6d0]
2018-07-10T22:07:45.104 INFO:tasks.radosbench.radosbench.0.smithi161.stderr: 2: (ceph::buffer::list::iterator_impl<false>::advance(int)+0x3d) [0x7f7963a3bbdd]
2018-07-10T22:07:45.104 INFO:tasks.radosbench.radosbench.0.smithi161.stderr: 3: (Pipe::read_message(Message**, AuthSessionHandler*)+0x924) [0x7f795a9653a4]
2018-07-10T22:07:45.104 INFO:tasks.radosbench.radosbench.0.smithi161.stderr: 4: (Pipe::reader()+0xb53) [0x7f795a967483]
2018-07-10T22:07:45.104 INFO:tasks.radosbench.radosbench.0.smithi161.stderr: 5: (Pipe::Reader::entry()+0xd) [0x7f795a96a56d]
2018-07-10T22:07:45.104 INFO:tasks.radosbench.radosbench.0.smithi161.stderr: 6: (()+0x7e25) [0x7f7958cc5e25]
/a/sage-2018-07-10_18:05:15-rados-wip-sage3-testing-2018-07-10-1048-distro-basic-smithi/2762968
/a/yuriw-2018-08-01_19:35:55-rados-wip-yuri-testing-2018-08-01-1605-luminous-distro-basic-smithi/2849244/
- Priority changed from Normal to High
2019-02-21T23:37:12.269 INFO:tasks.radosbench.radosbench.0.smithi087.stderr: ceph version 14.0.1-4064-g29c3ee3 (29c3ee3b2ff12b9c71f42161314be14bd122bbda) nautilus (dev)
2019-02-21T23:37:12.269 INFO:tasks.radosbench.radosbench.0.smithi087.stderr: 1: (()+0x11390) [0x7f55b3e1c390]
2019-02-21T23:37:12.270 INFO:tasks.radosbench.radosbench.0.smithi087.stderr: 2: (ceph::buffer::v14_2_0::list::iterator_impl<false>::advance(unsigned int)+0x31) [0x7f55b4661091]
2019-02-21T23:37:12.270 INFO:tasks.radosbench.radosbench.0.smithi087.stderr: 3: (Pipe::read_message(Message**, AuthSessionHandler*)+0xba8) [0x7f55b45c59e8]
2019-02-21T23:37:12.270 INFO:tasks.radosbench.radosbench.0.smithi087.stderr: 4: (Pipe::reader()+0xbc1) [0x7f55b45c7ec1]
2019-02-21T23:37:12.270 INFO:tasks.radosbench.radosbench.0.smithi087.stderr: 5: (Pipe::Reader::entry()+0xd) [0x7f55b45cb2ed]
2019-02-21T23:37:12.270 INFO:tasks.radosbench.radosbench.0.smithi087.stderr: 6: (()+0x76ba) [0x7f55b3e126ba]
2019-02-21T23:37:12.270 INFO:tasks.radosbench.radosbench.0.smithi087.stderr: 7: (clone()+0x6d) [0x7f55b343341d]
/a/sage-2019-02-21_21:52:17-rados-wip-sage3-testing-2019-02-21-1359-distro-basic-smithi/3622620
- Subject changed from rados bench segv in ceph::buffer::list::iterator_impl::advance(), Pipe::read_message to msg/simple: rados bench segv in ceph::buffer::list::iterator_impl::advance(), Pipe::read_message
- Priority changed from High to Urgent
/a/sage-2019-02-24_19:27:53-rados-wip-sage-testing-2019-02-24-1127-distro-basic-smithi/3634191
/a/sage-2019-02-24_19:27:53-rados-wip-sage-testing-2019-02-24-1127-distro-basic-smithi/3634199
- Status changed from 12 to In Progress
- Assignee set to Sage Weil
/a/sage-22480-b/3642573
looks like there was some rx_buffers activity on the connection right before it crashed....
yep, this made the failures go away:
diff --git a/src/msg/simple/Pipe.cc b/src/msg/simple/Pipe.cc
index 1a06ab04d1..50f2baa11e 100644
--- a/src/msg/simple/Pipe.cc
+++ b/src/msg/simple/Pipe.cc
@@ -2152,7 +2152,7 @@ int Pipe::read_message(Message **pm, AuthSessionHandler* auth_handler)
// get a buffer
connection_state->lock.Lock();
- map<ceph_tid_t,pair<bufferlist,int> >::iterator p = connection_state->rx_buffers.find(header.tid);
+ map<ceph_tid_t,pair<bufferlist,int> >::iterator p = connection_state->rx_buffers.end(); //= connection_state->rx_buffers.find(header.tid);
if (p != connection_state->rx_buffers.end()) {
if (rxbuf.length() == 0 || p->second.second != rxbuf_version) {
ldout(msgr->cct,10) << "reader seleting rx buffer v " << p->second.second
- Status changed from In Progress to Fix Under Review
- Pull request ID set to 26696
- Status changed from Fix Under Review to Pending Backport
- Backport set to mimic,luminous
We need to think about how to backport this in the most non-disruptive way.
- Copied to Backport #38570: luminous: msg/simple: rados bench segv in ceph::buffer::list::iterator_impl::advance(), Pipe::read_message added
- Copied to Backport #38571: mimic: msg/simple: rados bench segv in ceph::buffer::list::iterator_impl::advance(), Pipe::read_message added
- Project changed from RADOS to Messengers
- Category set to SimpleMessenger
- Status changed from Pending Backport to Resolved
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".
Also available in: Atom
PDF