Bug #22480
msg/simple: rados bench segv in ceph::buffer::list::iterator_impl::advance(), Pipe::read_message
0%
Description
2017-12-19T08:12:14.323 INFO:tasks.radosbench.radosbench.0.smithi012.stderr:*** Caught signal (Segmentation fault) ** 2017-12-19T08:12:14.324 INFO:tasks.radosbench.radosbench.0.smithi012.stderr: in thread 7f88247f8700 thread_name:ms_pipe_read 2017-12-19T08:12:14.324 INFO:tasks.radosbench.radosbench.0.smithi012.stderr: ceph version 13.0.0-4125-gdc6898e (dc6898ed56bd530acc62c3547d876952ba5d835d) mimic (dev) 2017-12-19T08:12:14.324 INFO:tasks.radosbench.radosbench.0.smithi012.stderr: 1: (()+0x59b4c) [0x5629b8b91b4c] 2017-12-19T08:12:14.324 INFO:tasks.radosbench.radosbench.0.smithi012.stderr: 2: (()+0x11390) [0x7f8830c1d390] 2017-12-19T08:12:14.325 INFO:tasks.radosbench.radosbench.0.smithi012.stderr: 3: (ceph::buffer::list::iterator_impl<false>::advance(int)+0x35) [0x7f8831050755] 2017-12-19T08:12:14.325 INFO:tasks.radosbench.radosbench.0.smithi012.stderr: 4: (Pipe::read_message(Message**, AuthSessionHandler*)+0x129c) [0x7f8831184adc] 2017-12-19T08:12:14.325 INFO:tasks.radosbench.radosbench.0.smithi012.stderr: 5: (Pipe::reader()+0xc29) [0x7f883118ebe9] 2017-12-19T08:12:14.325 INFO:tasks.radosbench.radosbench.0.smithi012.stderr: 6: (Pipe::Reader::entry()+0xd) [0x7f883119770d] 2017-12-19T08:12:14.325 INFO:tasks.radosbench.radosbench.0.smithi012.stderr: 7: (()+0x76ba) [0x7f8830c136ba] 2017-12-19T08:12:14.325 INFO:tasks.radosbench.radosbench.0.smithi012.stderr: 8: (clone()+0x6d) [0x7f882fc9c3dd] 2017-12-19T08:12:14.348 INFO:tasks.radosbench.radosbench.0.smithi012.stderr:Segmentation fault (core dumped)
/a/sage-2017-12-19_06:01:05-rados-wip-sage2-testing-2017-12-18-2147-distro-basic-smithi/1979727
rados/singleton/{all/thrash-eio.yaml msgr-failures/many.yaml msgr/random.yaml objectstore/bluestore-comp.yaml rados.yaml}
Related issues
History
#1 Updated by Sage Weil over 6 years ago
- Project changed from Ceph to RADOS
#2 Updated by Sage Weil over 5 years ago
2018-07-10T22:07:45.103 INFO:tasks.radosbench.radosbench.0.smithi161.stderr:*** Caught signal (Segmentation fault) ** 2018-07-10T22:07:45.103 INFO:tasks.radosbench.radosbench.0.smithi161.stderr: in thread 7f7950184700 thread_name:ms_pipe_read 2018-07-10T22:07:45.104 INFO:tasks.radosbench.radosbench.0.smithi161.stderr: ceph version 14.0.0-1233-g3944838 (3944838c7daaf6ab5ff54f23aebca2256e63d795) nautilus (dev) 2018-07-10T22:07:45.104 INFO:tasks.radosbench.radosbench.0.smithi161.stderr: 1: (()+0xf6d0) [0x7f7958ccd6d0] 2018-07-10T22:07:45.104 INFO:tasks.radosbench.radosbench.0.smithi161.stderr: 2: (ceph::buffer::list::iterator_impl<false>::advance(int)+0x3d) [0x7f7963a3bbdd] 2018-07-10T22:07:45.104 INFO:tasks.radosbench.radosbench.0.smithi161.stderr: 3: (Pipe::read_message(Message**, AuthSessionHandler*)+0x924) [0x7f795a9653a4] 2018-07-10T22:07:45.104 INFO:tasks.radosbench.radosbench.0.smithi161.stderr: 4: (Pipe::reader()+0xb53) [0x7f795a967483] 2018-07-10T22:07:45.104 INFO:tasks.radosbench.radosbench.0.smithi161.stderr: 5: (Pipe::Reader::entry()+0xd) [0x7f795a96a56d] 2018-07-10T22:07:45.104 INFO:tasks.radosbench.radosbench.0.smithi161.stderr: 6: (()+0x7e25) [0x7f7958cc5e25]
/a/sage-2018-07-10_18:05:15-rados-wip-sage3-testing-2018-07-10-1048-distro-basic-smithi/2762968
#3 Updated by Neha Ojha over 5 years ago
/a/yuriw-2018-08-01_19:35:55-rados-wip-yuri-testing-2018-08-01-1605-luminous-distro-basic-smithi/2849244/
#4 Updated by Sage Weil about 5 years ago
- Priority changed from Normal to High
2019-02-21T23:37:12.269 INFO:tasks.radosbench.radosbench.0.smithi087.stderr: ceph version 14.0.1-4064-g29c3ee3 (29c3ee3b2ff12b9c71f42161314be14bd122bbda) nautilus (dev) 2019-02-21T23:37:12.269 INFO:tasks.radosbench.radosbench.0.smithi087.stderr: 1: (()+0x11390) [0x7f55b3e1c390] 2019-02-21T23:37:12.270 INFO:tasks.radosbench.radosbench.0.smithi087.stderr: 2: (ceph::buffer::v14_2_0::list::iterator_impl<false>::advance(unsigned int)+0x31) [0x7f55b4661091] 2019-02-21T23:37:12.270 INFO:tasks.radosbench.radosbench.0.smithi087.stderr: 3: (Pipe::read_message(Message**, AuthSessionHandler*)+0xba8) [0x7f55b45c59e8] 2019-02-21T23:37:12.270 INFO:tasks.radosbench.radosbench.0.smithi087.stderr: 4: (Pipe::reader()+0xbc1) [0x7f55b45c7ec1] 2019-02-21T23:37:12.270 INFO:tasks.radosbench.radosbench.0.smithi087.stderr: 5: (Pipe::Reader::entry()+0xd) [0x7f55b45cb2ed] 2019-02-21T23:37:12.270 INFO:tasks.radosbench.radosbench.0.smithi087.stderr: 6: (()+0x76ba) [0x7f55b3e126ba] 2019-02-21T23:37:12.270 INFO:tasks.radosbench.radosbench.0.smithi087.stderr: 7: (clone()+0x6d) [0x7f55b343341d]
/a/sage-2019-02-21_21:52:17-rados-wip-sage3-testing-2019-02-21-1359-distro-basic-smithi/3622620
#5 Updated by Sage Weil about 5 years ago
- Subject changed from rados bench segv in ceph::buffer::list::iterator_impl::advance(), Pipe::read_message to msg/simple: rados bench segv in ceph::buffer::list::iterator_impl::advance(), Pipe::read_message
#6 Updated by Sage Weil about 5 years ago
- Priority changed from High to Urgent
/a/sage-2019-02-24_19:27:53-rados-wip-sage-testing-2019-02-24-1127-distro-basic-smithi/3634191
#7 Updated by Sage Weil about 5 years ago
/a/sage-2019-02-24_19:27:53-rados-wip-sage-testing-2019-02-24-1127-distro-basic-smithi/3634199
#8 Updated by Sage Weil about 5 years ago
it's always a standalone test, either thrash-eio or the newer thrash-backfill.
reproduces very easily, see http://pulpito.ceph.com/sage-22480-a/
#9 Updated by Sage Weil about 5 years ago
- Status changed from 12 to In Progress
- Assignee set to Sage Weil
#10 Updated by Sage Weil about 5 years ago
/a/sage-22480-b/3642573
looks like there was some rx_buffers activity on the connection right before it crashed....
#11 Updated by Sage Weil about 5 years ago
yep, this made the failures go away:
diff --git a/src/msg/simple/Pipe.cc b/src/msg/simple/Pipe.cc index 1a06ab04d1..50f2baa11e 100644 --- a/src/msg/simple/Pipe.cc +++ b/src/msg/simple/Pipe.cc @@ -2152,7 +2152,7 @@ int Pipe::read_message(Message **pm, AuthSessionHandler* auth_handler) // get a buffer connection_state->lock.Lock(); - map<ceph_tid_t,pair<bufferlist,int> >::iterator p = connection_state->rx_buffers.find(header.tid); + map<ceph_tid_t,pair<bufferlist,int> >::iterator p = connection_state->rx_buffers.end(); //= connection_state->rx_buffers.find(header.tid); if (p != connection_state->rx_buffers.end()) { if (rxbuf.length() == 0 || p->second.second != rxbuf_version) { ldout(msgr->cct,10) << "reader seleting rx buffer v " << p->second.second
#12 Updated by Neha Ojha about 5 years ago
- Status changed from In Progress to Fix Under Review
- Pull request ID set to 26696
#13 Updated by Sage Weil about 5 years ago
- Status changed from Fix Under Review to Pending Backport
- Backport set to mimic,luminous
We need to think about how to backport this in the most non-disruptive way.
#14 Updated by Nathan Cutler about 5 years ago
- Copied to Backport #38570: luminous: msg/simple: rados bench segv in ceph::buffer::list::iterator_impl::advance(), Pipe::read_message added
#15 Updated by Nathan Cutler about 5 years ago
- Copied to Backport #38571: mimic: msg/simple: rados bench segv in ceph::buffer::list::iterator_impl::advance(), Pipe::read_message added
#16 Updated by Greg Farnum about 5 years ago
- Project changed from RADOS to Messengers
#17 Updated by Greg Farnum about 5 years ago
- Category set to SimpleMessenger
#18 Updated by Nathan Cutler about 3 years ago
- Status changed from Pending Backport to Resolved
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".