Project

General

Profile

Bug #22480

msg/simple: rados bench segv in ceph::buffer::list::iterator_impl::advance(), Pipe::read_message

Added by Sage Weil over 6 years ago. Updated about 3 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
SimpleMessenger
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
mimic,luminous
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

2017-12-19T08:12:14.323 INFO:tasks.radosbench.radosbench.0.smithi012.stderr:*** Caught signal (Segmentation fault) **
2017-12-19T08:12:14.324 INFO:tasks.radosbench.radosbench.0.smithi012.stderr: in thread 7f88247f8700 thread_name:ms_pipe_read
2017-12-19T08:12:14.324 INFO:tasks.radosbench.radosbench.0.smithi012.stderr: ceph version 13.0.0-4125-gdc6898e (dc6898ed56bd530acc62c3547d876952ba5d835d) mimic (dev)
2017-12-19T08:12:14.324 INFO:tasks.radosbench.radosbench.0.smithi012.stderr: 1: (()+0x59b4c) [0x5629b8b91b4c]
2017-12-19T08:12:14.324 INFO:tasks.radosbench.radosbench.0.smithi012.stderr: 2: (()+0x11390) [0x7f8830c1d390]
2017-12-19T08:12:14.325 INFO:tasks.radosbench.radosbench.0.smithi012.stderr: 3: (ceph::buffer::list::iterator_impl<false>::advance(int)+0x35) [0x7f8831050755]
2017-12-19T08:12:14.325 INFO:tasks.radosbench.radosbench.0.smithi012.stderr: 4: (Pipe::read_message(Message**, AuthSessionHandler*)+0x129c) [0x7f8831184adc]
2017-12-19T08:12:14.325 INFO:tasks.radosbench.radosbench.0.smithi012.stderr: 5: (Pipe::reader()+0xc29) [0x7f883118ebe9]
2017-12-19T08:12:14.325 INFO:tasks.radosbench.radosbench.0.smithi012.stderr: 6: (Pipe::Reader::entry()+0xd) [0x7f883119770d]
2017-12-19T08:12:14.325 INFO:tasks.radosbench.radosbench.0.smithi012.stderr: 7: (()+0x76ba) [0x7f8830c136ba]
2017-12-19T08:12:14.325 INFO:tasks.radosbench.radosbench.0.smithi012.stderr: 8: (clone()+0x6d) [0x7f882fc9c3dd]
2017-12-19T08:12:14.348 INFO:tasks.radosbench.radosbench.0.smithi012.stderr:Segmentation fault (core dumped)

/a/sage-2017-12-19_06:01:05-rados-wip-sage2-testing-2017-12-18-2147-distro-basic-smithi/1979727
rados/singleton/{all/thrash-eio.yaml msgr-failures/many.yaml msgr/random.yaml objectstore/bluestore-comp.yaml rados.yaml}

Related issues

Copied to Messengers - Backport #38570: luminous: msg/simple: rados bench segv in ceph::buffer::list::iterator_impl::advance(), Pipe::read_message Rejected
Copied to Messengers - Backport #38571: mimic: msg/simple: rados bench segv in ceph::buffer::list::iterator_impl::advance(), Pipe::read_message Rejected

History

#1 Updated by Sage Weil over 6 years ago

  • Project changed from Ceph to RADOS

#2 Updated by Sage Weil over 5 years ago

2018-07-10T22:07:45.103 INFO:tasks.radosbench.radosbench.0.smithi161.stderr:*** Caught signal (Segmentation fault) **
2018-07-10T22:07:45.103 INFO:tasks.radosbench.radosbench.0.smithi161.stderr: in thread 7f7950184700 thread_name:ms_pipe_read
2018-07-10T22:07:45.104 INFO:tasks.radosbench.radosbench.0.smithi161.stderr: ceph version 14.0.0-1233-g3944838 (3944838c7daaf6ab5ff54f23aebca2256e63d795) nautilus (dev)
2018-07-10T22:07:45.104 INFO:tasks.radosbench.radosbench.0.smithi161.stderr: 1: (()+0xf6d0) [0x7f7958ccd6d0]
2018-07-10T22:07:45.104 INFO:tasks.radosbench.radosbench.0.smithi161.stderr: 2: (ceph::buffer::list::iterator_impl<false>::advance(int)+0x3d) [0x7f7963a3bbdd]
2018-07-10T22:07:45.104 INFO:tasks.radosbench.radosbench.0.smithi161.stderr: 3: (Pipe::read_message(Message**, AuthSessionHandler*)+0x924) [0x7f795a9653a4]
2018-07-10T22:07:45.104 INFO:tasks.radosbench.radosbench.0.smithi161.stderr: 4: (Pipe::reader()+0xb53) [0x7f795a967483]
2018-07-10T22:07:45.104 INFO:tasks.radosbench.radosbench.0.smithi161.stderr: 5: (Pipe::Reader::entry()+0xd) [0x7f795a96a56d]
2018-07-10T22:07:45.104 INFO:tasks.radosbench.radosbench.0.smithi161.stderr: 6: (()+0x7e25) [0x7f7958cc5e25]

/a/sage-2018-07-10_18:05:15-rados-wip-sage3-testing-2018-07-10-1048-distro-basic-smithi/2762968

#3 Updated by Neha Ojha over 5 years ago

/a/yuriw-2018-08-01_19:35:55-rados-wip-yuri-testing-2018-08-01-1605-luminous-distro-basic-smithi/2849244/

#4 Updated by Sage Weil about 5 years ago

  • Priority changed from Normal to High
2019-02-21T23:37:12.269 INFO:tasks.radosbench.radosbench.0.smithi087.stderr: ceph version 14.0.1-4064-g29c3ee3 (29c3ee3b2ff12b9c71f42161314be14bd122bbda) nautilus (dev)
2019-02-21T23:37:12.269 INFO:tasks.radosbench.radosbench.0.smithi087.stderr: 1: (()+0x11390) [0x7f55b3e1c390]
2019-02-21T23:37:12.270 INFO:tasks.radosbench.radosbench.0.smithi087.stderr: 2: (ceph::buffer::v14_2_0::list::iterator_impl<false>::advance(unsigned int)+0x31) [0x7f55b4661091]
2019-02-21T23:37:12.270 INFO:tasks.radosbench.radosbench.0.smithi087.stderr: 3: (Pipe::read_message(Message**, AuthSessionHandler*)+0xba8) [0x7f55b45c59e8]
2019-02-21T23:37:12.270 INFO:tasks.radosbench.radosbench.0.smithi087.stderr: 4: (Pipe::reader()+0xbc1) [0x7f55b45c7ec1]
2019-02-21T23:37:12.270 INFO:tasks.radosbench.radosbench.0.smithi087.stderr: 5: (Pipe::Reader::entry()+0xd) [0x7f55b45cb2ed]
2019-02-21T23:37:12.270 INFO:tasks.radosbench.radosbench.0.smithi087.stderr: 6: (()+0x76ba) [0x7f55b3e126ba]
2019-02-21T23:37:12.270 INFO:tasks.radosbench.radosbench.0.smithi087.stderr: 7: (clone()+0x6d) [0x7f55b343341d]

/a/sage-2019-02-21_21:52:17-rados-wip-sage3-testing-2019-02-21-1359-distro-basic-smithi/3622620

#5 Updated by Sage Weil about 5 years ago

  • Subject changed from rados bench segv in ceph::buffer::list::iterator_impl::advance(), Pipe::read_message to msg/simple: rados bench segv in ceph::buffer::list::iterator_impl::advance(), Pipe::read_message

#6 Updated by Sage Weil about 5 years ago

  • Priority changed from High to Urgent

/a/sage-2019-02-24_19:27:53-rados-wip-sage-testing-2019-02-24-1127-distro-basic-smithi/3634191

#7 Updated by Sage Weil about 5 years ago

/a/sage-2019-02-24_19:27:53-rados-wip-sage-testing-2019-02-24-1127-distro-basic-smithi/3634199

#8 Updated by Sage Weil about 5 years ago

it's always a standalone test, either thrash-eio or the newer thrash-backfill.

reproduces very easily, see http://pulpito.ceph.com/sage-22480-a/

#9 Updated by Sage Weil about 5 years ago

  • Status changed from 12 to In Progress
  • Assignee set to Sage Weil

#10 Updated by Sage Weil about 5 years ago

/a/sage-22480-b/3642573

looks like there was some rx_buffers activity on the connection right before it crashed....

#11 Updated by Sage Weil about 5 years ago

yep, this made the failures go away:

diff --git a/src/msg/simple/Pipe.cc b/src/msg/simple/Pipe.cc
index 1a06ab04d1..50f2baa11e 100644
--- a/src/msg/simple/Pipe.cc
+++ b/src/msg/simple/Pipe.cc
@@ -2152,7 +2152,7 @@ int Pipe::read_message(Message **pm, AuthSessionHandler* auth_handler)

       // get a buffer
       connection_state->lock.Lock();
-      map<ceph_tid_t,pair<bufferlist,int> >::iterator p = connection_state->rx_buffers.find(header.tid);
+      map<ceph_tid_t,pair<bufferlist,int> >::iterator p = connection_state->rx_buffers.end(); //= connection_state->rx_buffers.find(header.tid);
       if (p != connection_state->rx_buffers.end()) {
        if (rxbuf.length() == 0 || p->second.second != rxbuf_version) {
          ldout(msgr->cct,10) << "reader seleting rx buffer v " << p->second.second

#12 Updated by Neha Ojha about 5 years ago

  • Status changed from In Progress to Fix Under Review
  • Pull request ID set to 26696

#13 Updated by Sage Weil about 5 years ago

  • Status changed from Fix Under Review to Pending Backport
  • Backport set to mimic,luminous

We need to think about how to backport this in the most non-disruptive way.

#14 Updated by Nathan Cutler about 5 years ago

  • Copied to Backport #38570: luminous: msg/simple: rados bench segv in ceph::buffer::list::iterator_impl::advance(), Pipe::read_message added

#15 Updated by Nathan Cutler about 5 years ago

  • Copied to Backport #38571: mimic: msg/simple: rados bench segv in ceph::buffer::list::iterator_impl::advance(), Pipe::read_message added

#16 Updated by Greg Farnum about 5 years ago

  • Project changed from RADOS to Messengers

#17 Updated by Greg Farnum about 5 years ago

  • Category set to SimpleMessenger

#18 Updated by Nathan Cutler about 3 years ago

  • Status changed from Pending Backport to Resolved

While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".

Also available in: Atom PDF