Project

General

Profile

Bug #9582

librados: segmentation fault on timeout

Added by Matthias Kiefer almost 9 years ago. Updated almost 9 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
librados
Target version:
-
% Done:

0%

Source:
Support
Tags:
Backport:
firefly, dumpling
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Summary: If you configure librados with rados_osd_op_timeout, timeouts will result sometimes in a segmentation fault.

Problem description: We configured librados with a rados_osd_op_timeout of 2 seconds on a cluster with 1440 OSDs. For testing purposes we put load on the cluster via 8 tomcat webapps (using librados via rados-java) on 8 hosts each doing about 15 reads/second and about 4 writes/seconds to the ceph cluster. Object size is 0-4MB. In normal situation timeouts happen from time to time and everything works as expected. However, in situtations where the ceph cluster gets unresponsive (due to a high CPU load on the ODS hosts) the tomcats crash randomly in librados. I have been able to to get the following stacktrace of a crash from a core dump:

#0  ceph_crc32c_le_intel (crc=0, data=0xc6bc3c0 <Address 0xc6bc3c0 out of bounds>, length=<optimized out>) at common/crc32c-intel.c:58
#1  0x00007fc715afd185 in ceph_crc32c_le (length=14245, data=0xc6bc3c0 <Address 0xc6bc3c0 out of bounds>, crc=0) at ./include/crc32c.h:16
#2  ceph::buffer::list::crc32c (this=this@entry=0x7fc6c5fde9a0, crc=crc@entry=0) at ./include/buffer.h:428
#3  0x00007fc715af6f32 in decode_message (cct=0x7fc71d180c90, header=..., footer=..., front=..., middle=..., data=...) at msg/Message.cc:267
#4  0x00007fc715b3e170 in Pipe::read_message (this=this@entry=0x51e10c0, pm=pm@entry=0x7fc6c5fded38, auth_handler=auth_handler@entry=0x3b12ed0) at msg/Pipe.cc:1920
#5  0x00007fc715b4fdc1 in Pipe::reader (this=0x51e10c0) at msg/Pipe.cc:1447
#6  0x00007fc715b542dd in Pipe::Reader::entry (this=<optimized out>) at msg/Pipe.h:49
#7  0x00007fc738097b50 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#8  0x00007fc7379c6e6d in clone () from /lib/x86_64-linux-gnu/libc.so.6
#9  0x0000000000000000 in ?? ()

OS: Debian GNU/Linux 7.6 (wheezy)
System: cpuinfo is attached
librados: dumpling branch at commit 3f020443c8d92e61d8593049147a79a6696c9c93 installed from debian package http://gitbuilder.ceph.com/ceph-deb-wheezy-x86_64-basic/ref/dumpling/pool/main/c/ceph/librados2_0.67.10-13-g3f02044-1wheezy_amd64.deb (due to including a fix for http://tracker.ceph.com/issues/9362).

Please let me know, if you need more information. I can also provide you the full core dump if you need it.

cpuinfo - CPU info of system where the crash happened (10.9 KB) Matthias Kiefer, 09/24/2014 07:38 AM


Related issues

Related to Ceph - Bug #9613: "Segmentation fault" in upgrade:dumpling-giant-x:parallel-giant-distro-basic-multi run Duplicate 09/27/2014
Related to Ceph - Bug #9650: RWTimer cancel_event is racy Resolved 10/02/2014
Related to Ceph - Bug #5925: hung ceph_test_rados_delete_pools_parallel Can't reproduce 08/09/2013
Related to Ceph - Bug #9845: hung ceph_test_rados_delete_pools_parallel Resolved 08/09/2013
Duplicated by Ceph - Bug #9417: "Segmentation fault" in upgrade:dumpling-giant-x-master-distro-basic-vps run Duplicate 09/10/2014
Duplicated by Ceph - Bug #9515: "Segmentation fault (ceph_test_rados_api_io)" in upgrade:dumpling-giant-x:parallel-giant-distro-basic-vps run Duplicate 09/17/2014

Associated revisions

Revision 126d0b30 (diff)
Added by Sage Weil almost 9 years ago

osdc/Objecter: only post_rx_buffer if no op timeout

If we post an rx buffer and there is a timeout, the revocation can happen
while the reader has consumed the buffers but before it has decoded and
constructed the message. In particular, we calculate a crc32c over the
data portion of the message after we've taken the buffers and dropped the
lock.

Instead of fixing this race (for example, by reverifying rx_buffers under
the lock while calculating the crc.. bleh), just skip the rx buffer
optimization entirely when a timeout is present.

Note that this doesn't cover the op_cancel() paths, but none of those users
provide static buffers to read into.

Fixes: #9582
Backport: firefly, dumpling
Signed-off-by: Sage Weil <>

Revision 3ebc3502 (diff)
Added by Sage Weil almost 9 years ago

osdc/Objecter: only post_rx_buffer if no op timeout

If we post an rx buffer and there is a timeout, the revocation can happen
while the reader has consumed the buffers but before it has decoded and
constructed the message. In particular, we calculate a crc32c over the
data portion of the message after we've taken the buffers and dropped the
lock.

Instead of fixing this race (for example, by reverifying rx_buffers under
the lock while calculating the crc.. bleh), just skip the rx buffer
optimization entirely when a timeout is present.

Note that this doesn't cover the op_cancel() paths, but none of those users
provide static buffers to read into.

Fixes: #9582
Backport: firefly, dumpling
Signed-off-by: Sage Weil <>

backport of 126d0b30e990519b8f845f99ba893fdcd56de447

Revision 78c3ef90 (diff)
Added by Sage Weil almost 9 years ago

osdc/Objecter: only post_rx_buffer if no op timeout

If we post an rx buffer and there is a timeout, the revocation can happen
while the reader has consumed the buffers but before it has decoded and
constructed the message. In particular, we calculate a crc32c over the
data portion of the message after we've taken the buffers and dropped the
lock.

Instead of fixing this race (for example, by reverifying rx_buffers under
the lock while calculating the crc.. bleh), just skip the rx buffer
optimization entirely when a timeout is present.

Note that this doesn't cover the op_cancel() paths, but none of those users
provide static buffers to read into.

Fixes: #9582
Backport: firefly, dumpling
Signed-off-by: Sage Weil <>

backport of 126d0b30e990519b8f845f99ba893fdcd56de447

Revision 8253ead1 (diff)
Added by Sage Weil almost 9 years ago

osdc/Objecter: use SafeTimer; make callbacks race-tolerant

The RWTimer event cancellation is racy. Instead, just make all of our
callbacks tolerate cancellation races. This is already true of most of
them (in fact, they are probably broken because they try to take a write
lock while holding a read lock). Fix C_CancelOp so that it calls the
other op_cancel (that takes a tid).

Then switch the RWTimer back to a SafeTimer. Put it in unsafe callbacks
mode because we don't want to introduce lock cycles with timer_lock.

Fixes: #9582
See also: #9650
Signed-off-by: Sage Weil <>

History

#1 Updated by JuanJose Galvez almost 9 years ago

  • Source changed from other to Support

#2 Updated by Greg Farnum almost 9 years ago

Okay, looks like this is another race:
1) The message is coming in over the wire, and the Pipe grabs a preallocated buffer to store the message data in
2) The op gets cancelled and any remaining buffers are revoked
3) The on commit calls get run, and presumably the client code zaps the buffers
4) The Pipe runs decode_message() and the bufferlists are pointing at invalid memory.

The problem is basically that the messenger doesn't have real cancellation mechanisms, and the Objecter is trying to layer on top of them but there are some pretty rough edges. I think maybe we'd need to have the messenger indicate on op cancellation whether it succeeded in revoking the buffers, and if it fails the Objecter needs to wait to see the reply and give up on the cancel? Although the reserved buffers aren't the normal mechanism, so somebody would need to remember that existing state.

Or the messenger needs to grow a proper op cancellation system that tracks which ops are currently being satisfied and refuses to revoke them...

#3 Updated by Sage Weil almost 9 years ago

  • Assignee set to Sage Weil

i'm going to see if we can just skip the rx_buffers zero-copy paths when a timeout is present

#4 Updated by Sage Weil almost 9 years ago

  • Status changed from New to Fix Under Review
  • Priority changed from High to Urgent

#5 Updated by Sage Weil almost 9 years ago

  • Backport set to firefly, dumpling

#6 Updated by Greg Farnum almost 9 years ago

  • Status changed from Fix Under Review to Pending Backport

#7 Updated by Sage Weil almost 9 years ago

in giant, dumpling. still need to merge firefly backport.

#8 Updated by Matthias Kiefer almost 9 years ago

Thanks for your investigations and the quick fix! We have not been able to test this fix yet, but I will report back the results as soon as possible.

#10 Updated by Sage Weil almost 9 years ago

  • Status changed from Pending Backport to In Progress

hmm, several failures on giant

ubuntu@teuthology:/var/lib/teuthworker/archive/samuelj-2014-10-01_18:59:42-rados-giant-wip-testing-old-vanilla-basic-multi/524638

#11 Updated by Sage Weil almost 9 years ago

  • Status changed from In Progress to Fix Under Review

#12 Updated by Sage Weil almost 9 years ago

  • Status changed from Fix Under Review to Pending Backport

#13 Updated by Sage Weil almost 9 years ago

  • Status changed from Pending Backport to Resolved

i believe all patches affecting firefly and dumpling have been backported.

Also available in: Atom PDF