Bug #9582: librados: segmentation fault on timeout - Ceph - Ceph

Actions

Copy link

Bug #9582

closed

librados: segmentation fault on timeout

Added by Matthias Kiefer over 9 years ago. Updated over 9 years ago.

Status:

Resolved

Priority:

Urgent

Assignee:

Sage Weil

Category:

librados

Target version:

% Done:

Source:

Support

Tags:

Backport:

firefly, dumpling

Regression:

Severity:

2 - major

Reviewed:

Affected Versions:

ceph-qa-suite:

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

Summary: If you configure librados with rados_osd_op_timeout, timeouts will result sometimes in a segmentation fault.

Problem description: We configured librados with a rados_osd_op_timeout of 2 seconds on a cluster with 1440 OSDs. For testing purposes we put load on the cluster via 8 tomcat webapps (using librados via rados-java) on 8 hosts each doing about 15 reads/second and about 4 writes/seconds to the ceph cluster. Object size is 0-4MB. In normal situation timeouts happen from time to time and everything works as expected. However, in situtations where the ceph cluster gets unresponsive (due to a high CPU load on the ODS hosts) the tomcats crash randomly in librados. I have been able to to get the following stacktrace of a crash from a core dump:

#0  ceph_crc32c_le_intel (crc=0, data=0xc6bc3c0 <Address 0xc6bc3c0 out of bounds>, length=<optimized out>) at common/crc32c-intel.c:58
#1  0x00007fc715afd185 in ceph_crc32c_le (length=14245, data=0xc6bc3c0 <Address 0xc6bc3c0 out of bounds>, crc=0) at ./include/crc32c.h:16
#2  ceph::buffer::list::crc32c (this=this@entry=0x7fc6c5fde9a0, crc=crc@entry=0) at ./include/buffer.h:428
#3  0x00007fc715af6f32 in decode_message (cct=0x7fc71d180c90, header=..., footer=..., front=..., middle=..., data=...) at msg/Message.cc:267
#4  0x00007fc715b3e170 in Pipe::read_message (this=this@entry=0x51e10c0, pm=pm@entry=0x7fc6c5fded38, auth_handler=auth_handler@entry=0x3b12ed0) at msg/Pipe.cc:1920
#5  0x00007fc715b4fdc1 in Pipe::reader (this=0x51e10c0) at msg/Pipe.cc:1447
#6  0x00007fc715b542dd in Pipe::Reader::entry (this=<optimized out>) at msg/Pipe.h:49
#7  0x00007fc738097b50 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#8  0x00007fc7379c6e6d in clone () from /lib/x86_64-linux-gnu/libc.so.6
#9  0x0000000000000000 in ?? ()

OS: Debian GNU/Linux 7.6 (wheezy)
System: cpuinfo is attached
librados: dumpling branch at commit 3f020443c8d92e61d8593049147a79a6696c9c93 installed from debian package http://gitbuilder.ceph.com/ceph-deb-wheezy-x86_64-basic/ref/dumpling/pool/main/c/ceph/librados2_0.67.10-13-g3f02044-1wheezy_amd64.deb (due to including a fix for http://tracker.ceph.com/issues/9362).

Please let me know, if you need more information. I can also provide you the full core dump if you need it.

Files

cpuinfo (10.9 KB) cpuinfo

CPU info of system where the crash happened

Matthias Kiefer, 09/24/2014 07:38 AM

Related issues 6 (0 open — 6 closed)

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph

Custom queries

Bug #9582

librados: segmentation fault on timeout

Updated by JuanJose Galvez over 9 years ago

Updated by Greg Farnum over 9 years ago

Updated by Sage Weil over 9 years ago

Updated by Sage Weil over 9 years ago

Updated by Sage Weil over 9 years ago

Updated by Greg Farnum over 9 years ago

Updated by Sage Weil over 9 years ago

Updated by Matthias Kiefer over 9 years ago

Updated by Sage Weil over 9 years ago

Updated by Sage Weil over 9 years ago

Updated by Sage Weil over 9 years ago

Updated by Sage Weil over 9 years ago