Project

General

Profile

Actions

Bug #443

closed

osd segfault due to pipe->connection_state is NULL.

Added by Henry Chang over 13 years ago. Updated over 13 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Hi,

One of my OSD failed due to segfault.

gdb of the core dump shows:

Program terminated with signal 11, Segmentation fault.
#0  0x0000000000462c3c in AO_fetch_and_add_full (this=0x7f46e4003920) at /usr/include/atomic_ops/sysdeps/gcc/x86_64.h:67
67      /usr/include/atomic_ops/sysdeps/gcc/x86_64.h: No such file or directory.
        in /usr/include/atomic_ops/sysdeps/gcc/x86_64.h
Missing separate debuginfos, use: debuginfo-install glibc-2.11.2-1.x86_64 libgcc-4.4.2-7.fc12.x86_64 libstdc++-4.4.2-7.fc12.x86_64 openssl-1.0.0-4.fc12.x86_64 zlib-1.2.3-23.fc12.x86_64
(gdb) where
#0  0x0000000000462c3c in AO_fetch_and_add_full (this=0x7f46e4003920) at /usr/include/atomic_ops/sysdeps/gcc/x86_64.h:67
#1  inc (this=0x7f46e4003920) at include/atomic.h:33
#2  get (this=0x7f46e4003920) at msg/Message.h:150
#3  get (this=0x7f46e4003920) at msg/Message.h:181
#4  SimpleMessenger::Pipe::fail (this=0x7f46e4003920) at msg/SimpleMessenger.cc:1393
#5  0x0000000000462e4c in SimpleMessenger::Pipe::fault (this=0x7f46e4003920, onconnect=false, onread=<value optimized out>) at msg/SimpleMessenger.cc:1344
#6  0x0000000000468299 in SimpleMessenger::Pipe::reader (this=0x7f46e4003920) at msg/SimpleMessenger.cc:1552
#7  0x000000000045896d in SimpleMessenger::Pipe::Reader::entry (this=<value optimized out>) at msg/SimpleMessenger.h:192
#8  0x000000000046c49a in Thread::_entry_func (arg=<value optimized out>) at common/Thread.h:39
#9  0x00007f471f3e8a3a in start_thread () from /lib64/libpthread.so.0
#10 0x00007f471e60677d in clone () from /lib64/libc.so.6
#11 0x0000000000000000 in ?? ()
(gdb) up
#1  inc (this=0x7f46e4003920) at include/atomic.h:33
33          AO_fetch_and_add1(&val);
(gdb) up
#2  get (this=0x7f46e4003920) at msg/Message.h:150
150         nref.inc();
(gdb)
#3  get (this=0x7f46e4003920) at msg/Message.h:181
181         return (Connection *)RefCountedObject::get();
(gdb)
#4  SimpleMessenger::Pipe::fail (this=0x7f46e4003920) at msg/SimpleMessenger.cc:1393
1393        Connection * cstate = connection_state->get();
(gdb) print connection_state
No symbol "connection_state" in current context.
(gdb) print this->connection_state
$1 = (Connection *) 0x0
(gdb) print this->sd
$2 = -1
(gdb) print this->peer_addr
$3 = {type = 0, nonce = 3871338124, {addr = {ss_family = 2, __ss_align = 0, __ss_padding = '\000' <repeats 111 times>}, addr4 = {sin_family = 2, sin_port = 0, sin_addr = {
        s_addr = 1895934144}, sin_zero = "\000\000\000\000\000\000\000"}, addr6 = {sin6_family = 2, sin6_port = 0, sin6_flowinfo = 1895934144, sin6_addr = {__in6_u = {
          __u6_addr8 = '\000' <repeats 15 times>, __u6_addr16 = {0, 0, 0, 0, 0, 0, 0, 0}, __u6_addr32 = {0, 0, 0, 0}}}, sin6_scope_id = 0}}}

And, the last few lines of the OSD log:

2010-09-27 02:23:23.536689 7f46d2bed710 -- 192.168.1.108:6800/10436 >> 192.168.1.113:0/3871338124 pipe(0x7f46e4003920 sd=3 pgs=2017 cs=1 l=1).reader got 128 + 0 + 4194304 byte message.. ABORTED
2010-09-27 02:23:23.538907 7f46d2bed710 -- 192.168.1.108:6800/10436 >> 192.168.1.113:0/3871338124 pipe(0x7f46e4003920 sd=3 pgs=2017 cs=1 l=1).reader bad tag 0
2010-09-27 02:23:23.543352 7f46d32f2710 -- 192.168.1.108:6800/10436 >> 192.168.1.113:0/3871338124 pipe(0x7f46e4004290 sd=39 pgs=0 cs=0 l=0).accept peer addr is really 192.168.1.113:0/3871338124 (socket is 192.168.1.113:50930/0)
2010-09-27 02:23:23.543453 7f46d32f2710 -- 192.168.1.108:6800/10436 >> 192.168.1.113:0/3871338124 pipe(0x7f46e4004290 sd=39 pgs=0 cs=0 l=1).accept replacing existing (lossy) channel (new one lossy=1)

It looks like accepter thread had put the connection_state and set it to NULL before reader thread got the pipe lock.

I am using unstable branch (commit: f4be4b936fa473a9b16a).

Actions #1

Updated by Greg Farnum over 13 years ago

  • Status changed from New to Resolved

Pushed a change in ab62aabf1f71b21a8f64bd7985119f3341582ff5
Replacement pipes will only take over the old pipe's connection if they're not lossy.

Actions

Also available in: Atom PDF