Actions
Bug #443
closedosd segfault due to pipe->connection_state is NULL.
Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:
0%
Source:
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
Hi,
One of my OSD failed due to segfault.
gdb of the core dump shows:
Program terminated with signal 11, Segmentation fault. #0 0x0000000000462c3c in AO_fetch_and_add_full (this=0x7f46e4003920) at /usr/include/atomic_ops/sysdeps/gcc/x86_64.h:67 67 /usr/include/atomic_ops/sysdeps/gcc/x86_64.h: No such file or directory. in /usr/include/atomic_ops/sysdeps/gcc/x86_64.h Missing separate debuginfos, use: debuginfo-install glibc-2.11.2-1.x86_64 libgcc-4.4.2-7.fc12.x86_64 libstdc++-4.4.2-7.fc12.x86_64 openssl-1.0.0-4.fc12.x86_64 zlib-1.2.3-23.fc12.x86_64 (gdb) where #0 0x0000000000462c3c in AO_fetch_and_add_full (this=0x7f46e4003920) at /usr/include/atomic_ops/sysdeps/gcc/x86_64.h:67 #1 inc (this=0x7f46e4003920) at include/atomic.h:33 #2 get (this=0x7f46e4003920) at msg/Message.h:150 #3 get (this=0x7f46e4003920) at msg/Message.h:181 #4 SimpleMessenger::Pipe::fail (this=0x7f46e4003920) at msg/SimpleMessenger.cc:1393 #5 0x0000000000462e4c in SimpleMessenger::Pipe::fault (this=0x7f46e4003920, onconnect=false, onread=<value optimized out>) at msg/SimpleMessenger.cc:1344 #6 0x0000000000468299 in SimpleMessenger::Pipe::reader (this=0x7f46e4003920) at msg/SimpleMessenger.cc:1552 #7 0x000000000045896d in SimpleMessenger::Pipe::Reader::entry (this=<value optimized out>) at msg/SimpleMessenger.h:192 #8 0x000000000046c49a in Thread::_entry_func (arg=<value optimized out>) at common/Thread.h:39 #9 0x00007f471f3e8a3a in start_thread () from /lib64/libpthread.so.0 #10 0x00007f471e60677d in clone () from /lib64/libc.so.6 #11 0x0000000000000000 in ?? () (gdb) up #1 inc (this=0x7f46e4003920) at include/atomic.h:33 33 AO_fetch_and_add1(&val); (gdb) up #2 get (this=0x7f46e4003920) at msg/Message.h:150 150 nref.inc(); (gdb) #3 get (this=0x7f46e4003920) at msg/Message.h:181 181 return (Connection *)RefCountedObject::get(); (gdb) #4 SimpleMessenger::Pipe::fail (this=0x7f46e4003920) at msg/SimpleMessenger.cc:1393 1393 Connection * cstate = connection_state->get(); (gdb) print connection_state No symbol "connection_state" in current context. (gdb) print this->connection_state $1 = (Connection *) 0x0 (gdb) print this->sd $2 = -1 (gdb) print this->peer_addr $3 = {type = 0, nonce = 3871338124, {addr = {ss_family = 2, __ss_align = 0, __ss_padding = '\000' <repeats 111 times>}, addr4 = {sin_family = 2, sin_port = 0, sin_addr = { s_addr = 1895934144}, sin_zero = "\000\000\000\000\000\000\000"}, addr6 = {sin6_family = 2, sin6_port = 0, sin6_flowinfo = 1895934144, sin6_addr = {__in6_u = { __u6_addr8 = '\000' <repeats 15 times>, __u6_addr16 = {0, 0, 0, 0, 0, 0, 0, 0}, __u6_addr32 = {0, 0, 0, 0}}}, sin6_scope_id = 0}}}
And, the last few lines of the OSD log:
2010-09-27 02:23:23.536689 7f46d2bed710 -- 192.168.1.108:6800/10436 >> 192.168.1.113:0/3871338124 pipe(0x7f46e4003920 sd=3 pgs=2017 cs=1 l=1).reader got 128 + 0 + 4194304 byte message.. ABORTED 2010-09-27 02:23:23.538907 7f46d2bed710 -- 192.168.1.108:6800/10436 >> 192.168.1.113:0/3871338124 pipe(0x7f46e4003920 sd=3 pgs=2017 cs=1 l=1).reader bad tag 0 2010-09-27 02:23:23.543352 7f46d32f2710 -- 192.168.1.108:6800/10436 >> 192.168.1.113:0/3871338124 pipe(0x7f46e4004290 sd=39 pgs=0 cs=0 l=0).accept peer addr is really 192.168.1.113:0/3871338124 (socket is 192.168.1.113:50930/0) 2010-09-27 02:23:23.543453 7f46d32f2710 -- 192.168.1.108:6800/10436 >> 192.168.1.113:0/3871338124 pipe(0x7f46e4004290 sd=39 pgs=0 cs=0 l=1).accept replacing existing (lossy) channel (new one lossy=1)
It looks like accepter thread had put the connection_state and set it to NULL before reader thread got the pipe lock.
I am using unstable branch (commit: f4be4b936fa473a9b16a).
Updated by Greg Farnum over 13 years ago
- Status changed from New to Resolved
Pushed a change in ab62aabf1f71b21a8f64bd7985119f3341582ff5
Replacement pipes will only take over the old pipe's connection if they're not lossy.
Actions