Project

General

Profile

Bug #990

osd: PG::replay_queued_ops

Added by Wido den Hollander almost 13 years ago. Updated over 12 years ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Category:
OSD
Target version:
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

I upgraded my cluster to 24caedc8f549eeeba48b2d4a44927ee16e65c42a after doing so, one of my OSD's crashed with the following backtrace:

(gdb) bt
#0  0x00007f37c6dde7bb in raise () from /lib/libpthread.so.0
#1  0x0000000000620ce3 in reraise_fatal (signum=18688) at common/signal.cc:63
#2  0x0000000000621a0b in handle_fatal_signal (signum=6) at common/signal.cc:110
#3  <signal handler called>
#4  0x00007f37c59aea75 in raise () from /lib/libc.so.6
#5  0x00007f37c59b25c0 in abort () from /lib/libc.so.6
#6  0x00007f37c62648e5 in __gnu_cxx::__verbose_terminate_handler() () from /usr/lib/libstdc++.so.6
#7  0x00007f37c6262d16 in ?? () from /usr/lib/libstdc++.so.6
#8  0x00007f37c6262d43 in std::terminate() () from /usr/lib/libstdc++.so.6
#9  0x00007f37c6262e3e in __cxa_throw () from /usr/lib/libstdc++.so.6
#10 0x0000000000607caa in ceph::__ceph_assert_fail (assertion=<value optimized out>, file=<value optimized out>, line=<value optimized out>, func=0x649dd0 "void PG::replay_queued_ops()") at common/assert.cc:86
#11 0x00000000005649f6 in PG::replay_queued_ops (this=0x1011000) at osd/PG.cc:1984
#12 0x00000000004e468f in OSD::activate_pg (this=0xc3b000, pgid=DWARF-2 expression error: DW_OP_reg operations must be used either alone or in conjuction with DW_OP_piece.
) at osd/OSD.cc:4903
#13 0x00000000004e62c8 in OSD::check_replay_queue (this=0xc3b000) at osd/OSD.cc:4885
#14 0x000000000051b85d in OSD::tick (this=0xc3b000) at osd/OSD.cc:1754
#15 0x000000000060352b in SafeTimer::timer_thread (this=0xc3b048) at common/Timer.cc:102
#16 0x0000000000605d1d in SafeTimerThread::entry (this=<value optimized out>) at common/Timer.cc:38
#17 0x00007f37c6dd59ca in start_thread () from /lib/libpthread.so.0
#18 0x00007f37c5a6170d in clone () from /lib/libc.so.6
#19 0x0000000000000000 in ?? ()
(gdb)

I upgraded my OSD's machine by machine (per 4).

When I started the OSD again with full logging, it didn't crash again and the cluster fully recovered.

The core can be found at atom3.ceph.widodh.nl:/core.atom3.18688, logging is available on noisy.ceph.widodh.nl:/var/log/remote/ceph/osd.log

History

#1 Updated by Sage Weil almost 13 years ago

  • Target version set to v0.28

#2 Updated by Sage Weil almost 13 years ago

  • Target version changed from v0.28 to v0.26.1

#3 Updated by Wido den Hollander almost 13 years ago

According to sjust this should have been fixed in the osd_wip1000 branch, but I just hit the bug again on several OSD's (11 in total).

The backtrace is a bit different (line numbers):

(gdb) bt
#0  0x00007fb35a4fa7bb in raise () from /lib/libpthread.so.0
#1  0x000000000061a493 in reraise_fatal (signum=2306) at common/signal.cc:63
#2  0x000000000061b1bb in handle_fatal_signal (signum=6) at common/signal.cc:110
#3  <signal handler called>
#4  0x00007fb3590caa75 in raise () from /lib/libc.so.6
#5  0x00007fb3590ce5c0 in abort () from /lib/libc.so.6
#6  0x00007fb3599808e5 in __gnu_cxx::__verbose_terminate_handler() () from /usr/lib/libstdc++.so.6
#7  0x00007fb35997ed16 in ?? () from /usr/lib/libstdc++.so.6
#8  0x00007fb35997ed43 in std::terminate() () from /usr/lib/libstdc++.so.6
#9  0x00007fb35997ee3e in __cxa_throw () from /usr/lib/libstdc++.so.6
#10 0x000000000060034a in ceph::__ceph_assert_fail (assertion=<value optimized out>, file=<value optimized out>, line=<value optimized out>,
    func=0x643b10 "void PG::replay_queued_ops()") at common/assert.cc:86
#11 0x0000000000557076 in PG::replay_queued_ops (this=0x2f66000) at osd/PG.cc:1984
#12 0x00000000004e29cf in OSD::activate_pg (this=0x2927000, pgid=DWARF-2 expression error: DW_OP_reg operations must be used either alone or in conjuction with DW_OP_piece.
) at osd/OSD.cc:4893
#13 0x00000000004e3998 in OSD::check_replay_queue (this=0x2927000) at osd/OSD.cc:4875
#14 0x000000000051b1a2 in OSD::tick (this=0x2927000) at osd/OSD.cc:1742
#15 0x00000000005fbbcb in SafeTimer::timer_thread (this=0x2927048) at common/Timer.cc:102
#16 0x00000000005fe3bd in SafeTimerThread::entry (this=<value optimized out>) at common/Timer.cc:38
#17 0x00007fb35a4f19ca in start_thread () from /lib/libpthread.so.0
#18 0x00007fb35917d70d in clone () from /lib/libc.so.6
#19 0x0000000000000000 in ?? ()
(gdb)

#4 Updated by Sage Weil almost 13 years ago

  • Target version changed from v0.26.1 to v0.27.1

#5 Updated by Sage Weil over 12 years ago

  • Status changed from New to Closed

Also available in: Atom PDF