Actions
Bug #1020
closedosd: replay_queued_ops crash
% Done:
0%
Source:
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
2011-04-19 14:32:12.974015 7f5d2aa3d700 osd13 119 activate_pg 2011-04-19 14:32:12.974021 7f5d2aa3d700 osd13 119 activate_pg 2011-04-19 14:32:12.974026 7f5d2aa3d700 osd13 119 activate_pg 2011-04-19 14:32:12.974031 7f5d2aa3d700 osd13 119 activate_pg 2011-04-19 14:32:12.974039 7f5d2aa3d700 osd13 119 do_mon_report 2011-04-19 14:32:12.974045 7f5d2aa3d700 osd13 119 send_alive up_thru currently 118 want 118 2011-04-19 14:32:12.974064 7f5d2aa3d700 osd13 119 send_pg_stats - 1 pgs updated 2011-04-19 14:32:12.974083 7f5d2aa3d700 -- [2607:f298:cef:2233::5523]:6809/23681 --> mon0 [2607:f298:cef:2233::1321]:6789/0 -- pg_stats(1 pgs v 119) v1 -- ?+0 0x13beb40 2011-04-19 14:32:12.974106 7f5d2aa3d700 -- [2607:f298:cef:2233::5523]:6809/23681 --> mon0 [2607:f298:cef:2233::1321]:6789/0 -- log(1 entries) v1 -- ?+0 0x1a8b000 2011-04-19 14:32:13.074268 7f5d2312d700 -- [2607:f298:cef:2233::5523]:6811/23681 <== osd1 [2607:f298:cef:2233::5522]:6802/9506 52 ==== osd_ping(e119 as_of 119) v1 ==== 61+0+0 (2666712624 0 0) 0x1b878c0 con 0x1b2ca00 2011-04-19 14:32:13.222748 7f5d2412f700 -- [2607:f298:cef:2233::5523]:6809/23681 <== mon0 [2607:f298:cef:2233::1321]:6789/0 49 ==== log(last 16) v1 ==== 24+0+0 (3227908771 0 0) 0x1c52480 con 0x1396c80 2011-04-19 14:32:13.301836 7f5d2312d700 -- [2607:f298:cef:2233::5523]:6811/23681 <== osd19 [2607:f298:cef:2233::5524]:6814/14913 44 ==== osd_ping(e119 as_of 119) v1 ==== 61+0+0 (3008462510 0 0) 0x139a000 con 0x1af4780 2011-04-19 14:32:13.305611 7f5d2312d700 -- [2607:f298:cef:2233::5523]:6811/23681 <== osd16 [2607:f298:cef:2233::5524]:6805/14639 55 ==== osd_ping(e119 as_of 119) v1 ==== 61+0+0 (3188885997 0 0) 0x142e000 con 0x1b41780 2011-04-19 14:32:13.597742 7f5d2312d700 -- [2607:f298:cef:2233::5523]:6811/23681 <== osd15 [2607:f298:cef:2233::5524]:6802/14563 57 ==== osd_ping(e119 as_of 119) v1 ==== 61+0+0 (240984430 0 0) 0x1d2ca80 con 0x1af4140 2011-04-19 14:32:13.733209 7f5d2412f700 -- [2607:f298:cef:2233::5523]:6809/23681 <== mon0 [2607:f298:cef:2233::1321]:6789/0 50 ==== pg_stats_ack(1 pgs) v1 ==== 24+0+0 (1089551253 0 0) 0x1b89380 con 0x1396c80 2011-04-19 14:32:13.733225 7f5d2412f700 osd13 119 handle_pg_stats_ack 2011-04-19 14:32:13.745984 7f5d20127700 osd13 119 heartbeat: stat(2011-04-19 14:32:13.745926 oprate=0 qlen=0 recent_qlen=0 rdlat=0 / 0 fshedin=0) 2011-04-19 14:32:13.746000 7f5d20127700 osd13 119 heartbeat: osd_stat(5464 KB used, 1851 GB avail, 1853 GB total, peers [1,14,15,16,19]/[15,18]) 2011-04-19 14:32:13.746015 7f5d20127700 -- [2607:f298:cef:2233::5523]:6811/23681 --> osd15 [2607:f298:cef:2233::5524]:6802/14563 -- osd_ping(e119 as_of 119) v1 -- ?+0 0x1d02540 2011-04-19 14:32:13.746033 7f5d20127700 -- [2607:f298:cef:2233::5523]:6811/23681 --> osd18 [2607:f298:cef:2233::5524]:6811/14846 -- osd_ping(e119 as_of 119) v1 -- ?+0 0x1d2c000 2011-04-19 14:32:13.778261 7f5d2312d700 -- [2607:f298:cef:2233::5523]:6811/23681 <== osd1 [2607:f298:cef:2233::5522]:6802/9506 53 ==== osd_ping(e119 as_of 119) v1 ==== 61+0+0 (3020075899 0 0) 0x1b87000 con 0x1b2ca00 2011-04-19 14:32:13.977946 7f5d2aa3d700 osd13 119 tick 2011-04-19 14:32:13.977993 7f5d2aa3d700 osd13 119 activate_pg 2011-04-19 14:32:13.978006 7f5d2aa3d700 osd13 119 activate_pg 2011-04-19 14:32:13.978011 7f5d2aa3d700 osd13 119 activate_pg 2011-04-19 14:32:13.978017 7f5d2aa3d700 osd13 119 activate_pg 2011-04-19 14:32:13.978022 7f5d2aa3d700 osd13 119 activate_pg osd/PG.cc: In function 'void PG::replay_queued_ops()', in thread '0x7f5d2aa3d700' osd/PG.cc: 1984: FAILED assert(is_replay() && is_active() && !is_crashed()) ceph version 0.26-300-g4428d1e (commit:4428d1ec3bd6407923b5cdcf4f1e02c5586c043e) 1: (PG::replay_queued_ops()+0x3b6) [0x556ba6] 2: (OSD::activate_pg(pg_t, utime_t)+0x1ef) [0x4e219f] 3: (OSD::check_replay_queue()+0x188) [0x4e3db8] 4: (OSD::tick()+0x216) [0x51a0e6] 5: (SafeTimer::timer_thread()+0x36b) [0x5fb7fb] 6: (SafeTimerThread::entry()+0xd) [0x5fdfcd] 7: (()+0x68ba) [0x7f5d2f0968ba] 8: (clone()+0x6d) [0x7f5d2dd2b02d] ceph version 0.26-300-g4428d1e (commit:4428d1ec3bd6407923b5cdcf4f1e02c5586c043e) 1: (PG::replay_queued_ops()+0x3b6) [0x556ba6] 2: (OSD::activate_pg(pg_t, utime_t)+0x1ef) [0x4e219f] 3: (OSD::check_replay_queue()+0x188) [0x4e3db8] 4: (OSD::tick()+0x216) [0x51a0e6] 5: (SafeTimer::timer_thread()+0x36b) [0x5fb7fb] 6: (SafeTimerThread::entry()+0xd) [0x5fdfcd] 7: (()+0x68ba) [0x7f5d2f0968ba] 8: (clone()+0x6d) [0x7f5d2dd2b02d] *** Caught signal (Aborted) ** in thread 0x7f5d2aa3d700 ceph version 0.26-300-g4428d1e (commit:4428d1ec3bd6407923b5cdcf4f1e02c5586c043e) 1: /usr/bin/cosd() [0x61a772] 2: (()+0xef60) [0x7f5d2f09ef60] 3: (gsignal()+0x35) [0x7f5d2dc8e165] 4: (abort()+0x180) [0x7f5d2dc90f70] 5: (__gnu_cxx::__verbose_terminate_handler()+0x115) [0x7f5d2e521dc5] 6: (()+0xcb166) [0x7f5d2e520166] 7: (()+0xcb193) [0x7f5d2e520193] 8: (()+0xcb28e) [0x7f5d2e52028e] 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x373) [0x5ffee3] 10: (PG::replay_queued_ops()+0x3b6) [0x556ba6] 11: (OSD::activate_pg(pg_t, utime_t)+0x1ef) [0x4e219f] 12: (OSD::check_replay_queue()+0x188) [0x4e3db8] 13: (OSD::tick()+0x216) [0x51a0e6] 14: (SafeTimer::timer_thread()+0x36b) [0x5fb7fb] 15: (SafeTimerThread::entry()+0xd) [0x5fdfcd] 16: (()+0x68ba) [0x7f5d2f0968ba] 17: (clone()+0x6d) [0x7f5d2dd2b02d]
Updated by Samuel Just almost 13 years ago
- Status changed from New to Duplicate
duplicates 990. Working on a peering/recovery refactor which should take care of this.
Actions