Project

General

Profile

Actions

Bug #1020

closed

osd: replay_queued_ops crash

Added by Sage Weil almost 13 years ago. Updated almost 13 years ago.

Status:
Duplicate
Priority:
Urgent
Assignee:
-
Category:
OSD
Target version:
% Done:

0%

Source:
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

2011-04-19 14:32:12.974015 7f5d2aa3d700 osd13 119 activate_pg
2011-04-19 14:32:12.974021 7f5d2aa3d700 osd13 119 activate_pg
2011-04-19 14:32:12.974026 7f5d2aa3d700 osd13 119 activate_pg
2011-04-19 14:32:12.974031 7f5d2aa3d700 osd13 119 activate_pg
2011-04-19 14:32:12.974039 7f5d2aa3d700 osd13 119 do_mon_report
2011-04-19 14:32:12.974045 7f5d2aa3d700 osd13 119 send_alive up_thru currently 118 want 118
2011-04-19 14:32:12.974064 7f5d2aa3d700 osd13 119 send_pg_stats - 1 pgs updated
2011-04-19 14:32:12.974083 7f5d2aa3d700 -- [2607:f298:cef:2233::5523]:6809/23681 --> mon0 [2607:f298:cef:2233::1321]:6789/0 -- pg_stats(1 pgs v 119) v1 -- ?+0 0x13beb40
2011-04-19 14:32:12.974106 7f5d2aa3d700 -- [2607:f298:cef:2233::5523]:6809/23681 --> mon0 [2607:f298:cef:2233::1321]:6789/0 -- log(1 entries) v1 -- ?+0 0x1a8b000
2011-04-19 14:32:13.074268 7f5d2312d700 -- [2607:f298:cef:2233::5523]:6811/23681 <== osd1 [2607:f298:cef:2233::5522]:6802/9506 52 ==== osd_ping(e119 as_of 119) v1 ==== 61+0+0 (2666712624 0 0) 0x1b878c0 con 0x1b2ca00
2011-04-19 14:32:13.222748 7f5d2412f700 -- [2607:f298:cef:2233::5523]:6809/23681 <== mon0 [2607:f298:cef:2233::1321]:6789/0 49 ==== log(last 16) v1 ==== 24+0+0 (3227908771 0 0) 0x1c52480 con 0x1396c80
2011-04-19 14:32:13.301836 7f5d2312d700 -- [2607:f298:cef:2233::5523]:6811/23681 <== osd19 [2607:f298:cef:2233::5524]:6814/14913 44 ==== osd_ping(e119 as_of 119) v1 ==== 61+0+0 (3008462510 0 0) 0x139a000 con 0x1af4780
2011-04-19 14:32:13.305611 7f5d2312d700 -- [2607:f298:cef:2233::5523]:6811/23681 <== osd16 [2607:f298:cef:2233::5524]:6805/14639 55 ==== osd_ping(e119 as_of 119) v1 ==== 61+0+0 (3188885997 0 0) 0x142e000 con 0x1b41780
2011-04-19 14:32:13.597742 7f5d2312d700 -- [2607:f298:cef:2233::5523]:6811/23681 <== osd15 [2607:f298:cef:2233::5524]:6802/14563 57 ==== osd_ping(e119 as_of 119) v1 ==== 61+0+0 (240984430 0 0) 0x1d2ca80 con 0x1af4140
2011-04-19 14:32:13.733209 7f5d2412f700 -- [2607:f298:cef:2233::5523]:6809/23681 <== mon0 [2607:f298:cef:2233::1321]:6789/0 50 ==== pg_stats_ack(1 pgs) v1 ==== 24+0+0 (1089551253 0 0) 0x1b89380 con 0x1396c80
2011-04-19 14:32:13.733225 7f5d2412f700 osd13 119 handle_pg_stats_ack 
2011-04-19 14:32:13.745984 7f5d20127700 osd13 119 heartbeat: stat(2011-04-19 14:32:13.745926 oprate=0 qlen=0 recent_qlen=0 rdlat=0 / 0 fshedin=0)
2011-04-19 14:32:13.746000 7f5d20127700 osd13 119 heartbeat: osd_stat(5464 KB used, 1851 GB avail, 1853 GB total, peers [1,14,15,16,19]/[15,18])
2011-04-19 14:32:13.746015 7f5d20127700 -- [2607:f298:cef:2233::5523]:6811/23681 --> osd15 [2607:f298:cef:2233::5524]:6802/14563 -- osd_ping(e119 as_of 119) v1 -- ?+0 0x1d02540
2011-04-19 14:32:13.746033 7f5d20127700 -- [2607:f298:cef:2233::5523]:6811/23681 --> osd18 [2607:f298:cef:2233::5524]:6811/14846 -- osd_ping(e119 as_of 119) v1 -- ?+0 0x1d2c000
2011-04-19 14:32:13.778261 7f5d2312d700 -- [2607:f298:cef:2233::5523]:6811/23681 <== osd1 [2607:f298:cef:2233::5522]:6802/9506 53 ==== osd_ping(e119 as_of 119) v1 ==== 61+0+0 (3020075899 0 0) 0x1b87000 con 0x1b2ca00
2011-04-19 14:32:13.977946 7f5d2aa3d700 osd13 119 tick
2011-04-19 14:32:13.977993 7f5d2aa3d700 osd13 119 activate_pg
2011-04-19 14:32:13.978006 7f5d2aa3d700 osd13 119 activate_pg
2011-04-19 14:32:13.978011 7f5d2aa3d700 osd13 119 activate_pg
2011-04-19 14:32:13.978017 7f5d2aa3d700 osd13 119 activate_pg
2011-04-19 14:32:13.978022 7f5d2aa3d700 osd13 119 activate_pg
osd/PG.cc: In function 'void PG::replay_queued_ops()', in thread '0x7f5d2aa3d700'
osd/PG.cc: 1984: FAILED assert(is_replay() && is_active() && !is_crashed())
 ceph version 0.26-300-g4428d1e (commit:4428d1ec3bd6407923b5cdcf4f1e02c5586c043e)
 1: (PG::replay_queued_ops()+0x3b6) [0x556ba6]
 2: (OSD::activate_pg(pg_t, utime_t)+0x1ef) [0x4e219f]
 3: (OSD::check_replay_queue()+0x188) [0x4e3db8]
 4: (OSD::tick()+0x216) [0x51a0e6]
 5: (SafeTimer::timer_thread()+0x36b) [0x5fb7fb]
 6: (SafeTimerThread::entry()+0xd) [0x5fdfcd]
 7: (()+0x68ba) [0x7f5d2f0968ba]
 8: (clone()+0x6d) [0x7f5d2dd2b02d]
 ceph version 0.26-300-g4428d1e (commit:4428d1ec3bd6407923b5cdcf4f1e02c5586c043e)
 1: (PG::replay_queued_ops()+0x3b6) [0x556ba6]
 2: (OSD::activate_pg(pg_t, utime_t)+0x1ef) [0x4e219f]
 3: (OSD::check_replay_queue()+0x188) [0x4e3db8]
 4: (OSD::tick()+0x216) [0x51a0e6]
 5: (SafeTimer::timer_thread()+0x36b) [0x5fb7fb]
 6: (SafeTimerThread::entry()+0xd) [0x5fdfcd]
 7: (()+0x68ba) [0x7f5d2f0968ba]
 8: (clone()+0x6d) [0x7f5d2dd2b02d]
*** Caught signal (Aborted) **
 in thread 0x7f5d2aa3d700
 ceph version 0.26-300-g4428d1e (commit:4428d1ec3bd6407923b5cdcf4f1e02c5586c043e)
 1: /usr/bin/cosd() [0x61a772]
 2: (()+0xef60) [0x7f5d2f09ef60]
 3: (gsignal()+0x35) [0x7f5d2dc8e165]
 4: (abort()+0x180) [0x7f5d2dc90f70]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x115) [0x7f5d2e521dc5]
 6: (()+0xcb166) [0x7f5d2e520166]
 7: (()+0xcb193) [0x7f5d2e520193]
 8: (()+0xcb28e) [0x7f5d2e52028e]
 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x373) [0x5ffee3]
 10: (PG::replay_queued_ops()+0x3b6) [0x556ba6]
 11: (OSD::activate_pg(pg_t, utime_t)+0x1ef) [0x4e219f]
 12: (OSD::check_replay_queue()+0x188) [0x4e3db8]
 13: (OSD::tick()+0x216) [0x51a0e6]
 14: (SafeTimer::timer_thread()+0x36b) [0x5fb7fb]
 15: (SafeTimerThread::entry()+0xd) [0x5fdfcd]
 16: (()+0x68ba) [0x7f5d2f0968ba]
 17: (clone()+0x6d) [0x7f5d2dd2b02d]
Actions #1

Updated by Samuel Just almost 13 years ago

  • Status changed from New to Duplicate

duplicates 990. Working on a peering/recovery refactor which should take care of this.

Actions

Also available in: Atom PDF