Bug #1379
osd segfault during end of recovery
% Done:
0%
Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
From huang jun's email titled 'osd down after adding OSDs':
2011-08-08 20:11:34.506899 7f2f85ff6700 osd10 22 0.87 at 2011-08-08 20:05:16.696631 > 2011-08-07 20:11:34.506050 (86400 seconds ago) 2011-08-08 20:11:34.506903 7f2f85ff6700 osd10 22 sched_scrub done 2011-08-08 20:11:35.191699 7f2f7cee3700 osd10 22 heartbeat_entry woke up 2011-08-08 20:11:35.191724 7f2f7cee3700 osd10 22 heartbeat 2011-08-08 20:11:35.191758 7f2f7cee3700 osd10 22 heartbeat checking stats 2011-08-08 20:11:35.191780 7f2f7cee3700 osd10 22 update_osd_stat osd_stat(1640 KB used, 931 GB avail, 931 GB total, peers []/[]) 2011-08-08 20:11:35.191795 7f2f7cee3700 osd10 22 heartbeat: osd_stat(1640 KB used, 931 GB avail, 931 GB total, peers []/[]) 2011-08-08 20:11:35.191808 7f2f7cee3700 osd10 22 heartbeat map_locked=1 2011-08-08 20:11:35.191820 7f2f7cee3700 osd10 22 heartbeat check 2011-08-08 20:11:35.191828 7f2f7cee3700 osd10 22 heartbeat lonely? 2011-08-08 20:11:35.191835 7f2f7cee3700 osd10 22 heartbeat put map_lock 2011-08-08 20:11:35.191839 7f2f7cee3700 osd10 22 heartbeat done 2011-08-08 20:11:35.191846 7f2f7cee3700 osd10 22 heartbeat_entry sleeping for 1.1 2011-08-08 20:11:35.506955 7f2f85ff6700 osd10 22 tick 2011-08-08 20:11:35.507011 7f2f85ff6700 osd10 22 scrub_should_schedule loadavg 0.13 < max 0.5 = no, randomly backing off 2011-08-08 20:11:36.001713 7f2f847f3700 filestore(/data/osd10) sync_entry woke after 5.000054 2011-08-08 20:11:36.001745 7f2f847f3700 filestore(/data/osd10) sync_entry committing 2830 sync_epoch 10 2011-08-08 20:11:36.001786 7f2f847f3700 filestore(/data/osd10) sync_entry doing btrfs SYNC 2011-08-08 20:11:36.077118 7f2f847f3700 filestore(/data/osd10) sync_entry commit took 0.075372 2011-08-08 20:11:36.077238 7f2f84ff4700 osd10 22 pg[1.309( empty n=0 ec=2 les/c 6/20 21/21/21) [3] r=-1 stray] _activate_committed 8, that was an old interval 2011-08-08 20:11:36.077278 7f2f84ff4700 osd10 22 pg[1.309( empty n=0 ec=2 les/c 6/20 21/21/21) [3] r=-1 stray] _finish_recovery -- stale 2011-08-08 20:11:36.077291 7f2f847f3700 filestore(/data/osd10) sync_entry committed to op_seq 2830 2011-08-08 20:11:36.077308 7f2f847f3700 filestore(/data/osd10) sync_entry waiting for max_interval 5.000000 2011-08-08 20:11:36.077369 7f2f84ff4700 osd10 22 pg[2.429( empty n=0 ec=2 les/c 6/20 21/21/21) [6] r=-1 stray] _activate_committed 8, that was an old interval 2011-08-08 20:11:36.077412 7f2f84ff4700 osd10 22 pg[2.429( empty n=0 ec=2 les/c 6/20 21/21/21) [6] r=-1 stray] _finish_recovery -- stale *** Caught signal (Segmentation fault) ** in thread 0x7f2f84ff4700 ceph version 0.32 (commit:c08d08baa6a945d989427563e46c992f757ad5eb) 1: /usr/bin/cosd() [0x581269] 2: (()+0xef60) [0x7f2f8b854f60] 3: (PG::_activate_committed(unsigned int)+0x9c) [0x60bc2c] 4: (Context::complete(int)+0xa) [0x4d9ada] 5: (C_Contexts::finish(int)+0xdb) [0x4dfcdb] 6: (Finisher::finisher_thread_entry()+0x188) [0x6a0288] 7: (()+0x68ba) [0x7f2f8b84c8ba] 8: (clone()+0x6d) [0x7f2f8a2e602d]
Associated revisions
osd: fix _activate_committed() crash
Do not dereference acting0 unless we know it is still valid.
Take a reference when scheduling the transaction, and drop it in the
completion, to ensure that the PG isn't removed out from underneath us.
Fixes: #1379
Signed-off-by: Sage Weil <sage@newdream.net>
History
#1 Updated by Josh Durgin over 12 years ago
- File osd.10.log View added
#2 Updated by Sage Weil over 12 years ago
- Target version set to v0.34
#3 Updated by Sage Weil over 12 years ago
- Priority changed from Normal to High
#4 Updated by Sage Weil over 12 years ago
slang hit this too,
(10:08:17 AM) slang: http://pastebin.com/raw.php?i=fYFGnVPJ
#5 Updated by Sage Weil over 12 years ago
- Status changed from New to Resolved