Project

General

Profile

Actions

Bug #20369

open

segv in OSD::ShardedOpWQ::_process

Added by Sage Weil almost 7 years ago. Updated over 4 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

(gdb) bt
#0  0x00007faeba1e91fb in raise (sig=11) at ../nptl/sysdeps/unix/sysv/linux/pt-raise.c:37
#1  0x00007faebc68b635 in reraise_fatal (signum=11) at /build/ceph-12.0.3-1957-gcbf8433/src/global/signal_handler.cc:74
#2  handle_fatal_signal (signum=11) at /build/ceph-12.0.3-1957-gcbf8433/src/global/signal_handler.cc:138
#3  <signal handler called>
#4  0x00007faebc1c2165 in internal_visit<boost::intrusive_ptr<OpRequest> > (operand=..., this=<synthetic pointer>) at /build/ceph-12.0.3-1957-gcbf8433/obj-x86_64-linux-gnu/boost/include/boost/variant/variant.hpp:1046
#5  visitation_impl_invoke_impl<boost::detail::variant::invoke_visitor<PGQueueable::RunVis>, void*, boost::intrusive_ptr<OpRequest> > (storage=0x7fae9bf29630, visitor=<synthetic pointer>) at /build/ceph-12.0.3-1957-gcbf8433/obj-x86_64-linux-gnu/boost/include/boost/variant/detail/visitation_impl.hpp:114
#6  visitation_impl_invoke<boost::detail::variant::invoke_visitor<PGQueueable::RunVis>, void*, boost::intrusive_ptr<OpRequest>, boost::variant<boost::intrusive_ptr<OpRequest>, PGSnapTrim, PGScrub, PGRecovery>::has_fallback_type_> (t=0x0, storage=0x7fae9bf29630, visitor=<synthetic pointer>, 
    internal_which=<optimized out>) at /build/ceph-12.0.3-1957-gcbf8433/obj-x86_64-linux-gnu/boost/include/boost/variant/detail/visitation_impl.hpp:157
#7  visitation_impl<mpl_::int_<0>, boost::detail::variant::visitation_impl_step<boost::mpl::l_iter<boost::mpl::l_item<mpl_::long_<4l>, boost::intrusive_ptr<OpRequest>, boost::mpl::l_item<mpl_::long_<3l>, PGSnapTrim, boost::mpl::l_item<mpl_::long_<2l>, PGScrub, boost::mpl::l_item<mpl_::long_<1l>, PGRecovery, boost::mpl::l_end> > > > >, boost::mpl::l_iter<boost::mpl::l_end> >, boost::detail::variant::invoke_visitor<PGQueueable::RunVis>, void*, boost::variant<boost::intrusive_ptr<OpRequest>, PGSnapTrim, PGScrub, PGRecovery>::has_fallback_type_> (no_backup_flag=..., storage=0x7fae9bf29630, visitor=<synthetic pointer>, 
    logical_which=<optimized out>, internal_which=<optimized out>) at /build/ceph-12.0.3-1957-gcbf8433/obj-x86_64-linux-gnu/boost/include/boost/variant/detail/visitation_impl.hpp:238
#8  internal_apply_visitor_impl<boost::detail::variant::invoke_visitor<PGQueueable::RunVis>, void*> (storage=0x7fae9bf29630, visitor=<synthetic pointer>, logical_which=<optimized out>, internal_which=<optimized out>)
    at /build/ceph-12.0.3-1957-gcbf8433/obj-x86_64-linux-gnu/boost/include/boost/variant/variant.hpp:2389
#9  internal_apply_visitor<boost::detail::variant::invoke_visitor<PGQueueable::RunVis> > (visitor=<synthetic pointer>, this=0x7fae9bf29628) at /build/ceph-12.0.3-1957-gcbf8433/obj-x86_64-linux-gnu/boost/include/boost/variant/variant.hpp:2400
#10 apply_visitor<PGQueueable::RunVis> (visitor=..., this=0x7fae9bf29628) at /build/ceph-12.0.3-1957-gcbf8433/obj-x86_64-linux-gnu/boost/include/boost/variant/variant.hpp:2423
#11 apply_visitor<PGQueueable::RunVis, boost::variant<boost::intrusive_ptr<OpRequest>, PGSnapTrim, PGScrub, PGRecovery> > (visitable=..., visitor=...) at /build/ceph-12.0.3-1957-gcbf8433/obj-x86_64-linux-gnu/boost/include/boost/variant/detail/apply_visitor_unary.hpp:70
#12 run (handle=..., pg=..., osd=<optimized out>, this=0x7fae9bf29628) at /build/ceph-12.0.3-1957-gcbf8433/src/osd/OSD.h:448
#13 OSD::ShardedOpWQ::_process (this=0x7faec6691168, thread_index=<optimized out>, hb=0x7faec674d920) at /build/ceph-12.0.3-1957-gcbf8433/src/osd/OSD.cc:10025
#14 0x00007faebc6ccf3c in ShardedThreadPool::shardedthreadpool_worker (this=0x7faec6690958, thread_index=<optimized out>) at /build/ceph-12.0.3-1957-gcbf8433/src/common/WorkQueue.cc:343
#15 0x00007faebc6cf0c0 in ShardedThreadPool::WorkThreadSharded::entry (this=<optimized out>) at /build/ceph-12.0.3-1957-gcbf8433/src/common/WorkQueue.h:686
#16 0x00007faeba1e1184 in start_thread (arg=0x7fae9bf2b700) at pthread_create.c:312
#17 0x00007faeb92d137d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

log
   -21> 2017-06-21 04:21:34.370381 7fae9bf2b700 10 osd.3 pg_epoch: 58 pg[2.1( v 56'46 (0'0,56'46] local-lis/les=53/55 n=22 ec=14/14 lis/c 53/53 les/c/f 55/55/0 52/53/14) [3,4,0] r=0 lpr=53 crt=56'46 lcod 56'45 mlcod 56'45 active+clean] get_object_context: obc NOT found in cache: 2:981aefdf:::smithi183240546-191 oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo:head
   -20> 2017-06-21 04:21:34.370411 7fae9bf2b700 15 filestore(/var/lib/ceph/osd/ceph-3) getattr 2.1_head/#2:981aefdf:::smithi183240546-191 oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo:head# '_'
   -19> 2017-06-21 04:21:34.370580 7fae9bf2b700 10 filestore(/var/lib/ceph/osd/ceph-3) error opening file /var/lib/ceph/osd/ceph-3/current/2.1_head/smithi183240546-191 ooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo_2ba9eba32dd5301ed68c_0_long with flags=2: (2) No such file or directory
   -18> 2017-06-21 04:21:34.370597 7fae9bf2b700 10 filestore(/var/lib/ceph/osd/ceph-3) getattr 2.1_head/#2:981aefdf:::smithi183240546-191 oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo:head# '_' = -2
   -17> 2017-06-21 04:21:34.370613 7fae9bf2b700 10 osd.3 pg_epoch: 58 pg[2.1( v 56'46 (0'0,56'46] local-lis/les=53/55 n=22 ec=14/14 lis/c 53/53 les/c/f 55/55/0 52/53/14) [3,4,0] r=0 lpr=53 crt=56'46 lcod 56'45 mlcod 56'45 active+clean] get_object_context: no obc for soid 2:981aefdf:::smithi183240546-191 oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo:head and !can_create
   -16> 2017-06-21 04:21:34.370648 7fae9bf2b700 10 osd.3 pg_epoch: 58 pg[2.1( v 56'46 (0'0,56'46] local-lis/les=53/55 n=22 ec=14/14 lis/c 53/53 les/c/f 55/55/0 52/53/14) [3,4,0] r=0 lpr=53 crt=56'46 lcod 56'45 mlcod 56'45 active+clean] agent_choose_mode flush_mode: idle evict_mode: idle num_objects: 22 num_bytes: 50609028 num_objects_dirty: 10 num_objects_omap: 0 num_dirty: 10 num_user_objects: 21 num_user_bytes: 50651756 num_overhead_bytes: 43008 pool.info.target_max_bytes: 0 pool.info.target_max_objects: 250
   -15> 2017-06-21 04:21:34.370672 7fae9bf2b700 20 osd.3 pg_epoch: 58 pg[2.1( v 56'46 (0'0,56'46] local-lis/les=53/55 n=22 ec=14/14 lis/c 53/53 les/c/f 55/55/0 52/53/14) [3,4,0] r=0 lpr=53 crt=56'46 lcod 56'45 mlcod 56'45 active+clean] agent_choose_mode dirty 0.16129 full 0.338709
   -14> 2017-06-21 04:21:34.370703 7fae9bf2b700 25 osd.3 pg_epoch: 58 pg[2.1( v 56'46 (0'0,56'46] local-lis/les=53/55 n=22 ec=14/14 lis/c 53/53 les/c/f 55/55/0 52/53/14) [3,4,0] r=0 lpr=53 crt=56'46 lcod 56'45 mlcod 56'45 active+clean] maybe_handle_cache_detail (no obc) missing_oid 2:981aefdf:::smithi183240546-191 oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo:head must_promote 0 in_hit_set 1
   -13> 2017-06-21 04:21:34.370735 7fae9bf2b700 10 osd.3 pg_epoch: 58 pg[2.1( v 56'46 (0'0,56'46] local-lis/les=53/55 n=22 ec=14/14 lis/c 53/53 les/c/f 55/55/0 52/53/14) [3,4,0] r=0 lpr=53 crt=56'46 lcod 56'45 mlcod 56'45 active+clean] do_proxy_read Start proxy read for osd_op(client.4172.0:6010 2.1 2:981aefdf:::smithi183240546-191 oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo:head [sparse-read 0~3260690] snapc 0=[] ondisk+read+known_if_redirected e58) v8
   -12> 2017-06-21 04:21:34.370801 7fae9bf2b700  1 -- 172.21.15.70:0/11155 --> 172.21.15.183:6808/238371 -- osd_op(unknown.0.5:2950 1.1 1:981aefdf:::smithi183240546-191 oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo:head [sparse-read 0~3260690] snapc 0=[] ondisk+read+ignore_cache+ignore_overlay+known_if_redirected e58) v8 -- 0x7faec76f2f00 con 0
   -11> 2017-06-21 04:21:34.370848 7fae9bf2b700 20 osd.3 pg_epoch: 58 pg[2.1( v 56'46 (0'0,56'46] local-lis/les=53/55 n=22 ec=14/14 lis/c 53/53 les/c/f 55/55/0 52/53/14) [3,4,0] r=0 lpr=53 crt=56'46 lcod 56'45 mlcod 56'45 active+clean] maybe_promote missing_oid 2:981aefdf:::smithi183240546-191 oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo:head  in_hit_set 1
   -10> 2017-06-21 04:21:34.370882 7fae9bf2b700 10 osd.3 58 dequeue_op 0x7faec8628520 finish
    -9> 2017-06-21 04:21:34.373687 7fae8ccef700  1 -- 172.21.15.70:6801/11155 <== client.4172 172.21.15.183:0/3004699849 103 ==== osd_op(client.4172.0:6016 2.1 2.e1663bf5 (undecoded) ondisk+read+known_if_redirected e58) v8 ==== 456+0+36 (1298899858 0 2595185706) 0x7faec8095500 con 0x7faec7d51400
    -8> 2017-06-21 04:21:34.373764 7fae8ccef700 15 osd.3 58 enqueue_op 0x7faec8628aa0 prio 63 cost 36 latency 0.000124 epoch 58 osd_op(client.4172.0:6016 2.1 2.e1663bf5 (undecoded) ondisk+read+known_if_redirected e58) v8
    -7> 2017-06-21 04:21:34.373778 7fae8ccef700 20 osd.3 op_wq(1) _enqueue 2.1 PGQueueable(0x7faec8628aa0 prio 63 cost 36 e58)
    -6> 2017-06-21 04:21:34.373832 7fae8ccef700  1 -- 172.21.15.70:6801/11155 <== client.4172 172.21.15.183:0/3004699849 104 ==== osd_op(client.4172.0:6017 2.1 2.e1663bf5 (undecoded) ondisk+read+known_if_redirected e58) v8 ==== 266+0+4 (2548120908 0 3080238136) 0x7faec8097600 con 0x7faec7d51400
    -5> 2017-06-21 04:21:34.373855 7fae8ccef700 15 osd.3 58 enqueue_op 0x7faec8628ec0 prio 63 cost 4 latency 0.000063 epoch 58 osd_op(client.4172.0:6017 2.1 2.e1663bf5 (undecoded) ondisk+read+known_if_redirected e58) v8
    -4> 2017-06-21 04:21:34.373869 7fae8ccef700 20 osd.3 op_wq(1) _enqueue 2.1 PGQueueable(0x7faec8628ec0 prio 63 cost 4 e58)
    -3> 2017-06-21 04:21:34.373915 7fae8ccef700  1 -- 172.21.15.70:6801/11155 <== client.4172 172.21.15.183:0/3004699849 105 ==== osd_op(client.4172.0:6018 2.1 2.e1663bf5 (undecoded) ondisk+read+known_if_redirected e58) v8 ==== 228+0+0 (3210629878 0 0) 0x7faec88f7900 con 0x7faec7d51400
    -2> 2017-06-21 04:21:34.373979 7fae8ccef700 15 osd.3 58 enqueue_op 0x7faec8627b80 prio 63 cost 0 latency 0.000097 epoch 58 osd_op(client.4172.0:6018 2.1 2.e1663bf5 (undecoded) ondisk+read+known_if_redirected e58) v8
    -1> 2017-06-21 04:21:34.374000 7fae8ccef700 20 osd.3 op_wq(1) _enqueue 2.1 PGQueueable(0x7faec8627b80 prio 63 cost 0 e58)
     0> 2017-06-21 04:21:34.374333 7fae9bf2b700 -1 *** Caught signal (Segmentation fault) **
 in thread 7fae9bf2b700 thread_name:tp_osd_tp

 ceph version 12.0.3-1957-gcbf8433 (cbf84334f105a0a28eda18cb283eedfa14618e3f) luminous (dev)
 1: (()+0x9a9567) [0x7faebc68b567]
 2: (()+0x10330) [0x7faeba1e9330]
 3: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0xff5) [0x7faebc1c2165]
 4: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x8fc) [0x7faebc6ccf3c]
 5: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x7faebc6cf0c0]
 6: (()+0x8184) [0x7faeba1e1184]

qi is a boost::optional<PGQueueable> on the stack.

/a/sage-2017-06-21_02:01:04-rados-wip-sage-testing2-distro-basic-smithi/1308118

Actions #1

Updated by Sage Weil almost 7 years ago

  • Priority changed from Urgent to Normal
Actions #2

Updated by Patrick Donnelly over 4 years ago

  • Status changed from 12 to New
Actions

Also available in: Atom PDF