Actions
Bug #12194
closedosd crash FAILED assert(!parent->get_log().get_missing().is_missing(soid))
Status:
Won't Fix
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:
0%
Source:
other
Tags:
Backport:
Regression:
No
Severity:
1 - critical
Reviewed:
Description
After updating to Hammer 0.94.2 via 0.93 -> 0.94, one OSD crashes with the following assert.
-14> 2015-06-30 23:26:17.970533 7f02d0dd9700 1 -- 10.30.1.128:6805/4022 <== osd.63 10.30.1.125:6815/22138 11 ==== pg_info(1 pgs e28954:12.74) v4 ==== 759+0+0 (2625070229 0 0) 0xfe8d280 con 0xf2ac8e0 -13> 2015-06-30 23:26:17.970549 7f02d0dd9700 5 -- op tracker -- seq: 359, time: 2015-06-30 23:26:17.970359, event: header_read, op: pg_info(1 pgs e28954:12.74) -12> 2015-06-30 23:26:17.970556 7f02d0dd9700 5 -- op tracker -- seq: 359, time: 2015-06-30 23:26:17.970360, event: throttled, op: pg_info(1 pgs e28954:12.74) -11> 2015-06-30 23:26:17.970562 7f02d0dd9700 5 -- op tracker -- seq: 359, time: 2015-06-30 23:26:17.970380, event: all_read, op: pg_info(1 pgs e28954:12.74) -10> 2015-06-30 23:26:17.970622 7f02d0dd9700 5 -- op tracker -- seq: 359, time: 2015-06-30 23:26:17.970548, event: dispatched, op: pg_info(1 pgs e28954:12.74) -9> 2015-06-30 23:26:17.970636 7f02d0dd9700 5 -- op tracker -- seq: 359, time: 2015-06-30 23:26:17.970636, event: waiting_for_osdmap, op: pg_info(1 pgs e28954:12.74) -8> 2015-06-30 23:26:17.970650 7f02d0dd9700 5 -- op tracker -- seq: 359, time: 2015-06-30 23:26:17.970650, event: started, op: pg_info(1 pgs e28954:12.74) -7> 2015-06-30 23:26:17.970673 7f02d0dd9700 5 -- op tracker -- seq: 359, time: 2015-06-30 23:26:17.970673, event: done, op: pg_info(1 pgs e28954:12.74) -6> 2015-06-30 23:26:17.970696 7f02d0dd9700 1 -- 10.30.1.128:6805/4022 <== osd.58 10.30.1.37:6802/2027655 9 ==== MRecoveryReserve REQUEST pgid: 6.6e, query_epoch: 28954 v2 ==== 26+0+0 (2409593423 0 0) 0x432cb00 con 0xf2af220 -5> 2015-06-30 23:26:17.970719 7f02d0dd9700 5 -- op tracker -- seq: 360, time: 2015-06-30 23:26:17.970049, event: header_read, op: MRecoveryReserve REQUEST pgid: 6.6e, query_epoch: 28954 -4> 2015-06-30 23:26:17.970726 7f02d0dd9700 5 -- op tracker -- seq: 360, time: 2015-06-30 23:26:17.970051, event: throttled, op: MRecoveryReserve REQUEST pgid: 6.6e, query_epoch: 28954 -3> 2015-06-30 23:26:17.970732 7f02d0dd9700 5 -- op tracker -- seq: 360, time: 2015-06-30 23:26:17.970057, event: all_read, op: MRecoveryReserve REQUEST pgid: 6.6e, query_epoch: 28954 -2> 2015-06-30 23:26:17.970737 7f02d0dd9700 5 -- op tracker -- seq: 360, time: 2015-06-30 23:26:17.970716, event: dispatched, op: MRecoveryReserve REQUEST pgid: 6.6e, query_epoch: 28954 -1> 2015-06-30 23:26:17.970744 7f02d0dd9700 5 -- op tracker -- seq: 360, time: 2015-06-30 23:26:17.970743, event: waiting_for_osdmap, op: MRecoveryReserve REQUEST pgid: 6.6e, query_epoch: 28954 0> 2015-06-30 23:26:17.972003 7f02c31c3700 -1 osd/ReplicatedBackend.cc: In function 'void ReplicatedBackend::sub_op_modify_impl(OpRequestRef) [with T = MOSDRepOp, int MSGTYPE = 112]' thread 7f02c31c3700 time 2015-06-30 23:26:17.968440 osd/ReplicatedBackend.cc: 1138: FAILED assert(!parent->get_log().get_missing().is_missing(soid)) ceph version 0.94.2 (5fb85614ca8f354284c713a2f9c610860720bbf3) 1: (void ReplicatedBackend::sub_op_modify_impl<MOSDRepOp, 112>(std::tr1::shared_ptr<OpRequest>)+0x1488) [0x9d7898] 2: (ReplicatedBackend::sub_op_modify(std::tr1::shared_ptr<OpRequest>)+0x56) [0x9bdcb6] 3: (ReplicatedBackend::handle_message(std::tr1::shared_ptr<OpRequest>)+0x37f) [0x9c0d9f] 4: (ReplicatedPG::do_request(std::tr1::shared_ptr<OpRequest>&, ThreadPool::TPHandle&)+0x184) [0x873b04] 5: (OSD::dequeue_op(boost::intrusive_ptr<PG>, std::tr1::shared_ptr<OpRequest>, ThreadPool::TPHandle&)+0x178) [0x683318] 6: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x59e) [0x683cae] 7: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x795) [0xafe0f5] 8: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0xb019f0] 9: /lib64/libpthread.so.0() [0x37632079d1] 10: (clone()+0x6d) [0x3762ee88fd] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. --- logging levels --- 0/ 5 none 0/ 1 lockdep 0/ 1 context 1/ 1 crush 1/ 5 mds 1/ 5 mds_balancer 1/ 5 mds_locker 1/ 5 mds_log 1/ 5 mds_log_expire 1/ 5 mds_migrator 0/ 1 buffer 0/ 1 timer 0/ 1 filer 0/ 1 striper 0/ 1 objecter 0/ 5 rados 0/ 5 rbd 0/ 5 rbd_replay 0/ 5 journaler 0/ 5 objectcacher 0/ 5 client 1/ 5 osd 0/ 5 optracker 0/ 5 objclass 1/ 5 filestore 1/ 3 keyvaluestore 1/ 1 journal 0/ 5 ms 1/ 5 mon 5/20 monc 1/ 5 paxos 0/ 5 tp 1/ 5 auth 1/ 5 crypto 1/ 1 finisher 1/ 5 heartbeatmap 1/ 5 perfcounter 1/ 5 rgw 1/10 civetweb 1/ 5 javaclient 1/ 5 asok 1/ 1 throttle 0/ 0 refs 1/ 5 xio -2/-2 (syslog threshold) -1/-1 (stderr threshold) max_recent 10000 max_new 1000 log_file /var/log/ceph/ceph-osd.85.log
Each time i launch the OSD it crashes on a different pg, but with the same stack. I attached the full log.
Files
Actions