Project

General

Profile

Actions

Bug #12194

closed

osd crash FAILED assert(!parent->get_log().get_missing().is_missing(soid))

Added by Ben Hines almost 9 years ago. Updated about 7 years ago.

Status:
Won't Fix
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
No
Severity:
1 - critical
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

After updating to Hammer 0.94.2 via 0.93 -> 0.94, one OSD crashes with the following assert.

-14> 2015-06-30 23:26:17.970533 7f02d0dd9700  1 -- 10.30.1.128:6805/4022 <== osd.63 10.30.1.125:6815/22138 11 ==== pg_info(1 pgs e28954:12.74) v4 ==== 759+0+0 (2625070229 0 0) 0xfe8d280 con 0xf2ac8e0
   -13> 2015-06-30 23:26:17.970549 7f02d0dd9700  5 -- op tracker -- seq: 359, time: 2015-06-30 23:26:17.970359, event: header_read, op: pg_info(1 pgs e28954:12.74)
   -12> 2015-06-30 23:26:17.970556 7f02d0dd9700  5 -- op tracker -- seq: 359, time: 2015-06-30 23:26:17.970360, event: throttled, op: pg_info(1 pgs e28954:12.74)
   -11> 2015-06-30 23:26:17.970562 7f02d0dd9700  5 -- op tracker -- seq: 359, time: 2015-06-30 23:26:17.970380, event: all_read, op: pg_info(1 pgs e28954:12.74)
   -10> 2015-06-30 23:26:17.970622 7f02d0dd9700  5 -- op tracker -- seq: 359, time: 2015-06-30 23:26:17.970548, event: dispatched, op: pg_info(1 pgs e28954:12.74)
    -9> 2015-06-30 23:26:17.970636 7f02d0dd9700  5 -- op tracker -- seq: 359, time: 2015-06-30 23:26:17.970636, event: waiting_for_osdmap, op: pg_info(1 pgs e28954:12.74)
    -8> 2015-06-30 23:26:17.970650 7f02d0dd9700  5 -- op tracker -- seq: 359, time: 2015-06-30 23:26:17.970650, event: started, op: pg_info(1 pgs e28954:12.74)
    -7> 2015-06-30 23:26:17.970673 7f02d0dd9700  5 -- op tracker -- seq: 359, time: 2015-06-30 23:26:17.970673, event: done, op: pg_info(1 pgs e28954:12.74)
    -6> 2015-06-30 23:26:17.970696 7f02d0dd9700  1 -- 10.30.1.128:6805/4022 <== osd.58 10.30.1.37:6802/2027655 9 ==== MRecoveryReserve REQUEST  pgid: 6.6e, query_epoch: 28954 v2 ==== 26+0+0 (2409593423 0 0) 0x432cb00 con 0xf2af220
    -5> 2015-06-30 23:26:17.970719 7f02d0dd9700  5 -- op tracker -- seq: 360, time: 2015-06-30 23:26:17.970049, event: header_read, op: MRecoveryReserve REQUEST  pgid: 6.6e, query_epoch: 28954
    -4> 2015-06-30 23:26:17.970726 7f02d0dd9700  5 -- op tracker -- seq: 360, time: 2015-06-30 23:26:17.970051, event: throttled, op: MRecoveryReserve REQUEST  pgid: 6.6e, query_epoch: 28954
    -3> 2015-06-30 23:26:17.970732 7f02d0dd9700  5 -- op tracker -- seq: 360, time: 2015-06-30 23:26:17.970057, event: all_read, op: MRecoveryReserve REQUEST  pgid: 6.6e, query_epoch: 28954
    -2> 2015-06-30 23:26:17.970737 7f02d0dd9700  5 -- op tracker -- seq: 360, time: 2015-06-30 23:26:17.970716, event: dispatched, op: MRecoveryReserve REQUEST  pgid: 6.6e, query_epoch: 28954
    -1> 2015-06-30 23:26:17.970744 7f02d0dd9700  5 -- op tracker -- seq: 360, time: 2015-06-30 23:26:17.970743, event: waiting_for_osdmap, op: MRecoveryReserve REQUEST  pgid: 6.6e, query_epoch: 28954
     0> 2015-06-30 23:26:17.972003 7f02c31c3700 -1 osd/ReplicatedBackend.cc: In function 'void ReplicatedBackend::sub_op_modify_impl(OpRequestRef) [with T = MOSDRepOp, int MSGTYPE = 112]' thread 7f02c31c3700 time 2015-06-30 23:26:17.968440
osd/ReplicatedBackend.cc: 1138: FAILED assert(!parent->get_log().get_missing().is_missing(soid))

 ceph version 0.94.2 (5fb85614ca8f354284c713a2f9c610860720bbf3)
 1: (void ReplicatedBackend::sub_op_modify_impl<MOSDRepOp, 112>(std::tr1::shared_ptr<OpRequest>)+0x1488) [0x9d7898]
 2: (ReplicatedBackend::sub_op_modify(std::tr1::shared_ptr<OpRequest>)+0x56) [0x9bdcb6]
 3: (ReplicatedBackend::handle_message(std::tr1::shared_ptr<OpRequest>)+0x37f) [0x9c0d9f]
 4: (ReplicatedPG::do_request(std::tr1::shared_ptr<OpRequest>&, ThreadPool::TPHandle&)+0x184) [0x873b04]
 5: (OSD::dequeue_op(boost::intrusive_ptr<PG>, std::tr1::shared_ptr<OpRequest>, ThreadPool::TPHandle&)+0x178) [0x683318]
 6: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x59e) [0x683cae]
 7: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x795) [0xafe0f5]
 8: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0xb019f0]
 9: /lib64/libpthread.so.0() [0x37632079d1]
 10: (clone()+0x6d) [0x3762ee88fd]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- logging levels ---
   0/ 5 none
   0/ 1 lockdep
   0/ 1 context
   1/ 1 crush
   1/ 5 mds
   1/ 5 mds_balancer
   1/ 5 mds_locker
   1/ 5 mds_log
   1/ 5 mds_log_expire
   1/ 5 mds_migrator
   0/ 1 buffer
   0/ 1 timer
   0/ 1 filer
   0/ 1 striper
   0/ 1 objecter
   0/ 5 rados
   0/ 5 rbd
   0/ 5 rbd_replay
   0/ 5 journaler
   0/ 5 objectcacher
   0/ 5 client
   1/ 5 osd
   0/ 5 optracker
   0/ 5 objclass
   1/ 5 filestore
   1/ 3 keyvaluestore
   1/ 1 journal
   0/ 5 ms
   1/ 5 mon
   5/20 monc
   1/ 5 paxos
   0/ 5 tp
   1/ 5 auth
   1/ 5 crypto
   1/ 1 finisher
   1/ 5 heartbeatmap
   1/ 5 perfcounter
   1/ 5 rgw
   1/10 civetweb
   1/ 5 javaclient
   1/ 5 asok
   1/ 1 throttle
   0/ 0 refs
   1/ 5 xio
  -2/-2 (syslog threshold)
  -1/-1 (stderr threshold)
  max_recent     10000
  max_new         1000
  log_file /var/log/ceph/ceph-osd.85.log

Each time i launch the OSD it crashes on a different pg, but with the same stack. I attached the full log.


Files

ceph-osd.85.zip (236 KB) ceph-osd.85.zip Ben Hines, 07/01/2015 06:44 AM
Actions

Also available in: Atom PDF