Project

General

Profile

Actions

Bug #10524

closed

FAILED assert(peer_missing.count(fromshard))

Added by Loïc Dachary over 9 years ago. Updated over 9 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

http://qa-proxy.ceph.com/teuthology/samuelj-2015-01-07_13:21:03-rados-wip-sam-testing-wip-testing-vanilla-fixes-basic-multi/689640/

   -22> 2015-01-07 23:46:39.374580 7f1b2fb28700  5 -- op tracker -- seq: 501, time: 2015-01-07 23:46:39.374436, event: throttled, op: MRecoveryReserve GRANT  pgid: 0.17, query_epoch: 6
   -21> 2015-01-07 23:46:39.374584 7f1b2fb28700  5 -- op tracker -- seq: 501, time: 2015-01-07 23:46:39.374497, event: all_read, op: MRecoveryReserve GRANT  pgid: 0.17, query_epoch: 6
   -20> 2015-01-07 23:46:39.374587 7f1b2fb28700  5 -- op tracker -- seq: 501, time: 2015-01-07 23:46:39.374564, event: dispatched, op: MRecoveryReserve GRANT  pgid: 0.17, query_epoch: 6
   -19> 2015-01-07 23:46:39.374592 7f1b2fb28700  5 -- op tracker -- seq: 501, time: 2015-01-07 23:46:39.374592, event: waiting_for_osdmap, op: MRecoveryReserve GRANT  pgid: 0.17, query_epoch: 6
   -18> 2015-01-07 23:46:39.374596 7f1b2fb28700 15 osd.0 6 require_same_or_newer_map 6 (i am 6) 0x4bb5c20
   -17> 2015-01-07 23:46:39.374618 7f1b2fb28700  5 -- op tracker -- seq: 501, time: 2015-01-07 23:46:39.374618, event: done, op: MRecoveryReserve GRANT  pgid: 0.17, query_epoch: 6
   -16> 2015-01-07 23:46:39.374629 7f1b2fb28700 10 osd.0 6 do_waiters -- start
   -15> 2015-01-07 23:46:39.374630 7f1b2fb28700 10 osd.0 6 do_waiters -- finish
   -14> 2015-01-07 23:46:39.374634 7f1b27b18700 10 osd.0 pg_epoch: 6 pg[0.17( v 6'15 (0'0,6'15] local-les=5 n=12 ec=1 les/c 5/6 4/4/2) [0,2] r=0 lpr=4 crt=6'13 lcod 6'14 mlcod 6'14 active+recovery_wait m=1] handle_peering_event: epoch_sent: 6 epoch_requested: 6 RemoteRecoveryReserved
   -13> 2015-01-07 23:46:39.374679 7f1b27b18700  5 osd.0 pg_epoch: 6 pg[0.17( v 6'15 (0'0,6'15] local-les=5 n=12 ec=1 les/c 5/6 4/4/2) [0,2] r=0 lpr=4 crt=6'13 lcod 6'14 mlcod 6'14 active+recovery_wait m=1] exit Started/Primary/Active/WaitRemoteRecoveryReserved 0.001485 1 0.000118
   -12> 2015-01-07 23:46:39.374700 7f1b27b18700  5 osd.0 pg_epoch: 6 pg[0.17( v 6'15 (0'0,6'15] local-les=5 n=12 ec=1 les/c 5/6 4/4/2) [0,2] r=0 lpr=4 crt=6'13 lcod 6'14 mlcod 6'14 active+recovery_wait m=1] enter Started/Primary/Active/Recovering
   -11> 2015-01-07 23:46:39.374726 7f1b27b18700 10 osd.0 6 queue_for_recovery queued pg[0.17( v 6'15 (0'0,6'15] local-les=5 n=12 ec=1 les/c 5/6 4/4/2) [0,2] r=0 lpr=4 crt=6'13 lcod 6'14 mlcod 6'14 active+recovering m=1]
   -10> 2015-01-07 23:46:39.374742 7f1b27b18700 10 log is not dirty
    -9> 2015-01-07 23:46:39.374767 7f1b2230d700 10 osd.0 6 do_recovery can start 5 (0/15 rops)
    -8> 2015-01-07 23:46:39.374771 7f1b2230d700 10 osd.0 6 do_recovery starting 5 pg[0.17( v 6'15 (0'0,6'15] local-les=5 n=12 ec=1 les/c 5/6 4/4/2) [0,2] r=0 lpr=4 crt=6'13 lcod 6'14 mlcod 6'14 active+recovering m=1]
    -7> 2015-01-07 23:46:39.374789 7f1b2230d700 10 osd.0 pg_epoch: 6 pg[0.17( v 6'15 (0'0,6'15] local-les=5 n=12 ec=1 les/c 5/6 4/4/2) [0,2] r=0 lpr=4 crt=6'13 lcod 6'14 mlcod 6'14 active+recovering m=1] recover_primary recovering 0 in pg
    -6> 2015-01-07 23:46:39.374800 7f1b2230d700 10 osd.0 pg_epoch: 6 pg[0.17( v 6'15 (0'0,6'15] local-les=5 n=12 ec=1 les/c 5/6 4/4/2) [0,2] r=0 lpr=4 crt=6'13 lcod 6'14 mlcod 6'14 active+recovering m=1] recover_primary missing(1)
    -5> 2015-01-07 23:46:39.374814 7f1b2230d700 10 osd.0 pg_epoch: 6 pg[0.17( v 6'15 (0'0,6'15] local-les=5 n=12 ec=1 les/c 5/6 4/4/2) [0,2] r=0 lpr=4 crt=6'13 lcod 6'14 mlcod 6'14 active+recovering m=1] recover_primary 190db197/benchmark_data_burnupi59_7159_object114/head//0 6'14 (missing) (missing head)
    -4> 2015-01-07 23:46:39.374830 7f1b2230d700 10 osd.0 pg_epoch: 6 pg[0.17( v 6'15 (0'0,6'15] local-les=5 n=12 ec=1 les/c 5/6 4/4/2) [0,2] r=0 lpr=4 crt=6'13 lcod 6'14 mlcod 6'14 active+recovering m=1] start_recovery_op 190db197/benchmark_data_burnupi59_7159_object114/head//0
    -3> 2015-01-07 23:46:39.374883 7f1b2230d700 10 osd.0 6 start_recovery_op pg[0.17( v 6'15 (0'0,6'15] local-les=5 n=12 ec=1 les/c 5/6 4/4/2) [0,2] r=0 lpr=4 rops=1 crt=6'13 lcod 6'14 mlcod 6'14 active+recovering m=1] 190db197/benchmark_data_burnupi59_7159_object114/head//0 (5/15 rops)
    -2> 2015-01-07 23:46:39.374895 7f1b2230d700 10 osd.0 pg_epoch: 6 pg[0.17( v 6'15 (0'0,6'15] local-les=5 n=12 ec=1 les/c 5/6 4/4/2) [0,2] r=0 lpr=4 rops=1 crt=6'13 lcod 6'14 mlcod 6'14 active+recovering m=1] recover_object: 190db197/benchmark_data_burnupi59_7159_object114/head//0
    -1> 2015-01-07 23:46:39.374910 7f1b2230d700  7 osd.0 pg_epoch: 6 pg[0.17( v 6'15 (0'0,6'15] local-les=5 n=12 ec=1 les/c 5/6 4/4/2) [0,2] r=0 lpr=4 rops=1 crt=6'13 lcod 6'14 mlcod 6'14 active+recovering m=1] pull 190db197/benchmark_data_burnupi59_7159_object114/head//0 v 6'14 on osds 0 from osd.0
     0> 2015-01-07 23:46:39.378442 7f1b2230d700 -1 osd/ReplicatedPG.cc: In function 'void ReplicatedBackend::prepare_pull(eversion_t, const hobject_t&, ObjectContextRef, ReplicatedBackend::RPGHandle*)' thread 7f1b2230d700 time 2015-01-07 23:46:39.374934
osd/ReplicatedPG.cc: 8552: FAILED assert(peer_missing.count(fromshard))

 ceph version 0.90-793-g5f48d50 (5f48d505ab8a08832a65f449c7b927047c910cf9)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0xba5f3b]
 2: (ReplicatedBackend::prepare_pull(eversion_t, hobject_t const&, std::tr1::shared_ptr<ObjectContext>, ReplicatedBackend::RPGHandle*)+0xf2f) [0x85bb2f]
 3: (ReplicatedBackend::recover_object(hobject_t const&, eversion_t, std::tr1::shared_ptr<ObjectContext>, std::tr1::shared_ptr<ObjectContext>, PGBackend::RecoveryHandle*)+0x2ee) [0xa1309e]
 4: (ReplicatedPG::recover_missing(hobject_t const&, eversion_t, int, PGBackend::RecoveryHandle*)+0x5d2) [0x86d222]
 5: (ReplicatedPG::recover_primary(int, ThreadPool::TPHandle&)+0x139e) [0x8742ae]
 6: (ReplicatedPG::start_recovery_ops(int, PG::RecoveryCtx*, ThreadPool::TPHandle&, int*)+0x54b) [0x8a729b]
 7: (OSD::do_recovery(PG*, ThreadPool::TPHandle&)+0x293) [0x69b843]
 8: (OSD::RecoveryWQ::_process(PG*, ThreadPool::TPHandle&)+0x17) [0x6fc497]
 9: (ThreadPool::worker(ThreadPool::WorkThread*)+0xa46) [0xb970e6]
 10: (ThreadPool::WorkThread::entry()+0x10) [0xb98190]
 11: (()+0x8182) [0x7f1b42384182]
 12: (clone()+0x6d) [0x7f1b408f038d]


Files

ceph-osd.0-bad.log.gz (711 KB) ceph-osd.0-bad.log.gz log of primary osd.0 recover from the primary : fail Loïc Dachary, 01/15/2015 12:54 PM
ceph-osd.0-good.log.gz (247 KB) ceph-osd.0-good.log.gz log of primary osd.0 recover from the replica : ok Loïc Dachary, 01/15/2015 12:54 PM

Related issues 1 (0 open1 closed)

Has duplicate Ceph - Bug #10566: osd/ReplicatedPG.cc: 8729: FAILED assert(peer_missing.count(fromshard)) from scrub_test.yaml Duplicate01/18/2015

Actions
Actions

Also available in: Atom PDF