Actions
Bug #2126
closedosd: recover_primary did nothing when num_missing==1
Source:
Development
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
2012-02-29 14:47:26.032112 1685c700 osd.5 1525 pg[0.17( v 1522'182 lc 1135'140 (0'0,1522'182] n=11 ec=1 les/c 1519/1430 1514/1518/1518) [5]/[5,1] r=0 lpr=1518 mlcod 16'2 active+recovering+remapped m=1 snaptrimq=[1~208,20a~1,20c~1,20f~9,219~3,21e~1,220~3,22d~1]] recover_primary pulling 0 in pg 2012-02-29 14:47:26.032735 1685c700 osd.5 1525 pg[0.17( v 1522'182 lc 1135'140 (0'0,1522'182] n=11 ec=1 les/c 1519/1430 1514/1518/1518) [5]/[5,1] r=0 lpr=1518 mlcod 16'2 active+recovering+remapped m=1 snaptrimq=[1~208,20a~1,20c~1,20f~9,219~3,21e~1,220~3,22d~1]] recover_primary missing(1) 2012-02-29 14:47:26.033327 1685c700 osd.5 1525 pg[0.17( v 1522'182 lc 1135'140 (0'0,1522'182] n=11 ec=1 les/c 1519/1430 1514/1518/1518) [5]/[5,1] r=0 lpr=1518 mlcod 16'2 active+recovering+remapped m=1 snaptrimq=[1~208,20a~1,20c~1,20f~9,219~3,21e~1,220~3,22d~1]] started 0 2012-02-29 14:47:26.034020 1685c700 osd.5 1525 pg[0.17( v 1522'182 lc 1135'140 (0'0,1522'182] n=11 ec=1 les/c 1519/1430 1514/1518/1518) [5]/[5,1] r=0 lpr=1518 mlcod 16'2 active+recovering+remapped m=1 snaptrimq=[1~208,20a~1,20c~1,20f~9,219~3,21e~1,220~3,22d~1]] handle_recovery_complete 2012-02-29 14:47:26.035070 1685c700 osd.5 1525 pg[0.17( v 1522'182 lc 1135'140 (0'0,1522'182] n=11 ec=1 les/c 1519/1430 1514/1518/1518) [5]/[5,1] r=0 lpr=1518 mlcod 16'2 active+remapped m=1 snaptrimq=[1~208,20a~1,20c~1,20f~9,219~3,21e~1,220~3,22d~1]] calc_acting osd.1 0.17( v 1522'182 lc 1454'179 (0'0,1522'182] n=10 ec=1 les/c 1430/1430 1514/1518/1518) 2012-02-29 14:47:26.035888 1685c700 osd.5 1525 pg[0.17( v 1522'182 lc 1135'140 (0'0,1522'182] n=11 ec=1 les/c 1519/1430 1514/1518/1518) [5]/[5,1] r=0 lpr=1518 mlcod 16'2 active+remapped m=1 snaptrimq=[1~208,20a~1,20c~1,20f~9,219~3,21e~1,220~3,22d~1]] calc_acting osd.2 0.17( v 1454'179 (0'0,1454'179] n=10 ec=1 les/c 1430/1430 1514/1518/1518) 2012-02-29 14:47:26.036691 1685c700 osd.5 1525 pg[0.17( v 1522'182 lc 1135'140 (0'0,1522'182] n=11 ec=1 les/c 1519/1430 1514/1518/1518) [5]/[5,1] r=0 lpr=1518 mlcod 16'2 active+remapped m=1 snaptrimq=[1~208,20a~1,20c~1,20f~9,219~3,21e~1,220~3,22d~1]] calc_acting osd.5 0.17( v 1522'182 lc 1135'140 (0'0,1522'182] n=11 ec=1 les/c 1519/1430 1514/1518/1518) 2012-02-29 14:47:26.037384 1685c700 osd.5 1525 pg[0.17( v 1522'182 lc 1135'140 (0'0,1522'182] n=11 ec=1 les/c 1519/1430 1514/1518/1518) [5]/[5,1] r=0 lpr=1518 mlcod 16'2 active+remapped m=1 snaptrimq=[1~208,20a~1,20c~1,20f~9,219~3,21e~1,220~3,22d~1]] calc_acting prefer osd.5 because it is current primary 2012-02-29 14:47:26.038195 1685c700 osd.5 1525 pg[0.17( v 1522'182 lc 1135'140 (0'0,1522'182] n=11 ec=1 les/c 1519/1430 1514/1518/1518) [5]/[5,1] r=0 lpr=1518 mlcod 16'2 active+remapped m=1 snaptrimq=[1~208,20a~1,20c~1,20f~9,219~3,21e~1,220~3,22d~1]] calc_acting newest update on osd.5 with 0.17( v 1522'182 lc 1135'140 (0'0,1522'182] n=11 ec=1 les/c 1519/1430 1514/1518/1518) 2012-02-29 14:47:26.038903 1685c700 osd.5 1525 pg[0.17( v 1522'182 lc 1135'140 (0'0,1522'182] n=11 ec=1 les/c 1519/1430 1514/1518/1518) [5]/[5,1] r=0 lpr=1518 mlcod 16'2 active+remapped m=1 snaptrimq=[1~208,20a~1,20c~1,20f~9,219~3,21e~1,220~3,22d~1]] up[0](osd.5) selected as primary 2012-02-29 14:47:26.039721 1685c700 osd.5 1525 pg[0.17( v 1522'182 lc 1135'140 (0'0,1522'182] n=11 ec=1 les/c 1519/1430 1514/1518/1518) [5]/[5,1] r=0 lpr=1518 mlcod 16'2 active+remapped m=1 snaptrimq=[1~208,20a~1,20c~1,20f~9,219~3,21e~1,220~3,22d~1]] calc_acting primary is osd.5 with 0.17( v 1522'182 lc 1135'140 (0'0,1522'182] n=11 ec=1 les/c 1519/1430 1514/1518/1518) 2012-02-29 14:47:26.040578 1685c700 osd.5 1525 pg[0.17( v 1522'182 lc 1135'140 (0'0,1522'182] n=11 ec=1 les/c 1519/1430 1514/1518/1518) [5]/[5,1] r=0 lpr=1518 mlcod 16'2 active+remapped m=1 snaptrimq=[1~208,20a~1,20c~1,20f~9,219~3,21e~1,220~3,22d~1]] osd.1 (stray) accepted 0.17( v 1522'182 lc 1454'179 (0'0,1522'182] n=10 ec=1 les/c 1430/1430 1514/1518/1518) 2012-02-29 14:47:26.056109 1685c700 osd.5 1525 pg[0.17( v 1522'182 lc 1135'140 (0'0,1522'182] n=11 ec=1 les/c 1519/1430 1514/1518/1518) [5]/[5,1] r=0 lpr=1518 mlcod 16'2 active+remapped m=1 snaptrimq=[1~208,20a~1,20c~1,20f~9,219~3,21e~1,220~3,22d~1]] choose_acting want [5,1] (== acting) 2012-02-29 14:47:26.056974 1685c700 osd.5 1525 pg[0.17( v 1522'182 lc 1135'140 (0'0,1522'182] n=11 ec=1 les/c 1519/1430 1514/1518/1518) [5]/[5,1] r=0 lpr=1518 mlcod 16'2 active+remapped m=1 snaptrimq=[1~208,20a~1,20c~1,20f~9,219~3,21e~1,220~3,22d~1]] needs_recovery primary has 1 osd/PG.cc: In function 'boost::statechart::result PG::RecoveryState::Active::react(const PG::RecoveryState::RecoveryComplete&)' thread 1685c700 time 2012-02-29 14:47:26.061231 osd/PG.cc: 4120: FAILED assert(!pg->needs_recovery()) ceph version 0.42.2-203-g8a2b764 (commit:8a2b76411e42f0e6404e79b1a240ce502d44de3a) 1: (PG::RecoveryState::Active::react(PG::RecoveryState::RecoveryComplete const&)+0x1b3) [0x73b043] 2: (boost::statechart::simple_state<PG::RecoveryState::Active, PG::RecoveryState::Primary, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base const&, void const*)+0x139) [0x752109] 3: (boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Initial, std::allocator<void>, boost::statechart::null_exception_translator>::process_event(boost::statechart::event_base const&)+0x6b) [0x74d33b] 4: (PG::RecoveryState::handle_recovery_complete(PG::RecoveryCtx*)+0x10c) [0x70a7bc] 5: (ReplicatedPG::start_recovery_ops(int, PG::RecoveryCtx*)+0x32f) [0x532b0f] 6: (OSD::do_recovery(PG*)+0x345) [0x584575] 7: (ThreadPool::worker()+0xa26) [0x6a3706] 8: (ThreadPool::WorkThread::entry()+0xd) [0x5c806d] 9: (()+0x7971) [0x5037971] 10: (clone()+0x6d) [0x679f92d] ceph version 0.42.2-203-g8a2b764 (commit:8a2b76411e42f0e6404e79b1a240ce502d44de3a) 1: (PG::RecoveryState::Active::react(PG::RecoveryState::RecoveryComplete const&)+0x1b3) [0x73b043] 2: (boost::statechart::simple_state<PG::RecoveryState::Active, PG::RecoveryState::Primary, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base const&, void const*)+0x139) [0x752109] 3: (boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Initial, std::allocator<void>, boost::statechart::null_exception_translator>::process_event(boost::statechart::event_base const&)+0x6b) [0x74d33b] 4: (PG::RecoveryState::handle_recovery_complete(PG::RecoveryCtx*)+0x10c) [0x70a7bc] 5: (ReplicatedPG::start_recovery_ops(int, PG::RecoveryCtx*)+0x32f) [0x532b0f] 6: (OSD::do_recovery(PG*)+0x345) [0x584575] 7: (ThreadPool::worker()+0xa26) [0x6a3706] 8: (ThreadPool::WorkThread::entry()+0xd) [0x5c806d] 9: (()+0x7971) [0x5037971] 10: (clone()+0x6d) [0x679f92d]
adding some debug output and reproducing.
Actions