Project

General

Profile

Actions

Bug #2126

closed

osd: recover_primary did nothing when num_missing==1

Added by Sage Weil about 12 years ago. Updated about 12 years ago.

Status:
Duplicate
Priority:
Normal
Assignee:
Category:
OSD
Target version:
% Done:

0%

Source:
Development
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

2012-02-29 14:47:26.032112 1685c700 osd.5 1525 pg[0.17( v 1522'182 lc 1135'140 (0'0,1522'182] n=11 ec=1 les/c 1519/1430 1514/1518/1518) [5]/[5,1] r=0 lpr=1518 mlcod 16'2 active+recovering+remapped m=1 snaptrimq=[1~208,20a~1,20c~1,20f~9,219~3,21e~1,220~3,22d~1]] recover_primary pulling 0 in pg
2012-02-29 14:47:26.032735 1685c700 osd.5 1525 pg[0.17( v 1522'182 lc 1135'140 (0'0,1522'182] n=11 ec=1 les/c 1519/1430 1514/1518/1518) [5]/[5,1] r=0 lpr=1518 mlcod 16'2 active+recovering+remapped m=1 snaptrimq=[1~208,20a~1,20c~1,20f~9,219~3,21e~1,220~3,22d~1]] recover_primary missing(1)
2012-02-29 14:47:26.033327 1685c700 osd.5 1525 pg[0.17( v 1522'182 lc 1135'140 (0'0,1522'182] n=11 ec=1 les/c 1519/1430 1514/1518/1518) [5]/[5,1] r=0 lpr=1518 mlcod 16'2 active+recovering+remapped m=1 snaptrimq=[1~208,20a~1,20c~1,20f~9,219~3,21e~1,220~3,22d~1]]  started 0
2012-02-29 14:47:26.034020 1685c700 osd.5 1525 pg[0.17( v 1522'182 lc 1135'140 (0'0,1522'182] n=11 ec=1 les/c 1519/1430 1514/1518/1518) [5]/[5,1] r=0 lpr=1518 mlcod 16'2 active+recovering+remapped m=1 snaptrimq=[1~208,20a~1,20c~1,20f~9,219~3,21e~1,220~3,22d~1]] handle_recovery_complete
2012-02-29 14:47:26.035070 1685c700 osd.5 1525 pg[0.17( v 1522'182 lc 1135'140 (0'0,1522'182] n=11 ec=1 les/c 1519/1430 1514/1518/1518) [5]/[5,1] r=0 lpr=1518 mlcod 16'2 active+remapped m=1 snaptrimq=[1~208,20a~1,20c~1,20f~9,219~3,21e~1,220~3,22d~1]] calc_acting osd.1 0.17( v 1522'182 lc 1454'179 (0'0,1522'182] n=10 ec=1 les/c 1430/1430 1514/1518/1518)
2012-02-29 14:47:26.035888 1685c700 osd.5 1525 pg[0.17( v 1522'182 lc 1135'140 (0'0,1522'182] n=11 ec=1 les/c 1519/1430 1514/1518/1518) [5]/[5,1] r=0 lpr=1518 mlcod 16'2 active+remapped m=1 snaptrimq=[1~208,20a~1,20c~1,20f~9,219~3,21e~1,220~3,22d~1]] calc_acting osd.2 0.17( v 1454'179 (0'0,1454'179] n=10 ec=1 les/c 1430/1430 1514/1518/1518)
2012-02-29 14:47:26.036691 1685c700 osd.5 1525 pg[0.17( v 1522'182 lc 1135'140 (0'0,1522'182] n=11 ec=1 les/c 1519/1430 1514/1518/1518) [5]/[5,1] r=0 lpr=1518 mlcod 16'2 active+remapped m=1 snaptrimq=[1~208,20a~1,20c~1,20f~9,219~3,21e~1,220~3,22d~1]] calc_acting osd.5 0.17( v 1522'182 lc 1135'140 (0'0,1522'182] n=11 ec=1 les/c 1519/1430 1514/1518/1518)
2012-02-29 14:47:26.037384 1685c700 osd.5 1525 pg[0.17( v 1522'182 lc 1135'140 (0'0,1522'182] n=11 ec=1 les/c 1519/1430 1514/1518/1518) [5]/[5,1] r=0 lpr=1518 mlcod 16'2 active+remapped m=1 snaptrimq=[1~208,20a~1,20c~1,20f~9,219~3,21e~1,220~3,22d~1]] calc_acting prefer osd.5 because it is current primary
2012-02-29 14:47:26.038195 1685c700 osd.5 1525 pg[0.17( v 1522'182 lc 1135'140 (0'0,1522'182] n=11 ec=1 les/c 1519/1430 1514/1518/1518) [5]/[5,1] r=0 lpr=1518 mlcod 16'2 active+remapped m=1 snaptrimq=[1~208,20a~1,20c~1,20f~9,219~3,21e~1,220~3,22d~1]] calc_acting newest update on osd.5 with 0.17( v 1522'182 lc 1135'140 (0'0,1522'182] n=11 ec=1 les/c 1519/1430 1514/1518/1518)
2012-02-29 14:47:26.038903 1685c700 osd.5 1525 pg[0.17( v 1522'182 lc 1135'140 (0'0,1522'182] n=11 ec=1 les/c 1519/1430 1514/1518/1518) [5]/[5,1] r=0 lpr=1518 mlcod 16'2 active+remapped m=1 snaptrimq=[1~208,20a~1,20c~1,20f~9,219~3,21e~1,220~3,22d~1]] up[0](osd.5) selected as primary
2012-02-29 14:47:26.039721 1685c700 osd.5 1525 pg[0.17( v 1522'182 lc 1135'140 (0'0,1522'182] n=11 ec=1 les/c 1519/1430 1514/1518/1518) [5]/[5,1] r=0 lpr=1518 mlcod 16'2 active+remapped m=1 snaptrimq=[1~208,20a~1,20c~1,20f~9,219~3,21e~1,220~3,22d~1]] calc_acting primary is osd.5 with 0.17( v 1522'182 lc 1135'140 (0'0,1522'182] n=11 ec=1 les/c 1519/1430 1514/1518/1518)
2012-02-29 14:47:26.040578 1685c700 osd.5 1525 pg[0.17( v 1522'182 lc 1135'140 (0'0,1522'182] n=11 ec=1 les/c 1519/1430 1514/1518/1518) [5]/[5,1] r=0 lpr=1518 mlcod 16'2 active+remapped m=1 snaptrimq=[1~208,20a~1,20c~1,20f~9,219~3,21e~1,220~3,22d~1]]  osd.1 (stray) accepted 0.17( v 1522'182 lc 1454'179 (0'0,1522'182] n=10 ec=1 les/c 1430/1430 1514/1518/1518)
2012-02-29 14:47:26.056109 1685c700 osd.5 1525 pg[0.17( v 1522'182 lc 1135'140 (0'0,1522'182] n=11 ec=1 les/c 1519/1430 1514/1518/1518) [5]/[5,1] r=0 lpr=1518 mlcod 16'2 active+remapped m=1 snaptrimq=[1~208,20a~1,20c~1,20f~9,219~3,21e~1,220~3,22d~1]] choose_acting want [5,1] (== acting)
2012-02-29 14:47:26.056974 1685c700 osd.5 1525 pg[0.17( v 1522'182 lc 1135'140 (0'0,1522'182] n=11 ec=1 les/c 1519/1430 1514/1518/1518) [5]/[5,1] r=0 lpr=1518 mlcod 16'2 active+remapped m=1 snaptrimq=[1~208,20a~1,20c~1,20f~9,219~3,21e~1,220~3,22d~1]] needs_recovery primary has 1
osd/PG.cc: In function 'boost::statechart::result PG::RecoveryState::Active::react(const PG::RecoveryState::RecoveryComplete&)' thread 1685c700 time 2012-02-29 14:47:26.061231
osd/PG.cc: 4120: FAILED assert(!pg->needs_recovery())
 ceph version 0.42.2-203-g8a2b764 (commit:8a2b76411e42f0e6404e79b1a240ce502d44de3a)
 1: (PG::RecoveryState::Active::react(PG::RecoveryState::RecoveryComplete const&)+0x1b3) [0x73b043]
 2: (boost::statechart::simple_state<PG::RecoveryState::Active, PG::RecoveryState::Primary, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base const&, void const*)+0x139) [0x752109]
 3: (boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Initial, std::allocator<void>, boost::statechart::null_exception_translator>::process_event(boost::statechart::event_base const&)+0x6b) [0x74d33b]
 4: (PG::RecoveryState::handle_recovery_complete(PG::RecoveryCtx*)+0x10c) [0x70a7bc]
 5: (ReplicatedPG::start_recovery_ops(int, PG::RecoveryCtx*)+0x32f) [0x532b0f]
 6: (OSD::do_recovery(PG*)+0x345) [0x584575]
 7: (ThreadPool::worker()+0xa26) [0x6a3706]
 8: (ThreadPool::WorkThread::entry()+0xd) [0x5c806d]
 9: (()+0x7971) [0x5037971]
 10: (clone()+0x6d) [0x679f92d]
 ceph version 0.42.2-203-g8a2b764 (commit:8a2b76411e42f0e6404e79b1a240ce502d44de3a)
 1: (PG::RecoveryState::Active::react(PG::RecoveryState::RecoveryComplete const&)+0x1b3) [0x73b043]
 2: (boost::statechart::simple_state<PG::RecoveryState::Active, PG::RecoveryState::Primary, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base const&, void const*)+0x139) [0x752109]
 3: (boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Initial, std::allocator<void>, boost::statechart::null_exception_translator>::process_event(boost::statechart::event_base const&)+0x6b) [0x74d33b]
 4: (PG::RecoveryState::handle_recovery_complete(PG::RecoveryCtx*)+0x10c) [0x70a7bc]
 5: (ReplicatedPG::start_recovery_ops(int, PG::RecoveryCtx*)+0x32f) [0x532b0f]
 6: (OSD::do_recovery(PG*)+0x345) [0x584575]
 7: (ThreadPool::worker()+0xa26) [0x6a3706]
 8: (ThreadPool::WorkThread::entry()+0xd) [0x5c806d]
 9: (()+0x7971) [0x5037971]
 10: (clone()+0x6d) [0x679f92d]

adding some debug output and reproducing.


Related issues 1 (0 open1 closed)

Is duplicate of Ceph - Bug #2133: osd: recovery_completeResolved03/03/2012

Actions
Actions #1

Updated by Sage Weil about 12 years ago

  • Status changed from 12 to Duplicate
Actions

Also available in: Atom PDF