Project

General

Profile

Actions

Bug #17039

closed

osd crash when generate past interval

Added by mingxin liu over 7 years ago. Updated almost 7 years ago.

Status:
Can't reproduce
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

43> 2016-08-10 14:00:45.385221 7fd6c8fc6700 10 filestore(/var/lib/ceph/osd/ceph-1) FileStore::read(meta/-1/ac952985/osdmap.828/0) open error: (2) No such file or directory
-42> 2016-08-10 14:00:45.385226 7fd6c8fc6700 20 osd.1 830 advance_pg missing map 828
-41> 2016-08-10 14:00:45.385228 7fd6c8fc6700 20 osd.1 830 get_map 829 - loading and decoding 0x24875d40
-40> 2016-08-10 14:00:45.385231 7fd6c8fc6700 15 filestore(/var/lib/ceph/osd/ceph-1) read meta/-1/ac952955/osdmap.829/0 0~0
-39> 2016-08-10 14:00:45.385251 7fd6c8fc6700 10 filestore(/var/lib/ceph/osd/ceph-1) error opening file /var/lib/ceph/osd/ceph-1/current/meta/DIR_5/DIR_5/osdmap.829__0_AC952955__none with flags=2: (2) No such file or directory
-38> 2016-08-10 14:00:45.385256 7fd6c8fc6700 10 filestore(/var/lib/ceph/osd/ceph-1) FileStore::read(meta/-1/ac952955/osdmap.829/0) open error: (2) No such file or directory
-37> 2016-08-10 14:00:45.385260 7fd6c8fc6700 20 osd.1 830 advance_pg missing map 829
-36> 2016-08-10 14:00:45.385273 7fd6c8fc6700 10 osd.1 pg_epoch: 718 pg[1.1( v 705'215 (472'207,705'215] local-les=711 n=4 ec=6 les/c/f 711/715/0 710/0/677) [4,5,1] r=2 lpr=718 crt=705'215 lcod 0'0 inactive NOTIFY] handle_advance_map [4,5,3]/[4,5,3] -
4/4
-35> 2016-08-10 14:00:45.385306 7fd6c8fc6700 20 PGPool::update cached_removed_snaps [1~15,17~4,1c~3,20~4,25~3,29~3,2d~3,31~1,33~1,35~b,41~1,44~4,49~1] newly_removed_snaps [e~1,10~1,17~1,1d~1,20~1,2a~1,2d~1,2f~1,31~1,35~b,41~1,44~4,49~1] snapc 4a=[4a,48,43,42,40,34,32,30,2c,28,24,1f,1b,16] (updated)
-34> 2016-08-10 14:00:45.385319 7fd6c8fc6700 10 osd.1 pg_epoch: 830 pg[1.1( v 705'215 (472'207,705'215] local-les=711 n=4 ec=6 les/c/f 711/715/0 710/0/677) [4,5,1] r=2 lpr=718 crt=705'215 lcod 0'0 inactive NOTIFY] state<Reset>: Reset advmap
-33> 2016-08-10 14:00:45.385338 7fd6c8fc6700 10 osd.1 pg_epoch: 830 pg[1.1( v 705'215 (472'207,705'215] local-les=711 n=4 ec=6 les/c/f 711/715/0 710/0/677) [4,5,1] r=2 lpr=718 crt=705'215 lcod 0'0 inactive NOTIFY] generate_past_intervals over epochs 830-838
-32> 2016-08-10 14:00:45.385349 7fd6c8fc6700 20 osd.1 830 get_map 831 - loading and decoding 0x24875d40
-31> 2016-08-10 14:00:45.385353 7fd6c8fc6700 15 filestore(/var/lib/ceph/osd/ceph-1) read meta/-1/ac952f45/osdmap.831/0 0~0
-30> 2016-08-10 14:00:45.385380 7fd6c8fc6700 10 filestore(/var/lib/ceph/osd/ceph-1) FileStore::read meta/-1/ac952f45/osdmap.831/0 0~5914/5914
-29> 2016-08-10 14:00:45.385386 7fd6c8fc6700 10 osd.1 830 add_map_bl 831 5914 bytes
-28> 2016-08-10 14:00:45.385493 7fd6c8fc6700 20 osd.1 830 get_map 832 - loading and decoding 0x24875200
-27> 2016-08-10 14:00:45.385502 7fd6c8fc6700 15 filestore(/var/lib/ceph/osd/ceph-1) read meta/-1/ac952c15/osdmap.832/0 0~0
-26> 2016-08-10 14:00:45.385531 7fd6c8fc6700 10 filestore(/var/lib/ceph/osd/ceph-1) FileStore::read meta/-1/ac952c15/osdmap.832/0 0~5852/5852
-25> 2016-08-10 14:00:45.385538 7fd6c8fc6700 10 osd.1 830 add_map_bl 832 5852 bytes
-24> 2016-08-10 14:00:45.385618 7fd6c8fc6700 20 osd.1 830 get_map 833 - loading and decoding 0x23ece400
-23> 2016-08-10 14:00:45.385626 7fd6c8fc6700 15 filestore(/var/lib/ceph/osd/ceph-1) read meta/-1/ac952da5/osdmap.833/0 0~0
-22> 2016-08-10 14:00:45.385644 7fd6c8fc6700 10 filestore(/var/lib/ceph/osd/ceph-1) FileStore::read meta/-1/ac952da5/osdmap.833/0 0~5790/5790
-21> 2016-08-10 14:00:45.385650 7fd6c8fc6700 10 osd.1 830 add_map_bl 833 5790 bytes
-20> 2016-08-10 14:00:45.385737 7fd6c8fc6700 20 osd.1 830 get_map 834 - loading and decoding 0x24875d40
-19> 2016-08-10 14:00:45.385745 7fd6c8fc6700 15 filestore(/var/lib/ceph/osd/ceph-1) read meta/-1/ac952d75/osdmap.834/0 0~0
-18> 2016-08-10 14:00:45.385762 7fd6c8fc6700 10 filestore(/var/lib/ceph/osd/ceph-1) FileStore::read meta/-1/ac952d75/osdmap.834/0 0~5852/5852
-17> 2016-08-10 14:00:45.385767 7fd6c8fc6700 10 osd.1 830 add_map_bl 834 5852 bytes
-16> 2016-08-10 14:00:45.385846 7fd6c8fc6700 20 osd.1 830 get_map 835 - loading and decoding 0x24875200
-15> 2016-08-10 14:00:45.385854 7fd6c8fc6700 15 filestore(/var/lib/ceph/osd/ceph-1) read meta/-1/ac952205/osdmap.835/0 0~0
-14> 2016-08-10 14:00:45.385870 7fd6c8fc6700 10 filestore(/var/lib/ceph/osd/ceph-1) FileStore::read meta/-1/ac952205/osdmap.835/0 0~5790/5790
-13> 2016-08-10 14:00:45.385876 7fd6c8fc6700 10 osd.1 830 add_map_bl 835 5790 bytes
-12> 2016-08-10 14:00:45.385953 7fd6c8fc6700 20 osd.1 830 get_map 836 - loading and decoding 0x23ece400
-11> 2016-08-10 14:00:45.385963 7fd6c8fc6700 15 filestore(/var/lib/ceph/osd/ceph-1) read meta/-1/ac9523d5/osdmap.836/0 0~0
-10> 2016-08-10 14:00:45.385979 7fd6c8fc6700 10 filestore(/var/lib/ceph/osd/ceph-1) FileStore::read meta/-1/ac9523d5/osdmap.836/0 0~5728/5728
-9> 2016-08-10 14:00:45.385985 7fd6c8fc6700 10 osd.1 830 add_map_bl 836 5728 bytes
-8> 2016-08-10 14:00:45.386062 7fd6c8fc6700 20 osd.1 830 get_map 837 - loading and decoding 0x24875d40
-7> 2016-08-10 14:00:45.386070 7fd6c8fc6700 15 filestore(/var/lib/ceph/osd/ceph-1) read meta/-1/ac952365/osdmap.837/0 0~0
-6> 2016-08-10 14:00:45.386090 7fd6c8fc6700 10 filestore(/var/lib/ceph/osd/ceph-1) FileStore::read meta/-1/ac952365/osdmap.837/0 0~5790/5790
-5> 2016-08-10 14:00:45.386096 7fd6c8fc6700 10 osd.1 830 add_map_bl 837 5790 bytes
-4> 2016-08-10 14:00:45.386168 7fd6c8fc6700 20 osd.1 830 get_map 838 - loading and decoding 0x24875200
-3> 2016-08-10 14:00:45.386175 7fd6c8fc6700 15 filestore(/var/lib/ceph/osd/ceph-1) read meta/-1/ac952035/osdmap.838/0 0~0
-2> 2016-08-10 14:00:45.386193 7fd6c8fc6700 10 filestore(/var/lib/ceph/osd/ceph-1) FileStore::read meta/-1/ac952035/osdmap.838/0 0~4871/4871
-1> 2016-08-10 14:00:45.386198 7fd6c8fc6700 10 osd.1 830 add_map_bl 838 4871 bytes
0> 2016-08-10 14:00:45.390822 7fd6c8fc6700 -1 ** Caught signal (Segmentation fault) *
in thread 7fd6c8fc6700

ceph version .94.7.160704-205-g900a0a7 (900a0a72cccc36f38b76906f33a5a083fbe2c5f8)
1: ceph-osd() [0xb6c4b2]
2: (()+0xf100) [0x7fd6e1c36100]
3: (pg_interval_t::is_new_interval(int, int, std::vector<int, std::allocator<int> > const&, std::vector<int, std::allocator<int> > const&, int, int, std::vector<int, std::allocator<int> > const&, std::vector<int, std::allocator<int> > const&, std::tr1::shared_ptr<OSDMap const>, std::tr1::shared_ptr<OSDMap const>, pg_t)+0x11e) [0x898b6e]
4: (pg_interval_t::check_new_interval(int, int, std::vector<int, std::allocator<int> > const&, std::vector<int, std::allocator<int> > const&, int, int, std::vector<int, std::allocator<int> > const&, std::vector<int, std::allocator<int> > const&, unsigned int, unsigned int, std::tr1::shared_ptr<OSDMap const>, std::tr1::shared_ptr<OSDMap const>, pg_t, IsPGRecoverablePredicate*, std::map<unsigned int, pg_interval_t, std::less<unsigned int>, std::allocator<std::pair<unsigned int const, pg_interval_t> > >, std::ostream)+0x142) [0x8a1802]
5: (PG::generate_past_intervals()+0x8b6) [0x8c3e66]
6: (PG::RecoveryState::Reset::react(PG::AdvMap const&)+0xa5) [0x8f7e35]
7: (boost::statechart::simple_state<PG::RecoveryState::Reset, PG::RecoveryState::RecoveryMachine, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base const&, void const*)+0x21c) [0x935aac]
8: (boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Initial, std::allocator<void>, boost::statechart::null_exception_translator>::send_event(boost::statechart::event_base const&)+0x6b) [0x91cb3b]
9: (PG::handle_advance_map(std::tr1::shared_ptr<OSDMap const>, std::tr1::shared_ptr<OSDMap const>, std::vector<int, std::allocator<int> >&, int, std::vector<int, std::allocator<int> >&, int, PG::RecoveryCtx*)+0x4a9) [0x900669]
10: (OSD::advance_pg(unsigned int, PG*, ThreadPool::TPHandle&, PG::RecoveryCtx*, std::set<boost::intrusive_ptr<PG>, std::less<boost::intrusive_ptr<PG> >, std::allocator<boost::intrusive_ptr<PG> > >)+0x2da) [0x6af60a]
11: (OSD::process_peering_events(std::list<PG
, std::allocator<PG*> > const&, ThreadPool::TPHandle&)+0x201) [0x6c24e1]
12: (OSD::PeeringWQ::_process(std::list<PG*, std::allocator<PG*> > const&, ThreadPool::TPHandle&)+0x28) [0x71c808]
13: (ThreadPool::worker(ThreadPool::WorkThread*)+0xa8e) [0xc63f5e]

Actions #1

Updated by mingxin liu over 7 years ago

after killed osd.1 at epoch 718
pg 1.1 was imported by objectstore-tool
then remove pool 1
revive osd.1, this time, monitor trim osdmap to epoch 830, and newest map is 838
the direct reason, i think, is null ptr.
in method is_new_interval(), it get pool size using “osdmap->get_pools().find(pgid.pool())->second.size”
but this pool has been deleted.

Actions #2

Updated by mingxin liu over 7 years ago

v0.94.7
maybe this commit will help?https://github.com/ceph/ceph/commit/65dcc2da76750d0b6dd2cf0031c44f32749f33e5

Actions #3

Updated by Sage Weil almost 7 years ago

  • Status changed from New to Can't reproduce

I think this has been fixed.

Actions

Also available in: Atom PDF