Bug #20785
closedosd/osd_types.cc: 3574: FAILED assert(lastmap->get_pools().count(pgid.pool()))
0%
Description
2017-07-26 15:53:25.247581 7fcbabaa1700 1 -- 172.21.15.184:6808/156553 <== mon.0 172.21.15.148:6789/0 5 ==== osd_pg_create(e363 9.6:0 9.7:0 9.c:0 9.e:0 9.f:0) v3 ==== 381+0+0 (2842883642 0 0) 0x5568d4476800 con 0x5568d938d000 2017-07-26 15:53:25.247589 7fcbabaa1700 20 osd.2 364 OSD::ms_dispatch: osd_pg_create(e363 9.6:0 9.7:0 9.c:0 9.e:0 9.f:0) v3 2017-07-26 15:53:25.247585 7fcb9b402700 20 osd.2 pg_epoch: 364 pg[9.7( v 364'14 (0'0,364'14] local-lis/les=362/364 n=14 ec=362/362 lis/c 362/362 les/c/f 364/364/0 362/362/362) [2,1] r=0 lpr=362 luod=364'13 lua=364'13 crt=364'14 lcod 364'12 mlcod 364'12 active+clean] PrimaryLogPG::check_blacklisted_obc_watchers for obc 9:e5b2df3f:::benchmark_data_smithi184_210998_object209:head 2017-07-26 15:53:25.247592 7fcbabaa1700 10 osd.2 364 do_waiters -- start 2017-07-26 15:53:25.247593 7fcbabaa1700 10 osd.2 364 do_waiters -- finish 2017-07-26 15:53:25.247595 7fcbabaa1700 20 osd.2 364 _dispatch 0x5568d4476800 osd_pg_create(e363 9.6:0 9.7:0 9.c:0 9.e:0 9.f:0) v3 2017-07-26 15:53:25.247600 7fcbabaa1700 10 osd.2 364 handle_pg_create osd_pg_create(e363 9.6:0 9.7:0 9.c:0 9.e:0 9.f:0) v3 2017-07-26 15:53:25.247603 7fcbabaa1700 15 osd.2 364 require_same_or_newer_map 363 (i am 364) 0x5568d4476800 2017-07-26 15:53:25.247593 7fcb9b402700 10 osd.2 pg_epoch: 364 pg[9.7( v 364'14 (0'0,364'14] local-lis/les=362/364 n=14 ec=362/362 lis/c 362/362 les/c/f 364/364/0 362/362/362) [2,1] r=0 lpr=362 luod=364'13 lua=364'13 crt=364'14 lcod 364'12 mlcod 364'12 active+clean] get_object_context: 0x5568d6b8ea80 9:e5b2df3f:::benchmark_data_smithi184_210998_object209:head rwstate(none n=0 w=0) oi: 9:e5b2df3f:::benchmark_data_smithi184_210998_object209:head(0'0 unknown.0.0:0 s 0 uv 0 alloc_hint [0 0 0]) ssc: 0x5568ba51f760 snapset: 0=[]:[] 2017-07-26 15:53:25.247606 7fcbabaa1700 20 osd.2 364 mkpg 9.6 e0@0.000000 ... 2017-07-26 15:53:25.565112 7fcbabaa1700 -1 /build/ceph-12.1.1-609-g392d888/src/osd/osd_types.cc: In function 'static bool PastIntervals::check_new_interval(int, int, const std::vector<int>&, const std::vector<int>&, int, int, const std::vector<int>&, const std::vector<int>&, epoch_t, epoch_t, OSDMapRef, OSDMapRef, pg_t, IsPGRecoverablePredicate*, PastIntervals*, std::ostream*)' thread 7fcbabaa1700 time 2017-07-26 15:53:25.265711 /build/ceph-12.1.1-609-g392d888/src/osd/osd_types.cc: 3574: FAILED assert(lastmap->get_pools().count(pgid.pool())) ceph version 12.1.1-609-g392d888 (392d888dffbe31593a9d9f3d4dbf4f83284fdc58) luminous (rc) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x102) [0x5568aaf58e22] 2: (PastIntervals::check_new_interval(int, int, std::vector<int, std::allocator<int> > const&, std::vector<int, std::allocator<int> > const&, int, int, std::vector<int, std::allocator<int> > const&, std::vector<int, std::allocator<int> > const&, unsigned int, unsigned int, std::shared_ptr<OSDMap const>, std::shared_ptr<OSDMap const>, pg_t, IsPGRecoverablePredicate*, PastIntervals*, std::ostream*)+0x293) [0x5568aabc4493] 3: (OSD::build_initial_pg_history(spg_t, unsigned int, utime_t, pg_history_t*, PastIntervals*)+0x4c2) [0x5568aa980932] 4: (OSD::handle_pg_create(boost::intrusive_ptr<OpRequest>)+0xa8e) [0x5568aa9a143e] 5: (OSD::dispatch_op(boost::intrusive_ptr<OpRequest>)+0x1b1) [0x5568aa9a34a1] 6: (OSD::_dispatch(Message*)+0x389) [0x5568aa9a3f19] 7: (OSD::ms_dispatch(Message*)+0x87) [0x5568aa9a4267] 8: (DispatchQueue::entry()+0xf4a) [0x5568ab1c1dea] 9: (DispatchQueue::DispatchThread::entry()+0xd) [0x5568aafeae8d] 10: (()+0x76ba) [0x7fcbbe7546ba] 11: (clone()+0x6d) [0x7fcbbd7cb82d] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
/a/sage-2017-07-26_14:40:34-rados-wip-sage-testing-distro-basic-smithi/1447282
Updated by Sage Weil almost 7 years ago
problem appears to be the message the mon sent,
osd_pg_create(e363 9.6:0 9.7:0 9.c:0 9.e:0 9.f:0)
with epoch 0 for the create:
2017-07-26 15:53:25.247620 7fcbabaa1700 10 osd.2 364 build_initial_pg_history 9.6 created 0
indeed,
2017-07-26 15:53:25.241490 7efec178f700 20 mon.b@0(leader).osd e364 send_pg_creates osd.2 from 0 : epoch 363 5 pgs 2017-07-26 15:53:25.241501 7efec178f700 20 mon.b@0(leader).osd e364 send_pg_creates will create 9.6 at 0 2017-07-26 15:53:25.241506 7efec178f700 20 mon.b@0(leader).osd e364 send_pg_creates will create 9.7 at 0 2017-07-26 15:53:25.241509 7efec178f700 20 mon.b@0(leader).osd e364 send_pg_creates will create 9.c at 0 2017-07-26 15:53:25.241510 7efec178f700 20 mon.b@0(leader).osd e364 send_pg_creates will create 9.e at 0 2017-07-26 15:53:25.241512 7efec178f700 20 mon.b@0(leader).osd e364 send_pg_creates will create 9.f at 0 2017-07-26 15:53:25.241513 7efec178f700 1 -- 172.21.15.148:6789/0 --> 172.21.15.184:6808/156553 -- osd_pg_create(e363 9.6:0 9.7:0 9.c:0 9.e:0 9.f:0) v3 -- ?+0 0x7efed7cc9680 con 0x7efed868c300
Updated by Kefu Chai over 6 years ago
https://github.com/ceph/ceph/pull/16677 is posted to help debug this issue.
Updated by Sage Weil over 6 years ago
- Has duplicate Bug #20474: osd/osd_types.cc: 3534: FAILED assert(lastmap->get_pools().count(pgid.pool())) added
Updated by Kefu Chai over 6 years ago
- Has duplicate Bug #20987: mon/OSDMonitor.cc: 3284: FAILED assert(create != creating_pgs.pgs.end()) added
Updated by Kefu Chai over 6 years ago
- Category set to Correctness/Safety
- Status changed from Need More Info to Fix Under Review
- Backport set to luminous
- Component(RADOS) Monitor added
Updated by Kefu Chai over 6 years ago
- Status changed from Fix Under Review to Pending Backport
Updated by Kefu Chai over 6 years ago
- Copied to Backport #21076: luminous: osd/osd_types.cc: 3574: FAILED assert(lastmap->get_pools().count(pgid.pool())) added
Updated by Kefu Chai over 6 years ago
- Copied to Backport #21090: osd/osd_types.cc: 3574: FAILED assert(lastmap->get_pools().count(pgid.pool())) added
Updated by Kefu Chai over 6 years ago
- Status changed from Pending Backport to Resolved
Updated by Kefu Chai over 6 years ago
- Status changed from Resolved to Fix Under Review
/a//joshd-2017-08-25_00:03:46-rados-wip-dup-perf-distro-basic-smithi/1560728/ mon.c
Updated by Sage Weil over 6 years ago
- Status changed from Fix Under Review to Resolved
Updated by Joao Eduardo Luis over 6 years ago
I may be wrong, but it looks like the commit fixing this is only present in current master. I was under the impression this affected luminous; does it not? If so, we need to backport it.
Updated by Nathan Cutler over 6 years ago
- Status changed from Resolved to Pending Backport
Updated by Nathan Cutler over 6 years ago
Joao, I changed status to "Pending Backport" but the PR is also has the "needs-backport" label, which is perhaps enough to ensure that the cherry-pick gets done in time for v12.2.1.
Updated by Joao Eduardo Luis over 6 years ago
doh. I missed the needs-backport tag on the pr :(
Updated by Kefu Chai over 6 years ago
thanks Joao, i am commenting on https://github.com/ceph/ceph/pull/17191 so it references https://github.com/ceph/ceph/pull/17065.
Updated by Nathan Cutler over 6 years ago
- Status changed from Pending Backport to Resolved