Actions
Bug #19909
closedPastIntervals::check_new_interval: assert(lastmap->get_pools().count(pgid.pool()))
Status:
Won't Fix
Priority:
High
Assignee:
-
Category:
-
Target version:
-
% Done:
0%
Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
After updating osds from 12.0.1 to 12.0.1-2248-g745902a, we get all osd's failing like this:
-1> 2017-05-11 14:44:20.024794 7f797335e700 -1 osd.58 1886 failed to load OSD map for epoch 269, got 0 bytes 0> 2017-05-11 14:44:20.027699 7f797335e700 -1 /home/jenkins-build/build/workspace/ceph-dev-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.0.1-2248-g745902a/rpm/el7/BUILD/ceph-12.0.1-2248-g745902a/src/osd/OSD.h: In function 'OSDMapRef OSDService::get_map(epoch_t)' thread 7f797335e700 time 2017-05-11 14:44:20.024809 /home/jenkins-build/build/workspace/ceph-dev-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.0.1-2248-g745902a/rpm/el7/BUILD/ceph-12.0.1-2248-g745902a/src/osd/OSD.h: 1064: FAILED assert(ret) ceph version 12.0.1-2248-g745902a (745902aec6b1a0395ca244ae280abb2e99183189) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x110) [0x7f7981b1c0c0] 2: (OSDService::get_map(unsigned int)+0x3d) [0x7f798164c5bd] 3: (OSD::build_initial_pg_history(spg_t, unsigned int, utime_t, pg_history_t*, PastIntervals*)+0x186) [0x7f79815efa06] 4: (OSD::handle_pg_create(boost::intrusive_ptr<OpRequest>)+0x9b4) [0x7f79815faa04] 5: (OSD::dispatch_op(boost::intrusive_ptr<OpRequest>)+0x1e1) [0x7f79815fcb91] 6: (OSD::_dispatch(Message*)+0x3a7) [0x7f79815fd5b7] 7: (OSD::ms_dispatch(Message*)+0x87) [0x7f79815fd9f7] 8: (DispatchQueue::entry()+0x7a2) [0x7f7981cf6d82] 9: (DispatchQueue::DispatchThread::entry()+0xd) [0x7f7981b9a62d] 10: (()+0x7dc5) [0x7f797e765dc5] 11: (clone()+0x6d) [0x7f797d65473d] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Here's the ceph status:
[14:58][root@p06253939f44928 (ceph_dev:ceph/halpert/mon*2) ~]# ceph status 2017-05-11 14:58:30.902776 7f9be729b700 -1 WARNING: the following dangerous and experimental features are enabled: bluestore 2017-05-11 14:58:30.924144 7f9be729b700 -1 WARNING: the following dangerous and experimental features are enabled: bluestore cluster 1d0f18a4-bf2d-43cc-9b91-511570f1a037 health HEALTH_ERR 17149 pgs are stuck inactive for more than 300 seconds 6455 pgs degraded 250 pgs down 1871 pgs peering 8576 pgs stale 6455 pgs stuck degraded 8573 pgs stuck inactive 8576 pgs stuck stale 8576 pgs stuck unclean 6455 pgs stuck undersized 6455 pgs undersized recovery 119682/240036 objects degraded (49.860%) mds0: 180 slow requests are blocked > 30 sec 50/83 in osds are down monmap e4: 3 mons at {p06253939f44928=128.142.162.29:6789/0,p06253939y22952=128.142.162.19:6789/0,p06253939y31416=128.142.162.17:6789/0} election epoch 68, quorum 0,1,2 p06253939y31416,p06253939y22952,p06253939f44928 fsmap e3081: 3/3/3 up {0=cephhalpert-mds-981001588f=up:active,1=cephhalpert-mds-96d8ad3ea3=up:active,2=cephhalpert-mds-135c39f87d=up:active}, 1 up:standby mgr active: p06253939f44928 standbys: p06253939y22952, p06253939y31416 osdmap e1959: 112 osds: 33 up, 83 in; 22 remapped pgs pgmap v1984586: 8576 pgs, 3 pools, 5943 MB data, 80012 objects 198 GB used, 379 TB / 379 TB avail 99.965% pgs inactive 119682/240036 objects degraded (49.860%) 6451 stale+undersized+degraded+peered 1861 stale+peering 250 stale+down 10 stale+remapped+peering 3 stale+active+undersized+degraded 1 stale+activating+undersized+degraded
Actions