Bug #19909: PastIntervals::check_new_interval: assert(lastmap->get_pools().count(pgid.pool())) - RADOS - Ceph

Actions

Copy link

Bug #19909

closed

PastIntervals::check_new_interval: assert(lastmap->get_pools().count(pgid.pool()))

Added by Dan van der Ster almost 7 years ago. Updated almost 7 years ago.

Status:

Won't Fix

Priority:

High

Assignee:

Category:

Target version:

% Done:

Source:

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Component(RADOS):

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

After updating osds from 12.0.1 to 12.0.1-2248-g745902a, we get all osd's failing like this:

    -1> 2017-05-11 14:44:20.024794 7f797335e700 -1 osd.58 1886 failed to load OSD map for epoch 269, got 0 bytes
     0> 2017-05-11 14:44:20.027699 7f797335e700 -1 /home/jenkins-build/build/workspace/ceph-dev-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.0.1-2248-g745902a/rpm/el7/BUILD/ceph-12.0.1-2248-g745902a/src/osd/OSD.h: In function 'OSDMapRef OSDService::get_map(epoch_t)' thread 7f797335e700 time 2017-05-11 14:44:20.024809
/home/jenkins-build/build/workspace/ceph-dev-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.0.1-2248-g745902a/rpm/el7/BUILD/ceph-12.0.1-2248-g745902a/src/osd/OSD.h: 1064: FAILED assert(ret)

 ceph version 12.0.1-2248-g745902a (745902aec6b1a0395ca244ae280abb2e99183189)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x110) [0x7f7981b1c0c0]
 2: (OSDService::get_map(unsigned int)+0x3d) [0x7f798164c5bd]
 3: (OSD::build_initial_pg_history(spg_t, unsigned int, utime_t, pg_history_t*, PastIntervals*)+0x186) [0x7f79815efa06]
 4: (OSD::handle_pg_create(boost::intrusive_ptr<OpRequest>)+0x9b4) [0x7f79815faa04]
 5: (OSD::dispatch_op(boost::intrusive_ptr<OpRequest>)+0x1e1) [0x7f79815fcb91]
 6: (OSD::_dispatch(Message*)+0x3a7) [0x7f79815fd5b7]
 7: (OSD::ms_dispatch(Message*)+0x87) [0x7f79815fd9f7]
 8: (DispatchQueue::entry()+0x7a2) [0x7f7981cf6d82]
 9: (DispatchQueue::DispatchThread::entry()+0xd) [0x7f7981b9a62d]
 10: (()+0x7dc5) [0x7f797e765dc5]
 11: (clone()+0x6d) [0x7f797d65473d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

Here's the ceph status:

[14:58][root@p06253939f44928 (ceph_dev:ceph/halpert/mon*2) ~]# ceph status
2017-05-11 14:58:30.902776 7f9be729b700 -1 WARNING: the following dangerous and experimental features are enabled: bluestore
2017-05-11 14:58:30.924144 7f9be729b700 -1 WARNING: the following dangerous and experimental features are enabled: bluestore
    cluster 1d0f18a4-bf2d-43cc-9b91-511570f1a037
     health HEALTH_ERR
            17149 pgs are stuck inactive for more than 300 seconds
            6455 pgs degraded
            250 pgs down
            1871 pgs peering
            8576 pgs stale
            6455 pgs stuck degraded
            8573 pgs stuck inactive
            8576 pgs stuck stale
            8576 pgs stuck unclean
            6455 pgs stuck undersized
            6455 pgs undersized
            recovery 119682/240036 objects degraded (49.860%)
            mds0: 180 slow requests are blocked > 30 sec
            50/83 in osds are down
     monmap e4: 3 mons at {p06253939f44928=128.142.162.29:6789/0,p06253939y22952=128.142.162.19:6789/0,p06253939y31416=128.142.162.17:6789/0}
            election epoch 68, quorum 0,1,2 p06253939y31416,p06253939y22952,p06253939f44928
      fsmap e3081: 3/3/3 up {0=cephhalpert-mds-981001588f=up:active,1=cephhalpert-mds-96d8ad3ea3=up:active,2=cephhalpert-mds-135c39f87d=up:active}, 1 up:standby
        mgr active: p06253939f44928 standbys: p06253939y22952, p06253939y31416
     osdmap e1959: 112 osds: 33 up, 83 in; 22 remapped pgs
      pgmap v1984586: 8576 pgs, 3 pools, 5943 MB data, 80012 objects
            198 GB used, 379 TB / 379 TB avail
            99.965% pgs inactive
            119682/240036 objects degraded (49.860%)
                6451 stale+undersized+degraded+peered
                1861 stale+peering
                 250 stale+down
                  10 stale+remapped+peering
                   3 stale+active+undersized+degraded
                   1 stale+activating+undersized+degraded

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » RADOS

Custom queries

Bug #19909

PastIntervals::check_new_interval: assert(lastmap->get_pools().count(pgid.pool()))

Updated by Greg Farnum almost 7 years ago

Updated by Greg Farnum almost 7 years ago

Updated by Bertrand Gouny almost 7 years ago

Updated by WANG Guoqin almost 7 years ago

Updated by Jason McNeil almost 7 years ago

Updated by Greg Farnum almost 7 years ago

Updated by WANG Guoqin almost 7 years ago

Updated by Greg Farnum almost 7 years ago

Updated by Sage Weil almost 7 years ago

Updated by red ref almost 7 years ago

Updated by Greg Farnum almost 7 years ago

Updated by red ref almost 7 years ago

Updated by Greg Farnum almost 7 years ago

Updated by red ref almost 7 years ago

Updated by Josh Durgin almost 7 years ago

Updated by Sage Weil almost 7 years ago

Updated by sean redmond almost 7 years ago

Updated by WANG Guoqin almost 7 years ago

Updated by Jason McNeil almost 7 years ago

Updated by WANG Guoqin almost 7 years ago

Updated by Nathan Cutler almost 7 years ago

Updated by WANG Guoqin almost 7 years ago