Project

General

Profile

Actions

Bug #45353

open

FAILED ceph_assert(pg_upmap.empty())

Added by Brad Hubbard almost 4 years ago. Updated almost 4 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

/a/bhubbard-2020-05-01_01:03:08-rados-wip-yuri-testing-2020-04-24-1941-master-distro-basic-smithi/5003239

    -4> 2020-05-01T01:59:22.051+0000 7fba2db04700 10 mon.c@2(peon).osd e759 check_osdmap_sub 0x55f5e8fa7800 next 675 (onetime)
    -3> 2020-05-01T01:59:22.051+0000 7fba2db04700  5 mon.c@2(peon).osd e759 send_incremental [675..759] to client.24694
    -2> 2020-05-01T01:59:22.051+0000 7fba2db04700 20 mon.c@2(peon).osd e759 reencode_full_map 697 with features 504412504114407940
    -1> 2020-05-01T01:59:22.054+0000 7fba2db04700 -1 /home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/16.0.0-1064-g1674bb8/rpm/el8/BUILD/ceph-16.0.0-1064-g1674bb8/src/osd/OSDMap.cc: In function 'void OSDMap::encode(ceph::buffer::v15_2_0::list&, uint64_t) const' thread 7fba2db04700 time 2020-05-01T01:59:22.052335+0000
/home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/16.0.0-1064-g1674bb8/rpm/el8/BUILD/ceph-16.0.0-1064-g1674bb8/src/osd/OSDMap.cc: 2968: FAILED ceph_assert(pg_upmap.empty())

 ceph version 16.0.0-1064-g1674bb8 (1674bb8f59f2c0d240ee17d0ef66d5eafc97f716) pacific (dev)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x158) [0x7fba3b3a43e0]
 2: (()+0x27b5fa) [0x7fba3b3a45fa]
 3: (OSDMap::encode(ceph::buffer::v15_2_0::list&, unsigned long) const+0xd42) [0x7fba3b7b2132]
 4: (OSDMonitor::reencode_full_map(ceph::buffer::v15_2_0::list&, unsigned long)+0xe1) [0x55f5e6cb81c1]
 5: (OSDMonitor::get_version_full(unsigned long, unsigned long, ceph::buffer::v15_2_0::list&)+0x256) [0x55f5e6cb9b76]
 6: (OSDMonitor::send_incremental(unsigned int, MonSession*, bool, boost::intrusive_ptr<MonOpRequest>)+0x295) [0x55f5e6cbbcb5]
 7: (OSDMonitor::check_osdmap_sub(Subscription*)+0x72) [0x55f5e6cc30a2]
 8: (Monitor::handle_subscribe(boost::intrusive_ptr<MonOpRequest>)+0x1379) [0x55f5e6b7c019]
 9: (Monitor::dispatch_op(boost::intrusive_ptr<MonOpRequest>)+0x47d) [0x55f5e6b9ad7d]
 10: (Monitor::_ms_dispatch(Message*)+0x69d) [0x55f5e6b9c4fd]
 11: (Dispatcher::ms_dispatch2(boost::intrusive_ptr<Message> const&)+0x5c) [0x55f5e6bcae7c]
 12: (DispatchQueue::entry()+0x126a) [0x7fba3b5bdeaa]
 13: (DispatchQueue::DispatchThread::entry()+0x11) [0x7fba3b660321]
 14: (()+0x82de) [0x7fba390e22de]
 15: (clone()+0x43) [0x7fba37e8c133]

All monitors crashed with this same assert. No coredumps found.

Actions #1

Updated by Neha Ojha almost 4 years ago

'rados/thrash-old-clients/{0-size-min-size-overrides/2-size-2-min-size.yaml
1-install/jewel.yaml backoff/normal.yaml ceph.yaml
clusters/{openstack.yaml three-plus-one.yaml}
d-balancer/crush-compat.yaml distro$/{centos_7.6.yaml}
msgr-failures/fastclose.yaml rados.yaml thrashers/careful.yaml
thrashosds-health.yaml workloads/cache-snaps.yaml}'

Actions #2

Updated by Neha Ojha almost 4 years ago

We have removed jewel from thrash-old-clients in https://github.com/ceph/ceph/pull/34748. We should check if this failure reproduces in other tests.

Actions #3

Updated by Brad Hubbard almost 4 years ago

Damn, missed that thanks Neha. Let me run this again on current master.

Actions #4

Updated by Neha Ojha almost 4 years ago

  • Priority changed from High to Normal
Actions #5

Updated by Brad Hubbard almost 4 years ago

  • Severity changed from 2 - major to 3 - minor

Haven't been able to reproduce so far post https://github.com/ceph/ceph/pull/34748

Actions

Also available in: Atom PDF