Project

General

Profile

Actions

Bug #38330

closed

osd/OSD.cc: 1515: abort() in Service::build_incremental_map_msg

Added by Sage Weil about 5 years ago. Updated over 4 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
mimic
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

2019-02-15 05:38:00.283 7fc005613700 10 osd.4 216 send_incremental_map 215 -> 216 to 0x5604454e6400 v1:172.21.15.41:6806/34481
2019-02-15 05:38:00.283 7fc005613700 15 bluestore(/var/lib/ceph/osd/ceph-4) read meta #-1:437cbe63:::inc_osdmap.216:0# 0x0~0
2019-02-15 05:38:00.283 7fc005613700 20 bluestore(/var/lib/ceph/osd/ceph-4).collection(meta 0x560442061c20) get_onode oid #-1:437cbe63:::inc_osdmap.216:0# key 0x7f7fffffffffffffff437cbe'c!inc_osdmap.216!='0x0000000000000000ffffffffffffffff'o'
2019-02-15 05:38:00.283 7fc005613700 20 bluestore(/var/lib/ceph/osd/ceph-4).collection(meta 0x560442061c20)  r -2 v.len 0
2019-02-15 05:38:00.283 7fc005613700 10 bluestore(/var/lib/ceph/osd/ceph-4) read meta #-1:437cbe63:::inc_osdmap.216:0# 0x0~0 = -2
2019-02-15 05:38:00.283 7fc005613700 -1 osd.4 216 build_incremental_map_msg missing incremental map 216
2019-02-15 05:38:00.283 7fc005613700 15 bluestore(/var/lib/ceph/osd/ceph-4) read meta #-1:437cbe63:::inc_osdmap.216:0# 0x0~0
2019-02-15 05:38:00.283 7fc005613700 20 bluestore(/var/lib/ceph/osd/ceph-4).collection(meta 0x560442061c20) get_onode oid #-1:437cbe63:::inc_osdmap.216:0# key 0x7f7fffffffffffffff437cbe'c!inc_osdmap.216!='0x0000000000000000ffffffffffffffff'o'
2019-02-15 05:38:00.283 7fc005613700 20 bluestore(/var/lib/ceph/osd/ceph-4).collection(meta 0x560442061c20)  r -2 v.len 0
2019-02-15 05:38:00.283 7fc005613700 10 bluestore(/var/lib/ceph/osd/ceph-4) read meta #-1:437cbe63:::inc_osdmap.216:0# 0x0~0 = -2
2019-02-15 05:38:00.283 7fc005613700 -1 osd.4 216 build_incremental_map_msg unable to load latest map 216
2019-02-15 05:38:00.287 7fc005613700 -1 /home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/14.0.1-3749-g2aae580/rpm/el7/BUILD/ceph-14.0.1-3749-g2aae580/src/osd/OSD.cc: In function 'MOSDMap* OSDService::build_incremental_map_msg(epoch_t, epoch_t, OSDSuperblock&)' thread 7fc005613700 time 2019-02-15 05:38:00.284113
/home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/14.0.1-3749-g2aae580/rpm/el7/BUILD/ceph-14.0.1-3749-g2aae580/src/osd/OSD.cc: 1515: abort()

 ceph version 14.0.1-3749-g2aae580 (2aae58097fd39ec4bff12ccfd1de93e28cef88fa) nautilus (dev)
 1: (ceph::__ceph_abort(char const*, int, char const*, std::string const&)+0xd8) [0x560436b62450]
 2: (OSDService::build_incremental_map_msg(unsigned int, unsigned int, OSDSuperblock&)+0x99d) [0x560436c74f3d]
 3: (OSDService::send_incremental_map(unsigned int, Connection*, std::shared_ptr<OSDMap const>&)+0x3ac) [0x560436c7544c]
 4: (OSDService::share_map_peer(int, Connection*, std::shared_ptr<OSDMap const>)+0x170) [0x560436c76200]
 5: (OSDService::send_message_osd_cluster(int, Message*, unsigned int)+0x135) [0x560436c764a5]
 6: (ReplicatedBackend::repop_commit(std::shared_ptr<ReplicatedBackend::RepModify>)+0x25c) [0x560436f9a50c]
 7: (ReplicatedBackend::C_OSD_RepModifyCommit::finish(int)+0x48) [0x560436fb4d58]
 8: (Context::complete(int)+0x9) [0x560436cd5179]
 9: (PrimaryLogPG::BlessedContext::finish(int)+0x90) [0x560436e6c260]
 10: (Context::complete(int)+0x9) [0x560436cd5179]
 11: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x67c) [0x560436cbb09c]
 12: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x433) [0x5604372ad693]
 13: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x5604372b0730]

Related issues 4 (0 open4 closed)

Related to RADOS - Bug #38282: cephtool/test.sh failure in test_mon_osd_pool_setResolved02/12/2019

Actions
Related to RADOS - Bug #38040: osd_map_message_max default is too high?ResolvedSage Weil

Actions
Related to RADOS - Bug #43106: mimic: crash in build_incremental_map_msgResolved

Actions
Copied to RADOS - Backport #43119: mimic: osd/OSD.cc: 1515: abort() in Service::build_incremental_map_msgResolvedNathan CutlerActions
Actions #1

Updated by Sage Weil about 5 years ago

  • Status changed from 12 to Fix Under Review
Actions #2

Updated by Sage Weil about 5 years ago

  • Related to Bug #38282: cephtool/test.sh failure in test_mon_osd_pool_set added
Actions #3

Updated by Sage Weil about 5 years ago

  • Related to Bug #38040: osd_map_message_max default is too high? added
Actions #4

Updated by Sage Weil about 5 years ago

  • Status changed from Fix Under Review to Resolved
Actions #5

Updated by Greg Farnum about 5 years ago

  • Project changed from RADOS to Messengers
Actions #6

Updated by Dan van der Ster over 4 years ago

https://tracker.ceph.com/issues/38282 was backported to mimic in 13.2.7.
Does this need a backport also ?

(we have crashes in build_incremental_map_msg running v13.2.7.

Actions #7

Updated by Neha Ojha over 4 years ago

  • Project changed from Messengers to RADOS
Actions #8

Updated by Neha Ojha over 4 years ago

  • Status changed from Resolved to Pending Backport
  • Backport set to mimic
Actions #9

Updated by Neha Ojha over 4 years ago

  • Related to Bug #43106: mimic: crash in build_incremental_map_msg added
Actions #10

Updated by Nathan Cutler over 4 years ago

  • Copied to Backport #43119: mimic: osd/OSD.cc: 1515: abort() in Service::build_incremental_map_msg added
Actions #11

Updated by Nathan Cutler over 4 years ago

  • Pull request ID set to 26448
Actions #12

Updated by Nathan Cutler over 4 years ago

@Dan, @Neha . - mimic backport staged at https://github.com/ceph/ceph/pull/26448

Actions #13

Updated by Nathan Cutler over 4 years ago

  • Status changed from Pending Backport to Resolved

While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".

Actions

Also available in: Atom PDF