Project

General

Profile

Actions

Bug #24423

closed

failed to load OSD map for epoch X, got 0 bytes

Added by Sergey Malinin almost 6 years ago. Updated almost 6 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
mimic
Regression:
No
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

After upgrading to Mimic I deleted a non-lvm OSD and recreated it with 'ceph-volume lvm prepare --bluestore --data /dev/sdX'. Now the OSD fails to start with the following error:

2018-06-05 13:27:49.189 7f79e4def240 0 osd.3 0 done with init, starting boot process
2018-06-05 13:27:49.189 7f79e4def240 1 osd.3 0 start_boot
2018-06-05 13:27:49.189 7f79e4def240 10 osd.3 0 start_boot - have maps 0..0
2018-06-05 13:27:49.189 7f79d5d4a700 10 osd.3 0 OSD::ms_get_authorizer type=mgr
2018-06-05 13:27:49.193 7f79c971e700 10 osd.3 0 ms_handle_connect con 0x55e0a0f62300
2018-06-05 13:27:49.193 7f79bfae4700 10 osd.3 0 _preboot _preboot mon has osdmaps 17056..17676
2018-06-05 13:27:49.193 7f79bfae4700 20 osd.3 0 update_osd_stat osd_stat(1.0 GiB used, 99 GiB avail, 100 GiB total, peers [] op hist [])
2018-06-05 13:27:49.193 7f79bfae4700 5 osd.3 0 heartbeat: osd_stat(1.0 GiB used, 99 GiB avail, 100 GiB total, peers [] op hist [])
2018-06-05 13:27:49.193 7f79bfae4700 1 osd.3 0 waiting for initial osdmap
2018-06-05 13:27:49.193 7f79c971e700 20 osd.3 0 OSD::ms_dispatch: osd_map(17056..17056 src has 17056..17676 +gap_removed_snaps) v4
2018-06-05 13:27:49.193 7f79c971e700 10 osd.3 0 do_waiters -
start
2018-06-05 13:27:49.193 7f79c971e700 10 osd.3 0 do_waiters -- finish
2018-06-05 13:27:49.193 7f79c971e700 20 osd.3 0 _dispatch 0x55e0a0b2fe40 osd_map(17056..17056 src has 17056..17676 +gap_removed_snaps) v4
2018-06-05 13:27:49.193 7f79c971e700 3 osd.3 0 handle_osd_map epochs [17056,17056], i have 0, src has [17056,17676]
2018-06-05 13:27:49.193 7f79c971e700 10 osd.3 0 handle_osd_map message skips epochs 1..17055
2018-06-05 13:27:49.193 7f79c971e700 10 osd.3 0 handle_osd_map got full map for epoch 17056
2018-06-05 13:27:49.193 7f79c971e700 20 osd.3 0 got_full_map 17056, nothing requested
2018-06-05 13:27:49.193 7f79c971e700 20 osd.3 0 get_map 17055 - loading and decoding 0x55e0a0fa4480
2018-06-05 13:27:49.193 7f79c971e700 -1 osd.3 0 failed to load OSD map for epoch 17055, got 0 bytes
2018-06-05 13:27:49.197 7f79c971e700 -1 /build/ceph-13.2.0/src/osd/OSD.h: In function 'OSDMapRef OSDService::get_map(epoch_t)' thread 7f79c971e700 time 2018-06-05 13:27:49.200273
/build/ceph-13.2.0/src/osd/OSD.h: 828: FAILED assert(ret)

ceph version 13.2.0 (79a10589f1f80dfe21e8f9794365ed98143071c4) mimic (stable)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x102) [0x7f79dc2175e2]
2: (()+0x26b7a7) [0x7f79dc2177a7]
3: (OSDService::get_map(unsigned int)+0x4a) [0x55e09e756e9a]
4: (OSD::handle_osd_map(MOSDMap*)+0xfb1) [0x55e09e6fddc1]
5: (OSD::_dispatch(Message*)+0xa1) [0x55e09e706a21]
6: (OSD::ms_dispatch(Message*)+0x56) [0x55e09e706d76]
7: (DispatchQueue::entry()+0xb92) [0x7f79dc290452]
8: (DispatchQueue::DispatchThread::entry()+0xd) [0x7f79dc32e6cd]
9: (()+0x76db) [0x7f79da9126db]
10: (clone()+0x3f) [0x7f79d98d688f]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

Related issues 3 (0 open3 closed)

Has duplicate Ceph - Bug #24524: Newly added OSDs do not start in MimicDuplicate06/14/2018

Actions
Has duplicate RADOS - Bug #24450: OSD Caught signal (Aborted)Duplicate06/07/2018

Actions
Copied to RADOS - Backport #24599: mimic: failed to load OSD map for epoch X, got 0 bytesResolvedSage WeilActions
Actions

Also available in: Atom PDF