Project

General

Profile

Actions

Bug #54561

closed

5 out of 6 OSD crashing after update to 17.1.0-0.2.rc1.fc37.x86_64

Added by Kaleb KEITHLEY about 2 years ago. Updated about 2 years ago.

Status:
Closed
Priority:
High
Assignee:
-
Target version:
% Done:

0%

Source:
Community (dev)
Tags:
Backport:
quincy
Regression:
No
Severity:
1 - critical
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Description of problem:
After upgrading to 17.1.0-0.2.rc1.fc37.x86_64, 5 out of 6 of my OSDs are crashing on start.

2022-03-14T11:20:44.682+0100 7ff5a50d0180 -1 bluestore::NCB::__restore_allocator::Failed open_for_read with error-code -2
2022-03-14T11:20:44.682+0100 7ff5a50d0180 0 bluestore(/var/lib/ceph/osd/ceph-0) _init_alloc::NCB::restore_allocator() failed! Run Full Recovery from ONodes (might ta
ke a while) ...
2022-03-14T11:20:54.767+0100 7ff5a50d0180 -1 /builddir/build/BUILD/ceph-17.1.0/src/os/bluestore/AvlAllocator.cc: In function 'virtual void AvlAllocator::init_add_free
(uint64_t, uint64_t)' thread 7ff5a50d0180 time 2022-03-14T11:20:54.766296+0100
/builddir/build/BUILD/ceph-17.1.0/src/os/bluestore/AvlAllocator.cc: 442: FAILED ceph_assert(offset + length <= uint64_t(device_size))

ceph version 17.1.0 (c675060073a05d40ef404d5921c81178a52af6e0) quincy (dev)

(full log attached)

Version-Release number of selected component (if applicable):
17.1.0-0.2.rc1.fc37.x86_64

How reproducible:

Steps to Reproduce:
1. Upgrade working cluster to quincy rc1 release.
2.
3.

Actual results:
OSD crashing

Expected results:
OSD working.

Additional info:
My cluster has 3 control nodes running rawhide (mons, mgrs, mds).
1 physical server with 6 HDDs running 6 OSDs (rawhide).
I'm using CephFS and RGW.


Related issues 1 (0 open1 closed)

Related to bluestore - Backport #54523: quincy: default osd_fast_shutdown=true would cause NCB to recover allocation map on each startResolvedGabriel BenHanokhActions
Actions

Also available in: Atom PDF