Project

General

Profile

Actions

Bug #53466

closed

OSD is unable to allocate free space for BlueFS

Added by Jan Tilsner over 2 years ago. Updated 9 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
Target version:
-
% Done:

0%

Source:
Tags:
backport_processed
Backport:
quincy, pacific
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

We have a ceph cluster using rook on kubernetes with some OSDs that are in CrashLoopBackOff state.

We have the suspicion that this occurred after the last update ( update rook to 1.5.12 and ceph to 15.2.11; update rook to 1.6.10 and ceph to 15.2.13; update rook to 1.7.4 and ceph to 16.2.6).
After that update the cluster was stable for several weeks (Oct. 18 til Nov. 15), now all OSDs that are restarted are crashing.

I will attach a log from a crashing OSD.

We have a second identical cluster that has the same problem (update on Oct. 18, crash on Nov. 24). Both of these clusters are virtual machines on ESX.
These are test clusters and not critical, but we did the same update and are running the identical software on our productive clusters (AWS EC2 and bare metal).
We fear that after some time they will be affected too.

We were able to repair one of the two clusters with adding a new osd and taking the broken one away. Unfortunately that didn't help with the other cluster (now 6 of 10 OSDs broken).

Since the error message is "unable to allocate 0x90000 on bdev 1, allocator name block, allocator type hybrid, capacity 0x31ffc00000, block size 0x1000, free 0x5d008c000, fragmentation 0.211058, allocated 0x0" we fount the following issue:
https://tracker.ceph.com/issues/47883
We tried to change the allocator to bitmap, but still no luck. Changing the image tag of the OSD to quay.io/ceph/daemon-base:latest-pacific-devel did not help either.

rook-version: 1.7.4
ceph-version: 16.2.6


Files

osd-2.log (289 KB) osd-2.log Jan Tilsner, 12/02/2021 03:10 PM

Related issues 7 (2 open5 closed)

Related to bluestore - Fix #54299: osd error restartNeed More Info

Actions
Related to bluestore - Bug #53899: bluefs _allocate allocation failed - BlueFS.cc: 2768: ceph_abort_msg("bluefs enospc")Need More Info

Actions
Related to bluestore - Bug #53814: Pacific cluster crashWon't Fix

Actions
Is duplicate of bluestore - Bug #57672: SSD OSD won't start after high framentation score!Duplicate

Actions
Has duplicate bluestore - Bug #62125: bluestore/bluefs: bluefs enospc while osd startDuplicate

Actions
Copied to bluestore - Backport #58588: quincy: OSD is unable to allocate free space for BlueFSResolvedIgor FedotovActions
Copied to bluestore - Backport #58589: pacific: OSD is unable to allocate free space for BlueFSResolvedIgor FedotovActions
Actions

Also available in: Atom PDF