Project

General

Profile

Actions

Bug #46800

closed

Bug #47751: Hybrid allocator might segfault when fallback allocator is present

Octopus OSD died and fails to start with FAILED ceph_assert(is_valid_io(off, len))

Added by Vitaliy Filippov almost 4 years ago. Updated over 3 years ago.

Status:
Duplicate
Priority:
Normal
Assignee:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
1 - critical
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Hi

One of my OSDs just died trying to write beyond the end of the device. Now it just fails to start with the same assertion during _deferred_replay().

LVM volume size is 0x37400000000, Bluestore device size is also 0x37400000000 according to `ceph-bluestore-tool bluefs-bdev-sizes -h --path /var/lib/ceph/osd/ceph-2/`, but when I attached to the OSD with gdb and looked at the `off` (offset) parameter in BlockDevice::is_valid_io it was 0x3740003e000.

So the OSD was indeed trying to write beyond the device end.

You can find an excerpt from the OSD log in the attachment. It starts with the initial stack trace when it crashed and then there's a number of repeated startup errors.

The assertion message is like:

/build/ceph-15.2.4/src/os/bluestore/KernelDevice.cc: 892: FAILED ceph_assert(is_valid_io(off, len))

I've backed up BlueFS of this OSD to a directory with bluefs-export for possible future reference... Now I'll probably recreate the OSD and pray that others don't die during backfill because I don't want a Cloudmouse here... which is especially important because of my EC 2+1. :-)


Files

ceph-osd.2-excerpt.log (452 KB) ceph-osd.2-excerpt.log Vitaliy Filippov, 07/31/2020 01:53 PM
fsck.log (18 KB) fsck.log Vitaliy Filippov, 07/31/2020 03:32 PM

Related issues 1 (0 open1 closed)

Related to bluestore - Bug #48276: OSD Crash with ceph_assert(is_valid_io(off, len))Duplicate

Actions
Actions

Also available in: Atom PDF