Project

General

Profile

Actions

Bug #52095

open

OSD container can't start: _read_bdev_label unable to decode label at offset 102: buffer::malformed_input

Added by Oleg Neumyvakin over 2 years ago. Updated about 1 year ago.

Status:
Need More Info
Priority:
Normal
Assignee:
-
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
container
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

ceph version
ceph version 15.2.13 (c44bc49e7a57a87d84dfff2a077a2058aa2172e2) octopus (stable)

OSD container can't start:
/usr/bin/docker run --rm --ipc=host --net=host --entrypoint /usr/sbin/ceph-volume --privileged --group-add=disk --name ceph-c6f330e2-f775-11eb-a326-85f44cce260a-osd.0-activate e CONTAINER_IMAGE=docker.io/ceph/ceph:v15 -e NODE_NAME=jailyn-wchwyk -v /var/run/ceph/c6f330e2-f775-11eb-a326-85f44cce260a:/var/run/ceph:z -v /var/log/ceph/c6f330e2-f775-11eb-a326-85f44cce260a:/var/log/ceph:z -v /var/lib/ceph/c6f330e2-f775-11eb-a326-85f44cce260a/crash:/var/lib/ceph/crash:z -v /var/lib/ceph/c6f330e2-f775-11eb-a326-85f44cce260a/osd.0:/var/lib/ceph/osd/ceph-0:z -v /var/lib/ceph/c6f330e2-f775-11eb-a326-85f44cce260a/osd.0/config:/etc/ceph/ceph.conf:z -v /dev:/dev -v /run/udev:/run/udev -v /sys:/sys -v /run/lvm:/run/lvm -v /run/lock/lvm:/run/lock/lvm docker.io/ceph/ceph:v15 lvm activate 0 bf95f45b-0849-46cd-9715-7c27db32b9ff --no-systemd
Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-0
Running command: /usr/bin/ceph-bluestore-tool --cluster=ceph prime-osd-dir --dev /dev/ceph-0142ad4b-5029-457d-ae0a-9ed927249e32/osd-block-bf95f45b-0849-46cd-9715-7c27db32b9ff --path /var/lib/ceph/osd/ceph-0 --no-mon-config
stderr: failed to read label for /dev/ceph-0142ad4b-5029-457d-ae0a-9ed927249e32/osd-block-bf95f45b-0849-46cd-9715-7c27db32b9ff: (2) No such file or directory
-
> RuntimeError: command returned non-zero exit status: 1

Content from /var/log/ceph/ceph-volume.log:

[2021-08-08 04:08:22,963][ceph_volume.process][INFO ] Running command: /usr/bin/lsblk --nodeps P -o NAME,KNAME,MAJ:MIN,FSTYPE,MOUNTPOINT,LABEL,UUID,RO,RM,MODEL,SIZE,STATE,OWNER,GROUP,MODE,ALIGNMENT,PHY-SEC,L
OG-SEC,ROTA,SCHED,TYPE,DISC-ALN,DISC-GRAN,DISC-MAX,DISC-ZERO,PKNAME,PARTLABEL /dev/sdb
[2021-08-08 04:08:22,902][ceph_volume.process][INFO ] stdout TAGS=:systemd:
[2021-08-08 04:08:22,903][ceph_volume.process][INFO ] Running command: /usr/sbin/lvs --noheadings --readonly --separator=";" -a --units=b --nosuffix -S lv_path=/dev/sdb -o lv_tags,lv_path,lv_name,vg_name,lv_uuid,lv_size
[2021-08-08 04:08:22,963][ceph_volume.process][INFO ] Running command: /usr/bin/lsblk --nodeps -P -o NAME,KNAME,MAJ:MIN,FSTYPE,MOUNTPOINT,LABEL,UUID,RO,RM,MODEL,SIZE,STATE,OWNER,GROUP,MODE,ALIGNMENT,PHY-SEC,LOG-SEC,ROTA,SCHED,TYPE,DISC-ALN,DISC-GRAN,DISC-MAX,DISC-ZERO,PKNAME,PARTLABEL /dev/sdb
[2021-08-08 04:08:22,972][ceph_volume.process][INFO ] stdout NAME="sdb" KNAME="sdb" MAJ:MIN="8:16" FSTYPE="LVM2_member" MOUNTPOINT="" LABEL="" UUID="BVsYZe-nnPG-UCo3-D0xy-H5vr-mESK-IrW4IH" RO="0" RM="0" MODEL="QEMU HARDDISK " SIZE="80G" STATE="running" OWNER="root" GROUP="disk" MODE="brw-rw---
" ALIGNMENT="0" PHY-SEC="512" LOG-SEC="512" ROTA="1" SCHED="mq-deadline" TYPE="disk" DISC-ALN="0" DISC-GRAN="4K" DISC-MAX="1G" DISC-ZERO="0" PKNAME="" PARTLABEL=""
[2021-08-08 04:08:22,973][ceph_volume.process][INFO ] Running command: /usr/sbin/blkid p /dev/sdb
[2021-08-08 04:08:22,977][ceph_volume.process][INFO ] stdout /dev/sdb: UUID="BVsYZe-nnPG-UCo3-D0xy-H5vr-mESK-IrW4IH" VERSION="LVM2 001" TYPE="LVM2_member" USAGE="raid"
[2021-08-08 04:08:22,978][ceph_volume.process][INFO ] Running command: /usr/sbin/pvs --noheadings --readonly --units=b --nosuffix --separator=";" -o vg_name,pv_count,lv_count,vg_attr,vg_extent_count,vg_free_count,vg_extent_size /dev/sdb
[2021-08-08 04:08:23,034][ceph_volume.process][INFO ] stdout ceph-0142ad4b-5029-457d-ae0a-9ed927249e32";"1";"1";"wz--n
";"20479";"0";"4194304
[2021-08-08 04:08:23,034][ceph_volume.process][INFO ] Running command: /usr/sbin/pvs --noheadings --readonly --separator=";" -a --units=b --nosuffix -o lv_tags,lv_path,lv_name,vg_name,lv_uuid,lv_size /dev/sdb
[2021-08-08 04:08:23,094][ceph_volume.process][INFO ] stdout ceph.block_device=/dev/ceph-0142ad4b-5029-457d-ae0a-9ed927249e32/osd-block-bf95f45b-0849-46cd-9715-7c27db32b9ff,ceph.block_uuid=rL6nxR-ESXS-Xd5B-TwP1-GGNE-MLpa-M4mcHY,ceph.cephx_lockbox_secret=,ceph.cluster_fsid=c6f330e2-f775-11eb-a326-85f44cce260a,ceph.cluster_name=ceph,ceph.crush_device_class=None,ceph.encrypted=0,ceph.osd_fsid=bf95f45b-0849-46cd-9715-7c27db32b9ff,ceph.osd_id=0,ceph.osdspec_affinity=None,ceph.type=block,ceph.vdo=0";"/dev/ceph-0142ad4b-5029-457d-ae0a-9ed927249e32/osd-block-bf95f45b-0849-46cd-9715-7c27db32b9ff";"osd-block-bf95f45b-0849-46cd-9715-7c27db32b9ff";"ceph-0142ad4b-5029-457d-ae0a-9ed927249e32";"rL6nxR-ESXS-Xd5B-TwP1-GGNE-MLpa-M4mcHY";"85895151616
[2021-08-08 04:08:23,095][ceph_volume.process][INFO ] Running command: /usr/bin/ceph-bluestore-tool show-label --dev /dev/sdb
[2021-08-08 04:08:23,125][ceph_volume.process][INFO ] stderr unable to read label for /dev/sdb: (2) No such file or directory

Run /usr/bin/ceph-bluestore-tool --log-level=30:

/usr/bin/ceph-bluestore-tool show-label --log-level=30 --dev /dev/sdb -l /var/log/ceph/ceph-volume.log
unable to read label for /dev/sdb: (2) No such file or directory

2021-08-08T05:17:45.303+0000 7ff7fc9c0240 10 bluestore(/dev/sdb) _read_bdev_label
2021-08-08T05:17:45.303+0000 7ff7fc9c0240 2 bluestore(/dev/sdb) _read_bdev_label unable to decode label at offset 102: buffer::malformed_input: void bluestore_bdev_label_t::decode(ceph::buffer::v15_2_0::list::const_iterator&) decode past end of struct encoding

Attached dump 8K from /dev/sdb:

dd if=/dev/sdb of=/tmp/foo bs=4K count=2
hexdump -C /tmp/foo


Files

hexdump_8k (16.5 KB) hexdump_8k Oleg Neumyvakin, 08/08/2021 05:47 AM
ceph-volume.log.gz (174 KB) ceph-volume.log.gz Richard Hesse, 02/14/2023 09:48 PM
Actions #1

Updated by Igor Fedotov over 2 years ago

  • Tags set to container
Actions #2

Updated by Richard Hesse about 1 year ago

I'm also seeing the same issue on Pacific 16.2.11 (midway through upgrading from 16.2.10). The non-LVM containerized OSD's start fine. The LVM OSD's take 10-15 minutes to start. Checking ceph-volume.log, it's looping through every block device in the system and failing. I guess that makes sense as these are LVM OSD. Any ideas or traction since Igor first reported this a year ago?

Actions #3

Updated by Igor Fedotov about 1 year ago

Richard Hesse wrote:

I'm also seeing the same issue on Pacific 16.2.11 (midway through upgrading from 16.2.10). The non-LVM containerized OSD's start fine. The LVM OSD's take 10-15 minutes to start. Checking ceph-volume.log, it's looping through every block device in the system and failing. I guess that makes sense as these are LVM OSD. Any ideas or traction since Igor first reported this a year ago?

Hi Richard,
could you please share ceph-volume log?

Additionally, if this log contains ceph device references, like:
[2021-08-08 04:08:23,094][ceph_volume.process][INFO ] stdout ceph.block_device=/dev/ceph-0142ad4b-5029-457d-ae0a-9ed927249e32/osd-block-bf95f45b-0849-46cd-9715-7c27db32b9ff

please check if that device is mounted to the container and try to read the first 4K block out of it..

Actions #4

Updated by Richard Hesse about 1 year ago

Here's a truncated version of the ceph-volume.log. It was 1.7MB compressed so I only included a minute or two of output during an upgrade run.

Actions #5

Updated by Igor Fedotov about 1 year ago

Richard Hesse wrote:

Here's a truncated version of the ceph-volume.log. It was 1.7MB compressed so I only included a minute or two of output during an upgrade run.

Hi Richard,
I ran through your logs and got no idea what's happening. But I don't see any obvious points that's a bluestore issue. Neither I'm sure it's similar to the original one, e.g. bluestore labels are parsed properly (at least sometimes) in your log which is different from the behavior Oleg reported...

Hence I'd suggest to open another ticket against ceph-volume project, I guess folks their might be able to provide more help..

Actions #6

Updated by Adam Kupczyk about 1 year ago

  • Status changed from New to Need More Info
Actions #7

Updated by Richard Hesse about 1 year ago

This was an issue that was fixed in the most recent release of Quincy. ceph-volume was failing when lots of devices were present.

Actions

Also available in: Atom PDF