Bug #21820: Ceph OSD crash with Segfault - Ceph - Ceph

Actions

Copy link

Bug #21820

closed

Ceph OSD crash with Segfault

Added by Yves Vogl over 6 years ago. Updated over 6 years ago.

Status:

Duplicate

Priority:

Normal

Assignee:

Category:

OSD

Target version:

v12.2.1

% Done:

Source:

Tags:

Backport:

Regression:

Severity:

1 - critical

Reviewed:

Affected Versions:

ceph-qa-suite:

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

Hi,

I've observed that after a while some OSD crash with a segfault. This happends since I switched to Bluestore.
This leads to reduced data redundancy and seems critical to me.

Here are some information:

ceph --cluster ceph-mirror osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 17.06296 root default
-2 5.82999 host inf-0a38f9
1 hdd 2.91499 osd.1 up 1.00000 1.00000
2 hdd 2.91499 osd.2 up 1.00000 1.00000
-3 5.62140 host inf-30d985
4 hdd 2.81070 osd.4 up 1.00000 1.00000
5 hdd 2.81070 osd.5 down 0 1.00000
-4 5.61157 host inf-d7a3ca
0 hdd 2.80579 osd.0 down 0 1.00000
3 hdd 2.80579 osd.3 up 1.00000 1.00000

ceph --cluster ceph-mirror -s
cluster:
id: 4b3bef10-7a76-491e-bf1a-c6ea4f5705cf
health: HEALTH_WARN
622/323253 objects misplaced (0.192%)
Degraded data redundancy: 9306/323253 objects degraded (2.879%), 11 pgs unclean, 11 pgs degraded, 8 pgs undersized
services:
mon: 3 daemons, quorum inf-d7a3ca,inf-30d985,inf-0a38f9
mgr: inf-0a38f9(active), standbys: inf-d7a3ca, inf-30d985
osd: 6 osds: 4 up, 4 in; 8 remapped pgs
rbd-mirror: 1 daemon active

data:
pools: 2 pools, 128 pgs
objects: 105k objects, 418 GB
usage: 1765 GB used, 9955 GB / 11721 GB avail
pgs: 9306/323253 objects degraded (2.879%)
622/323253 objects misplaced (0.192%)
117 active+clean
4 active+recovery_wait+undersized+degraded+remapped
3 active+recovery_wait+degraded
3 active+undersized+degraded+remapped+backfill_wait
1 active+undersized+degraded+remapped+backfilling

io:
client: 159 kB/s rd, 2004 kB/s wr, 19 op/s rd, 137 op/s wr
recovery: 1705 kB/s, 0 objects/s

Each node has 2x HDD and 2x SSD. The SSDs offer partition number 4 for usage as separate Block / WAL:

Disk /dev/sda: 234441648 sectors, 111.8 GiB
Logical sector size: 512 bytes
Disk identifier (GUID): 1BD0737C-CFB6-4A06-AB2F-3BF150E6CC12
Partition table holds up to 128 entries
First usable sector is 34, last usable sector is 234441614
Partitions will be aligned on 2048-sector boundaries
Total free space is 2014 sectors (1007.0 KiB)

Number Start (sector) End (sector) Size Code Name
1 2048 16795647 8.0 GiB FD00 Linux RAID
2 16795648 58771455 20.0 GiB FD00 Linux RAID
3 58771456 58773503 1024.0 KiB EF02 BIOS boot partition
4 58773504 234441614 83.8 GiB 8300 Linux filesystem

This is how I provisioned the devices for each node:

ceph-disk prepare --cluster ceph-mirror --bluestore --block.db /dev/sda4 /dev/sdc
ceph-disk prepare --cluster ceph-mirror --bluestore --block.db /dev/sdb4 /dev/sdd

ceph-disk activate /dev/sdc1
ceph-disk activate /dev/sdd1

sdc and sdd are the hdds, sda4 and sdb4 are the manually created (and not formatted in any way) partitions for WAL/DB usage.

After occuring this issue I've to complete remove the OSD and recreate it. Next time, another OSD crashes. It's mysterious.

Please see the attached log for details.

Files

ceph-osd.log (70.8 KB) ceph-osd.log

Yves Vogl, 10/17/2017 11:52 AM

Related issues 1 (0 open — 1 closed)

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph

Custom queries

Bug #21820

Ceph OSD crash with Segfault

Updated by Dan Williams over 6 years ago

Updated by Dan Williams over 6 years ago

Updated by Yves Vogl over 6 years ago

Updated by Yves Vogl over 6 years ago

Updated by Yves Vogl over 6 years ago

Updated by Sage Weil over 6 years ago

Updated by Sage Weil over 6 years ago