Project

General

Profile

Actions

Support #52700

open

OSDs wont start

Added by Nishith Tiwari over 2 years ago. Updated over 2 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Tags:
Reviewed:
Affected Versions:
Component(RADOS):
Pull request ID:

Description

We have a three node ceph cluster with 3 baremetal nodes running 4 OSDs each (In total 12 OSDs). Recently, my collegue enabled cephfs here and offlate we observed OSDs were down from one of the baremetals and wont start. They complain as follows:

$ ceph --version
ceph version 13.2.10 (564bdc4ae87418a232fc901524470e1a0f76d641) mimic (stable)

$ ceph -s
cluster:
id: 314aad04-c512-4057-b645-b3d2edb95c70
health: HEALTH_WARN
18/1824198 objects misplaced (0.001%)
Degraded data redundancy: 608048/1824198 objects degraded (33.332%), 197 pgs degraded, 220 pgs undersized

services:
mon: 3 daemons, quorum ceph-admin-mon1,ceph-admin-mon2,ceph-admin-mon3
mgr: ceph-admin-mon1(active), standbys: ceph-admin-mon3
mds: shared_storage-1/1/1 up {0=ceph-admin-mon2=up:active}, 2 up:standby
osd: 12 osds: 8 up, 8 in; 100 remapped pgs
data:
pools: 5 pools, 320 pgs
objects: 608.1 k objects, 2.2 TiB
usage: 4.4 TiB used, 2.9 TiB / 7.3 TiB avail
pgs: 608048/1824198 objects degraded (33.332%)
18/1824198 objects misplaced (0.001%)
197 active+undersized+degraded
100 active+clean+remapped
23 active+undersized
io:
client: 165 KiB/s rd, 3.1 MiB/s wr, 42 op/s rd, 180 op/s wr

$ ceph osd status
-------------+-------+-------+--------+---------+--------+---------+----------------+ | id | host | used | avail | wr ops | wr data | rd ops | rd data | state |
-------------+-------+-------+--------+---------+--------+---------+----------------+ | 0 | BM-Ceph-1 | 685G | 245G | 43 | 1869k | 4 | 0 | exists,up | | 1 | BM-Ceph-2 | 579G | 351G | 19 | 653k | 5 | 0 | exists,up | | 2 | | 0 | 0 | 0 | 0 | 0 | 0 | autoout,exists | | 3 | BM-Ceph-1 | 699G | 232G | 55 | 913k | 7 | 0 | exists,up | | 4 | BM-Ceph-2 | 593G | 337G | 20 | 324k | 6 | 0 | exists,up | | 5 | | 0 | 0 | 0 | 0 | 0 | 0 | autoout,exists | | 6 | BM-Ceph-1 | 493G | 438G | 17 | 543k | 3 | 0 | exists,up | | 7 | BM-Ceph-2 | 580G | 351G | 25 | 889k | 4 | 6 | exists,up | | 8 | | 0 | 0 | 0 | 0 | 0 | 0 | autoout,exists | | 9 | BM-Ceph-1 | 373G | 558G | 10 | 90.4k | 19 | 64.8k | exists,up | | 10 | BM-Ceph-2 | 501G | 429G | 10 | 306k | 31 | 116k | exists,up | | 11 | | 0 | 0 | 0 | 0 | 0 | 0 | autoout,exists |
-------------+-------+-------+--------+---------+--------+---------+----------------+

root@BM-Ceph-3:/var/log# systemctl status ceph-osd@2
- Ceph object storage daemon osd.2
Loaded: loaded (/lib/systemd/system/ceph-osd@.service; enabled-runtime; vendor preset: enabled)
Active: failed (Result: exit-code) since Wed 2021-09-22 09:10:39 UTC; 726ms ago
Process: 11920 ExecStartPre=/usr/lib/ceph/ceph-osd-prestart.sh --cluster ${CLUSTER} --id 2 (code=exited, status=0/SUCCESS)
Process: 11931 ExecStart=/usr/bin/ceph-osd -f --cluster ${CLUSTER} --id 2 --setuser ceph --setgroup ceph (code=exited, status=1>
Main PID: 11931 (code=exited, status=1/FAILURE)

Sep 22 09:10:39 BM-Ceph-3 systemd1: : Scheduled restart job, restart counter is at 3.
Sep 22 09:10:39 BM-Ceph-3 systemd1: Stopped Ceph object storage daemon osd.2.
Sep 22 09:10:39 BM-Ceph-3 systemd1: : Start request repeated too quickly.
Sep 22 09:10:39 BM-Ceph-3 systemd1: : Failed with result 'exit-code'.
Sep 22 09:10:39 BM-Ceph-3 systemd1: Failed to start Ceph object storage daemon osd.2.

root@BM-Ceph-3:/var/log/ceph# tail -100f ceph-osd.2.log
2021-09-22T09:10:06.913+0000 7eff32ed5d80 0 set uid:gid to 64045:64045 (ceph:ceph)
2021-09-22T09:10:06.913+0000 7eff32ed5d80 0 ceph version 15.2.7 (88e41c6c49beb18add4fdb6b4326ca466d931db8) octopus (stable), process ceph-osd, pid 11803
2021-09-22T09:10:06.913+0000 7eff32ed5d80 0 pidfile_write: ignore empty --pid-file
2021-09-22T09:10:06.917+0000 7eff32ed5d80 1 bdev create path /var/lib/ceph/osd/ceph-2/block type kernel
2021-09-22T09:10:06.917+0000 7eff32ed5d80 1 bdev(0x559f185bc000 /var/lib/ceph/osd/ceph-2/block) open path /var/lib/ceph/osd/ceph-2/block
2021-09-22T09:10:06.917+0000 7eff32ed5d80 1 bdev(0x559f185bc000 /var/lib/ceph/osd/ceph-2/block) open size 1000203091968 (0xe8e0c00000, 932 GiB) block_size 4096 (4 KiB) non-rotational discard supported
2021-09-22T09:10:06.917+0000 7eff32ed5d80 1 bluestore(/var/lib/ceph/osd/ceph-2) _set_cache_sizes cache_size 3221225472 meta 0.4 kv 0.4 data 0.2
2021-09-22T09:10:06.917+0000 7eff32ed5d80 1 bdev create path /var/lib/ceph/osd/ceph-2/block type kernel
2021-09-22T09:10:06.917+0000 7eff32ed5d80 1 bdev(0x559f185bc700 /var/lib/ceph/osd/ceph-2/block) open path /var/lib/ceph/osd/ceph-2/block
2021-09-22T09:10:06.917+0000 7eff32ed5d80 1 bdev(0x559f185bc700 /var/lib/ceph/osd/ceph-2/block) open size 1000203091968 (0xe8e0c00000, 932 GiB) block_size 4096 (4 KiB) non-rotational discard supported
2021-09-22T09:10:06.917+0000 7eff32ed5d80 1 bluefs add_block_device bdev 1 path /var/lib/ceph/osd/ceph-2/block size 932 GiB
2021-09-22T09:10:06.917+0000 7eff32ed5d80 1 bdev(0x559f185bc700 /var/lib/ceph/osd/ceph-2/block) close
2021-09-22T09:10:07.201+0000 7eff32ed5d80 1 bdev(0x559f185bc000 /var/lib/ceph/osd/ceph-2/block) close
2021-09-22T09:10:07.449+0000 7eff32ed5d80 1 objectstore numa_node 1
2021-09-22T09:10:07.449+0000 7eff32ed5d80 0 starting osd.2 osd_data /var/lib/ceph/osd/ceph-2 /var/lib/ceph/osd/ceph-2/journal
2021-09-22T09:10:07.453+0000 7eff32ed5d80 -1 unable to find any IPv4 address in networks '172.16.0.0/16' interfaces ''
2021-09-22T09:10:07.453+0000 7eff32ed5d80 -1 unable to find any IPv4 address in networks '192.168.9.0/24' interfaces ''
2021-09-22T09:10:07.457+0000 7eff32ed5d80 -1 expected plugin /usr/lib/x86_64-linux-gnu/ceph/erasure-code/libec_jerasure.so version 15.2.7 but it claims to be 15.2.3 instead
2021-09-22T09:10:17.705+0000 7fc09aab8d80 0 set uid:gid to 64045:64045 (ceph:ceph)
2021-09-22T09:10:17.705+0000 7fc09aab8d80 0 ceph version 15.2.7 (88e41c6c49beb18add4fdb6b4326ca466d931db8) octopus (stable), process ceph-osd, pid 11860
2021-09-22T09:10:17.705+0000 7fc09aab8d80 0 pidfile_write: ignore empty --pid-file
2021-09-22T09:10:17.709+0000 7fc09aab8d80 1 bdev create path /var/lib/ceph/osd/ceph-2/block type kernel
2021-09-22T09:10:17.709+0000 7fc09aab8d80 1 bdev(0x55a4a1938000 /var/lib/ceph/osd/ceph-2/block) open path /var/lib/ceph/osd/ceph-2/block
2021-09-22T09:10:17.709+0000 7fc09aab8d80 1 bdev(0x55a4a1938000 /var/lib/ceph/osd/ceph-2/block) open size 1000203091968 (0xe8e0c00000, 932 GiB) block_size 4096 (4 KiB) non-rotational discard supported
2021-09-22T09:10:17.709+0000 7fc09aab8d80 1 bluestore(/var/lib/ceph/osd/ceph-2) _set_cache_sizes cache_size 3221225472 meta 0.4 kv 0.4 data 0.2
2021-09-22T09:10:17.709+0000 7fc09aab8d80 1 bdev create path /var/lib/ceph/osd/ceph-2/block type kernel
2021-09-22T09:10:17.709+0000 7fc09aab8d80 1 bdev(0x55a4a1938700 /var/lib/ceph/osd/ceph-2/block) open path /var/lib/ceph/osd/ceph-2/block
2021-09-22T09:10:17.709+0000 7fc09aab8d80 1 bdev(0x55a4a1938700 /var/lib/ceph/osd/ceph-2/block) open size 1000203091968 (0xe8e0c00000, 932 GiB) block_size 4096 (4 KiB) non-rotational discard supported
2021-09-22T09:10:17.709+0000 7fc09aab8d80 1 bluefs add_block_device bdev 1 path /var/lib/ceph/osd/ceph-2/block size 932 GiB
2021-09-22T09:10:17.709+0000 7fc09aab8d80 1 bdev(0x55a4a1938700 /var/lib/ceph/osd/ceph-2/block) close
2021-09-22T09:10:17.993+0000 7fc09aab8d80 1 bdev(0x55a4a1938000 /var/lib/ceph/osd/ceph-2/block) close
2021-09-22T09:10:18.241+0000 7fc09aab8d80 1 objectstore numa_node 1
2021-09-22T09:10:18.241+0000 7fc09aab8d80 0 starting osd.2 osd_data /var/lib/ceph/osd/ceph-2 /var/lib/ceph/osd/ceph-2/journal
2021-09-22T09:10:18.245+0000 7fc09aab8d80 -1 unable to find any IPv4 address in networks '172.16.0.0/16' interfaces ''
2021-09-22T09:10:18.245+0000 7fc09aab8d80 -1 unable to find any IPv4 address in networks '192.168.9.0/24' interfaces ''
2021-09-22T09:10:18.249+0000 7fc09aab8d80 -1 expected plugin /usr/lib/x86_64-linux-gnu/ceph/erasure-code/libec_jerasure.so version 15.2.7 but it claims to be 15.2.3 instead
2021-09-22T09:10:28.492+0000 7f5b71666d80 0 set uid:gid to 64045:64045 (ceph:ceph)
2021-09-22T09:10:28.492+0000 7f5b71666d80 0 ceph version 15.2.7 (88e41c6c49beb18add4fdb6b4326ca466d931db8) octopus (stable), process ceph-osd, pid 11931
2021-09-22T09:10:28.492+0000 7f5b71666d80 0 pidfile_write: ignore empty --pid-file
2021-09-22T09:10:28.496+0000 7f5b71666d80 1 bdev create path /var/lib/ceph/osd/ceph-2/block type kernel
2021-09-22T09:10:28.496+0000 7f5b71666d80 1 bdev(0x5588cadc6000 /var/lib/ceph/osd/ceph-2/block) open path /var/lib/ceph/osd/ceph-2/block
2021-09-22T09:10:28.496+0000 7f5b71666d80 1 bdev(0x5588cadc6000 /var/lib/ceph/osd/ceph-2/block) open size 1000203091968 (0xe8e0c00000, 932 GiB) block_size 4096 (4 KiB) non-rotational discard supported
2021-09-22T09:10:28.496+0000 7f5b71666d80 1 bluestore(/var/lib/ceph/osd/ceph-2) _set_cache_sizes cache_size 3221225472 meta 0.4 kv 0.4 data 0.2
2021-09-22T09:10:28.496+0000 7f5b71666d80 1 bdev create path /var/lib/ceph/osd/ceph-2/block type kernel
2021-09-22T09:10:28.496+0000 7f5b71666d80 1 bdev(0x5588cadc6700 /var/lib/ceph/osd/ceph-2/block) open path /var/lib/ceph/osd/ceph-2/block
2021-09-22T09:10:28.496+0000 7f5b71666d80 1 bdev(0x5588cadc6700 /var/lib/ceph/osd/ceph-2/block) open size 1000203091968 (0xe8e0c00000, 932 GiB) block_size 4096 (4 KiB) non-rotational discard supported
2021-09-22T09:10:28.496+0000 7f5b71666d80 1 bluefs add_block_device bdev 1 path /var/lib/ceph/osd/ceph-2/block size 932 GiB
2021-09-22T09:10:28.496+0000 7f5b71666d80 1 bdev(0x5588cadc6700 /var/lib/ceph/osd/ceph-2/block) close
2021-09-22T09:10:28.776+0000 7f5b71666d80 1 bdev(0x5588cadc6000 /var/lib/ceph/osd/ceph-2/block) close
2021-09-22T09:10:29.032+0000 7f5b71666d80 1 objectstore numa_node 1
2021-09-22T09:10:29.032+0000 7f5b71666d80 0 starting osd.2 osd_data /var/lib/ceph/osd/ceph-2 /var/lib/ceph/osd/ceph-2/journal
2021-09-22T09:10:29.036+0000 7f5b71666d80 -1 unable to find any IPv4 address in networks '172.16.0.0/16' interfaces ''
2021-09-22T09:10:29.036+0000 7f5b71666d80 -1 unable to find any IPv4 address in networks '192.168.9.0/24' interfaces ''
2021-09-22T09:10:29.040+0000 7f5b71666d80 -1 expected plugin /usr/lib/x86_64-linux-gnu/ceph/erasure-code/libec_jerasure.so version 15.2.7 but it claims to be 15.2.3 instead

I checked the baremetal has networks from public & private networks as can be seen below:
  1. ip -4 addr
    1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    inet 127.0.0.1/8 scope host lo
    valid_lft forever preferred_lft forever
    14: vlan326: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
    inet 10.117.134.166/28 brd 10.117.134.175 scope global vlan326
    valid_lft forever preferred_lft forever
    17: vlan3202: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
    inet 172.16.0.12/16 brd 172.16.255.255 scope global vlan3202
    valid_lft forever preferred_lft forever
    18: vlan3203: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
    inet 192.168.9.12/24 brd 192.168.9.255 scope global vlan3203
    valid_lft forever preferred_lft forever
    19: virbr0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default qlen 1000
    inet 192.168.122.1/24 brd 192.168.122.255 scope global virbr0
    valid_lft forever preferred_lft forever

One strange thing I observe now is that ceph -version reporting mimic and osd log reporting octopus.

Actions #1

Updated by Greg Farnum over 2 years ago

  • Project changed from Ceph to RADOS
  • Category deleted (OSD)
Actions #2

Updated by Loïc Dachary over 2 years ago

  • Target version deleted (v15.2.15)
Actions

Also available in: Atom PDF