Support #52700
openOSDs wont start
0%
Description
We have a three node ceph cluster with 3 baremetal nodes running 4 OSDs each (In total 12 OSDs). Recently, my collegue enabled cephfs here and offlate we observed OSDs were down from one of the baremetals and wont start. They complain as follows:
$ ceph --version
ceph version 13.2.10 (564bdc4ae87418a232fc901524470e1a0f76d641) mimic (stable)
$ ceph -s
cluster:
id: 314aad04-c512-4057-b645-b3d2edb95c70
health: HEALTH_WARN
18/1824198 objects misplaced (0.001%)
Degraded data redundancy: 608048/1824198 objects degraded (33.332%), 197 pgs degraded, 220 pgs undersized
services:
mon: 3 daemons, quorum ceph-admin-mon1,ceph-admin-mon2,ceph-admin-mon3
mgr: ceph-admin-mon1(active), standbys: ceph-admin-mon3
mds: shared_storage-1/1/1 up {0=ceph-admin-mon2=up:active}, 2 up:standby
osd: 12 osds: 8 up, 8 in; 100 remapped pgs
data:
pools: 5 pools, 320 pgs
objects: 608.1 k objects, 2.2 TiB
usage: 4.4 TiB used, 2.9 TiB / 7.3 TiB avail
pgs: 608048/1824198 objects degraded (33.332%)
18/1824198 objects misplaced (0.001%)
197 active+undersized+degraded
100 active+clean+remapped
23 active+undersized
io:
client: 165 KiB/s rd, 3.1 MiB/s wr, 42 op/s rd, 180 op/s wr
$ ceph osd status-------------+-------+-------+--------+---------+--------+---------+----------------+
| id | host | used | avail | wr ops | wr data | rd ops | rd data | state |-------------+-------+-------+--------+---------+--------+---------+----------------+
| 0 | BM-Ceph-1 | 685G | 245G | 43 | 1869k | 4 | 0 | exists,up |
| 1 | BM-Ceph-2 | 579G | 351G | 19 | 653k | 5 | 0 | exists,up |
| 2 | | 0 | 0 | 0 | 0 | 0 | 0 | autoout,exists |
| 3 | BM-Ceph-1 | 699G | 232G | 55 | 913k | 7 | 0 | exists,up |
| 4 | BM-Ceph-2 | 593G | 337G | 20 | 324k | 6 | 0 | exists,up |
| 5 | | 0 | 0 | 0 | 0 | 0 | 0 | autoout,exists |
| 6 | BM-Ceph-1 | 493G | 438G | 17 | 543k | 3 | 0 | exists,up |
| 7 | BM-Ceph-2 | 580G | 351G | 25 | 889k | 4 | 6 | exists,up |
| 8 | | 0 | 0 | 0 | 0 | 0 | 0 | autoout,exists |
| 9 | BM-Ceph-1 | 373G | 558G | 10 | 90.4k | 19 | 64.8k | exists,up |
| 10 | BM-Ceph-2 | 501G | 429G | 10 | 306k | 31 | 116k | exists,up |
| 11 | | 0 | 0 | 0 | 0 | 0 | 0 | autoout,exists |-------------+-------+-------+--------+---------+--------+---------+----------------+
root@BM-Ceph-3:/var/log# systemctl status ceph-osd@2
● ceph-osd@2.service - Ceph object storage daemon osd.2
Loaded: loaded (/lib/systemd/system/ceph-osd@.service; enabled-runtime; vendor preset: enabled)
Active: failed (Result: exit-code) since Wed 2021-09-22 09:10:39 UTC; 726ms ago
Process: 11920 ExecStartPre=/usr/lib/ceph/ceph-osd-prestart.sh --cluster ${CLUSTER} --id 2 (code=exited, status=0/SUCCESS)
Process: 11931 ExecStart=/usr/bin/ceph-osd -f --cluster ${CLUSTER} --id 2 --setuser ceph --setgroup ceph (code=exited, status=1>
Main PID: 11931 (code=exited, status=1/FAILURE)
Sep 22 09:10:39 BM-Ceph-3 systemd1: ceph-osd@2.service: Scheduled restart job, restart counter is at 3.
Sep 22 09:10:39 BM-Ceph-3 systemd1: Stopped Ceph object storage daemon osd.2.
Sep 22 09:10:39 BM-Ceph-3 systemd1: ceph-osd@2.service: Start request repeated too quickly.
Sep 22 09:10:39 BM-Ceph-3 systemd1: ceph-osd@2.service: Failed with result 'exit-code'.
Sep 22 09:10:39 BM-Ceph-3 systemd1: Failed to start Ceph object storage daemon osd.2.
root@BM-Ceph-3:/var/log/ceph# tail -100f ceph-osd.2.log
2021-09-22T09:10:06.913+0000 7eff32ed5d80 0 set uid:gid to 64045:64045 (ceph:ceph)
2021-09-22T09:10:06.913+0000 7eff32ed5d80 0 ceph version 15.2.7 (88e41c6c49beb18add4fdb6b4326ca466d931db8) octopus (stable), process ceph-osd, pid 11803
2021-09-22T09:10:06.913+0000 7eff32ed5d80 0 pidfile_write: ignore empty --pid-file
2021-09-22T09:10:06.917+0000 7eff32ed5d80 1 bdev create path /var/lib/ceph/osd/ceph-2/block type kernel
2021-09-22T09:10:06.917+0000 7eff32ed5d80 1 bdev(0x559f185bc000 /var/lib/ceph/osd/ceph-2/block) open path /var/lib/ceph/osd/ceph-2/block
2021-09-22T09:10:06.917+0000 7eff32ed5d80 1 bdev(0x559f185bc000 /var/lib/ceph/osd/ceph-2/block) open size 1000203091968 (0xe8e0c00000, 932 GiB) block_size 4096 (4 KiB) non-rotational discard supported
2021-09-22T09:10:06.917+0000 7eff32ed5d80 1 bluestore(/var/lib/ceph/osd/ceph-2) _set_cache_sizes cache_size 3221225472 meta 0.4 kv 0.4 data 0.2
2021-09-22T09:10:06.917+0000 7eff32ed5d80 1 bdev create path /var/lib/ceph/osd/ceph-2/block type kernel
2021-09-22T09:10:06.917+0000 7eff32ed5d80 1 bdev(0x559f185bc700 /var/lib/ceph/osd/ceph-2/block) open path /var/lib/ceph/osd/ceph-2/block
2021-09-22T09:10:06.917+0000 7eff32ed5d80 1 bdev(0x559f185bc700 /var/lib/ceph/osd/ceph-2/block) open size 1000203091968 (0xe8e0c00000, 932 GiB) block_size 4096 (4 KiB) non-rotational discard supported
2021-09-22T09:10:06.917+0000 7eff32ed5d80 1 bluefs add_block_device bdev 1 path /var/lib/ceph/osd/ceph-2/block size 932 GiB
2021-09-22T09:10:06.917+0000 7eff32ed5d80 1 bdev(0x559f185bc700 /var/lib/ceph/osd/ceph-2/block) close
2021-09-22T09:10:07.201+0000 7eff32ed5d80 1 bdev(0x559f185bc000 /var/lib/ceph/osd/ceph-2/block) close
2021-09-22T09:10:07.449+0000 7eff32ed5d80 1 objectstore numa_node 1
2021-09-22T09:10:07.449+0000 7eff32ed5d80 0 starting osd.2 osd_data /var/lib/ceph/osd/ceph-2 /var/lib/ceph/osd/ceph-2/journal
2021-09-22T09:10:07.453+0000 7eff32ed5d80 -1 unable to find any IPv4 address in networks '172.16.0.0/16' interfaces ''
2021-09-22T09:10:07.453+0000 7eff32ed5d80 -1 unable to find any IPv4 address in networks '192.168.9.0/24' interfaces ''
2021-09-22T09:10:07.457+0000 7eff32ed5d80 -1 expected plugin /usr/lib/x86_64-linux-gnu/ceph/erasure-code/libec_jerasure.so version 15.2.7 but it claims to be 15.2.3 instead
2021-09-22T09:10:17.705+0000 7fc09aab8d80 0 set uid:gid to 64045:64045 (ceph:ceph)
2021-09-22T09:10:17.705+0000 7fc09aab8d80 0 ceph version 15.2.7 (88e41c6c49beb18add4fdb6b4326ca466d931db8) octopus (stable), process ceph-osd, pid 11860
2021-09-22T09:10:17.705+0000 7fc09aab8d80 0 pidfile_write: ignore empty --pid-file
2021-09-22T09:10:17.709+0000 7fc09aab8d80 1 bdev create path /var/lib/ceph/osd/ceph-2/block type kernel
2021-09-22T09:10:17.709+0000 7fc09aab8d80 1 bdev(0x55a4a1938000 /var/lib/ceph/osd/ceph-2/block) open path /var/lib/ceph/osd/ceph-2/block
2021-09-22T09:10:17.709+0000 7fc09aab8d80 1 bdev(0x55a4a1938000 /var/lib/ceph/osd/ceph-2/block) open size 1000203091968 (0xe8e0c00000, 932 GiB) block_size 4096 (4 KiB) non-rotational discard supported
2021-09-22T09:10:17.709+0000 7fc09aab8d80 1 bluestore(/var/lib/ceph/osd/ceph-2) _set_cache_sizes cache_size 3221225472 meta 0.4 kv 0.4 data 0.2
2021-09-22T09:10:17.709+0000 7fc09aab8d80 1 bdev create path /var/lib/ceph/osd/ceph-2/block type kernel
2021-09-22T09:10:17.709+0000 7fc09aab8d80 1 bdev(0x55a4a1938700 /var/lib/ceph/osd/ceph-2/block) open path /var/lib/ceph/osd/ceph-2/block
2021-09-22T09:10:17.709+0000 7fc09aab8d80 1 bdev(0x55a4a1938700 /var/lib/ceph/osd/ceph-2/block) open size 1000203091968 (0xe8e0c00000, 932 GiB) block_size 4096 (4 KiB) non-rotational discard supported
2021-09-22T09:10:17.709+0000 7fc09aab8d80 1 bluefs add_block_device bdev 1 path /var/lib/ceph/osd/ceph-2/block size 932 GiB
2021-09-22T09:10:17.709+0000 7fc09aab8d80 1 bdev(0x55a4a1938700 /var/lib/ceph/osd/ceph-2/block) close
2021-09-22T09:10:17.993+0000 7fc09aab8d80 1 bdev(0x55a4a1938000 /var/lib/ceph/osd/ceph-2/block) close
2021-09-22T09:10:18.241+0000 7fc09aab8d80 1 objectstore numa_node 1
2021-09-22T09:10:18.241+0000 7fc09aab8d80 0 starting osd.2 osd_data /var/lib/ceph/osd/ceph-2 /var/lib/ceph/osd/ceph-2/journal
2021-09-22T09:10:18.245+0000 7fc09aab8d80 -1 unable to find any IPv4 address in networks '172.16.0.0/16' interfaces ''
2021-09-22T09:10:18.245+0000 7fc09aab8d80 -1 unable to find any IPv4 address in networks '192.168.9.0/24' interfaces ''
2021-09-22T09:10:18.249+0000 7fc09aab8d80 -1 expected plugin /usr/lib/x86_64-linux-gnu/ceph/erasure-code/libec_jerasure.so version 15.2.7 but it claims to be 15.2.3 instead
2021-09-22T09:10:28.492+0000 7f5b71666d80 0 set uid:gid to 64045:64045 (ceph:ceph)
2021-09-22T09:10:28.492+0000 7f5b71666d80 0 ceph version 15.2.7 (88e41c6c49beb18add4fdb6b4326ca466d931db8) octopus (stable), process ceph-osd, pid 11931
2021-09-22T09:10:28.492+0000 7f5b71666d80 0 pidfile_write: ignore empty --pid-file
2021-09-22T09:10:28.496+0000 7f5b71666d80 1 bdev create path /var/lib/ceph/osd/ceph-2/block type kernel
2021-09-22T09:10:28.496+0000 7f5b71666d80 1 bdev(0x5588cadc6000 /var/lib/ceph/osd/ceph-2/block) open path /var/lib/ceph/osd/ceph-2/block
2021-09-22T09:10:28.496+0000 7f5b71666d80 1 bdev(0x5588cadc6000 /var/lib/ceph/osd/ceph-2/block) open size 1000203091968 (0xe8e0c00000, 932 GiB) block_size 4096 (4 KiB) non-rotational discard supported
2021-09-22T09:10:28.496+0000 7f5b71666d80 1 bluestore(/var/lib/ceph/osd/ceph-2) _set_cache_sizes cache_size 3221225472 meta 0.4 kv 0.4 data 0.2
2021-09-22T09:10:28.496+0000 7f5b71666d80 1 bdev create path /var/lib/ceph/osd/ceph-2/block type kernel
2021-09-22T09:10:28.496+0000 7f5b71666d80 1 bdev(0x5588cadc6700 /var/lib/ceph/osd/ceph-2/block) open path /var/lib/ceph/osd/ceph-2/block
2021-09-22T09:10:28.496+0000 7f5b71666d80 1 bdev(0x5588cadc6700 /var/lib/ceph/osd/ceph-2/block) open size 1000203091968 (0xe8e0c00000, 932 GiB) block_size 4096 (4 KiB) non-rotational discard supported
2021-09-22T09:10:28.496+0000 7f5b71666d80 1 bluefs add_block_device bdev 1 path /var/lib/ceph/osd/ceph-2/block size 932 GiB
2021-09-22T09:10:28.496+0000 7f5b71666d80 1 bdev(0x5588cadc6700 /var/lib/ceph/osd/ceph-2/block) close
2021-09-22T09:10:28.776+0000 7f5b71666d80 1 bdev(0x5588cadc6000 /var/lib/ceph/osd/ceph-2/block) close
2021-09-22T09:10:29.032+0000 7f5b71666d80 1 objectstore numa_node 1
2021-09-22T09:10:29.032+0000 7f5b71666d80 0 starting osd.2 osd_data /var/lib/ceph/osd/ceph-2 /var/lib/ceph/osd/ceph-2/journal
2021-09-22T09:10:29.036+0000 7f5b71666d80 -1 unable to find any IPv4 address in networks '172.16.0.0/16' interfaces ''
2021-09-22T09:10:29.036+0000 7f5b71666d80 -1 unable to find any IPv4 address in networks '192.168.9.0/24' interfaces ''
2021-09-22T09:10:29.040+0000 7f5b71666d80 -1 expected plugin /usr/lib/x86_64-linux-gnu/ceph/erasure-code/libec_jerasure.so version 15.2.7 but it claims to be 15.2.3 instead
- ip -4 addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
14: vlan326: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
inet 10.117.134.166/28 brd 10.117.134.175 scope global vlan326
valid_lft forever preferred_lft forever
17: vlan3202: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
inet 172.16.0.12/16 brd 172.16.255.255 scope global vlan3202
valid_lft forever preferred_lft forever
18: vlan3203: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
inet 192.168.9.12/24 brd 192.168.9.255 scope global vlan3203
valid_lft forever preferred_lft forever
19: virbr0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default qlen 1000
inet 192.168.122.1/24 brd 192.168.122.255 scope global virbr0
valid_lft forever preferred_lft forever
One strange thing I observe now is that ceph -version reporting mimic and osd log reporting octopus.
Updated by Greg Farnum over 2 years ago
- Project changed from Ceph to RADOS
- Category deleted (
OSD)