Bug #46845
closedNewly orchestrated OSD fails with 'unable to find any IPv4 address in networks '2001:db8:11d::/120' with ms_bind_ipv6=true
0%
Description
I just started deploying 60 OSDs to my new 15.2.4 OCtopus IPv6 cephadm cluster. I applied the spec for the OSDs and the orchestrator started creating OSDs. Unfortunately all 60 OSDs crashed at startup with the following message: 'unable to find any IPv4 address in networks '2001:db8:11d::/120'
ms_bind_ipv6 is set to true.
-- The job identifier is 14258. Aug 06 09:21:01 node3.example.net bash[64671]: Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-22 Aug 06 09:21:01 node3.example.net bash[64671]: Running command: /usr/bin/ceph-bluestore-tool --cluster=ceph prime-osd-dir --dev /dev/ceph-24e819f4-9089-48ae-b817-014a29addf23/osd-data-0ccc10ee-018d-43e8-8350-6ea1dd67102e --path /var/lib/ceph/osd/ceph-22 --no-mon-config Aug 06 09:21:01 node3.example.net bash[64671]: Running command: /usr/bin/ln -snf /dev/ceph-24e819f4-9089-48ae-b817-014a29addf23/osd-data-0ccc10ee-018d-43e8-8350-6ea1dd67102e /var/lib/ceph/osd/ceph-22/block Aug 06 09:21:01 node3.example.net bash[64671]: Running command: /usr/bin/chown -h ceph:ceph /var/lib/ceph/osd/ceph-22/block Aug 06 09:21:01 node3.example.net bash[64671]: Running command: /usr/bin/chown -R ceph:ceph /dev/mapper/ceph--24e819f4--9089--48ae--b817--014a29addf23-osd--data--0ccc10ee--018d--43e8--8350--6ea1dd67102e Aug 06 09:21:01 node3.example.net bash[64671]: Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-22 Aug 06 09:21:01 node3.example.net bash[64671]: --> ceph-volume lvm activate successful for osd ID: 22 Aug 06 09:21:01 node3.example.net bash[64907]: debug 2020-08-06T07:21:01.465+0000 7fee3e813f40 0 set uid:gid to 167:167 (ceph:ceph) Aug 06 09:21:01 node3.example.net bash[64907]: debug 2020-08-06T07:21:01.465+0000 7fee3e813f40 0 ceph version 15.2.4 (7447c15c6ff58d7fce91843b705a268a1917325c) octopus (stable), process ceph-osd, pid 1 Aug 06 09:21:01 node3.example.net bash[64907]: debug 2020-08-06T07:21:01.465+0000 7fee3e813f40 0 pidfile_write: ignore empty --pid-file Aug 06 09:21:01 node3.example.net bash[64907]: debug 2020-08-06T07:21:01.469+0000 7fee3e813f40 1 bdev create path /var/lib/ceph/osd/ceph-22/block type kernel Aug 06 09:21:01 node3.example.net bash[64907]: debug 2020-08-06T07:21:01.469+0000 7fee3e813f40 1 bdev(0x562f2f600000 /var/lib/ceph/osd/ceph-22/block) open path /var/lib/ceph/osd/ceph-22/block Aug 06 09:21:01 node3.example.net bash[64907]: debug 2020-08-06T07:21:01.469+0000 7fee3e813f40 1 bdev(0x562f2f600000 /var/lib/ceph/osd/ceph-22/block) open size 1000203091968 (0xe8e0c00000, 932 GiB) block_size 4096 (4 KiB) non-rotational discard supported Aug 06 09:21:01 node3.example.net bash[64907]: debug 2020-08-06T07:21:01.469+0000 7fee3e813f40 1 bluestore(/var/lib/ceph/osd/ceph-22) _set_cache_sizes cache_size 3221225472 meta 0.4 kv 0.4 data 0.2 Aug 06 09:21:01 node3.example.net bash[64907]: debug 2020-08-06T07:21:01.469+0000 7fee3e813f40 1 bdev create path /var/lib/ceph/osd/ceph-22/block type kernel Aug 06 09:21:01 node3.example.net bash[64907]: debug 2020-08-06T07:21:01.469+0000 7fee3e813f40 1 bdev(0x562f2f600700 /var/lib/ceph/osd/ceph-22/block) open path /var/lib/ceph/osd/ceph-22/block Aug 06 09:21:01 node3.example.net bash[64907]: debug 2020-08-06T07:21:01.469+0000 7fee3e813f40 1 bdev(0x562f2f600700 /var/lib/ceph/osd/ceph-22/block) open size 1000203091968 (0xe8e0c00000, 932 GiB) block_size 4096 (4 KiB) non-rotational discard supported Aug 06 09:21:01 node3.example.net bash[64907]: debug 2020-08-06T07:21:01.469+0000 7fee3e813f40 1 bluefs add_block_device bdev 1 path /var/lib/ceph/osd/ceph-22/block size 932 GiB Aug 06 09:21:01 node3.example.net bash[64907]: debug 2020-08-06T07:21:01.469+0000 7fee3e813f40 1 bdev(0x562f2f600700 /var/lib/ceph/osd/ceph-22/block) close Aug 06 09:21:01 node3.example.net bash[64907]: debug 2020-08-06T07:21:01.773+0000 7fee3e813f40 1 bdev(0x562f2f600000 /var/lib/ceph/osd/ceph-22/block) close Aug 06 09:21:02 node3.example.net bash[64907]: debug 2020-08-06T07:21:02.037+0000 7fee3e813f40 1 objectstore numa_node 0 Aug 06 09:21:02 node3.example.net bash[64907]: debug 2020-08-06T07:21:02.037+0000 7fee3e813f40 0 starting osd.22 osd_data /var/lib/ceph/osd/ceph-22 /var/lib/ceph/osd/ceph-22/journal Aug 06 09:21:02 node3.example.net bash[64907]: debug 2020-08-06T07:21:02.037+0000 7fee3e813f40 -1 unable to find any IPv4 address in networks '2001:db8:11d::/120' interfaces '' Aug 06 09:21:02 node3.example.net bash[64907]: debug 2020-08-06T07:21:02.037+0000 7fee3e813f40 -1 unable to find any IPv4 address in networks '2001:db8:11d::/120' interfaces '' Aug 06 09:21:02 node3.example.net bash[64907]: debug 2020-08-06T07:21:02.037+0000 7fee3e813f40 -1 Failed to pick public address. Aug 06 09:21:02 node3.example.net systemd[1]: ceph-d77f7c4a-d656-11ea-95cb-531234b0f844@osd.22.service: Main process exited, code=exited, status=1/FAILURE
I double checked to see if ms_bind_ipv6 was set to True, this is the case.
While searching for ms_bind I noticed ms_bind_ipv4 is a thing that exists and it was also set to true (default). When I configure this to be false, the OSDs can boot up. Switching ms_bind_ipv4 back to the default (true), the OSDs can not start.
ms_bind_ipv4 set to false (for OSD only):
Aug 06 09:55:22 node3.example.net bash[66959]: debug 2020-08-06T07:55:22.013+0000 7f54b86daf40 0 starting osd.22 osd_data /var/lib/ceph/osd/ceph-22 /var/lib/ceph/osd/ceph-22/journal Aug 06 09:55:22 node3.example.net bash[66959]: debug 2020-08-06T07:55:22.033+0000 7f54b86daf40 0 load: jerasure load: lrc load: isa Aug 06 09:55:22 node3.example.net bash[66959]: debug 2020-08-06T07:55:22.033+0000 7f54b86daf40 1 bdev create path /var/lib/ceph/osd/ceph-22/block type kernel Aug 06 09:55:22 node3.example.net bash[66959]: debug 2020-08-06T07:55:22.033+0000 7f54b86daf40 1 bdev(0x55c143d6a000 /var/lib/ceph/osd/ceph-22/block) open path /var/lib/ceph/osd/ceph-22/block ...snip... Aug 06 09:55:25 node3.example.net bash[66959]: debug 2020-08-06T07:55:25.628+0000 7f54a21c7700 1 osd.22 88 state: booting -> active
ms_bind_ipv4 back to the default value (true) and then it fails to start again:
Aug 06 10:10:43 node3.example.net bash[70455]: debug 2020-08-06T08:10:43.617+0000 7f78b53d3f40 0 starting osd.22 osd_data /var/lib/ceph/osd/ceph-22 /var/lib/ceph/osd/ceph-22/journal Aug 06 10:10:43 node3.example.net bash[70455]: debug 2020-08-06T08:10:43.617+0000 7f78b53d3f40 -1 unable to find any IPv4 address in networks '2001:db8:11d::/120' interfaces ''
To be sure this was the only thing in the way, i tried it 2 more times. I can confirm that with the ms_bind_ipv4 set to false, my OSDs can boot. With ms_bind_ipv4 set to default (true), my OSDs fail to boot.
If you need any more information i'd be happy to supply you with it.