Project

General

Profile

Actions

Bug #46845

closed

Newly orchestrated OSD fails with 'unable to find any IPv4 address in networks '2001:db8:11d::/120' with ms_bind_ipv6=true

Added by Daniël Vos over 3 years ago. Updated over 3 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Matthew Oliver
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
OSD
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

I just started deploying 60 OSDs to my new 15.2.4 OCtopus IPv6 cephadm cluster. I applied the spec for the OSDs and the orchestrator started creating OSDs. Unfortunately all 60 OSDs crashed at startup with the following message: 'unable to find any IPv4 address in networks '2001:db8:11d::/120'

ms_bind_ipv6 is set to true.

-- The job identifier is 14258.
Aug 06 09:21:01 node3.example.net bash[64671]: Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-22
Aug 06 09:21:01 node3.example.net bash[64671]: Running command: /usr/bin/ceph-bluestore-tool --cluster=ceph prime-osd-dir --dev /dev/ceph-24e819f4-9089-48ae-b817-014a29addf23/osd-data-0ccc10ee-018d-43e8-8350-6ea1dd67102e --path /var/lib/ceph/osd/ceph-22 --no-mon-config
Aug 06 09:21:01 node3.example.net bash[64671]: Running command: /usr/bin/ln -snf /dev/ceph-24e819f4-9089-48ae-b817-014a29addf23/osd-data-0ccc10ee-018d-43e8-8350-6ea1dd67102e /var/lib/ceph/osd/ceph-22/block
Aug 06 09:21:01 node3.example.net bash[64671]: Running command: /usr/bin/chown -h ceph:ceph /var/lib/ceph/osd/ceph-22/block
Aug 06 09:21:01 node3.example.net bash[64671]: Running command: /usr/bin/chown -R ceph:ceph /dev/mapper/ceph--24e819f4--9089--48ae--b817--014a29addf23-osd--data--0ccc10ee--018d--43e8--8350--6ea1dd67102e
Aug 06 09:21:01 node3.example.net bash[64671]: Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-22
Aug 06 09:21:01 node3.example.net bash[64671]: --> ceph-volume lvm activate successful for osd ID: 22
Aug 06 09:21:01 node3.example.net bash[64907]: debug 2020-08-06T07:21:01.465+0000 7fee3e813f40  0 set uid:gid to 167:167 (ceph:ceph)
Aug 06 09:21:01 node3.example.net bash[64907]: debug 2020-08-06T07:21:01.465+0000 7fee3e813f40  0 ceph version 15.2.4 (7447c15c6ff58d7fce91843b705a268a1917325c) octopus (stable), process ceph-osd, pid 1
Aug 06 09:21:01 node3.example.net bash[64907]: debug 2020-08-06T07:21:01.465+0000 7fee3e813f40  0 pidfile_write: ignore empty --pid-file
Aug 06 09:21:01 node3.example.net bash[64907]: debug 2020-08-06T07:21:01.469+0000 7fee3e813f40  1 bdev create path /var/lib/ceph/osd/ceph-22/block type kernel
Aug 06 09:21:01 node3.example.net bash[64907]: debug 2020-08-06T07:21:01.469+0000 7fee3e813f40  1 bdev(0x562f2f600000 /var/lib/ceph/osd/ceph-22/block) open path /var/lib/ceph/osd/ceph-22/block
Aug 06 09:21:01 node3.example.net bash[64907]: debug 2020-08-06T07:21:01.469+0000 7fee3e813f40  1 bdev(0x562f2f600000 /var/lib/ceph/osd/ceph-22/block) open size 1000203091968 (0xe8e0c00000, 932 GiB) block_size 4096 (4 KiB) non-rotational discard supported
Aug 06 09:21:01 node3.example.net bash[64907]: debug 2020-08-06T07:21:01.469+0000 7fee3e813f40  1 bluestore(/var/lib/ceph/osd/ceph-22) _set_cache_sizes cache_size 3221225472 meta 0.4 kv 0.4 data 0.2
Aug 06 09:21:01 node3.example.net bash[64907]: debug 2020-08-06T07:21:01.469+0000 7fee3e813f40  1 bdev create path /var/lib/ceph/osd/ceph-22/block type kernel
Aug 06 09:21:01 node3.example.net bash[64907]: debug 2020-08-06T07:21:01.469+0000 7fee3e813f40  1 bdev(0x562f2f600700 /var/lib/ceph/osd/ceph-22/block) open path /var/lib/ceph/osd/ceph-22/block
Aug 06 09:21:01 node3.example.net bash[64907]: debug 2020-08-06T07:21:01.469+0000 7fee3e813f40  1 bdev(0x562f2f600700 /var/lib/ceph/osd/ceph-22/block) open size 1000203091968 (0xe8e0c00000, 932 GiB) block_size 4096 (4 KiB) non-rotational discard supported
Aug 06 09:21:01 node3.example.net bash[64907]: debug 2020-08-06T07:21:01.469+0000 7fee3e813f40  1 bluefs add_block_device bdev 1 path /var/lib/ceph/osd/ceph-22/block size 932 GiB
Aug 06 09:21:01 node3.example.net bash[64907]: debug 2020-08-06T07:21:01.469+0000 7fee3e813f40  1 bdev(0x562f2f600700 /var/lib/ceph/osd/ceph-22/block) close
Aug 06 09:21:01 node3.example.net bash[64907]: debug 2020-08-06T07:21:01.773+0000 7fee3e813f40  1 bdev(0x562f2f600000 /var/lib/ceph/osd/ceph-22/block) close
Aug 06 09:21:02 node3.example.net bash[64907]: debug 2020-08-06T07:21:02.037+0000 7fee3e813f40  1  objectstore numa_node 0
Aug 06 09:21:02 node3.example.net bash[64907]: debug 2020-08-06T07:21:02.037+0000 7fee3e813f40  0 starting osd.22 osd_data /var/lib/ceph/osd/ceph-22 /var/lib/ceph/osd/ceph-22/journal
Aug 06 09:21:02 node3.example.net bash[64907]: debug 2020-08-06T07:21:02.037+0000 7fee3e813f40 -1 unable to find any IPv4 address in networks '2001:db8:11d::/120' interfaces ''
Aug 06 09:21:02 node3.example.net bash[64907]: debug 2020-08-06T07:21:02.037+0000 7fee3e813f40 -1 unable to find any IPv4 address in networks '2001:db8:11d::/120' interfaces ''
Aug 06 09:21:02 node3.example.net bash[64907]: debug 2020-08-06T07:21:02.037+0000 7fee3e813f40 -1 Failed to pick public address.
Aug 06 09:21:02 node3.example.net systemd[1]: ceph-d77f7c4a-d656-11ea-95cb-531234b0f844@osd.22.service: Main process exited, code=exited, status=1/FAILURE

I double checked to see if ms_bind_ipv6 was set to True, this is the case.

While searching for ms_bind I noticed ms_bind_ipv4 is a thing that exists and it was also set to true (default). When I configure this to be false, the OSDs can boot up. Switching ms_bind_ipv4 back to the default (true), the OSDs can not start.

ms_bind_ipv4 set to false (for OSD only):

Aug 06 09:55:22 node3.example.net bash[66959]: debug 2020-08-06T07:55:22.013+0000 7f54b86daf40  0 starting osd.22 osd_data /var/lib/ceph/osd/ceph-22 /var/lib/ceph/osd/ceph-22/journal
Aug 06 09:55:22 node3.example.net bash[66959]: debug 2020-08-06T07:55:22.033+0000 7f54b86daf40  0 load: jerasure load: lrc load: isa
Aug 06 09:55:22 node3.example.net bash[66959]: debug 2020-08-06T07:55:22.033+0000 7f54b86daf40  1 bdev create path /var/lib/ceph/osd/ceph-22/block type kernel
Aug 06 09:55:22 node3.example.net bash[66959]: debug 2020-08-06T07:55:22.033+0000 7f54b86daf40  1 bdev(0x55c143d6a000 /var/lib/ceph/osd/ceph-22/block) open path /var/lib/ceph/osd/ceph-22/block
...snip...
Aug 06 09:55:25 node3.example.net bash[66959]: debug 2020-08-06T07:55:25.628+0000 7f54a21c7700  1 osd.22 88 state: booting -> active

ms_bind_ipv4 back to the default value (true) and then it fails to start again:

Aug 06 10:10:43 node3.example.net bash[70455]: debug 2020-08-06T08:10:43.617+0000 7f78b53d3f40  0 starting osd.22 osd_data /var/lib/ceph/osd/ceph-22 /var/lib/ceph/osd/ceph-22/journal
Aug 06 10:10:43 node3.example.net bash[70455]: debug 2020-08-06T08:10:43.617+0000 7f78b53d3f40 -1 unable to find any IPv4 address in networks '2001:db8:11d::/120' interfaces ''

To be sure this was the only thing in the way, i tried it 2 more times. I can confirm that with the ms_bind_ipv4 set to false, my OSDs can boot. With ms_bind_ipv4 set to default (true), my OSDs fail to boot.

If you need any more information i'd be happy to supply you with it.


Related issues 2 (1 open1 closed)

Related to RADOS - Bug #52867: pick_address.cc prints: unable to find any IPv4 address in networks 'fd00:fd00:fd00:3000::/64' interfacesNew

Actions
Has duplicate Messengers - Bug #39711: "unable to find any IPv4 address in networks <ipv6-network>" after upgrade to nautilus on osd and mdsDuplicate

Actions
Actions #1

Updated by Matthew Oliver over 3 years ago

I think this is duplicate of https://tracker.ceph.com/issues/39711

The workaround was to disable `ms_bind_ipv4`, as it's enabled by default. And seeing `ms_bind_ipv6` doesn't disable it.

Your tracker bug has more details then the other though, which is nice, I'm no expert in this area of the codebase.. yet but I'll use it to attempt to track down the issue so we don't need a workaround, because clearly that isn't IPv4 so maybe a passing failure. I'll play with this tomorrow (it's late here in Oz).

Actions #2

Updated by Matthew Oliver over 3 years ago

I've managed to recreate the issue in a vstart env. It happens when I use ipv6 but set the `public network` to an ipv6 network. Now I can debug!

Hopefully have a solution/PR soon :)

Actions #3

Updated by Daniël Vos over 3 years ago

Matthew Oliver wrote:

I've managed to recreate the issue in a vstart env. It happens when I use ipv6 but set the `public network` to an ipv6 network. Now I can debug!

Hopefully have a solution/PR soon :)

That's great! My `public network` and `cluster network` both have their own /120. I've `cephadm bootstrap`ped my cluster with a ceph.conf that contained 3 settings, the public/cluster network and `ms bind ipv6` = true.

Actions #4

Updated by Matthew Oliver over 3 years ago

  • Status changed from New to In Progress
  • Assignee set to Matthew Oliver

Cool, I've tracked down what's happening. Will push the first version of a patch up on Monday. I think if we get an IP from the network then we shouldn't stop the OSD from starting. If there isn't an network for all address famlilies then we should warn and continue.

So it'll be a PR containing I bit of code change and documentation to make sure how to do single stack (ipv4 or ipv6) and dual stack would need to be configured. That's the plan anyway.

Right now, back to my weekend :)

Actions #5

Updated by Neha Ojha over 3 years ago

  • Status changed from In Progress to Fix Under Review
  • Pull request ID set to 36536
Actions #6

Updated by Kefu Chai over 3 years ago

  • Has duplicate Bug #39711: "unable to find any IPv4 address in networks <ipv6-network>" after upgrade to nautilus on osd and mds added
Actions #7

Updated by Kefu Chai over 3 years ago

  • Status changed from Fix Under Review to Resolved
Actions #8

Updated by Daniel Pivonka over 2 years ago

  • Related to Bug #52867: pick_address.cc prints: unable to find any IPv4 address in networks 'fd00:fd00:fd00:3000::/64' interfaces added
Actions

Also available in: Atom PDF