Project

General

Profile

Actions

Bug #10763

open

OSDs get marked as down in docker

Added by Ian Babrou about 9 years ago. Updated over 4 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
OSD
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

http://lists.ceph.com/pipermail/ceph-users-ceph.com/2014-November/044996.html

I can replicate that on digitalocean machines and give an access to figure out what's wrong.

OSDs and MONs are running with host networking, upgrading to 0.80.7 didn't help.

Actions #1

Updated by Samuel Just about 9 years ago

  • Status changed from New to Closed

This sounds kind of like a docker problem?

Actions #2

Updated by Ian Babrou about 9 years ago

... that only happens with ceph. Ceph gets crazy about connections and cpu for no reason, not docker.

Guess what docker developers would say when I create an issue "ceph doesn't work in docker"?

Actions #3

Updated by Peter Rosell about 9 years ago

I have been struggling with this problem too. After disussion on this issue, https://github.com/ceph/ceph-docker/issues/19, I decided to setup Ceph running on docker with IPv6 to get rid of the host configuration of network interface in the containers.

I can confirm that running three different osd containers on the same host work when using ipv6. I removed --net=host and also --priviledged. Also the monitor was running on the same host in a separate container. I used a /80 network on the host and each container got its own ip address. The host handles the routing and it was enabled with these commands. ( change the network to your correct one)

$ ip -6 route add 2001:db8:1::/64 dev docker0
$ sysctl net.ipv6.conf.default.forwarding=1
$ sysctl net.ipv6.conf.all.forwarding=1

I did the tests half manually just to get something to verify. I guess there are some work to do if you want to be able to use these containers and scripts with ipv6. Each time you restart a container you will get a new ip address. It is based on the mac address that increase for each container that is started.

I guess that ceph have some problem when the two osd bind to the same ip address but they can't see each other due to running in separate containers. When I run on ipv4 with your containers and starts two osds on the same host with a third osd also running the network traffic goes up like crazy. I get over 8000 tcp connections after a few seconds, but almost all of them is in TIME_WAIT state, which means that they are closed. It's like the third osd contacts one of the osds running on the same host, but finds out that it's the wrong OSD_ID and disconnect and then start over again. The osds then starts reporting each other as down to the monitor. I'm not sure if this is the real reason, but it might be interesting to hear what someone from the ceph community think about this theory. Maybe detailed logging of the intercommunication between the osd can be activated.

I hope I will be able to push my setup later, but it will be next week.
My setup is running vagrant(virtualbox), coreos (alpha channel, 598.0.0) with docker 1.5.0.

Actions #4

Updated by Sam Yaple about 9 years ago

I have successfully run multiple OSDs in seperate containers with IPv4 networking in Docker.

Using the `--pid=host` flag so that it shares the host pid namespace as prevented Ceph from going crazy and spawning a bunch of connections. Coupled with `--net=host` multiple OSDs seem fine with IPv4. The version of Docker right now, 1.5.0, does not have support for custom PID namespaces, but I would be willing to bet if I could share the pid namespace between containers then this would work as well. The feature request for shared PID namespaces is here: https://github.com/docker/docker/issues/10163

Is there some type of interprocess communication between the pids that is happening? I am not familiar enough with the Ceph codebase to know exactly what is going on or how OSDs on the same host typically communicate and if that communication can be changed to use TCP instead of sockets.

Actions #5

Updated by Sébastien Han about 5 years ago

  • Status changed from Closed to New

Still hitting this on Nautilus dev.

Actions #6

Updated by Sachi King over 4 years ago

I've just run into this, admittedly on Luminous, and was able to reproduce this in my testing environment, so I spent a little bit of time on it.

I noticed when looking at the osdmap that the addresses were '<addr>:<port>/<nonce>', with nonce being PID.
Digging into this a bit more, I see in 'include/msgr.h' that ceph_entity_addr has a comment on 'nonce' "unique id for process (e.g. pid)".

Being in a container, this makes our pid non-unique. Taking a stab in the dark as I have not found where the nonce is used yet, I thought that IP/Nonce might be used somewhere to pool connections, and attempted to make the nonce unique again. To do this I added a shim to the container that did:
'for i in $(seq 0 ${OSD_ID}); do cat /dev/null; done'

Afterwards the OSDs worked as expected, and did not experience the high CPU load.

Actions

Also available in: Atom PDF