Project

General

Profile

Bug #13995

ceph-osd "nonce collision" due to unshared pid namespaces

Added by Oskar stenman over 8 years ago. Updated over 7 years ago.

Status:
Can't reproduce
Priority:
High
Assignee:
Category:
OSD
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

I found Issue #13032 yesterday after I've been hitting a brick wall while trying to deploy ceph on and off during the last couple of months, everybody has said network-problem but i have never found one.

I added "--pid=host" to the docker containers running ceph-osd, and after that everything was instantly stable and working as expected.

Issue #13032 seems to affect ceph-osd when running multiple osd's in different pid namespaces (containers), pid is always the same for each osd.

When i launched one osd in a container, everything was fine.
Two osd's in two different containers on the same host also seemed to work alright.
But when i launched a third one on a second host the third one couldn't connect to osd's on a host where two were running.

ceph-osd logs ~100k messages per second (not exactly sure, but it was a lot), ~500MB of stdout/stderr per minute, 250-350% cpu was used per osd just to try and reconnect, netstat went from 25 tcp sessions to over 100k within a couple of seconds, cluster was inherently unstable since most OSD's reported most other OSD's as broken or similar.

I tried logging with debug ms = 10 and i got ~600MB of logs per second, if you want debug logs i can post a second or two of them somewhere.

Maybe these reconnect-attempts should be throttled in the daemon as well, if 100k new connections per second doesn't work it's usually not a network or ip stack problem but an application problem. Throttling them somehow (with a config option to increase the limit as to not affect future deploys where it's actually needed?) would've made troubleshooting much easier as well since i could then match the logs between the different osd's, and maybe also run with debug ms = 10.

These are the log lines from the nodes that get stuck in an infinite reconnect-loop, this very rapidly fills the hosts diskspace:
http://pastebin.com/SU3ed7hG
These are logs from a node stuck in an infinite reconnect-loop:
http://pastebin.com/M63PJVFE
These are logs from a node receiving connections:
http://pastebin.com/atQS7a8n

Problem at least exist on ubuntu with 9.2.0, and hammer.

History

#1 Updated by Samuel Just about 8 years ago

  • Assignee set to Loïc Dachary

#2 Updated by Samuel Just about 8 years ago

I don't quite understand the problem, is it that the pid and hostname was the same?

#3 Updated by Loïc Dachary about 8 years ago

  • Status changed from New to Need More Info

that's an interesting problem :-) It would be great if you could provide steps to reproduce the problem. I don't see how it can happen if the OSD is running in an unprivileged container with no bind mount of any kind. But maybe you're doing something slightly different ?

#4 Updated by Samuel Just over 7 years ago

  • Status changed from Need More Info to Can't reproduce

Feel free to reopen once there is more information.

Also available in: Atom PDF