Bug #7690
closedTeuthology: failed to bind the Unix Domain socket
0%
Description
AdminSocketConfigObs::init: failed: AdminSocket::bind_and_listen: failed to bind the UNIX domain socket to '/var/run/ceph/ceph-osd.1.asok': (17) File exists
Updated by Anonymous about 10 years ago
The error message here was:
2014-03-11T19:37:09.850 DEBUG:teuthology.orchestra.run:Running [10.214.138.66]: 'sudo MALLOC_CHECK_=3 adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph-osd --mkfs --mkkey -i 1 --monmap /home/ubuntu/cephtest/monmap'
2014-03-11T19:37:09.970 INFO:teuthology.orchestra.run.err:[10.214.138.66]: 2014-03-12 02:37:09.969172 7f43edcc0780 -1 asok(0x22b61c0) AdminSocketConfigObs::init: failed: AdminSocket::bind_and_listen: failed to bind the UNIX domain socket to '/var/run/ceph/ceph-osd.1.asok': (17) File exists
2014-03-11T19:37:09.970 INFO:teuthology.orchestra.run.err:[10.214.138.66]: 2014-03-12 02:37:09.969726 7f43edcc0780 -1 OSD::mkfs: FileStore::mkfs failed with error -16
The yaml file that generated this is:
roles: - - mon.a - mds.a - osd.0 - osd.1 - - mon.b - mon.c - osd.2 - osd.3 - - client.0 - client.1 tasks: - install: branch: dumpling - ceph: fs: xfs - parallel: - workload - upgrade-sequence - final-work workload: sequential: - workunit: branch: dumpling clients: client.0: - rados/test.sh - cls - install.upgrade: all: branch: emperor - ceph: fs: xfs - ceph.restart: [mon.a, mon.b, mon.c, mds.a, osd.0, osd.1, osd.2, osd.3] - parallel: - workload-two - upgrade-sequence workload-two: sequential: - workunit: branch: dumpling clients: client.0: - rados/test.sh - cls upgrade-sequence: sequential: - install.upgrade: all: branch: emperor - ceph.restart: [mon.a, mon.b, mon.c, mds.a, osd.0, osd.1, osd.2, osd.3] final-work: - workunit: clients: client.1: - rados/load-gen-mix.sh os_type: ubuntu os_version: "12.04" targets: ubuntu@vpm006.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCm4HgN4bngOSLVSI/i4YdFfxt7zqF2t6mj9nXsg7O/csHTWLf4X3X8XY2XTS/qWQBpf+/QiTgF1fHm3OvHaSd7gy7DH9T1YnMWQDmTU9+D6OL8RfioyfSuxi4TI/9pMLbfbZpoV/gwqtWHLxETVETOYbhO/CY4Pe9lWcG65wD+XwJ6k9Tp3FUYmgMVerHBCYhhNyof6aJjrFK6gL1ucr4s+e0W/eXmqOEYF7WhjwgTcXIYdWE8+iEzxeC9yWIC59jHnq/IK3Qn+1xiV8ImBE7tPZMIEc2x+WZ68T8TWLVlrx0VX48PP+4fn46MhTl9U8TgIdlqj2sKyVfKHDT/nv5L ubuntu@vpm007.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDIIV9gZqhkhqkzmUGMbfXRsm9C6b7i5+AgIJYJQkgYgVCVmb5zdobNvF/vv4+1dNoYjfSZH1ufF8IdMENRUTYTginH5oj5J98+PGG9TeC3hIsaeLpWXsUJXs54JZGvEOPBY3YRQV5pP33hS4zpyCdSYmmnmw65Oh0zf/9hp2Hyu9xasWv9HepTItbOamOc6gcc68AEQvylibmlAkZgnfGMetgTIFN/p/DmMYDM29wFK/DZh0z41fwZ7u1e1zB+wqeowf1PqVPNxRFgIbgZtUNM5ZxGTXNgrhDdN6qNlb2wMLJhgaeAB2ZruCIFt34CriGN1uSwLmCWZYgc8/YgU5aD ubuntu@vpm008.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDOTisb6GMCnmZKqAxPmaKoiOqbKJ1kjEmszr8bYI4ql4RKGbeFWlze+6NbUBKJXSfDzBwtblQprxdpbma4jDBrk/bJl5xjbTnGQuACZDAsYGwaBu/RAvtUpXOb8xvBVvk3943nPSpgGL0PbtQF2nrUbwOM89z5NKQobTiORVBhZ8dRylzUnAMF2Aj/0IfhTkgEg1wUaSJlZaV1YveuWO9KEDgxU3iN53/V2OI6CZlfsjIm5xJZdcowkwn9SCbxMBe20fb+AAr5NxhDd8fVMDx0FTGXSbNvKTW0AYu3Yk8YAJq5OaS65FVybArU5gXfa5uTOTsETLaUO8zuiMDs9OD5
Updated by Loïc Dachary about 10 years ago
This is because a daemon is still answering the socket and it won't be removed. The daemon needs to be killed. It is a new behavior introduced by the last backport to dumpling to avoid destroying admin sockets that are in use. This reveals a problem : when upgrading from dumpling to emperor where an OSD apparently starts before the existing one finishes.
Updated by Anonymous about 10 years ago
- Assignee set to Ian Colle
We should probably get a developer to look at this.
Updated by Anonymous about 10 years ago
- Priority changed from Normal to High
This issue needs to be elevated. It is stopping some of the upgrade tests from running.
I have an experimental fix that may work around this problem (I am testing right now). I am stopping all the osds before the mkfs commands that fail. I am not sure if this the right thing to do, however.
Updated by Anonymous about 10 years ago
I tried ceph killall osd but the problem still happened. I then tried ceph killall on mds and mon as well and I got umount failures.
Am I approaching this correctly? I did these ceph killall commands before the mkfs that failed originally.
Updated by Ian Colle about 10 years ago
- Assignee changed from Ian Colle to Josh Durgin
Josh - please help Warren with this.
Updated by Anonymous about 10 years ago
- Assignee changed from Josh Durgin to Anonymous
Josh noted that ceph was being installed twice. That seems to be the problem
Updated by Anonymous about 10 years ago
- Status changed from New to Closed
Correcting the yaml file fixed this problem. I am closing this because the rest of the testing is being tracked by 7606
Updated by Shambhu Rajak almost 10 years ago
Hi Warren can you please tell me what change did you do in the yaml file.
I am facing the same issue:
This is how my yaml file is looks like:
check-locks: false
overrides:
ceph:
conf:
global:
ms inject socket failures: 5000
osd:
osd sloppy crc: true
fs: xfs
roles:
- - mon.0
- osd.0
- osd.1
- osd.2
- - mon.1
- osd.3
- osd.4
- osd.5
- - mon.2
- osd.6
- osd.7
- osd.8
- client.0
targets:
ubuntu@ip-10-15-16-12: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC/Nexgf0K/z6EZgGgDuZF02EJdDiiGP9lI4W1VIGxxwmNMylhuA65lRjBcdwWf/Cr5KtNirLoTPgE35FGStmSYflyH5Eu1zCtrsam+fhoVsKHxsHF4pLInZ6o5yaWP6tNi90Khp4tT1PXCVkSCyLJ8zIe16afetNbPxnxZHbvd67aAdT8Msm6A5k0FMjVzCXSZavA9QQ9bY7GlAxJrVyzU0fBnZD8OCOFegzsygVHa8xUvDsn3b8t+XsPRd/LjoXual4UjDW6Km5DhPCcxxUwqEY6znCzGIx6YnmCxSyc/U9YkcDCufBjalMH0NITwO/CwHAN1tBUBTLmwBWL1JSkV
ubuntu@ip-10-15-16-13: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDL3XS5AL8mg+lys+xsd8RU0yZxRuqz7tEEWDRlFR3bxfBs3UCcTO0iXXbiJDzoesjn3rXymIS9BfPKhhF9U9LLuwdlUYF1WvcO+ulhHxfs2aHRZDBERvLdC+qqYFUQ19Ko9MoagVjpzicAqcKfDafnptgGLJXKkdcUlhFZdtxvxod/92aftL1r1IQZBxOh/dLTvZokKQ8R9hALYxfeeR2HYo/dZNdLa0D21HcDaemdzCqd9k/zkBcYlqbFgQYVgzn/4hwBKGWJ1HHImAdpumS2/vuaWKgNVeFo2PNT/bwxkAQTshd+2BcUDTd5HR5Kiev+Jdi1nmuO/sHTLfcrOesH
ubuntu@ip-10-15-16-26: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCreyg7S8zf5wts+zo7746Vl5+rHEsH8DSX3rerxxFs+YedSm/HfFPUUuSFhnNMw0Cn/JtH0bQRHFo04nKcA2ePlcLgQey7FToeXWHM0/dsMQgUTXzTE8U7dhNRDGI9/rS78SZ/e7ePrwp94puIBjvDI2ZEr3ZLSFHxrDArsaHQNrwGfwVb2RM2uruePZb3Z2JOesYYpeAfRVjc0DiRu0HA5MG1BQeEZxU4/QvPZXrtskvUmrqLCGN0VsNV+f7CnFVVm4G1OpRT5zV6GWhuoLbzn/j+lvEXoTBsq6M8MRvNHBtMio3B3HmpkFrXw1Elcx5mk+ClsyWnlzaAVslU4mD5
tasks:
- install: null
- ceph: null
- ceph:
log-whitelist:
- wrongly marked me down
- objects unfound and apparently lost
- thrashosds:
chance_pgnum_grow: 2
chance_pgpnum_fix: 1
timeout: 1200
- rados:
clients:
- client.0
ec_pool: true
objects: 50
op_weights:
append: 100
copy_from: 50
delete: 50
read: 100
rollback: 50
snap_create: 50
snap_remove: 50
write: 0
ops: 4000
verbose: true
Updated by Shambhu Rajak almost 10 years ago
My yaml file content format changed as i am using Redmine for the first time, putting the snippet of the yaml file again:
check-locks: false overrides: ceph: conf: global: ms inject socket failures: 5000 osd: osd sloppy crc: true fs: xfs roles: - - mon.0 - osd.0 - osd.1 - osd.2 - - mon.1 - osd.3 - osd.4 - osd.5 - - mon.2 - osd.6 - osd.7 - osd.8 - client.0 targets: ubuntu@ip-10-15-16-12: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC/Nexgf0K/z6EZgGgDuZF02EJdDiiGP9lI4W1VIGxxwmNMylhuA65lRjBcdwWf/Cr5KtNirLoTPgE35FGStmSYflyH5Eu1zCtrsam+fhoVsKHxsHF4pLInZ6o5yaWP6tNi90Khp4tT1PXCVkSCyLJ8zIe16afetNbPxnxZHbvd67aAdT8Msm6A5k0FMjVzCXSZavA9QQ9bY7GlAxJrVyzU0fBnZD8OCOFegzsygVHa8xUvDsn3b8t+XsPRd/LjoXual4UjDW6Km5DhPCcxxUwqEY6znCzGIx6YnmCxSyc/U9YkcDCufBjalMH0NITwO/CwHAN1tBUBTLmwBWL1JSkV ubuntu@ip-10-15-16-13: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDL3XS5AL8mg+lys+xsd8RU0yZxRuqz7tEEWDRlFR3bxfBs3UCcTO0iXXbiJDzoesjn3rXymIS9BfPKhhF9U9LLuwdlUYF1WvcO+ulhHxfs2aHRZDBERvLdC+qqYFUQ19Ko9MoagVjpzicAqcKfDafnptgGLJXKkdcUlhFZdtxvxod/92aftL1r1IQZBxOh/dLTvZokKQ8R9hALYxfeeR2HYo/dZNdLa0D21HcDaemdzCqd9k/zkBcYlqbFgQYVgzn/4hwBKGWJ1HHImAdpumS2/vuaWKgNVeFo2PNT/bwxkAQTshd+2BcUDTd5HR5Kiev+Jdi1nmuO/sHTLfcrOesH ubuntu@ip-10-15-16-26: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCreyg7S8zf5wts+zo7746Vl5+rHEsH8DSX3rerxxFs+YedSm/HfFPUUuSFhnNMw0Cn/JtH0bQRHFo04nKcA2ePlcLgQey7FToeXWHM0/dsMQgUTXzTE8U7dhNRDGI9/rS78SZ/e7ePrwp94puIBjvDI2ZEr3ZLSFHxrDArsaHQNrwGfwVb2RM2uruePZb3Z2JOesYYpeAfRVjc0DiRu0HA5MG1BQeEZxU4/QvPZXrtskvUmrqLCGN0VsNV+f7CnFVVm4G1OpRT5zV6GWhuoLbzn/j+lvEXoTBsq6M8MRvNHBtMio3B3HmpkFrXw1Elcx5mk+ClsyWnlzaAVslU4mD5 tasks: - install: null - ceph: null - ceph: log-whitelist: - wrongly marked me down - objects unfound and apparently lost - thrashosds: chance_pgnum_grow: 2 chance_pgpnum_fix: 1 timeout: 1200 - rados: clients: - client.0 ec_pool: true objects: 50 op_weights: append: 100 copy_from: 50 delete: 50 read: 100 rollback: 50 snap_create: 50 snap_remove: 50 write: 0 ops: 4000 verbose: true
Updated by Anonymous over 7 years ago
. . . - ceph: null - ceph: log-whitelist: . . .
should probably be:
. . . - ceph: log-whitelist: . . .