Project

General

Profile

Actions

Bug #7690

closed

Teuthology: failed to bind the Unix Domain socket

Added by Anonymous about 10 years ago. Updated over 7 years ago.

Status:
Closed
Priority:
High
Assignee:
-
Category:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):

Description

AdminSocketConfigObs::init: failed: AdminSocket::bind_and_listen: failed to bind the UNIX domain socket to '/var/run/ceph/ceph-osd.1.asok': (17) File exists

Actions #1

Updated by Anonymous about 10 years ago

The error message here was:

2014-03-11T19:37:09.850 DEBUG:teuthology.orchestra.run:Running [10.214.138.66]: 'sudo MALLOC_CHECK_=3 adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph-osd --mkfs --mkkey -i 1 --monmap /home/ubuntu/cephtest/monmap'
2014-03-11T19:37:09.970 INFO:teuthology.orchestra.run.err:[10.214.138.66]: 2014-03-12 02:37:09.969172 7f43edcc0780 -1 asok(0x22b61c0) AdminSocketConfigObs::init: failed: AdminSocket::bind_and_listen: failed to bind the UNIX domain socket to '/var/run/ceph/ceph-osd.1.asok': (17) File exists
2014-03-11T19:37:09.970 INFO:teuthology.orchestra.run.err:[10.214.138.66]: 2014-03-12 02:37:09.969726 7f43edcc0780 -1 OSD::mkfs: FileStore::mkfs failed with error -16

The yaml file that generated this is:

roles:
- - mon.a
  - mds.a
  - osd.0
  - osd.1
- - mon.b
  - mon.c
  - osd.2
  - osd.3
- - client.0
  - client.1
tasks:
- install:
    branch: dumpling
- ceph:
    fs: xfs
- parallel:
    - workload
    - upgrade-sequence
- final-work
workload:
   sequential:
     - workunit:
         branch: dumpling
         clients:
           client.0:
             - rados/test.sh
             - cls
     - install.upgrade:
         all:
           branch: emperor
     - ceph:
         fs: xfs
     - ceph.restart: [mon.a, mon.b, mon.c, mds.a, osd.0, osd.1, osd.2, osd.3]
     - parallel:
         - workload-two
         - upgrade-sequence
workload-two:
   sequential:
     - workunit:
         branch: dumpling
         clients:
           client.0:
             - rados/test.sh
             - cls

upgrade-sequence:
   sequential:
   - install.upgrade:
       all:
         branch: emperor
   - ceph.restart: [mon.a, mon.b, mon.c, mds.a, osd.0, osd.1, osd.2, osd.3]
final-work:
  - workunit:
      clients:
        client.1:
          - rados/load-gen-mix.sh
os_type: ubuntu
os_version: "12.04" 
targets:
  ubuntu@vpm006.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCm4HgN4bngOSLVSI/i4YdFfxt7zqF2t6mj9nXsg7O/csHTWLf4X3X8XY2XTS/qWQBpf+/QiTgF1fHm3OvHaSd7gy7DH9T1YnMWQDmTU9+D6OL8RfioyfSuxi4TI/9pMLbfbZpoV/gwqtWHLxETVETOYbhO/CY4Pe9lWcG65wD+XwJ6k9Tp3FUYmgMVerHBCYhhNyof6aJjrFK6gL1ucr4s+e0W/eXmqOEYF7WhjwgTcXIYdWE8+iEzxeC9yWIC59jHnq/IK3Qn+1xiV8ImBE7tPZMIEc2x+WZ68T8TWLVlrx0VX48PP+4fn46MhTl9U8TgIdlqj2sKyVfKHDT/nv5L
  ubuntu@vpm007.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDIIV9gZqhkhqkzmUGMbfXRsm9C6b7i5+AgIJYJQkgYgVCVmb5zdobNvF/vv4+1dNoYjfSZH1ufF8IdMENRUTYTginH5oj5J98+PGG9TeC3hIsaeLpWXsUJXs54JZGvEOPBY3YRQV5pP33hS4zpyCdSYmmnmw65Oh0zf/9hp2Hyu9xasWv9HepTItbOamOc6gcc68AEQvylibmlAkZgnfGMetgTIFN/p/DmMYDM29wFK/DZh0z41fwZ7u1e1zB+wqeowf1PqVPNxRFgIbgZtUNM5ZxGTXNgrhDdN6qNlb2wMLJhgaeAB2ZruCIFt34CriGN1uSwLmCWZYgc8/YgU5aD
  ubuntu@vpm008.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDOTisb6GMCnmZKqAxPmaKoiOqbKJ1kjEmszr8bYI4ql4RKGbeFWlze+6NbUBKJXSfDzBwtblQprxdpbma4jDBrk/bJl5xjbTnGQuACZDAsYGwaBu/RAvtUpXOb8xvBVvk3943nPSpgGL0PbtQF2nrUbwOM89z5NKQobTiORVBhZ8dRylzUnAMF2Aj/0IfhTkgEg1wUaSJlZaV1YveuWO9KEDgxU3iN53/V2OI6CZlfsjIm5xJZdcowkwn9SCbxMBe20fb+AAr5NxhDd8fVMDx0FTGXSbNvKTW0AYu3Yk8YAJq5OaS65FVybArU5gXfa5uTOTsETLaUO8zuiMDs9OD5
Actions #2

Updated by Loïc Dachary about 10 years ago

This is because a daemon is still answering the socket and it won't be removed. The daemon needs to be killed. It is a new behavior introduced by the last backport to dumpling to avoid destroying admin sockets that are in use. This reveals a problem : when upgrading from dumpling to emperor where an OSD apparently starts before the existing one finishes.

Actions #3

Updated by Anonymous about 10 years ago

  • Assignee set to Ian Colle

We should probably get a developer to look at this.

Actions #4

Updated by Anonymous about 10 years ago

  • Priority changed from Normal to High

This issue needs to be elevated. It is stopping some of the upgrade tests from running.

I have an experimental fix that may work around this problem (I am testing right now). I am stopping all the osds before the mkfs commands that fail. I am not sure if this the right thing to do, however.

Actions #5

Updated by Anonymous about 10 years ago

I tried ceph killall osd but the problem still happened. I then tried ceph killall on mds and mon as well and I got umount failures.

Am I approaching this correctly? I did these ceph killall commands before the mkfs that failed originally.

Actions #6

Updated by Ian Colle about 10 years ago

  • Assignee changed from Ian Colle to Josh Durgin

Josh - please help Warren with this.

Actions #7

Updated by Anonymous about 10 years ago

  • Assignee changed from Josh Durgin to Anonymous

Josh noted that ceph was being installed twice. That seems to be the problem

Actions #8

Updated by Anonymous about 10 years ago

  • Status changed from New to Closed

Correcting the yaml file fixed this problem. I am closing this because the rest of the testing is being tracked by 7606

Actions #9

Updated by Shambhu Rajak almost 10 years ago

Hi Warren can you please tell me what change did you do in the yaml file.

I am facing the same issue:

This is how my yaml file is looks like:

check-locks: false
overrides:
ceph:
conf:
global:
ms inject socket failures: 5000
osd:
osd sloppy crc: true
fs: xfs
roles:
- - mon.0
- osd.0
- osd.1
- osd.2
- - mon.1
- osd.3
- osd.4
- osd.5
- - mon.2
- osd.6
- osd.7
- osd.8
- client.0
targets:
ubuntu@ip-10-15-16-12: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC/Nexgf0K/z6EZgGgDuZF02EJdDiiGP9lI4W1VIGxxwmNMylhuA65lRjBcdwWf/Cr5KtNirLoTPgE35FGStmSYflyH5Eu1zCtrsam+fhoVsKHxsHF4pLInZ6o5yaWP6tNi90Khp4tT1PXCVkSCyLJ8zIe16afetNbPxnxZHbvd67aAdT8Msm6A5k0FMjVzCXSZavA9QQ9bY7GlAxJrVyzU0fBnZD8OCOFegzsygVHa8xUvDsn3b8t+XsPRd/LjoXual4UjDW6Km5DhPCcxxUwqEY6znCzGIx6YnmCxSyc/U9YkcDCufBjalMH0NITwO/CwHAN1tBUBTLmwBWL1JSkV
ubuntu@ip-10-15-16-13: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDL3XS5AL8mg+lys+xsd8RU0yZxRuqz7tEEWDRlFR3bxfBs3UCcTO0iXXbiJDzoesjn3rXymIS9BfPKhhF9U9LLuwdlUYF1WvcO+ulhHxfs2aHRZDBERvLdC+qqYFUQ19Ko9MoagVjpzicAqcKfDafnptgGLJXKkdcUlhFZdtxvxod/92aftL1r1IQZBxOh/dLTvZokKQ8R9hALYxfeeR2HYo/dZNdLa0D21HcDaemdzCqd9k/zkBcYlqbFgQYVgzn/4hwBKGWJ1HHImAdpumS2/vuaWKgNVeFo2PNT/bwxkAQTshd+2BcUDTd5HR5Kiev+Jdi1nmuO/sHTLfcrOesH
ubuntu@ip-10-15-16-26: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCreyg7S8zf5wts+zo7746Vl5+rHEsH8DSX3rerxxFs+YedSm/HfFPUUuSFhnNMw0Cn/JtH0bQRHFo04nKcA2ePlcLgQey7FToeXWHM0/dsMQgUTXzTE8U7dhNRDGI9/rS78SZ/e7ePrwp94puIBjvDI2ZEr3ZLSFHxrDArsaHQNrwGfwVb2RM2uruePZb3Z2JOesYYpeAfRVjc0DiRu0HA5MG1BQeEZxU4/QvPZXrtskvUmrqLCGN0VsNV+f7CnFVVm4G1OpRT5zV6GWhuoLbzn/j+lvEXoTBsq6M8MRvNHBtMio3B3HmpkFrXw1Elcx5mk+ClsyWnlzaAVslU4mD5
tasks:
- install: null
- ceph: null
- ceph:
log-whitelist:
- wrongly marked me down
- objects unfound and apparently lost
- thrashosds:
chance_pgnum_grow: 2
chance_pgpnum_fix: 1
timeout: 1200
- rados:
clients:
- client.0
ec_pool: true
objects: 50
op_weights:
append: 100
copy_from: 50
delete: 50
read: 100
rollback: 50
snap_create: 50
snap_remove: 50
write: 0
ops: 4000
verbose: true

Actions #10

Updated by Shambhu Rajak almost 10 years ago

My yaml file content format changed as i am using Redmine for the first time, putting the snippet of the yaml file again:

check-locks: false
overrides:
  ceph:
    conf:
      global:
        ms inject socket failures: 5000
      osd:
        osd sloppy crc: true
    fs: xfs
roles:
- - mon.0
  - osd.0
  - osd.1
  - osd.2
- - mon.1
  - osd.3
  - osd.4
  - osd.5
- - mon.2
  - osd.6
  - osd.7
  - osd.8
  - client.0
targets:
  ubuntu@ip-10-15-16-12: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC/Nexgf0K/z6EZgGgDuZF02EJdDiiGP9lI4W1VIGxxwmNMylhuA65lRjBcdwWf/Cr5KtNirLoTPgE35FGStmSYflyH5Eu1zCtrsam+fhoVsKHxsHF4pLInZ6o5yaWP6tNi90Khp4tT1PXCVkSCyLJ8zIe16afetNbPxnxZHbvd67aAdT8Msm6A5k0FMjVzCXSZavA9QQ9bY7GlAxJrVyzU0fBnZD8OCOFegzsygVHa8xUvDsn3b8t+XsPRd/LjoXual4UjDW6Km5DhPCcxxUwqEY6znCzGIx6YnmCxSyc/U9YkcDCufBjalMH0NITwO/CwHAN1tBUBTLmwBWL1JSkV
  ubuntu@ip-10-15-16-13: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDL3XS5AL8mg+lys+xsd8RU0yZxRuqz7tEEWDRlFR3bxfBs3UCcTO0iXXbiJDzoesjn3rXymIS9BfPKhhF9U9LLuwdlUYF1WvcO+ulhHxfs2aHRZDBERvLdC+qqYFUQ19Ko9MoagVjpzicAqcKfDafnptgGLJXKkdcUlhFZdtxvxod/92aftL1r1IQZBxOh/dLTvZokKQ8R9hALYxfeeR2HYo/dZNdLa0D21HcDaemdzCqd9k/zkBcYlqbFgQYVgzn/4hwBKGWJ1HHImAdpumS2/vuaWKgNVeFo2PNT/bwxkAQTshd+2BcUDTd5HR5Kiev+Jdi1nmuO/sHTLfcrOesH
  ubuntu@ip-10-15-16-26: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCreyg7S8zf5wts+zo7746Vl5+rHEsH8DSX3rerxxFs+YedSm/HfFPUUuSFhnNMw0Cn/JtH0bQRHFo04nKcA2ePlcLgQey7FToeXWHM0/dsMQgUTXzTE8U7dhNRDGI9/rS78SZ/e7ePrwp94puIBjvDI2ZEr3ZLSFHxrDArsaHQNrwGfwVb2RM2uruePZb3Z2JOesYYpeAfRVjc0DiRu0HA5MG1BQeEZxU4/QvPZXrtskvUmrqLCGN0VsNV+f7CnFVVm4G1OpRT5zV6GWhuoLbzn/j+lvEXoTBsq6M8MRvNHBtMio3B3HmpkFrXw1Elcx5mk+ClsyWnlzaAVslU4mD5
tasks:
- install: null
- ceph: null
- ceph:
    log-whitelist:
    - wrongly marked me down
    - objects unfound and apparently lost
- thrashosds:
    chance_pgnum_grow: 2
    chance_pgpnum_fix: 1
    timeout: 1200
- rados:
    clients:
    - client.0
    ec_pool: true
    objects: 50
    op_weights:
      append: 100
      copy_from: 50
      delete: 50
      read: 100
      rollback: 50
      snap_create: 50
      snap_remove: 50
      write: 0
    ops: 4000
verbose: true

Actions #11

Updated by Anonymous over 7 years ago

.
.
.
- ceph: null
- ceph:
    log-whitelist:
.
.
.

should probably be:

.
.
.
- ceph:
    log-whitelist:
.
.
.

Actions

Also available in: Atom PDF