Bug #7690: Teuthology: failed to bind the Unix Domain socket - teuthology - Ceph

Actions

Copy link

Bug #7690

closed

Teuthology: failed to bind the Unix Domain socket

Added by Anonymous about 10 years ago. Updated over 7 years ago.

Status:

Closed

Priority:

High

Assignee:

Category:

% Done:

Source:

other

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Crash signature (v1):

Crash signature (v2):

Description

AdminSocketConfigObs::init: failed: AdminSocket::bind_and_listen: failed to bind the UNIX domain socket to '/var/run/ceph/ceph-osd.1.asok': (17) File exists

Actions

Copy link

Updated by Anonymous about 10 years ago

The error message here was:

2014-03-11T19:37:09.850 DEBUG:teuthology.orchestra.run:Running [10.214.138.66]: 'sudo MALLOC_CHECK_=3 adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph-osd --mkfs --mkkey -i 1 --monmap /home/ubuntu/cephtest/monmap'
2014-03-11T19:37:09.970 INFO:teuthology.orchestra.run.err:[10.214.138.66]: 2014-03-12 02:37:09.969172 7f43edcc0780 -1 asok(0x22b61c0) AdminSocketConfigObs::init: failed: AdminSocket::bind_and_listen: failed to bind the UNIX domain socket to '/var/run/ceph/ceph-osd.1.asok': (17) File exists
2014-03-11T19:37:09.970 INFO:teuthology.orchestra.run.err:[10.214.138.66]: 2014-03-12 02:37:09.969726 7f43edcc0780 -1 OSD::mkfs: FileStore::mkfs failed with error -16

The yaml file that generated this is:

roles:
- - mon.a
  - mds.a
  - osd.0
  - osd.1
- - mon.b
  - mon.c
  - osd.2
  - osd.3
- - client.0
  - client.1
tasks:
- install:
    branch: dumpling
- ceph:
    fs: xfs
- parallel:
    - workload
    - upgrade-sequence
- final-work
workload:
   sequential:
     - workunit:
         branch: dumpling
         clients:
           client.0:
             - rados/test.sh
             - cls
     - install.upgrade:
         all:
           branch: emperor
     - ceph:
         fs: xfs
     - ceph.restart: [mon.a, mon.b, mon.c, mds.a, osd.0, osd.1, osd.2, osd.3]
     - parallel:
         - workload-two
         - upgrade-sequence
workload-two:
   sequential:
     - workunit:
         branch: dumpling
         clients:
           client.0:
             - rados/test.sh
             - cls

upgrade-sequence:
   sequential:
   - install.upgrade:
       all:
         branch: emperor
   - ceph.restart: [mon.a, mon.b, mon.c, mds.a, osd.0, osd.1, osd.2, osd.3]
final-work:
  - workunit:
      clients:
        client.1:
          - rados/load-gen-mix.sh
os_type: ubuntu
os_version: "12.04" 
targets:
  ubuntu@vpm006.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCm4HgN4bngOSLVSI/i4YdFfxt7zqF2t6mj9nXsg7O/csHTWLf4X3X8XY2XTS/qWQBpf+/QiTgF1fHm3OvHaSd7gy7DH9T1YnMWQDmTU9+D6OL8RfioyfSuxi4TI/9pMLbfbZpoV/gwqtWHLxETVETOYbhO/CY4Pe9lWcG65wD+XwJ6k9Tp3FUYmgMVerHBCYhhNyof6aJjrFK6gL1ucr4s+e0W/eXmqOEYF7WhjwgTcXIYdWE8+iEzxeC9yWIC59jHnq/IK3Qn+1xiV8ImBE7tPZMIEc2x+WZ68T8TWLVlrx0VX48PP+4fn46MhTl9U8TgIdlqj2sKyVfKHDT/nv5L
  ubuntu@vpm007.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDIIV9gZqhkhqkzmUGMbfXRsm9C6b7i5+AgIJYJQkgYgVCVmb5zdobNvF/vv4+1dNoYjfSZH1ufF8IdMENRUTYTginH5oj5J98+PGG9TeC3hIsaeLpWXsUJXs54JZGvEOPBY3YRQV5pP33hS4zpyCdSYmmnmw65Oh0zf/9hp2Hyu9xasWv9HepTItbOamOc6gcc68AEQvylibmlAkZgnfGMetgTIFN/p/DmMYDM29wFK/DZh0z41fwZ7u1e1zB+wqeowf1PqVPNxRFgIbgZtUNM5ZxGTXNgrhDdN6qNlb2wMLJhgaeAB2ZruCIFt34CriGN1uSwLmCWZYgc8/YgU5aD
  ubuntu@vpm008.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDOTisb6GMCnmZKqAxPmaKoiOqbKJ1kjEmszr8bYI4ql4RKGbeFWlze+6NbUBKJXSfDzBwtblQprxdpbma4jDBrk/bJl5xjbTnGQuACZDAsYGwaBu/RAvtUpXOb8xvBVvk3943nPSpgGL0PbtQF2nrUbwOM89z5NKQobTiORVBhZ8dRylzUnAMF2Aj/0IfhTkgEg1wUaSJlZaV1YveuWO9KEDgxU3iN53/V2OI6CZlfsjIm5xJZdcowkwn9SCbxMBe20fb+AAr5NxhDd8fVMDx0FTGXSbNvKTW0AYu3Yk8YAJq5OaS65FVybArU5gXfa5uTOTsETLaUO8zuiMDs9OD5

Actions

Copy link

Updated by Loïc Dachary about 10 years ago

This is because a daemon is still answering the socket and it won't be removed. The daemon needs to be killed. It is a new behavior introduced by the last backport to dumpling to avoid destroying admin sockets that are in use. This reveals a problem : when upgrading from dumpling to emperor where an OSD apparently starts before the existing one finishes.

Actions

Copy link

Updated by Anonymous about 10 years ago

Assignee set to Ian Colle

We should probably get a developer to look at this.

Actions

Copy link

Updated by Anonymous about 10 years ago

Priority changed from Normal to High

This issue needs to be elevated. It is stopping some of the upgrade tests from running.

I have an experimental fix that may work around this problem (I am testing right now). I am stopping all the osds before the mkfs commands that fail. I am not sure if this the right thing to do, however.

Actions

Copy link

Updated by Anonymous about 10 years ago

I tried ceph killall osd but the problem still happened. I then tried ceph killall on mds and mon as well and I got umount failures.

Am I approaching this correctly? I did these ceph killall commands before the mkfs that failed originally.

Actions

Copy link

Updated by Ian Colle about 10 years ago

Assignee changed from Ian Colle to Josh Durgin

Josh - please help Warren with this.

Actions

Copy link

Updated by Anonymous about 10 years ago

Assignee changed from Josh Durgin to Anonymous

Josh noted that ceph was being installed twice. That seems to be the problem

Actions

Copy link

Updated by Anonymous about 10 years ago

Status changed from New to Closed

Correcting the yaml file fixed this problem. I am closing this because the rest of the testing is being tracked by 7606

Actions

Copy link

Updated by Shambhu Rajak almost 10 years ago

Hi Warren can you please tell me what change did you do in the yaml file.

I am facing the same issue:

This is how my yaml file is looks like:

check-locks: false
overrides:
ceph:
conf:
global:
ms inject socket failures: 5000
osd:
osd sloppy crc: true
fs: xfs
roles:
- - mon.0
- osd.0
- osd.1
- osd.2
- - mon.1
- osd.3
- osd.4
- osd.5
- - mon.2
- osd.6
- osd.7
- osd.8
- client.0
targets:
ubuntu@ip-10-15-16-12: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC/Nexgf0K/z6EZgGgDuZF02EJdDiiGP9lI4W1VIGxxwmNMylhuA65lRjBcdwWf/Cr5KtNirLoTPgE35FGStmSYflyH5Eu1zCtrsam+fhoVsKHxsHF4pLInZ6o5yaWP6tNi90Khp4tT1PXCVkSCyLJ8zIe16afetNbPxnxZHbvd67aAdT8Msm6A5k0FMjVzCXSZavA9QQ9bY7GlAxJrVyzU0fBnZD8OCOFegzsygVHa8xUvDsn3b8t+XsPRd/LjoXual4UjDW6Km5DhPCcxxUwqEY6znCzGIx6YnmCxSyc/U9YkcDCufBjalMH0NITwO/CwHAN1tBUBTLmwBWL1JSkV
ubuntu@ip-10-15-16-13: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDL3XS5AL8mg+lys+xsd8RU0yZxRuqz7tEEWDRlFR3bxfBs3UCcTO0iXXbiJDzoesjn3rXymIS9BfPKhhF9U9LLuwdlUYF1WvcO+ulhHxfs2aHRZDBERvLdC+qqYFUQ19Ko9MoagVjpzicAqcKfDafnptgGLJXKkdcUlhFZdtxvxod/92aftL1r1IQZBxOh/dLTvZokKQ8R9hALYxfeeR2HYo/dZNdLa0D21HcDaemdzCqd9k/zkBcYlqbFgQYVgzn/4hwBKGWJ1HHImAdpumS2/vuaWKgNVeFo2PNT/bwxkAQTshd+2BcUDTd5HR5Kiev+Jdi1nmuO/sHTLfcrOesH
ubuntu@ip-10-15-16-26: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCreyg7S8zf5wts+zo7746Vl5+rHEsH8DSX3rerxxFs+YedSm/HfFPUUuSFhnNMw0Cn/JtH0bQRHFo04nKcA2ePlcLgQey7FToeXWHM0/dsMQgUTXzTE8U7dhNRDGI9/rS78SZ/e7ePrwp94puIBjvDI2ZEr3ZLSFHxrDArsaHQNrwGfwVb2RM2uruePZb3Z2JOesYYpeAfRVjc0DiRu0HA5MG1BQeEZxU4/QvPZXrtskvUmrqLCGN0VsNV+f7CnFVVm4G1OpRT5zV6GWhuoLbzn/j+lvEXoTBsq6M8MRvNHBtMio3B3HmpkFrXw1Elcx5mk+ClsyWnlzaAVslU4mD5
tasks:
- install: null
- ceph: null
- ceph:
log-whitelist:
- wrongly marked me down
- objects unfound and apparently lost
- thrashosds:
chance_pgnum_grow: 2
chance_pgpnum_fix: 1
timeout: 1200
- rados:
clients:
- client.0
ec_pool: true
objects: 50
op_weights:
append: 100
copy_from: 50
delete: 50
read: 100
rollback: 50
snap_create: 50
snap_remove: 50
write: 0
ops: 4000
verbose: true

Actions

Copy link

#10

Updated by Shambhu Rajak almost 10 years ago

My yaml file content format changed as i am using Redmine for the first time, putting the snippet of the yaml file again:

check-locks: false
overrides:
  ceph:
    conf:
      global:
        ms inject socket failures: 5000
      osd:
        osd sloppy crc: true
    fs: xfs
roles:
- - mon.0
  - osd.0
  - osd.1
  - osd.2
- - mon.1
  - osd.3
  - osd.4
  - osd.5
- - mon.2
  - osd.6
  - osd.7
  - osd.8
  - client.0
targets:
  ubuntu@ip-10-15-16-12: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC/Nexgf0K/z6EZgGgDuZF02EJdDiiGP9lI4W1VIGxxwmNMylhuA65lRjBcdwWf/Cr5KtNirLoTPgE35FGStmSYflyH5Eu1zCtrsam+fhoVsKHxsHF4pLInZ6o5yaWP6tNi90Khp4tT1PXCVkSCyLJ8zIe16afetNbPxnxZHbvd67aAdT8Msm6A5k0FMjVzCXSZavA9QQ9bY7GlAxJrVyzU0fBnZD8OCOFegzsygVHa8xUvDsn3b8t+XsPRd/LjoXual4UjDW6Km5DhPCcxxUwqEY6znCzGIx6YnmCxSyc/U9YkcDCufBjalMH0NITwO/CwHAN1tBUBTLmwBWL1JSkV
  ubuntu@ip-10-15-16-13: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDL3XS5AL8mg+lys+xsd8RU0yZxRuqz7tEEWDRlFR3bxfBs3UCcTO0iXXbiJDzoesjn3rXymIS9BfPKhhF9U9LLuwdlUYF1WvcO+ulhHxfs2aHRZDBERvLdC+qqYFUQ19Ko9MoagVjpzicAqcKfDafnptgGLJXKkdcUlhFZdtxvxod/92aftL1r1IQZBxOh/dLTvZokKQ8R9hALYxfeeR2HYo/dZNdLa0D21HcDaemdzCqd9k/zkBcYlqbFgQYVgzn/4hwBKGWJ1HHImAdpumS2/vuaWKgNVeFo2PNT/bwxkAQTshd+2BcUDTd5HR5Kiev+Jdi1nmuO/sHTLfcrOesH
  ubuntu@ip-10-15-16-26: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCreyg7S8zf5wts+zo7746Vl5+rHEsH8DSX3rerxxFs+YedSm/HfFPUUuSFhnNMw0Cn/JtH0bQRHFo04nKcA2ePlcLgQey7FToeXWHM0/dsMQgUTXzTE8U7dhNRDGI9/rS78SZ/e7ePrwp94puIBjvDI2ZEr3ZLSFHxrDArsaHQNrwGfwVb2RM2uruePZb3Z2JOesYYpeAfRVjc0DiRu0HA5MG1BQeEZxU4/QvPZXrtskvUmrqLCGN0VsNV+f7CnFVVm4G1OpRT5zV6GWhuoLbzn/j+lvEXoTBsq6M8MRvNHBtMio3B3HmpkFrXw1Elcx5mk+ClsyWnlzaAVslU4mD5
tasks:
- install: null
- ceph: null
- ceph:
    log-whitelist:
    - wrongly marked me down
    - objects unfound and apparently lost
- thrashosds:
    chance_pgnum_grow: 2
    chance_pgpnum_fix: 1
    timeout: 1200
- rados:
    clients:
    - client.0
    ec_pool: true
    objects: 50
    op_weights:
      append: 100
      copy_from: 50
      delete: 50
      read: 100
      rollback: 50
      snap_create: 50
      snap_remove: 50
      write: 0
    ops: 4000
verbose: true

Actions

Copy link

#11

Updated by Anonymous over 7 years ago

.
.
.
- ceph: null
- ceph:
    log-whitelist:
.
.
.

should probably be:

.
.
.
- ceph:
    log-whitelist:
.
.
.

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Tools » teuthology

Custom queries

Bug #7690

Teuthology: failed to bind the Unix Domain socket

Updated by Anonymous about 10 years ago

Updated by Loïc Dachary about 10 years ago

Updated by Anonymous about 10 years ago

Updated by Anonymous about 10 years ago

Updated by Anonymous about 10 years ago

Updated by Ian Colle about 10 years ago

Updated by Anonymous about 10 years ago

Updated by Anonymous about 10 years ago

Updated by Shambhu Rajak almost 10 years ago

Updated by Shambhu Rajak almost 10 years ago

Updated by Anonymous over 7 years ago