Project

General

Profile

Actions

Bug #45421

closed

cephadm: MaxWhileTries: Waiting for 3 mons in monmap: "unable to remove container c3ed65093dd89d593e40d2d1bbfa03c8dcb5f53ba7bdda77eacde8d9f1a9c28e after failing to start and attach to it"

Added by Brad Hubbard almost 4 years ago. Updated over 3 years ago.

Status:
Duplicate
Priority:
High
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
octopus
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

/a/bhubbard-2020-05-01_23:30:27-rados:thrash-old-clients-master-distro-basic-smithi/5014152
/a/bhubbard-2020-05-01_23:30:27-rados:thrash-old-clients-master-distro-basic-smithi/5014151

2020-05-02T23:04:24.917 INFO:teuthology.orchestra.run.smithi043:> sudo /home/ubuntu/cephtest/cephadm --image quay.io/ceph-ci/ceph:beaa4b04bc57ed43e98602e493e8a787a014b4e6 shell -c /etc/ceph/ceph.conf -k /etc/ceph/ceph.client.admin.keyring --fsid cfcaf4e8-8cc8-11ea-a068-001a4aab830c -- ceph mon dump -f json
2020-05-02T23:04:24.973 INFO:ceph.mon.c.smithi043.stdout:-- Logs begin at Sat 2020-05-02 22:48:38 UTC. --
2020-05-02T23:04:24.974 INFO:ceph.mon.c.smithi043.stdout:May 02 23:04:24 smithi043 podman[10108]: 2020-05-02 23:04:24.506439991 +0000 UTC m=+0.549861856 container create c3ed65093dd89d593e40d2d1bbfa03c8dcb5f53ba7bdda77eacde8d9f1a9c28e (image=quay.io/ceph-ci/ceph:beaa4b04bc57ed43e98602e493e8a787a014b4e6, name=ceph-cfcaf4e8-8cc8-11ea-a068-001a4aab830c-mon.c)
2020-05-02T23:04:26.463 INFO:ceph.mon.a.smithi129.stdout:May 02 23:04:26 smithi129 bash[10331]: cluster 2020-05-02T23:04:25.450994+0000 mgr.y (mgr.14142) 61 : cluster [DBG] pgmap v52: 1 pgs: 1 unknown; 0 B data, 0 B used, 0 B / 0 B avail
2020-05-02T23:04:26.463 INFO:ceph.mon.b.smithi136.stdout:May 02 23:04:26 smithi136 bash[10198]: cluster 2020-05-02T23:04:25.450994+0000 mgr.y (mgr.14142) 61 : cluster [DBG] pgmap v52: 1 pgs: 1 unknown; 0 B data, 0 B used, 0 B / 0 B avail
2020-05-02T23:04:28.467 INFO:ceph.mon.a.smithi129.stdout:May 02 23:04:28 smithi129 bash[10331]: cluster 2020-05-02T23:04:27.451441+0000 mgr.y (mgr.14142) 62 : cluster [DBG] pgmap v53: 1 pgs: 1 unknown; 0 B data, 0 B used, 0 B / 0 B avail
2020-05-02T23:04:28.468 INFO:ceph.mon.b.smithi136.stdout:May 02 23:04:28 smithi136 bash[10198]: cluster 2020-05-02T23:04:27.451441+0000 mgr.y (mgr.14142) 62 : cluster [DBG] pgmap v53: 1 pgs: 1 unknown; 0 B data, 0 B used, 0 B / 0 B avail
2020-05-02T23:04:30.472 INFO:ceph.mon.a.smithi129.stdout:May 02 23:04:30 smithi129 bash[10331]: cluster 2020-05-02T23:04:29.451910+0000 mgr.y (mgr.14142) 63 : cluster [DBG] pgmap v54: 1 pgs: 1 unknown; 0 B data, 0 B used, 0 B / 0 B avail
2020-05-02T23:04:30.473 INFO:ceph.mon.b.smithi136.stdout:May 02 23:04:30 smithi136 bash[10198]: cluster 2020-05-02T23:04:29.451910+0000 mgr.y (mgr.14142) 63 : cluster [DBG] pgmap v54: 1 pgs: 1 unknown; 0 B data, 0 B used, 0 B / 0 B avail
2020-05-02T23:04:32.476 INFO:ceph.mon.a.smithi129.stdout:May 02 23:04:32 smithi129 bash[10331]: cluster 2020-05-02T23:04:31.452355+0000 mgr.y (mgr.14142) 64 : cluster [DBG] pgmap v55: 1 pgs: 1 unknown; 0 B data, 0 B used, 0 B / 0 B avail
2020-05-02T23:04:32.477 INFO:ceph.mon.b.smithi136.stdout:May 02 23:04:32 smithi136 bash[10198]: cluster 2020-05-02T23:04:31.452355+0000 mgr.y (mgr.14142) 64 : cluster [DBG] pgmap v55: 1 pgs: 1 unknown; 0 B data, 0 B used, 0 B / 0 B avail
2020-05-02T23:04:32.497 INFO:ceph.mon.c.smithi043.stdout:May 02 23:04:32 smithi043 podman[10108]: 2020-05-02 23:04:32.500299107 +0000 UTC m=+8.543721099 container remove c3ed65093dd89d593e40d2d1bbfa03c8dcb5f53ba7bdda77eacde8d9f1a9c28e (image=quay.io/ceph-ci/ceph:beaa4b04bc57ed43e98602e493e8a787a014b4e6, name=ceph-cfcaf4e8-8cc8-11ea-a068-001a4aab830c-mon.c)
2020-05-02T23:04:32.500 INFO:ceph.mon.c.smithi043.stdout:May 02 23:04:32 smithi043 bash[10103]: time="2020-05-02T23:04:32Z" level=error msg="unable to remove container c3ed65093dd89d593e40d2d1bbfa03c8dcb5f53ba7bdda77eacde8d9f1a9c28e after failing to start and attach to it" 
2020-05-02T23:04:32.558 INFO:ceph.mon.c.smithi043.stdout:May 02 23:04:32 smithi043 bash[10103]: Error: container_linux.go:345: starting container process caused "exec: \"/usr/bin/ceph-mon\": stat /usr/bin/ceph-mon: no such file or directory" 
2020-05-02T23:04:32.559 INFO:ceph.mon.c.smithi043.stdout:May 02 23:04:32 smithi043 bash[10103]: : OCI runtime error
2020-05-02T23:04:32.579 INFO:ceph.mon.c.smithi043.stdout:May 02 23:04:32 smithi043 systemd[1]: ceph-cfcaf4e8-8cc8-11ea-a068-001a4aab830c@mon.c.service: main process exited, code=exited, status=127/n/a
2020-05-02T23:04:32.792 INFO:ceph.mon.c.smithi043.stdout:May 02 23:04:32 smithi043 podman[10373]: Error: no container with name or ID ceph-cfcaf4e8-8cc8-11ea-a068-001a4aab830c-mon.c found: no such container
2020-05-02T23:04:32.815 INFO:ceph.mon.c.smithi043.stdout:May 02 23:04:32 smithi043 systemd[1]: Unit ceph-cfcaf4e8-8cc8-11ea-a068-001a4aab830c@mon.c.service entered failed state.
2020-05-02T23:04:32.816 INFO:ceph.mon.c.smithi043.stdout:May 02 23:04:32 smithi043 systemd[1]: ceph-cfcaf4e8-8cc8-11ea-a068-001a4aab830c@mon.c.service failed.
2020-05-02T23:04:42.923 INFO:ceph.mon.c.smithi043.stdout:May 02 23:04:42 smithi043 systemd[1]: ceph-cfcaf4e8-8cc8-11ea-a068-001a4aab830c@mon.c.service holdoff time over, scheduling restart.
2020-05-02T23:04:42.923 INFO:ceph.mon.c.smithi043.stdout:May 02 23:04:42 smithi043 systemd[1]: Stopped Ceph mon.c for cfcaf4e8-8cc8-11ea-a068-001a4aab830c.
2020-05-02T23:04:42.925 INFO:ceph.mon.c.smithi043.stdout:May 02 23:04:42 smithi043 systemd[1]: Starting Ceph mon.c for cfcaf4e8-8cc8-11ea-a068-001a4aab830c...
2020-05-02T23:04:43.039 INFO:ceph.mon.c.smithi043.stdout:May 02 23:04:43 smithi043 podman[10992]: Error: no container with name or ID ceph-cfcaf4e8-8cc8-11ea-a068-001a4aab830c-mon.c found: no such container
2020-05-02T23:04:43.047 INFO:ceph.mon.c.smithi043.stdout:May 02 23:04:43 smithi043 systemd[1]: Started Ceph mon.c for cfcaf4e8-8cc8-11ea-a068-001a4aab830c.
2020-05-02T23:04:43.516 INFO:ceph.mon.a.smithi129.stdout:May 02 23:04:43 smithi129 bash[10331]: audit 2020-05-02T23:04:42.890594+0000 mon.a (mon.0) 168 : audit [DBG] from='client.? 172.21.15.43:0/3120862389' entity='client.admin' cmd=[{"prefix": "mon dump", "format": "json"}]: dispatch
2020-05-02T23:04:43.517 INFO:ceph.mon.b.smithi136.stdout:May 02 23:04:43 smithi136 bash[10198]: audit 2020-05-02T23:04:42.890594+0000 mon.a (mon.0) 168 : audit [DBG] from='client.? 172.21.15.43:0/3120862389' entity='client.admin' cmd=[{"prefix": "mon dump", "format": "json"}]: dispatch
2020-05-02T23:04:43.690 INFO:ceph.mon.c.smithi043.stdout:May 02 23:04:43 smithi043 bash[11026]: Error: error creating container storage: the container name "ceph-cfcaf4e8-8cc8-11ea-a068-001a4aab830c-mon.c" is already in use by "c3ed65093dd89d593e40d2d1bbfa03c8dcb5f53ba7bdda77eacde8d9f1a9c28e". You have to remove that container to be able to reuse that name.: that name is already in use
2020-05-02T23:04:43.696 INFO:ceph.mon.c.smithi043.stdout:May 02 23:04:43 smithi043 systemd[1]: ceph-cfcaf4e8-8cc8-11ea-a068-001a4aab830c@mon.c.service: main process exited, code=exited, status=125/n/a
2020-05-02T23:04:43.805 INFO:ceph.mon.c.smithi043.stdout:May 02 23:04:43 smithi043 podman[11053]: Error: no container with name or ID ceph-cfcaf4e8-8cc8-11ea-a068-001a4aab830c-mon.c found: no such container
2020-05-02T23:04:43.818 INFO:ceph.mon.c.smithi043.stdout:May 02 23:04:43 smithi043 systemd[1]: Unit ceph-cfcaf4e8-8cc8-11ea-a068-001a4aab830c@mon.c.service entered failed state.
2020-05-02T23:04:43.819 INFO:ceph.mon.c.smithi043.stdout:May 02 23:04:43 smithi043 systemd[1]: ceph-cfcaf4e8-8cc8-11ea-a068-001a4aab830c@mon.c.service failed.
2020-05-02T23:04:44.518 INFO:ceph.mon.a.smithi129.stdout:May 02 23:04:44 smithi129 bash[10331]: cluster 2020-05-02T23:04:43.455211+0000 mgr.y (mgr.14142) 70 : cluster [DBG] pgmap v61: 1 pgs: 1 unknown; 0 B data, 0 B used, 0 B / 0 B avail
2020-05-02T23:04:44.519 INFO:ceph.mon.b.smithi136.stdout:May 02 23:04:44 smithi136 bash[10198]: cluster 2020-05-02T23:04:43.455211+0000 mgr.y (mgr.14142) 70 : cluster [DBG] pgmap v61: 1 pgs: 1 unknown; 0 B data, 0 B used, 0 B / 0 B avail
2020-05-02T23:04:44.694 INFO:tasks.cephadm:Waiting for 3 mons in monmap...
2020-05-02T23:16:38.351 INFO:tasks.cephadm:Waiting for 3 mons in monmap...
2020-05-02T23:16:38.351 INFO:teuthology.orchestra.run.smithi043:> true
2020-05-02T23:16:38.467 INFO:teuthology.orchestra.run.smithi043:> sudo /home/ubuntu/cephtest/cephadm --image quay.io/ceph-ci/ceph:beaa4b04bc57ed43e98602e493e8a787a014b4e6 shell -c /etc/ceph/ceph.conf -k /etc/ceph/ceph.client.admin.keyring --fsid cfcaf4e8-8cc8-11ea-a068-001a4aab830c -- ceph mon dump -f json
2020-05-02T23:16:38.680 INFO:ceph.mon.a.smithi129.stdout:May 02 23:16:38 smithi129 bash[10331]: cluster 2020-05-02T23:16:37.628693+0000 mgr.y (mgr.14142) 427 : cluster [DBG] pgmap v418: 1 pgs: 1 unknown; 0 B data, 0 B used, 0 B / 0 B avail
2020-05-02T23:16:38.682 INFO:ceph.mon.b.smithi136.stdout:May 02 23:16:38 smithi136 bash[10198]: cluster 2020-05-02T23:16:37.628693+0000 mgr.y (mgr.14142) 427 : cluster [DBG] pgmap v418: 1 pgs: 1 unknown; 0 B data, 0 B used, 0 B / 0 B avail
2020-05-02T23:16:40.685 INFO:ceph.mon.a.smithi129.stdout:May 02 23:16:40 smithi129 bash[10331]: cluster 2020-05-02T23:16:39.629217+0000 mgr.y (mgr.14142) 428 : cluster [DBG] pgmap v419: 1 pgs: 1 unknown; 0 B data, 0 B used, 0 B / 0 B avail
2020-05-02T23:16:40.686 INFO:ceph.mon.b.smithi136.stdout:May 02 23:16:40 smithi136 bash[10198]: cluster 2020-05-02T23:16:39.629217+0000 mgr.y (mgr.14142) 428 : cluster [DBG] pgmap v419: 1 pgs: 1 unknown; 0 B data, 0 B used, 0 B / 0 B avail
2020-05-02T23:16:40.742 INFO:teuthology.orchestra.run.smithi043.stdout:
2020-05-02T23:16:40.742 INFO:teuthology.orchestra.run.smithi043.stdout:{"epoch":2,"fsid":"cfcaf4e8-8cc8-11ea-a068-001a4aab830c","modified":"2020-05-02T23:04:12.125642Z","created":"2020-05-02T23:01:50.425360Z","min_mon_release":16,"min_mon_release_name":"pacific","features":{"persistent":["kraken","luminous","mimic","osdmap-prune","nautilus","octopus","pacific"],"optional":[]},"mons":[{"rank":0,"name":"a","public_addrs":{"addrvec":[{"type":"v2","addr":"172.21.15.129:3300","nonce":0},{"type":"v1","addr":"172.21.15.129:6789","nonce":0}]},"addr":"172.21.15.129:6789/0","public_addr":"172.21.15.129:6789/0","priority":0,"weight":0},{"rank":1,"name":"b","public_addrs":{"addrvec":[{"type":"v2","addr":"172.21.15.136:3300","nonce":0},{"type":"v1","addr":"172.21.15.136:6789","nonce":0}]},"addr":"172.21.15.136:6789/0","public_addr":"172.21.15.136:6789/0","priority":0,"weight":0}],"quorum":[0,1]}
2020-05-02T23:16:40.744 INFO:teuthology.orchestra.run.smithi043.stderr:dumped monmap epoch 2
2020-05-02T23:16:41.378 ERROR:teuthology.contextutil:Saw exception from nested tasks
Traceback (most recent call last):
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/contextutil.py", line 32, in nested
    vars.append(enter())
  File "/usr/lib/python3.6/contextlib.py", line 81, in __enter__
    return next(self.gen)
  File "/home/teuthworker/src/github.com_ceph_ceph_master/qa/tasks/cephadm.py", line 495, in ceph_mons
    while proceed():
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/contextutil.py", line 134, in __call__
    raise MaxWhileTries(error_msg)
teuthology.exceptions.MaxWhileTries: reached maximum tries (180) after waiting for 180 seconds

Related issues 1 (0 open1 closed)

Is duplicate of Orchestrator - Bug #44990: cephadm: exec: "/usr/bin/ceph-mon": stat /usr/bin/ceph-mon: no such file or directoryCan't reproduceSebastian Wagner

Actions
Actions #1

Updated by Brad Hubbard almost 4 years ago

  • Description updated (diff)
Actions #2

Updated by Brad Hubbard almost 4 years ago

  • Subject changed from MaxWhileTries: Waiting for 3 mons in monmap: "unable to remove container c3ed65093dd89d593e40d2d1bbfa03c8dcb5f53ba7bdda77eacde8d9f1a9c28e after failing to start and attach to it" to cephadm: MaxWhileTries: Waiting for 3 mons in monmap: "unable to remove container c3ed65093dd89d593e40d2d1bbfa03c8dcb5f53ba7bdda77eacde8d9f1a9c28e after failing to start and attach to it"

/a/yuriw-2020-05-05_15:20:13-rados-wip-yuri8-testing-2020-05-04-2117-octopus-distro-basic-smithi/5024853

Actions #3

Updated by Sebastian Wagner almost 4 years ago

  • Is duplicate of Bug #44990: cephadm: exec: "/usr/bin/ceph-mon": stat /usr/bin/ceph-mon: no such file or directory added
Actions #4

Updated by Sebastian Wagner almost 4 years ago

  • Status changed from New to Duplicate
Actions #5

Updated by Kefu Chai almost 4 years ago

/a/bhubbard-2020-05-01_23:30:27-rados:thrash-old-clients-master-distro-basic-smithi/5014074

2020-05-02T21:20:34.536 DEBUG:teuthology.orchestra.console:pexpect command: ipmitool -H smithi084.ipmi.sepia.ceph.com -I lanplus -U inktank -P ApGNXcA7 power status
2020-05-02T21:20:34.649 WARNING:teuthology.contextutil:'wait for power on' reached maximum tries (5) after waiting for 20.0 seconds
2020-05-02T21:20:34.751 ERROR:teuthology.orchestra.console:Failed to power on smithi084
2020-05-02T21:20:34.853 INFO:teuthology.provision.fog.smithi084:Waiting for deploy to finish
...
2020-05-02T21:35:24.974 ERROR:teuthology.run_tasks:Saw exception from tasks.
Traceback (most recent call last):
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/run_tasks.py", line 90, in run_tasks
    manager.__enter__()
  File "/usr/lib/python3.6/contextlib.py", line 81, in __enter__
    return next(self.gen)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/task/internal/lock_machines.py", line 78, in lock_machines
    os_version, arch)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/lock/ops.py", line 145, in lock_many
    reimaged[machine] = machines[machine]
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/parallel.py", line 87, in __exit__
    for result in self:
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/parallel.py", line 101, in __next__
    resurrect_traceback(result)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/parallel.py", line 37, in resurrect_traceback
    reraise(*exc_info)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/virtualenv/lib/python3.6/site-packages/six.py", line 703, in reraise
    raise value
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/parallel.py", line 24, in capture_traceback
    return func(*args, **kwargs)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/provision/__init__.py", line 39, in reimage
    return obj.create()
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/provision/fog.py", line 85, in create
    self.wait_for_deploy_task(task_id)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/provision/fog.py", line 243, in wait_for_deploy_task
    while proceed():
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/contextutil.py", line 134, in __call__
    raise MaxWhileTries(error_msg)
teuthology.exceptions.MaxWhileTries: reached maximum tries (60) after waiting for 900 seconds
Actions #6

Updated by Brad Hubbard almost 4 years ago

  • Description updated (diff)
Actions #7

Updated by Brad Hubbard almost 4 years ago

Removed /a/bhubbard-2020-05-01_23:30:27-rados:thrash-old-clients-master-distro-basic-smithi/5014074 since it's a different failure.

Actions #8

Updated by Sebastian Wagner almost 4 years ago

I'm trying since ages to reproduce this one, but failed so far. See my attempts at https://github.com/ceph/ceph/pull/35018

Brad, how fast do you schedule Teuthology runs after shaman has completed the builds?

Actions #9

Updated by Brad Hubbard almost 4 years ago

I have no idea about the timing of this run other than to say as a general rule I wait until the shaman build page is all green and could schedule the teuthology run any time after that. I would guess that in this case it would not have been long after the shaman page was all green.

Actions #10

Updated by Brad Hubbard almost 4 years ago

  • Backport set to octopus, nautilus
Actions #11

Updated by Brad Hubbard almost 4 years ago

  • Backport deleted (octopus, nautilus)
Actions #13

Updated by Brad Hubbard over 3 years ago

  • Backport set to octopus
Actions #14

Updated by Brad Hubbard over 3 years ago

Sebastian, Can you confirm whether you still think this duplicates Bug #44990 ?

Actions #15

Updated by Deepika Upadhyay over 3 years ago

seen this in recent run on octopus:

/a/yuriw-2020-09-23_15:16:58-rados-wip-yuri-testing-2020-09-22-1332-octopus-distro-basic-smithi/5462982/teuthology.log
/a/yuriw-2020-09-23_15:16:58-rados-wip-yuri-testing-2020-09-22-1332-octopus-distro-basic-smithi/5463009/
/a/yuriw-2020-09-23_15:16:58-rados-wip-yuri-testing-2020-09-22-1332-octopus-distro-basic-smithi/5463036/teuthology.log


96, name=ceph-e6d91c94-fdb8-11ea-a2a9-001a4aab830c-mon.c)
2020-09-23T16:25:23.926 INFO:journalctl@ceph.mon.c.smithi169.stdout:Sep 23 16:25:23 smithi169 bash[8472]: time="2020-09-23T16:25:23Z" level=error msg="unable to remove container 493506a780c0cc778087467a93b3fb5e27ed2d392f56ca504b93ed73ef5a4363 after failing to start and attach to it" 
2020-09-23T16:25:23.981 INFO:journalctl@ceph.mon.c.smithi169.stdout:Sep 23 16:25:24 smithi169 bash[8472]: Error: container_linux.go:345: starting container process caused "exec: \"/usr/bin/ceph-mon\": stat /usr/bin/ceph-mon: no such file or directory" 
2020-09-23T16:25:23.981 INFO:journalctl@ceph.mon.c.smithi169.stdout:Sep 23 16:25:24 smithi169 bash[8472]: : OCI runtime error
Actions #17

Updated by Deepika Upadhyay over 3 years ago

/a/yuriw-2020-10-05_22:17:06-rados-wip-yuri7-testing-2020-10-05-1338-octopus-distro-basic-smithi/5500303/teuthology.log

Actions

Also available in: Atom PDF