Bug #45421
closedcephadm: MaxWhileTries: Waiting for 3 mons in monmap: "unable to remove container c3ed65093dd89d593e40d2d1bbfa03c8dcb5f53ba7bdda77eacde8d9f1a9c28e after failing to start and attach to it"
0%
Description
/a/bhubbard-2020-05-01_23:30:27-rados:thrash-old-clients-master-distro-basic-smithi/5014152
/a/bhubbard-2020-05-01_23:30:27-rados:thrash-old-clients-master-distro-basic-smithi/5014151
2020-05-02T23:04:24.917 INFO:teuthology.orchestra.run.smithi043:> sudo /home/ubuntu/cephtest/cephadm --image quay.io/ceph-ci/ceph:beaa4b04bc57ed43e98602e493e8a787a014b4e6 shell -c /etc/ceph/ceph.conf -k /etc/ceph/ceph.client.admin.keyring --fsid cfcaf4e8-8cc8-11ea-a068-001a4aab830c -- ceph mon dump -f json 2020-05-02T23:04:24.973 INFO:ceph.mon.c.smithi043.stdout:-- Logs begin at Sat 2020-05-02 22:48:38 UTC. -- 2020-05-02T23:04:24.974 INFO:ceph.mon.c.smithi043.stdout:May 02 23:04:24 smithi043 podman[10108]: 2020-05-02 23:04:24.506439991 +0000 UTC m=+0.549861856 container create c3ed65093dd89d593e40d2d1bbfa03c8dcb5f53ba7bdda77eacde8d9f1a9c28e (image=quay.io/ceph-ci/ceph:beaa4b04bc57ed43e98602e493e8a787a014b4e6, name=ceph-cfcaf4e8-8cc8-11ea-a068-001a4aab830c-mon.c) 2020-05-02T23:04:26.463 INFO:ceph.mon.a.smithi129.stdout:May 02 23:04:26 smithi129 bash[10331]: cluster 2020-05-02T23:04:25.450994+0000 mgr.y (mgr.14142) 61 : cluster [DBG] pgmap v52: 1 pgs: 1 unknown; 0 B data, 0 B used, 0 B / 0 B avail 2020-05-02T23:04:26.463 INFO:ceph.mon.b.smithi136.stdout:May 02 23:04:26 smithi136 bash[10198]: cluster 2020-05-02T23:04:25.450994+0000 mgr.y (mgr.14142) 61 : cluster [DBG] pgmap v52: 1 pgs: 1 unknown; 0 B data, 0 B used, 0 B / 0 B avail 2020-05-02T23:04:28.467 INFO:ceph.mon.a.smithi129.stdout:May 02 23:04:28 smithi129 bash[10331]: cluster 2020-05-02T23:04:27.451441+0000 mgr.y (mgr.14142) 62 : cluster [DBG] pgmap v53: 1 pgs: 1 unknown; 0 B data, 0 B used, 0 B / 0 B avail 2020-05-02T23:04:28.468 INFO:ceph.mon.b.smithi136.stdout:May 02 23:04:28 smithi136 bash[10198]: cluster 2020-05-02T23:04:27.451441+0000 mgr.y (mgr.14142) 62 : cluster [DBG] pgmap v53: 1 pgs: 1 unknown; 0 B data, 0 B used, 0 B / 0 B avail 2020-05-02T23:04:30.472 INFO:ceph.mon.a.smithi129.stdout:May 02 23:04:30 smithi129 bash[10331]: cluster 2020-05-02T23:04:29.451910+0000 mgr.y (mgr.14142) 63 : cluster [DBG] pgmap v54: 1 pgs: 1 unknown; 0 B data, 0 B used, 0 B / 0 B avail 2020-05-02T23:04:30.473 INFO:ceph.mon.b.smithi136.stdout:May 02 23:04:30 smithi136 bash[10198]: cluster 2020-05-02T23:04:29.451910+0000 mgr.y (mgr.14142) 63 : cluster [DBG] pgmap v54: 1 pgs: 1 unknown; 0 B data, 0 B used, 0 B / 0 B avail 2020-05-02T23:04:32.476 INFO:ceph.mon.a.smithi129.stdout:May 02 23:04:32 smithi129 bash[10331]: cluster 2020-05-02T23:04:31.452355+0000 mgr.y (mgr.14142) 64 : cluster [DBG] pgmap v55: 1 pgs: 1 unknown; 0 B data, 0 B used, 0 B / 0 B avail 2020-05-02T23:04:32.477 INFO:ceph.mon.b.smithi136.stdout:May 02 23:04:32 smithi136 bash[10198]: cluster 2020-05-02T23:04:31.452355+0000 mgr.y (mgr.14142) 64 : cluster [DBG] pgmap v55: 1 pgs: 1 unknown; 0 B data, 0 B used, 0 B / 0 B avail 2020-05-02T23:04:32.497 INFO:ceph.mon.c.smithi043.stdout:May 02 23:04:32 smithi043 podman[10108]: 2020-05-02 23:04:32.500299107 +0000 UTC m=+8.543721099 container remove c3ed65093dd89d593e40d2d1bbfa03c8dcb5f53ba7bdda77eacde8d9f1a9c28e (image=quay.io/ceph-ci/ceph:beaa4b04bc57ed43e98602e493e8a787a014b4e6, name=ceph-cfcaf4e8-8cc8-11ea-a068-001a4aab830c-mon.c) 2020-05-02T23:04:32.500 INFO:ceph.mon.c.smithi043.stdout:May 02 23:04:32 smithi043 bash[10103]: time="2020-05-02T23:04:32Z" level=error msg="unable to remove container c3ed65093dd89d593e40d2d1bbfa03c8dcb5f53ba7bdda77eacde8d9f1a9c28e after failing to start and attach to it" 2020-05-02T23:04:32.558 INFO:ceph.mon.c.smithi043.stdout:May 02 23:04:32 smithi043 bash[10103]: Error: container_linux.go:345: starting container process caused "exec: \"/usr/bin/ceph-mon\": stat /usr/bin/ceph-mon: no such file or directory" 2020-05-02T23:04:32.559 INFO:ceph.mon.c.smithi043.stdout:May 02 23:04:32 smithi043 bash[10103]: : OCI runtime error 2020-05-02T23:04:32.579 INFO:ceph.mon.c.smithi043.stdout:May 02 23:04:32 smithi043 systemd[1]: ceph-cfcaf4e8-8cc8-11ea-a068-001a4aab830c@mon.c.service: main process exited, code=exited, status=127/n/a 2020-05-02T23:04:32.792 INFO:ceph.mon.c.smithi043.stdout:May 02 23:04:32 smithi043 podman[10373]: Error: no container with name or ID ceph-cfcaf4e8-8cc8-11ea-a068-001a4aab830c-mon.c found: no such container 2020-05-02T23:04:32.815 INFO:ceph.mon.c.smithi043.stdout:May 02 23:04:32 smithi043 systemd[1]: Unit ceph-cfcaf4e8-8cc8-11ea-a068-001a4aab830c@mon.c.service entered failed state. 2020-05-02T23:04:32.816 INFO:ceph.mon.c.smithi043.stdout:May 02 23:04:32 smithi043 systemd[1]: ceph-cfcaf4e8-8cc8-11ea-a068-001a4aab830c@mon.c.service failed.
2020-05-02T23:04:42.923 INFO:ceph.mon.c.smithi043.stdout:May 02 23:04:42 smithi043 systemd[1]: ceph-cfcaf4e8-8cc8-11ea-a068-001a4aab830c@mon.c.service holdoff time over, scheduling restart. 2020-05-02T23:04:42.923 INFO:ceph.mon.c.smithi043.stdout:May 02 23:04:42 smithi043 systemd[1]: Stopped Ceph mon.c for cfcaf4e8-8cc8-11ea-a068-001a4aab830c. 2020-05-02T23:04:42.925 INFO:ceph.mon.c.smithi043.stdout:May 02 23:04:42 smithi043 systemd[1]: Starting Ceph mon.c for cfcaf4e8-8cc8-11ea-a068-001a4aab830c... 2020-05-02T23:04:43.039 INFO:ceph.mon.c.smithi043.stdout:May 02 23:04:43 smithi043 podman[10992]: Error: no container with name or ID ceph-cfcaf4e8-8cc8-11ea-a068-001a4aab830c-mon.c found: no such container 2020-05-02T23:04:43.047 INFO:ceph.mon.c.smithi043.stdout:May 02 23:04:43 smithi043 systemd[1]: Started Ceph mon.c for cfcaf4e8-8cc8-11ea-a068-001a4aab830c. 2020-05-02T23:04:43.516 INFO:ceph.mon.a.smithi129.stdout:May 02 23:04:43 smithi129 bash[10331]: audit 2020-05-02T23:04:42.890594+0000 mon.a (mon.0) 168 : audit [DBG] from='client.? 172.21.15.43:0/3120862389' entity='client.admin' cmd=[{"prefix": "mon dump", "format": "json"}]: dispatch 2020-05-02T23:04:43.517 INFO:ceph.mon.b.smithi136.stdout:May 02 23:04:43 smithi136 bash[10198]: audit 2020-05-02T23:04:42.890594+0000 mon.a (mon.0) 168 : audit [DBG] from='client.? 172.21.15.43:0/3120862389' entity='client.admin' cmd=[{"prefix": "mon dump", "format": "json"}]: dispatch 2020-05-02T23:04:43.690 INFO:ceph.mon.c.smithi043.stdout:May 02 23:04:43 smithi043 bash[11026]: Error: error creating container storage: the container name "ceph-cfcaf4e8-8cc8-11ea-a068-001a4aab830c-mon.c" is already in use by "c3ed65093dd89d593e40d2d1bbfa03c8dcb5f53ba7bdda77eacde8d9f1a9c28e". You have to remove that container to be able to reuse that name.: that name is already in use 2020-05-02T23:04:43.696 INFO:ceph.mon.c.smithi043.stdout:May 02 23:04:43 smithi043 systemd[1]: ceph-cfcaf4e8-8cc8-11ea-a068-001a4aab830c@mon.c.service: main process exited, code=exited, status=125/n/a 2020-05-02T23:04:43.805 INFO:ceph.mon.c.smithi043.stdout:May 02 23:04:43 smithi043 podman[11053]: Error: no container with name or ID ceph-cfcaf4e8-8cc8-11ea-a068-001a4aab830c-mon.c found: no such container 2020-05-02T23:04:43.818 INFO:ceph.mon.c.smithi043.stdout:May 02 23:04:43 smithi043 systemd[1]: Unit ceph-cfcaf4e8-8cc8-11ea-a068-001a4aab830c@mon.c.service entered failed state. 2020-05-02T23:04:43.819 INFO:ceph.mon.c.smithi043.stdout:May 02 23:04:43 smithi043 systemd[1]: ceph-cfcaf4e8-8cc8-11ea-a068-001a4aab830c@mon.c.service failed. 2020-05-02T23:04:44.518 INFO:ceph.mon.a.smithi129.stdout:May 02 23:04:44 smithi129 bash[10331]: cluster 2020-05-02T23:04:43.455211+0000 mgr.y (mgr.14142) 70 : cluster [DBG] pgmap v61: 1 pgs: 1 unknown; 0 B data, 0 B used, 0 B / 0 B avail 2020-05-02T23:04:44.519 INFO:ceph.mon.b.smithi136.stdout:May 02 23:04:44 smithi136 bash[10198]: cluster 2020-05-02T23:04:43.455211+0000 mgr.y (mgr.14142) 70 : cluster [DBG] pgmap v61: 1 pgs: 1 unknown; 0 B data, 0 B used, 0 B / 0 B avail 2020-05-02T23:04:44.694 INFO:tasks.cephadm:Waiting for 3 mons in monmap...
2020-05-02T23:16:38.351 INFO:tasks.cephadm:Waiting for 3 mons in monmap... 2020-05-02T23:16:38.351 INFO:teuthology.orchestra.run.smithi043:> true 2020-05-02T23:16:38.467 INFO:teuthology.orchestra.run.smithi043:> sudo /home/ubuntu/cephtest/cephadm --image quay.io/ceph-ci/ceph:beaa4b04bc57ed43e98602e493e8a787a014b4e6 shell -c /etc/ceph/ceph.conf -k /etc/ceph/ceph.client.admin.keyring --fsid cfcaf4e8-8cc8-11ea-a068-001a4aab830c -- ceph mon dump -f json 2020-05-02T23:16:38.680 INFO:ceph.mon.a.smithi129.stdout:May 02 23:16:38 smithi129 bash[10331]: cluster 2020-05-02T23:16:37.628693+0000 mgr.y (mgr.14142) 427 : cluster [DBG] pgmap v418: 1 pgs: 1 unknown; 0 B data, 0 B used, 0 B / 0 B avail 2020-05-02T23:16:38.682 INFO:ceph.mon.b.smithi136.stdout:May 02 23:16:38 smithi136 bash[10198]: cluster 2020-05-02T23:16:37.628693+0000 mgr.y (mgr.14142) 427 : cluster [DBG] pgmap v418: 1 pgs: 1 unknown; 0 B data, 0 B used, 0 B / 0 B avail 2020-05-02T23:16:40.685 INFO:ceph.mon.a.smithi129.stdout:May 02 23:16:40 smithi129 bash[10331]: cluster 2020-05-02T23:16:39.629217+0000 mgr.y (mgr.14142) 428 : cluster [DBG] pgmap v419: 1 pgs: 1 unknown; 0 B data, 0 B used, 0 B / 0 B avail 2020-05-02T23:16:40.686 INFO:ceph.mon.b.smithi136.stdout:May 02 23:16:40 smithi136 bash[10198]: cluster 2020-05-02T23:16:39.629217+0000 mgr.y (mgr.14142) 428 : cluster [DBG] pgmap v419: 1 pgs: 1 unknown; 0 B data, 0 B used, 0 B / 0 B avail 2020-05-02T23:16:40.742 INFO:teuthology.orchestra.run.smithi043.stdout: 2020-05-02T23:16:40.742 INFO:teuthology.orchestra.run.smithi043.stdout:{"epoch":2,"fsid":"cfcaf4e8-8cc8-11ea-a068-001a4aab830c","modified":"2020-05-02T23:04:12.125642Z","created":"2020-05-02T23:01:50.425360Z","min_mon_release":16,"min_mon_release_name":"pacific","features":{"persistent":["kraken","luminous","mimic","osdmap-prune","nautilus","octopus","pacific"],"optional":[]},"mons":[{"rank":0,"name":"a","public_addrs":{"addrvec":[{"type":"v2","addr":"172.21.15.129:3300","nonce":0},{"type":"v1","addr":"172.21.15.129:6789","nonce":0}]},"addr":"172.21.15.129:6789/0","public_addr":"172.21.15.129:6789/0","priority":0,"weight":0},{"rank":1,"name":"b","public_addrs":{"addrvec":[{"type":"v2","addr":"172.21.15.136:3300","nonce":0},{"type":"v1","addr":"172.21.15.136:6789","nonce":0}]},"addr":"172.21.15.136:6789/0","public_addr":"172.21.15.136:6789/0","priority":0,"weight":0}],"quorum":[0,1]} 2020-05-02T23:16:40.744 INFO:teuthology.orchestra.run.smithi043.stderr:dumped monmap epoch 2 2020-05-02T23:16:41.378 ERROR:teuthology.contextutil:Saw exception from nested tasks Traceback (most recent call last): File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/contextutil.py", line 32, in nested vars.append(enter()) File "/usr/lib/python3.6/contextlib.py", line 81, in __enter__ return next(self.gen) File "/home/teuthworker/src/github.com_ceph_ceph_master/qa/tasks/cephadm.py", line 495, in ceph_mons while proceed(): File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/contextutil.py", line 134, in __call__ raise MaxWhileTries(error_msg) teuthology.exceptions.MaxWhileTries: reached maximum tries (180) after waiting for 180 seconds
Updated by Brad Hubbard almost 4 years ago
- Subject changed from MaxWhileTries: Waiting for 3 mons in monmap: "unable to remove container c3ed65093dd89d593e40d2d1bbfa03c8dcb5f53ba7bdda77eacde8d9f1a9c28e after failing to start and attach to it" to cephadm: MaxWhileTries: Waiting for 3 mons in monmap: "unable to remove container c3ed65093dd89d593e40d2d1bbfa03c8dcb5f53ba7bdda77eacde8d9f1a9c28e after failing to start and attach to it"
/a/yuriw-2020-05-05_15:20:13-rados-wip-yuri8-testing-2020-05-04-2117-octopus-distro-basic-smithi/5024853
Updated by Sebastian Wagner almost 4 years ago
- Is duplicate of Bug #44990: cephadm: exec: "/usr/bin/ceph-mon": stat /usr/bin/ceph-mon: no such file or directory added
Updated by Sebastian Wagner almost 4 years ago
- Status changed from New to Duplicate
Updated by Kefu Chai almost 4 years ago
/a/bhubbard-2020-05-01_23:30:27-rados:thrash-old-clients-master-distro-basic-smithi/5014074
2020-05-02T21:20:34.536 DEBUG:teuthology.orchestra.console:pexpect command: ipmitool -H smithi084.ipmi.sepia.ceph.com -I lanplus -U inktank -P ApGNXcA7 power status 2020-05-02T21:20:34.649 WARNING:teuthology.contextutil:'wait for power on' reached maximum tries (5) after waiting for 20.0 seconds 2020-05-02T21:20:34.751 ERROR:teuthology.orchestra.console:Failed to power on smithi084 2020-05-02T21:20:34.853 INFO:teuthology.provision.fog.smithi084:Waiting for deploy to finish ... 2020-05-02T21:35:24.974 ERROR:teuthology.run_tasks:Saw exception from tasks. Traceback (most recent call last): File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/run_tasks.py", line 90, in run_tasks manager.__enter__() File "/usr/lib/python3.6/contextlib.py", line 81, in __enter__ return next(self.gen) File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/task/internal/lock_machines.py", line 78, in lock_machines os_version, arch) File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/lock/ops.py", line 145, in lock_many reimaged[machine] = machines[machine] File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/parallel.py", line 87, in __exit__ for result in self: File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/parallel.py", line 101, in __next__ resurrect_traceback(result) File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/parallel.py", line 37, in resurrect_traceback reraise(*exc_info) File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/virtualenv/lib/python3.6/site-packages/six.py", line 703, in reraise raise value File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/parallel.py", line 24, in capture_traceback return func(*args, **kwargs) File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/provision/__init__.py", line 39, in reimage return obj.create() File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/provision/fog.py", line 85, in create self.wait_for_deploy_task(task_id) File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/provision/fog.py", line 243, in wait_for_deploy_task while proceed(): File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/contextutil.py", line 134, in __call__ raise MaxWhileTries(error_msg) teuthology.exceptions.MaxWhileTries: reached maximum tries (60) after waiting for 900 seconds
Updated by Brad Hubbard almost 4 years ago
Removed /a/bhubbard-2020-05-01_23:30:27-rados:thrash-old-clients-master-distro-basic-smithi/5014074 since it's a different failure.
Updated by Sebastian Wagner almost 4 years ago
I'm trying since ages to reproduce this one, but failed so far. See my attempts at https://github.com/ceph/ceph/pull/35018
Brad, how fast do you schedule Teuthology runs after shaman has completed the builds?
Updated by Brad Hubbard almost 4 years ago
I have no idea about the timing of this run other than to say as a general rule I wait until the shaman build page is all green and could schedule the teuthology run any time after that. I would guess that in this case it would not have been long after the shaman page was all green.
Updated by Brad Hubbard almost 4 years ago
- Backport deleted (
octopus, nautilus)
Updated by Brad Hubbard over 3 years ago
Sebastian, Can you confirm whether you still think this duplicates Bug #44990 ?
Updated by Deepika Upadhyay over 3 years ago
seen this in recent run on octopus:
/a/yuriw-2020-09-23_15:16:58-rados-wip-yuri-testing-2020-09-22-1332-octopus-distro-basic-smithi/5462982/teuthology.log
/a/yuriw-2020-09-23_15:16:58-rados-wip-yuri-testing-2020-09-22-1332-octopus-distro-basic-smithi/5463009/
/a/yuriw-2020-09-23_15:16:58-rados-wip-yuri-testing-2020-09-22-1332-octopus-distro-basic-smithi/5463036/teuthology.log
96, name=ceph-e6d91c94-fdb8-11ea-a2a9-001a4aab830c-mon.c) 2020-09-23T16:25:23.926 INFO:journalctl@ceph.mon.c.smithi169.stdout:Sep 23 16:25:23 smithi169 bash[8472]: time="2020-09-23T16:25:23Z" level=error msg="unable to remove container 493506a780c0cc778087467a93b3fb5e27ed2d392f56ca504b93ed73ef5a4363 after failing to start and attach to it" 2020-09-23T16:25:23.981 INFO:journalctl@ceph.mon.c.smithi169.stdout:Sep 23 16:25:24 smithi169 bash[8472]: Error: container_linux.go:345: starting container process caused "exec: \"/usr/bin/ceph-mon\": stat /usr/bin/ceph-mon: no such file or directory" 2020-09-23T16:25:23.981 INFO:journalctl@ceph.mon.c.smithi169.stdout:Sep 23 16:25:24 smithi169 bash[8472]: : OCI runtime error
Updated by Deepika Upadhyay over 3 years ago
seeing this in recent octopus branch with related issue fix:
http://qa-proxy.ceph.com/teuthology/yuriw-2020-09-28_18:47:33-rados-wip-yuri-testing-2020-09-28-1007-octopus-distro-basic-smithi/5478283/teuthology.log
Updated by Deepika Upadhyay over 3 years ago
/a/yuriw-2020-10-05_22:17:06-rados-wip-yuri7-testing-2020-10-05-1338-octopus-distro-basic-smithi/5500303/teuthology.log