Project

General

Profile

Bug #44990

cephadm: exec: "/usr/bin/ceph-mon": stat /usr/bin/ceph-mon: no such file or directory

Added by Sebastian Wagner almost 4 years ago. Updated about 3 years ago.

Status:
Can't reproduce
Priority:
Normal
Category:
cephadm
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

http://pulpito.ceph.com/yuriw-2020-04-07_17:39:28-rados-wip-octopus-rgw-msg-fixes-distro-basic-smithi/4931485/

2020-04-07T20:30:18.064 INFO:teuthology.orchestra.run.smithi163:> sudo /home/ubuntu/cephtest/cephadm --image quay.io/ceph-ci/ceph:99c8109c540eb4adfdfd778d8f345bafcf2366e7 shell -c /etc/ceph/ceph.conf -k /etc/ceph/ceph.client.admin.keyring --fsid 25ae96
72-790e-11ea-924d-001a4aab830c -- ceph orch daemon add mon smithi163:172.21.15.163=b
2020-04-07T20:30:19.340 INFO:ceph.mon.c.smithi115.stdout:Apr 07 20:30:19 smithi115 bash[8474]: cluster 2020-04-07T20:30:17.918224+0000 mgr.y (mgr.14144) 56 : cluster [DBG] pgmap v49: 1 pgs: 1 unknown; 0 B data, 0 B used, 0 B / 0 B avail
2020-04-07T20:30:19.341 INFO:ceph.mon.a.smithi002.stdout:Apr 07 20:30:19 smithi002 bash[8292]: cluster 2020-04-07T20:30:17.918224+0000 mgr.y (mgr.14144) 56 : cluster [DBG] pgmap v49: 1 pgs: 1 unknown; 0 B data, 0 B used, 0 B / 0 B avail
2020-04-07T20:30:21.340 INFO:ceph.mon.a.smithi002.stdout:Apr 07 20:30:21 smithi002 bash[8292]: cluster 2020-04-07T20:30:19.918699+0000 mgr.y (mgr.14144) 57 : cluster [DBG] pgmap v50: 1 pgs: 1 unknown; 0 B data, 0 B used, 0 B / 0 B avail
2020-04-07T20:30:21.341 INFO:ceph.mon.a.smithi002.stdout:Apr 07 20:30:21 smithi002 bash[8292]: audit 2020-04-07T20:30:20.398410+0000 mgr.y (mgr.14144) 58 : audit [DBG] from='client.14182 -' entity='client.admin' cmd=[{"prefix": "orch daemon add", "daem
on_type": "mon", "placement": "smithi163:172.21.15.163=b", "target": ["mon-mgr", ""]}]: dispatch
2020-04-07T20:30:21.341 INFO:ceph.mon.a.smithi002.stdout:Apr 07 20:30:21 smithi002 bash[8292]: audit 2020-04-07T20:30:20.400823+0000 mon.a (mon.0) 152 : audit [INF] from='mgr.14144 172.21.15.2:0/3757663963' entity='mgr.y' cmd=[{"prefix": "auth get", "e
ntity": "mon."}]: dispatch
2020-04-07T20:30:21.343 INFO:ceph.mon.c.smithi115.stdout:Apr 07 20:30:21 smithi115 bash[8474]: cluster 2020-04-07T20:30:19.918699+0000 mgr.y (mgr.14144) 57 : cluster [DBG] pgmap v50: 1 pgs: 1 unknown; 0 B data, 0 B used, 0 B / 0 B avail
2020-04-07T20:30:21.343 INFO:ceph.mon.c.smithi115.stdout:Apr 07 20:30:21 smithi115 bash[8474]: audit 2020-04-07T20:30:20.398410+0000 mgr.y (mgr.14144) 58 : audit [DBG] from='client.14182 -' entity='client.admin' cmd=[{"prefix": "orch daemon add", "daem
on_type": "mon", "placement": "smithi163:172.21.15.163=b", "target": ["mon-mgr", ""]}]: dispatch
2020-04-07T20:30:21.343 INFO:ceph.mon.c.smithi115.stdout:Apr 07 20:30:21 smithi115 bash[8474]: audit 2020-04-07T20:30:20.400823+0000 mon.a (mon.0) 152 : audit [INF] from='mgr.14144 172.21.15.2:0/3757663963' entity='mgr.y' cmd=[{"prefix": "auth get", "e
ntity": "mon."}]: dispatch
2020-04-07T20:30:21.345 INFO:ceph.mon.c.smithi115.stdout:Apr 07 20:30:21 smithi115 bash[8474]: audit 2020-04-07T20:30:20.402607+0000 mon.a (mon.0) 153 : audit [DBG] from='mgr.14144 172.21.15.2:0/3757663963' entity='mgr.y' cmd=[{"prefix": "config genera
te-minimal-conf"}]: dispatch
2020-04-07T20:30:21.345 INFO:ceph.mon.c.smithi115.stdout:Apr 07 20:30:21 smithi115 bash[8474]: cephadm 2020-04-07T20:30:20.404042+0000 mgr.y (mgr.14144) 59 : cephadm [INF] Deploying daemon mon.b on smithi163
2020-04-07T20:30:21.345 INFO:ceph.mon.c.smithi115.stdout:Apr 07 20:30:21 smithi115 bash[8474]: audit 2020-04-07T20:30:20.404724+0000 mon.a (mon.0) 154 : audit [DBG] from='mgr.14144 172.21.15.2:0/3757663963' entity='mgr.y' cmd=[{"prefix": "config get", 
"who": "mon.b", "key": "container_image"}]: dispatch
2020-04-07T20:30:21.346 INFO:ceph.mon.a.smithi002.stdout:Apr 07 20:30:21 smithi002 bash[8292]: audit 2020-04-07T20:30:20.402607+0000 mon.a (mon.0) 153 : audit [DBG] from='mgr.14144 172.21.15.2:0/3757663963' entity='mgr.y' cmd=[{"prefix": "config genera
te-minimal-conf"}]: dispatch
2020-04-07T20:30:21.346 INFO:ceph.mon.a.smithi002.stdout:Apr 07 20:30:21 smithi002 bash[8292]: cephadm 2020-04-07T20:30:20.404042+0000 mgr.y (mgr.14144) 59 : cephadm [INF] Deploying daemon mon.b on smithi163
2020-04-07T20:30:21.347 INFO:ceph.mon.a.smithi002.stdout:Apr 07 20:30:21 smithi002 bash[8292]: audit 2020-04-07T20:30:20.404724+0000 mon.a (mon.0) 154 : audit [DBG] from='mgr.14144 172.21.15.2:0/3757663963' entity='mgr.y' cmd=[{"prefix": "config get", 
"who": "mon.b", "key": "container_image"}]: dispatch
2020-04-07T20:30:23.361 INFO:ceph.mon.c.smithi115.stdout:Apr 07 20:30:23 smithi115 bash[8474]: cluster 2020-04-07T20:30:21.919192+0000 mgr.y (mgr.14144) 60 : cluster [DBG] pgmap v51: 1 pgs: 1 unknown; 0 B data, 0 B used, 0 B / 0 B avail
2020-04-07T20:30:23.361 INFO:ceph.mon.a.smithi002.stdout:Apr 07 20:30:23 smithi002 bash[8292]: cluster 2020-04-07T20:30:21.919192+0000 mgr.y (mgr.14144) 60 : cluster [DBG] pgmap v51: 1 pgs: 1 unknown; 0 B data, 0 B used, 0 B / 0 B avail
2020-04-07T20:30:23.405 INFO:teuthology.orchestra.run.smithi163.stdout:Deployed mon.b on host 'smithi163'
2020-04-07T20:30:24.216 INFO:teuthology.orchestra.run.smithi163:> true
2020-04-07T20:30:24.325 INFO:teuthology.orchestra.run.smithi163:mon.b> sudo journalctl -f -n 0 -u ceph-25ae9672-790e-11ea-924d-001a4aab830c@mon.b.service
2020-04-07T20:30:24.329 INFO:tasks.cephadm:Waiting for 3 mons in monmap...
2020-04-07T20:30:24.329 INFO:teuthology.orchestra.run.smithi163:> true
2020-04-07T20:30:24.350 INFO:ceph.mon.a.smithi002.stdout:Apr 07 20:30:24 smithi002 bash[8292]: audit 2020-04-07T20:30:23.387535+0000 mon.a (mon.0) 155 : audit [INF] from='mgr.14144 172.21.15.2:0/3757663963' entity='mgr.y' cmd=[{"prefix":"config-key set
","key":"mgr/cephadm/host.smithi163","val":"{\"daemons\": {\"mon.b\": {\"hostname\": \"smithi163\", \"daemon_id\": \"b\", \"daemon_type\": \"mon\", \"status\": 1, \"status_desc\": \"starting\"}}, \"devices\": [{\"rejected_reasons\": [\"LVM detected\", 
\"locked\", \"Insufficient space (<5GB) on vgs\"], \"available\": false, \"path\": \"/dev/nvme0n1\", \"sys_api\": {\"removable\": \"0\", \"ro\": \"0\", \"vendor\": \"\", \"model\": \"INTEL SSDPEDMD400G4\", \"rev\": \"\", \"sas_address\": \"\", \"sas_de
vice_handle\": \"\", \"support_discard\": \"512\", \"rotational\": \"0\", \"nr_requests\": \"1023\", \"scheduler_mode\": \"none\", \"partitions\": {}, \"sectors\": 0, \"sectorsize\": \"512\", \"size\": 400088457216.0, \"human_readable_size\": \"372.61 
GB\", \"path\": \"/dev/nvme0n1\", \"locked\": 1}, \"lvs\": [{\"name\": \"lv_5\", \"comment\": \"not used by ceph\"}, {\"name\": \"lv_4\", \"comment\": \"not used by ceph\"}, {\"name\": \"lv_3\", \"comment\": \"not used by ceph\"}, {\"name\": \"lv_2\", 
\"comment\": \"not used by ceph\"}, {\"name\": \"lv_1\", \"comment\": \"not used by ceph\"}], \"human_readable_type\": \"ssd\", \"device_id\": \"_CVFT623300MD400BGN\"}, {\"rejected_reasons\": [\"locked\"], \"available\": false, \"path\": \"/dev/sda\", 
\"sys_api\": {\"removable\": \"0\", \"ro\": \"0\", \"vendor\": \"ATA\", \"model\": \"ST1000NM0033-9ZM\", \"rev\": \"SN06\", \"sas_address\": \"\", \"sas_device_handle\": \"\", \"support_discard\": \"0\", \"rotational\": \"1\", \"nr_requests\": \"128\",
 \"scheduler_mode\": \"deadline\", \"partitions\": {\"sda1\": {\"start\": \"2048\", \"sectors\": \"1953522688\", \"sectorsize\": 512, \"size\": 1000203616256.0, \"human_readable_size\": \"931.51 GB\", \"holders\": []}}, \"sectors\": 0, \"sectorsize\": 
\"512\", \"size\": 1000204886016.0, \"human_readable_size\": \"931.51 GB\", \"path\": \"/dev/sda\", \"locked\": 1}, \"lvs\": [], \"human_readable_type\": \"hdd\", \"device_id\": \"ST1000NM0033-9ZM173_Z1W5XFMX\"}], \"daemon_config_deps\": {\"mon.b\": {\
"deps\": [], \"last_config\": \"2020-04-07T20:30:20.402140\"}}, \"last_device_update\": \"2020-04-07T20:30:01.505668\", \"networks\": {\"172.21.0.0/20\": [\"172.21.15.163\"]}, \"last_host_check\": \"2020-04-07T20:29:57.183966\"}"}]: dispatch
2020-04-07T20:30:24.350 INFO:ceph.mon.a.smithi002.stdout:Apr 07 20:30:24 smithi002 bash[8292]: audit 2020-04-07T20:30:23.388406+0000 mon.a (mon.0) 156 : audit [DBG] from='mgr.14144 172.21.15.2:0/3757663963' entity='mgr.y' cmd=[{"prefix": "config get", 
"who": "mon", "key": "container_image"}]: dispatch
2020-04-07T20:30:24.352 INFO:ceph.mon.c.smithi115.stdout:Apr 07 20:30:24 smithi115 bash[8474]: audit 2020-04-07T20:30:23.387535+0000 mon.a (mon.0) 155 : audit [INF] from='mgr.14144 172.21.15.2:0/3757663963' entity='mgr.y' cmd=[{"prefix":"config-key set
","key":"mgr/cephadm/host.smithi163","val":"{\"daemons\": {\"mon.b\": {\"hostname\": \"smithi163\", \"daemon_id\": \"b\", \"daemon_type\": \"mon\", \"status\": 1, \"status_desc\": \"starting\"}}, \"devices\": [{\"rejected_reasons\": [\"LVM detected\", 
\"locked\", \"Insufficient space (<5GB) on vgs\"], \"available\": false, \"path\": \"/dev/nvme0n1\", \"sys_api\": {\"removable\": \"0\", \"ro\": \"0\", \"vendor\": \"\", \"model\": \"INTEL SSDPEDMD400G4\", \"rev\": \"\", \"sas_address\": \"\", \"sas_de
vice_handle\": \"\", \"support_discard\": \"512\", \"rotational\": \"0\", \"nr_requests\": \"1023\", \"scheduler_mode\": \"none\", \"partitions\": {}, \"sectors\": 0, \"sectorsize\": \"512\", \"size\": 400088457216.0, \"human_readable_size\": \"372.61 
GB\", \"path\": \"/dev/nvme0n1\", \"locked\": 1}, \"lvs\": [{\"name\": \"lv_5\", \"comment\": \"not used by ceph\"}, {\"name\": \"lv_4\", \"comment\": \"not used by ceph\"}, {\"name\": \"lv_3\", \"comment\": \"not used by ceph\"}, {\"name\": \"lv_2\", \"comment\": \"not used by ceph\"}, {\"name\": \"lv_1\", \"comment\": \"not used by ceph\"}], \"human_readable_type\": \"ssd\", \"device_id\": \"_CVFT623300MD400BGN\"}, {\"rejected_reasons\": [\"locked\"], \"available\": false, \"path\": \"/dev/sda\", \"sys_api\": {\"removable\": \"0\", \"ro\": \"0\", \"vendor\": \"ATA\", \"model\": \"ST1000NM0033-9ZM\", \"rev\": \"SN06\", \"sas_address\": \"\", \"sas_device_handle\": \"\", \"support_discard\": \"0\", \"rotational\": \"1\", \"nr_requests\": \"128\", \"scheduler_mode\": \"deadline\", \"partitions\": {\"sda1\": {\"start\": \"2048\", \"sectors\": \"1953522688\", \"sectorsize\": 512, \"size\": 1000203616256.0, \"human_readable_size\": \"931.51 GB\", \"holders\": []}}, \"sectors\": 0, \"sectorsize\": \"512\", \"size\": 1000204886016.0, \"human_readable_size\": \"931.51 GB\", \"path\": \"/dev/sda\", \"locked\": 1}, \"lvs\": [], \"human_readable_type\": \"hdd\", \"device_id\": \"ST1000NM0033-9ZM173_Z1W5XFMX\"}], \"daemon_config_deps\": {\"mon.b\": {\"deps\": [], \"last_config\": \"2020-04-07T20:30:20.402140\"}}, \"last_device_update\": \"2020-04-07T20:30:01.505668\", \"networks\": {\"172.21.0.0/20\": [\"172.21.15.163\"]}, \"last_host_check\": \"2020-04-07T20:29:57.183966\"}"}]: dispatch
2020-04-07T20:30:24.352 INFO:ceph.mon.c.smithi115.stdout:Apr 07 20:30:24 smithi115 bash[8474]: audit 2020-04-07T20:30:23.388406+0000 mon.a (mon.0) 156 : audit [DBG] from='mgr.14144 172.21.15.2:0/3757663963' entity='mgr.y' cmd=[{"prefix": "config get", "who": "mon", "key": "container_image"}]: dispatch
2020-04-07T20:30:24.353 INFO:ceph.mon.a.smithi002.stdout:Apr 07 20:30:24 smithi002 bash[8292]: audit 2020-04-07T20:30:23.392468+0000 mon.a (mon.0) 157 : audit [INF] from='mgr.14144 172.21.15.2:0/3757663963' entity='mgr.y' cmd='[{"prefix":"config-key set","key":"mgr/cephadm/host.smithi163","val":"{\"daemons\": {\"mon.b\": {\"hostname\": \"smithi163\", \"daemon_id\": \"b\", \"daemon_type\": \"mon\", \"status\": 1, \"status_desc\": \"starting\"}}, \"devices\": [{\"rejected_reasons\": [\"LVM detected\", \"locked\", \"Insufficient space (<5GB) on vgs\"], \"available\": false, \"path\": \"/dev/nvme0n1\", \"sys_api\": {\"removable\": \"0\", \"ro\": \"0\", \"vendor\": \"\", \"model\": \"INTEL SSDPEDMD400G4\", \"rev\": \"\", \"sas_address\": \"\", \"sas_device_handle\": \"\", \"support_discard\": \"512\", \"rotational\": \"0\", \"nr_requests\": \"1023\", \"scheduler_mode\": \"none\", \"partitions\": {}, \"sectors\": 0, \"sectorsize\": \"512\", \"size\": 400088457216.0, \"human_readable_size\": \"372.61 GB\", \"path\": \"/dev/nvme0n1\", \"locked\": 1}, \"lvs\": [{\"name\": \"lv_5\", \"comment\": \"not used by ceph\"}, {\"name\": \"lv_4\", \"comment\": \"not used by ceph\"}, {\"name\": \"lv_3\", \"comment\": \"not used by ceph\"}, {\"name\": \"lv_2\", \"comment\": \"not used by ceph\"}, {\"name\": \"lv_1\", \"comment\": \"not used by ceph\"}], \"human_readable_type\": \"ssd\", \"device_id\": \"_CVFT623300MD400BGN\"}, {\"rejected_reasons\": [\"locked\"], \"available\": false, \"path\": \"/dev/sda\", \"sys_api\": {\"removable\": \"0\", \"ro\": \"0\", \"vendor\": \"ATA\", \"model\": \"ST1000NM0033-9ZM\", \"rev\": \"SN06\", \"sas_address\": \"\", \"sas_device_handle\": \"\", \"support_discard\": \"0\", \"rotational\": \"1\", \"nr_requests\": \"128\", \"scheduler_mode\": \"deadline\", \"partitions\": {\"sda1\": {\"start\": \"2048\", \"sectors\": \"1953522688\", \"sectorsize\": 512, \"size\": 1000203616256.0, \"human_readable_size\": \"931.51 GB\", \"holders\": []}}, \"sectors\": 0, \"sectorsize\": \"512\", \"size\": 1000204886016.0, \"human_readable_size\": \"931.51 GB\", \"path\": \"/dev/sda\", \"locked\": 1}, \"lvs\": [], \"human_readable_type\": \"hdd\", \"device_id\": \"ST1000NM0033-9ZM173_Z1W5XFMX\"}], \"daemon_config_deps\": {\"mon.b\": {\"deps\": [], \"last_config\": \"2020-04-07T20:30:20.402140\"}}, \"last_device_update\": \"2020-04-07T20:30:01.505668\", \"networks\": {\"172.21.0.0/20\": [\"172.21.15.163\"]}, \"last_host_check\": \"2020-04-07T20:29:57.183966\"}"}]': finished
2020-04-07T20:30:24.355 INFO:ceph.mon.c.smithi115.stdout:Apr 07 20:30:24 smithi115 bash[8474]: audit 2020-04-07T20:30:23.392468+0000 mon.a (mon.0) 157 : audit [INF] from='mgr.14144 172.21.15.2:0/3757663963' entity='mgr.y' cmd='[{"prefix":"config-key set","key":"mgr/cephadm/host.smithi163","val":"{\"daemons\": {\"mon.b\": {\"hostname\": \"smithi163\", \"daemon_id\": \"b\", \"daemon_type\": \"mon\", \"status\": 1, \"status_desc\": \"starting\"}}, \"devices\": [{\"rejected_reasons\": [\"LVM detected\", \"locked\", \"Insufficient space (<5GB) on vgs\"], \"available\": false, \"path\": \"/dev/nvme0n1\", \"sys_api\": {\"removable\": \"0\", \"ro\": \"0\", \"vendor\": \"\", \"model\": \"INTEL SSDPEDMD400G4\", \"rev\": \"\", \"sas_address\": \"\", \"sas_device_handle\": \"\", \"support_discard\": \"512\", \"rotational\": \"0\", \"nr_requests\": \"1023\", \"scheduler_mode\": \"none\", \"partitions\": {}, \"sectors\": 0, \"sectorsize\": \"512\", \"size\": 400088457216.0, \"human_readable_size\": \"372.61 GB\", \"path\": \"/dev/nvme0n1\", \"locked\": 1}, \"lvs\": [{\"name\": \"lv_5\", \"comment\": \"not used by ceph\"}, {\"name\": \"lv_4\", \"comment\": \"not used by ceph\"}, {\"name\": \"lv_3\", \"comment\": \"not used by ceph\"}, {\"name\": \"lv_2\", \"comment\": \"not used by ceph\"}, {\"name\": \"lv_1\", \"comment\": \"not used by ceph\"}], \"human_readable_type\": \"ssd\", \"device_id\": \"_CVFT623300MD400BGN\"}, {\"rejected_reasons\": [\"locked\"], \"available\": false, \"path\": \"/dev/sda\", \"sys_api\": {\"removable\": \"0\", \"ro\": \"0\", \"vendor\": \"ATA\", \"model\": \"ST1000NM0033-9ZM\", \"rev\": \"SN06\", \"sas_address\": \"\", \"sas_device_handle\": \"\", \"support_discard\": \"0\", \"rotational\": \"1\", \"nr_requests\": \"128\", \"scheduler_mode\": \"deadline\", \"partitions\": {\"sda1\": {\"start\": \"2048\", \"sectors\": \"1953522688\", \"sectorsize\": 512, \"size\": 1000203616256.0, \"human_readable_size\": \"931.51 GB\", \"holders\": []}}, \"sectors\": 0, \"sectorsize\": \"512\", \"size\": 1000204886016.0, \"human_readable_size\": \"931.51 GB\", \"path\": \"/dev/sda\", \"locked\": 1}, \"lvs\": [], \"human_readable_type\": \"hdd\", \"device_id\": \"ST1000NM0033-9ZM173_Z1W5XFMX\"}], \"daemon_config_deps\": {\"mon.b\": {\"deps\": [], \"last_config\": \"2020-04-07T20:30:20.402140\"}}, \"last_device_update\": \"2020-04-07T20:30:01.505668\", \"networks\": {\"172.21.0.0/20\": [\"172.21.15.163\"]}, \"last_host_check\": \"2020-04-07T20:29:57.183966\"}"}]': finished
2020-04-07T20:30:24.420 INFO:teuthology.orchestra.run.smithi163:> sudo /home/ubuntu/cephtest/cephadm --image quay.io/ceph-ci/ceph:99c8109c540eb4adfdfd778d8f345bafcf2366e7 shell -c /etc/ceph/ceph.conf -k /etc/ceph/ceph.client.admin.keyring --fsid 25ae9672-790e-11ea-924d-001a4aab830c -- ceph mon dump -f json
2020-04-07T20:30:24.482 INFO:ceph.mon.b.smithi163.stdout:-- Logs begin at Tue 2020-04-07 20:19:06 UTC. --
2020-04-07T20:30:24.482 INFO:ceph.mon.b.smithi163.stdout:Apr 07 20:30:23 smithi163 podman[8468]: 2020-04-07 20:30:23.964016659 +0000 UTC m=+0.572856243 container create 5e38b7398dd4098834c5fc1b074f530940cbfdb5f6ac8ffaf41d021be4014597 (image=quay.io/ceph-ci/ceph:99c8109c540eb4adfdfd778d8f345bafcf2366e7, name=ceph-25ae9672-790e-11ea-924d-001a4aab830c-mon.b)
2020-04-07T20:30:25.391 INFO:ceph.mon.a.smithi002.stdout:Apr 07 20:30:25 smithi002 bash[8292]: cluster 2020-04-07T20:30:23.919686+0000 mgr.y (mgr.14144) 61 : cluster [DBG] pgmap v52: 1 pgs: 1 unknown; 0 B data, 0 B used, 0 B / 0 B avail
2020-04-07T20:30:25.392 INFO:ceph.mon.c.smithi115.stdout:Apr 07 20:30:25 smithi115 bash[8474]: cluster 2020-04-07T20:30:23.919686+0000 mgr.y (mgr.14144) 61 : cluster [DBG] pgmap v52: 1 pgs: 1 unknown; 0 B data, 0 B used, 0 B / 0 B avail
2020-04-07T20:30:27.396 INFO:ceph.mon.a.smithi002.stdout:Apr 07 20:30:27 smithi002 bash[8292]: cluster 2020-04-07T20:30:25.920162+0000 mgr.y (mgr.14144) 62 : cluster [DBG] pgmap v53: 1 pgs: 1 unknown; 0 B data, 0 B used, 0 B / 0 B avail
2020-04-07T20:30:27.407 INFO:ceph.mon.c.smithi115.stdout:Apr 07 20:30:27 smithi115 bash[8474]: cluster 2020-04-07T20:30:25.920162+0000 mgr.y (mgr.14144) 62 : cluster [DBG] pgmap v53: 1 pgs: 1 unknown; 0 B data, 0 B used, 0 B / 0 B avail
2020-04-07T20:30:29.400 INFO:ceph.mon.a.smithi002.stdout:Apr 07 20:30:29 smithi002 bash[8292]: cluster 2020-04-07T20:30:27.920656+0000 mgr.y (mgr.14144) 63 : cluster [DBG] pgmap v54: 1 pgs: 1 unknown; 0 B data, 0 B used, 0 B / 0 B avail
2020-04-07T20:30:29.401 INFO:ceph.mon.c.smithi115.stdout:Apr 07 20:30:29 smithi115 bash[8474]: cluster 2020-04-07T20:30:27.920656+0000 mgr.y (mgr.14144) 63 : cluster [DBG] pgmap v54: 1 pgs: 1 unknown; 0 B data, 0 B used, 0 B / 0 B avail
2020-04-07T20:30:31.405 INFO:ceph.mon.a.smithi002.stdout:Apr 07 20:30:31 smithi002 bash[8292]: cluster 2020-04-07T20:30:29.921145+0000 mgr.y (mgr.14144) 64 : cluster [DBG] pgmap v55: 1 pgs: 1 unknown; 0 B data, 0 B used, 0 B / 0 B avail
2020-04-07T20:30:31.406 INFO:ceph.mon.c.smithi115.stdout:Apr 07 20:30:31 smithi115 bash[8474]: cluster 2020-04-07T20:30:29.921145+0000 mgr.y (mgr.14144) 64 : cluster [DBG] pgmap v55: 1 pgs: 1 unknown; 0 B data, 0 B used, 0 B / 0 B avail
2020-04-07T20:30:32.079 INFO:ceph.mon.b.smithi163.stdout:Apr 07 20:30:32 smithi163 podman[8468]: 2020-04-07 20:30:32.092425133 +0000 UTC m=+8.701264795 container remove 5e38b7398dd4098834c5fc1b074f530940cbfdb5f6ac8ffaf41d021be4014597 (image=quay.io/ceph-ci/ceph:99c8109c540eb4adfdfd778d8f345bafcf2366e7, name=ceph-25ae9672-790e-11ea-924d-001a4aab830c-mon.b)
2020-04-07T20:30:32.082 INFO:ceph.mon.b.smithi163.stdout:Apr 07 20:30:32 smithi163 bash[8465]: time="2020-04-07T20:30:32Z" level=error msg="unable to remove container 5e38b7398dd4098834c5fc1b074f530940cbfdb5f6ac8ffaf41d021be4014597 after failing to start and attach to it" 
2020-04-07T20:30:32.138 INFO:ceph.mon.b.smithi163.stdout:Apr 07 20:30:32 smithi163 bash[8465]: Error: container_linux.go:345: starting container process caused "exec: \"/usr/bin/ceph-mon\": stat /usr/bin/ceph-mon: no such file or directory" 
2020-04-07T20:30:32.139 INFO:ceph.mon.b.smithi163.stdout:Apr 07 20:30:32 smithi163 bash[8465]: : OCI runtime error
2020-04-07T20:30:32.163 INFO:ceph.mon.b.smithi163.stdout:Apr 07 20:30:32 smithi163 systemd[1]: ceph-25ae9672-790e-11ea-924d-001a4aab830c@mon.b.service: main process exited, code=exited, status=127/n/a
2020-04-07T20:30:32.356 INFO:ceph.mon.b.smithi163.stdout:Apr 07 20:30:32 smithi163 podman[8760]: Error: no container with name or ID ceph-25ae9672-790e-11ea-924d-001a4aab830c-mon.b found: no such container
2020-04-07T20:30:32.373 INFO:ceph.mon.b.smithi163.stdout:Apr 07 20:30:32 smithi163 systemd[1]: Unit ceph-25ae9672-790e-11ea-924d-001a4aab830c@mon.b.service entered failed state.
2020-04-07T20:30:32.374 INFO:ceph.mon.b.smithi163.stdout:Apr 07 20:30:32 smithi163 systemd[1]: ceph-25ae9672-790e-11ea-924d-001a4aab830c@mon.b.service failed.


Related issues

Related to Orchestrator - Bug #44777: podman: stat /usr/bin/ceph-mon: no such file or directory, then unable to remove container Resolved
Related to Orchestrator - Bug #46036: cephadm: killmode=none: systemd units failed, but containers still running Resolved
Related to Orchestrator - Bug #46529: cephadm: error removing storage for container "...-mon": remove /var/lib/containers/storage/overlay/.../merged: device or resource busy Resolved
Related to Orchestrator - Bug #53175: podman: failed to exec pid1: Exec format error: wrongly using the amd64-only digest New
Duplicated by Orchestrator - Bug #45421: cephadm: MaxWhileTries: Waiting for 3 mons in monmap: "unable to remove container c3ed65093dd89d593e40d2d1bbfa03c8dcb5f53ba7bdda77eacde8d9f1a9c28e after failing to start and attach to it" Duplicate

History

#1 Updated by Sebastian Wagner almost 4 years ago

  • Related to Bug #44777: podman: stat /usr/bin/ceph-mon: no such file or directory, then unable to remove container added

#2 Updated by Neha Ojha almost 4 years ago

/a/yuriw-2020-04-25_15:46:30-rados-wip-yuri4-testing-2020-04-25-0009-master-distro-basic-smithi/4984285

#3 Updated by Josh Durgin almost 4 years ago

Could this be caused by the container image not being built yet, or would that present as a different error? With any cephadm jobs teuthology-suite could check that the container image is present just like it does for packages to avoid that problem.

This occurred again in a suite scheduled shortly after package builds finished:

/a/joshd-2020-05-06_08:23:52-rados-wip-joshd-fix-octopus-distro-basic-smithi/5028167/

#4 Updated by Sebastian Wagner almost 4 years ago

  • Duplicated by Bug #45421: cephadm: MaxWhileTries: Waiting for 3 mons in monmap: "unable to remove container c3ed65093dd89d593e40d2d1bbfa03c8dcb5f53ba7bdda77eacde8d9f1a9c28e after failing to start and attach to it" added

#5 Updated by Sebastian Wagner almost 4 years ago

  • Status changed from New to In Progress
  • Assignee set to Sebastian Wagner

#6 Updated by Sebastian Wagner almost 4 years ago

In the past, https://github.com/ceph/ceph/pull/34091 was able to reproduce this bug consistently. I'll look into resurrecting it. Let's see if it provides a way to find cause.

#7 Updated by Sebastian Wagner almost 4 years ago

  • Subject changed from octopus: cephadm: exec: "/usr/bin/ceph-mon": stat /usr/bin/ceph-mon: no such file or directory to cephadm: exec: "/usr/bin/ceph-mon": stat /usr/bin/ceph-mon: no such file or directory
  • Priority changed from Normal to High

#8 Updated by Sebastian Wagner almost 4 years ago

2020-05-06T10:25:58.126 INFO:teuthology.orchestra.run.smithi053:> sudo /home/ubuntu/cephtest/cephadm --image quay.io/ceph-ci/ceph:0b54056fb60dceb5086e11269bac7b044d365904 shell -c /etc/ceph/ceph.conf -k /etc/ceph/ceph.client.admin.keyring --fsid 8e6313c4-8f83-11
2020-05-06T10:25:58.182 INFO:ceph.mon.a.smithi131.stdout:May 06 10:25:58 smithi131 bash[10137]: cluster 2020-05-06T10:25:57.866815+0000 mgr.y (mgr.14141) 54 : cluster [DBG] pgmap v45: 1 pgs: 1 unknown; 0 B data, 0 B used, 0 B / 0 B avail
2020-05-06T10:25:58.184 INFO:ceph.mon.b.smithi053.stdout:-- Logs begin at Wed 2020-05-06 10:14:19 UTC. --
2020-05-06T10:25:58.184 INFO:ceph.mon.b.smithi053.stdout:May 06 10:25:57 smithi053 podman[10178]: 2020-05-06 10:25:57.675609957 +0000 UTC m=+0.551746053 container create 3e6b87c9529fcad64f1dd6c148c7f8944a4b52d8f2e63d8ce2b331ef06d177a0 (image=quay.io/ceph-ci/ceph
2020-05-06T10:25:59.992 INFO:ceph.mon.a.smithi131.stdout:May 06 10:25:59 smithi131 bash[10137]: cluster 2020-05-06T10:25:59.867300+0000 mgr.y (mgr.14141) 55 : cluster [DBG] pgmap v46: 1 pgs: 1 unknown; 0 B data, 0 B used, 0 B / 0 B avail
2020-05-06T10:26:01.991 INFO:ceph.mon.a.smithi131.stdout:May 06 10:26:01 smithi131 bash[10137]: cluster 2020-05-06T10:26:01.867801+0000 mgr.y (mgr.14141) 56 : cluster [DBG] pgmap v47: 1 pgs: 1 unknown; 0 B data, 0 B used, 0 B / 0 B avail
2020-05-06T10:26:03.991 INFO:ceph.mon.a.smithi131.stdout:May 06 10:26:03 smithi131 bash[10137]: cluster 2020-05-06T10:26:03.868280+0000 mgr.y (mgr.14141) 57 : cluster [DBG] pgmap v48: 1 pgs: 1 unknown; 0 B data, 0 B used, 0 B / 0 B avail
2020-05-06T10:26:05.722 INFO:ceph.mon.b.smithi053.stdout:May 06 10:26:05 smithi053 podman[10178]: 2020-05-06 10:26:05.72028671 +0000 UTC m=+8.596422879 container remove 3e6b87c9529fcad64f1dd6c148c7f8944a4b52d8f2e63d8ce2b331ef06d177a0 (image=quay.io/ceph-ci/ceph:
2020-05-06T10:26:05.725 INFO:ceph.mon.b.smithi053.stdout:May 06 10:26:05 smithi053 bash[10174]: time="2020-05-06T10:26:05Z" level=error msg="unable to remove container 3e6b87c9529fcad64f1dd6c148c7f8944a4b52d8f2e63d8ce2b331ef06d177a0 after failing to start and at
2020-05-06T10:26:05.812 INFO:ceph.mon.b.smithi053.stdout:May 06 10:26:05 smithi053 bash[10174]: Error: container_linux.go:345: starting container process caused "exec: \"/usr/bin/ceph-mon\": stat /usr/bin/ceph-mon: no such file or directory" 
2020-05-06T10:26:05.812 INFO:ceph.mon.b.smithi053.stdout:May 06 10:26:05 smithi053 bash[10174]: : OCI runtime error
2020-05-06T10:26:05.829 INFO:ceph.mon.b.smithi053.stdout:May 06 10:26:05 smithi053 systemd[1]: ceph-8e6313c4-8f83-11ea-a068-001a4aab830c@mon.b.service: main process exited, code=exited, status=127/n/a
2020-05-06T10:26:05.988 INFO:ceph.mon.b.smithi053.stdout:May 06 10:26:05 smithi053 podman[10437]: Error: no container with name or ID ceph-8e6313c4-8f83-11ea-a068-001a4aab830c-mon.b found: no such container

#9 Updated by Sebastian Wagner almost 4 years ago

2020-04-25T21:57:55.424 INFO:teuthology.orchestra.run.smithi033:> sudo /home/ubuntu/cephtest/cephadm --image quay.io/ceph-ci/ceph:fec0296dffa2b7ab61f520f54a835a822a2b2fa4 shell -c /etc/ceph/ceph.conf -k /etc/ceph/ceph.client.admin.keyring --fsid 5a63e2d4-873f-11ea-a068-001a4aab830c -- ceph mon dump -f json
2020-04-25T21:57:55.481 INFO:ceph.mon.b.smithi033.stdout:-- Logs begin at Sat 2020-04-25 21:46:39 UTC. --
2020-04-25T21:57:55.481 INFO:ceph.mon.b.smithi033.stdout:Apr 25 21:57:55 smithi033 podman[8415]: 2020-04-25 21:57:55.001506419 +0000 UTC m=+0.656050814 container create 94571bdfe3f9b8076522bfa1d95d5424f080af4ddd876fc8d0d41677ba25b638 (image=quay.io/ceph-ci/ceph:fec0296dffa2b7ab61f520f54a835a822a2b2fa4, name=ceph-5a63e2d4-873f-11ea-a068-001a4aab830c-mon.b)
2020-04-25T21:57:55.619 INFO:ceph.mon.a.smithi114.stdout:Apr 25 21:57:55 smithi114 bash[8807]: cluster 2020-04-25T21:57:54.376174+0000 mgr.y (mgr.14143) 62 : cluster [DBG] pgmap v53: 1 pgs: 1 unknown; 0 B data, 0 B used, 0 B / 0 B avail
2020-04-25T21:57:55.623 INFO:ceph.mon.c.smithi042.stdout:Apr 25 21:57:55 smithi042 bash[8897]: cluster 2020-04-25T21:57:54.376174+0000 mgr.y (mgr.14143) 62 : cluster [DBG] pgmap v53: 1 pgs: 1 unknown; 0 B data, 0 B used, 0 B / 0 B avail
2020-04-25T21:57:56.627 INFO:ceph.mon.a.smithi114.stdout:Apr 25 21:57:56 smithi114 bash[8807]: cluster 2020-04-25T21:57:56.376626+0000 mgr.y (mgr.14143) 63 : cluster [DBG] pgmap v54: 1 pgs: 1 unknown; 0 B data, 0 B used, 0 B / 0 B avail
2020-04-25T21:57:56.628 INFO:ceph.mon.c.smithi042.stdout:Apr 25 21:57:56 smithi042 bash[8897]: cluster 2020-04-25T21:57:56.376626+0000 mgr.y (mgr.14143) 63 : cluster [DBG] pgmap v54: 1 pgs: 1 unknown; 0 B data, 0 B used, 0 B / 0 B avail
2020-04-25T21:57:59.454 INFO:ceph.mon.a.smithi114.stdout:Apr 25 21:57:59 smithi114 bash[8807]: cluster 2020-04-25T21:57:58.377092+0000 mgr.y (mgr.14143) 64 : cluster [DBG] pgmap v55: 1 pgs: 1 unknown; 0 B data, 0 B used, 0 B / 0 B avail
2020-04-25T21:57:59.455 INFO:ceph.mon.c.smithi042.stdout:Apr 25 21:57:59 smithi042 bash[8897]: cluster 2020-04-25T21:57:58.377092+0000 mgr.y (mgr.14143) 64 : cluster [DBG] pgmap v55: 1 pgs: 1 unknown; 0 B data, 0 B used, 0 B / 0 B avail
2020-04-25T21:58:01.237 INFO:ceph.mon.a.smithi114.stdout:Apr 25 21:58:01 smithi114 bash[8807]: cluster 2020-04-25T21:58:00.377556+0000 mgr.y (mgr.14143) 65 : cluster [DBG] pgmap v56: 1 pgs: 1 unknown; 0 B data, 0 B used, 0 B / 0 B avail
2020-04-25T21:58:01.238 INFO:ceph.mon.c.smithi042.stdout:Apr 25 21:58:01 smithi042 bash[8897]: cluster 2020-04-25T21:58:00.377556+0000 mgr.y (mgr.14143) 65 : cluster [DBG] pgmap v56: 1 pgs: 1 unknown; 0 B data, 0 B used, 0 B / 0 B avail
2020-04-25T21:58:03.023 INFO:ceph.mon.b.smithi033.stdout:Apr 25 21:58:03 smithi033 podman[8415]: 2020-04-25 21:58:03.023594912 +0000 UTC m=+8.678139524 container remove 94571bdfe3f9b8076522bfa1d95d5424f080af4ddd876fc8d0d41677ba25b638 (image=quay.io/ceph-ci/ceph:fec0296dffa2b7ab61f520f54a835a822a2b2fa4, name=ceph-5a63e2d4-873f-11ea-a068-001a4aab830c-mon.b)
2020-04-25T21:58:03.026 INFO:ceph.mon.b.smithi033.stdout:Apr 25 21:58:03 smithi033 bash[8411]: time="2020-04-25T21:58:03Z" level=error msg="unable to remove container 94571bdfe3f9b8076522bfa1d95d5424f080af4ddd876fc8d0d41677ba25b638 after failing to start and attach to it" 
2020-04-25T21:58:03.094 INFO:ceph.mon.b.smithi033.stdout:Apr 25 21:58:03 smithi033 bash[8411]: Error: container_linux.go:345: starting container process caused "exec: \"/usr/bin/ceph-mon\": stat /usr/bin/ceph-mon: no such file or directory" 
2020-04-25T21:58:03.094 INFO:ceph.mon.b.smithi033.stdout:Apr 25 21:58:03 smithi033 bash[8411]: : OCI runtime error
2020-04-25T21:58:03.113 INFO:ceph.mon.b.smithi033.stdout:Apr 25 21:58:03 smithi033 systemd[1]: ceph-5a63e2d4-873f-11ea-a068-001a4aab830c@mon.b.service: main process exited, code=exited, status=127/n/a
2020-04-25T21:58:03.293 INFO:ceph.mon.b.smithi033.stdout:Apr 25 21:58:03 smithi033 podman[8685]: Error: no container with name or ID ceph-5a63e2d4-873f-11ea-a068-001a4aab830c-mon.b found: no such container
2020-04-25T21:58:03.314 INFO:ceph.mon.b.smithi033.stdout:Apr 25 21:58:03 smithi033 systemd[1]: Unit ceph-5a63e2d4-873f-11ea-a068-001a4aab830c@mon.b.service entered failed state.
2020-04-25T21:58:03.315 INFO:ceph.mon.b.smithi033.stdout:Apr 25 21:58:03 smithi033 systemd[1]: ceph-5a63e2d4-873f-11ea-a068-001a4aab830c@mon.b.service failed.

#10 Updated by Sebastian Wagner almost 4 years ago

2020-05-08T15:44:31.009 INFO:tasks.cephadm:Waiting for 3 mons in monmap...
2020-05-08T15:44:31.010 INFO:teuthology.orchestra.run.smithi096:> true
2020-05-08T15:44:31.094 INFO:teuthology.orchestra.run.smithi096:> sudo /home/ubuntu/cephtest/cephadm --image quay.io/ceph-ci/ceph:2e1f29558886bd90d7f04e6fced16b4a9464840b shell -c /etc/ceph/ceph.conf -k /etc/ceph/ceph.client.admin.keyring --fsid 7792d16e-9142-11ea-a068-001a4aab830c -- ceph mon dump -f json
2020-05-08T15:44:31.148 INFO:ceph.mon.b.smithi096.stdout:-- Logs begin at Fri 2020-05-08 15:36:32 UTC. --
2020-05-08T15:44:31.148 INFO:ceph.mon.b.smithi096.stdout:May 08 15:44:30 smithi096 podman[6877]: 2020-05-08 15:44:30.680006536 +0000 UTC m=+0.442275939 container create cb703f4fa7c417c57ecaa89052f59755a6115b0b29dc5c70b703789fefc6f612 (image=quay.io/ceph-ci/ceph:2e1f29558886bd90d7f04e6fced16b4a9464840b, name=ceph-7792d16e-9142-11ea-a068-001a4aab830c-mon.b)
2020-05-08T15:44:32.240 INFO:ceph.mon.a.smithi196.stdout:May 08 15:44:32 smithi196 bash[6946]: cluster 2020-05-08T15:44:30.628739+0000 mgr.y (mgr.14142) 35 : cluster [DBG] pgmap v29: 1 pgs: 1 unknown; 0 B data, 0 B used, 0 B / 0 B avail
2020-05-08T15:44:32.241 INFO:ceph.mon.c.smithi196.stdout:May 08 15:44:32 smithi196 bash[11439]: cluster 2020-05-08T15:44:30.628739+0000 mgr.y (mgr.14142) 35 : cluster [DBG] pgmap v29: 1 pgs: 1 unknown; 0 B data, 0 B used, 0 B / 0 B avail
2020-05-08T15:44:34.244 INFO:ceph.mon.a.smithi196.stdout:May 08 15:44:34 smithi196 bash[6946]: cluster 2020-05-08T15:44:32.629261+0000 mgr.y (mgr.14142) 36 : cluster [DBG] pgmap v30: 1 pgs: 1 unknown; 0 B data, 0 B used, 0 B / 0 B avail
2020-05-08T15:44:34.245 INFO:ceph.mon.c.smithi196.stdout:May 08 15:44:34 smithi196 bash[11439]: cluster 2020-05-08T15:44:32.629261+0000 mgr.y (mgr.14142) 36 : cluster [DBG] pgmap v30: 1 pgs: 1 unknown; 0 B data, 0 B used, 0 B / 0 B avail
2020-05-08T15:44:36.248 INFO:ceph.mon.a.smithi196.stdout:May 08 15:44:36 smithi196 bash[6946]: cluster 2020-05-08T15:44:34.629756+0000 mgr.y (mgr.14142) 37 : cluster [DBG] pgmap v31: 1 pgs: 1 unknown; 0 B data, 0 B used, 0 B / 0 B avail
2020-05-08T15:44:36.249 INFO:ceph.mon.c.smithi196.stdout:May 08 15:44:36 smithi196 bash[11439]: cluster 2020-05-08T15:44:34.629756+0000 mgr.y (mgr.14142) 37 : cluster [DBG] pgmap v31: 1 pgs: 1 unknown; 0 B data, 0 B used, 0 B / 0 B avail
2020-05-08T15:44:38.253 INFO:ceph.mon.a.smithi196.stdout:May 08 15:44:38 smithi196 bash[6946]: cluster 2020-05-08T15:44:36.630260+0000 mgr.y (mgr.14142) 38 : cluster [DBG] pgmap v32: 1 pgs: 1 unknown; 0 B data, 0 B used, 0 B / 0 B avail
2020-05-08T15:44:38.253 INFO:ceph.mon.c.smithi196.stdout:May 08 15:44:38 smithi196 bash[11439]: cluster 2020-05-08T15:44:36.630260+0000 mgr.y (mgr.14142) 38 : cluster [DBG] pgmap v32: 1 pgs: 1 unknown; 0 B data, 0 B used, 0 B / 0 B avail
2020-05-08T15:44:38.615 INFO:ceph.mon.b.smithi096.stdout:May 08 15:44:38 smithi096 podman[6877]: 2020-05-08 15:44:38.61445606 +0000 UTC m=+8.376725570 container remove cb703f4fa7c417c57ecaa89052f59755a6115b0b29dc5c70b703789fefc6f612 (image=quay.io/ceph-ci/ceph:2e1f29558886bd90d7f04e6fced16b4a9464840b, name=ceph-7792d16e-9142-11ea-a068-001a4aab830c-mon.b)
2020-05-08T15:44:38.618 INFO:ceph.mon.b.smithi096.stdout:May 08 15:44:38 smithi096 bash[6873]: time="2020-05-08T15:44:38Z" level=error msg="unable to remove container cb703f4fa7c417c57ecaa89052f59755a6115b0b29dc5c70b703789fefc6f612 after failing to start and attach to it" 
2020-05-08T15:44:38.674 INFO:ceph.mon.b.smithi096.stdout:May 08 15:44:38 smithi096 bash[6873]: Error: container_linux.go:345: starting container process caused "exec: \"/usr/bin/ceph-mon\": stat /usr/bin/ceph-mon: no such file or directory" 
2020-05-08T15:44:38.675 INFO:ceph.mon.b.smithi096.stdout:May 08 15:44:38 smithi096 bash[6873]: : OCI runtime error
2020-05-08T15:44:38.690 INFO:ceph.mon.b.smithi096.stdout:May 08 15:44:38 smithi096 systemd[1]: ceph-7792d16e-9142-11ea-a068-001a4aab830c@mon.b.service: main process exited, code=exited, status=127/n/a
2020-05-08T15:44:38.816 INFO:ceph.mon.b.smithi096.stdout:May 08 15:44:38 smithi096 podman[7136]: Error: no container with name or ID ceph-7792d16e-9142-11ea-a068-001a4aab830c-mon.b found: no such container

#11 Updated by Sebastian Wagner almost 4 years ago

https://github.com/ceph/ceph/pull/35018 might make this thing go away, without fixing the underlying issue.

#12 Updated by Sebastian Wagner almost 4 years ago

seems that I'm close to unable to reproduce this reliably.

#13 Updated by Neha Ojha almost 4 years ago

Looks similar

2020-05-20T21:17:30.996 INFO:teuthology.orchestra.run.smithi038:mon.b> sudo journalctl -f -n 0 -u ceph-e9ddf488-9ade-11ea-a06a-001a4aab830c@mon.b.service
2020-05-20T21:17:31.002 INFO:tasks.cephadm:Waiting for 2 mons in monmap...
2020-05-20T21:17:31.002 INFO:teuthology.orchestra.run.smithi038:> sudo /home/ubuntu/cephtest/cephadm --image quay.io/ceph-ci/ceph:2e8572496bd5b6aa1de9ccadd3e0d171c60aee81 shell -c /etc/ceph/ceph.conf -k /etc/ceph/ceph.client.admin.keyring --fsid e9ddf488-9ade-11ea-a06a-001a4aab830c -- ceph mon dump -f json
2020-05-20T21:17:31.138 INFO:ceph.mon.b.smithi038.stdout:-- Logs begin at Wed 2020-05-20 21:06:47 UTC. --
2020-05-20T21:17:31.138 INFO:ceph.mon.b.smithi038.stdout:May 20 21:17:30 smithi038 podman[7373]: 2020-05-20 21:17:30.741649334 +0000 UTC m=+0.506565325 container create ec7f880ef9761d95f33f95bba292813602748e57661d56298f70f5a349ed5e86 (image=quay.io/ceph-ci/ceph:2e8572496bd5b6aa1de9ccadd3e0d171c60aee81, name=ceph-e9ddf488-9ade-11ea-a06a-001a4aab830c-mon.b)
2020-05-20T21:17:31.871 INFO:ceph.mon.a.smithi185.stdout:May 20 21:17:31 smithi185 bash[7587]: cluster 2020-05-20T21:17:31.778601+0000 mgr.y (mgr.14140) 57 : cluster [DBG] pgmap v44: 1 pgs: 1 unknown; 0 B data, 0 B used, 0 B / 0 B avail
2020-05-20T21:17:33.863 INFO:ceph.mon.a.smithi185.stdout:May 20 21:17:33 smithi185 bash[7587]: cluster 2020-05-20T21:17:33.779080+0000 mgr.y (mgr.14140) 58 : cluster [DBG] pgmap v45: 1 pgs: 1 unknown; 0 B data, 0 B used, 0 B / 0 B avail
2020-05-20T21:17:36.638 INFO:ceph.mon.a.smithi185.stdout:May 20 21:17:36 smithi185 bash[7587]: cluster 2020-05-20T21:17:35.779516+0000 mgr.y (mgr.14140) 59 : cluster [DBG] pgmap v46: 1 pgs: 1 unknown; 0 B data, 0 B used, 0 B / 0 B avail
2020-05-20T21:17:37.866 INFO:ceph.mon.a.smithi185.stdout:May 20 21:17:37 smithi185 bash[7587]: cluster 2020-05-20T21:17:37.780155+0000 mgr.y (mgr.14140) 60 : cluster [DBG] pgmap v47: 1 pgs: 1 unknown; 0 B data, 0 B used, 0 B / 0 B avail
2020-05-20T21:17:38.831 INFO:ceph.mon.b.smithi038.stdout:May 20 21:17:38 smithi038 podman[7373]: 2020-05-20 21:17:38.829952419 +0000 UTC m=+8.594868476 container remove ec7f880ef9761d95f33f95bba292813602748e57661d56298f70f5a349ed5e86 (image=quay.io/ceph-ci/ceph:2e8572496bd5b6aa1de9ccadd3e0d171c60aee81, name=ceph-e9ddf488-9ade-11ea-a06a-001a4aab830c-mon.b)
2020-05-20T21:17:38.834 INFO:ceph.mon.b.smithi038.stdout:May 20 21:17:38 smithi038 bash[7369]: time="2020-05-20T21:17:38Z" level=error msg="unable to remove container ec7f880ef9761d95f33f95bba292813602748e57661d56298f70f5a349ed5e86 after failing to start and attach to it" 
2020-05-20T21:17:38.952 INFO:ceph.mon.b.smithi038.stdout:May 20 21:17:38 smithi038 bash[7369]: Error: container_linux.go:345: starting container process caused "exec: \"/usr/bin/ceph-mon\": stat /usr/bin/ceph-mon: no such file or directory" 

/a/nojha-2020-05-20_19:45:21-rados-master-distro-basic-smithi/5073466 - this is on latest master

#14 Updated by Kefu Chai almost 4 years ago

2020-05-21T11:22:56.079 INFO:teuthology.orchestra.run.smithi084:> sudo /home/ubuntu/cephtest/cephadm --image quay.io/ceph-ci/ceph:62b4f5047ac335093fb47a2897524ad3f1e6aa9d shell -c /etc/ceph/ceph.conf -k /etc/ceph/ceph.client.admin.keyring --fsid 04030f80-9b55-11ea-a06a-001a4aab830c -- ceph mon dump -f json
2020-05-21T11:22:57.724 INFO:ceph.mon.a.smithi185.stdout:May 21 11:22:57 smithi185 bash[8583]: cluster 2020-05-21T11:22:56.149439+0000 mgr.y (mgr.14141) 56 : cluster [DBG] pgmap v43: 1 pgs: 1 unknown; 0 B data, 0 B used, 0 B / 0 B avail
2020-05-21T11:22:59.726 INFO:ceph.mon.a.smithi185.stdout:May 21 11:22:59 smithi185 bash[8583]: cluster 2020-05-21T11:22:58.149920+0000 mgr.y (mgr.14141) 57 : cluster [DBG] pgmap v44: 1 pgs: 1 unknown; 0 B data, 0 B used, 0 B / 0 B avail
2020-05-21T11:23:01.729 INFO:ceph.mon.a.smithi185.stdout:May 21 11:23:01 smithi185 bash[8583]: cluster 2020-05-21T11:23:00.150408+0000 mgr.y (mgr.14141) 58 : cluster [DBG] pgmap v45: 1 pgs: 1 unknown; 0 B data, 0 B used, 0 B / 0 B avail
2020-05-21T11:23:03.730 INFO:ceph.mon.a.smithi185.stdout:May 21 11:23:03 smithi185 bash[8583]: cluster 2020-05-21T11:23:02.150880+0000 mgr.y (mgr.14141) 59 : cluster [DBG] pgmap v46: 1 pgs: 1 unknown; 0 B data, 0 B used, 0 B / 0 B avail
2020-05-21T11:23:03.877 INFO:ceph.mon.b.smithi084.stdout:-- Logs begin at Thu 2020-05-21 11:11:16 UTC. --
2020-05-21T11:23:03.877 INFO:ceph.mon.b.smithi084.stdout:May 21 11:22:55 smithi084 podman[8564]: 2020-05-21 11:22:55.830665394 +0000 UTC m=+0.467398441 container create 0f4c2860764cf01c0ba1fd566a971919b3c32d31edd9810cb52021623717cf10 (image=quay.io/ceph-ci/ceph:62b4f5047ac335093fb47a2897524ad3f1e6aa9d, name=ceph-04030f80-9b55-11ea-a06a-001a4aab830c-mon.b)
2020-05-21T11:23:03.878 INFO:ceph.mon.b.smithi084.stdout:May 21 11:23:03 smithi084 podman[8564]: 2020-05-21 11:23:03.874235726 +0000 UTC m=+8.510968806 container remove 0f4c2860764cf01c0ba1fd566a971919b3c32d31edd9810cb52021623717cf10 (image=quay.io/ceph-ci/ceph:62b4f5047ac335093fb47a2897524ad3f1e6aa9d, name=ceph-04030f80-9b55-11ea-a06a-001a4aab830c-mon.b)
2020-05-21T11:23:03.879 INFO:ceph.mon.b.smithi084.stdout:May 21 11:23:03 smithi084 bash[8559]: time="2020-05-21T11:23:03Z" level=error msg="unable to remove container 0f4c2860764cf01c0ba1fd566a971919b3c32d31edd9810cb52021623717cf10 after failing to start and attach to it" 
2020-05-21T11:23:03.931 INFO:ceph.mon.b.smithi084.stdout:May 21 11:23:03 smithi084 bash[8559]: Error: container_linux.go:345: starting container process caused "exec: \"/usr/bin/ceph-mon\": stat /usr/bin/ceph-mon: no such file or directory" 
2020-05-21T11:23:03.931 INFO:ceph.mon.b.smithi084.stdout:May 21 11:23:03 smithi084 bash[8559]: : OCI runtime error
2020-05-21T11:23:03.944 INFO:ceph.mon.b.smithi084.stdout:May 21 11:23:03 smithi084 systemd[1]: ceph-04030f80-9b55-11ea-a06a-001a4aab830c@mon.b.service: main process exited, code=exited, status=127/n/a
2020-05-21T11:23:04.115 INFO:ceph.mon.b.smithi084.stdout:May 21 11:23:04 smithi084 podman[8769]: Error: no container with name or ID ceph-04030f80-9b55-11ea-a06a-001a4aab830c-mon.b found: no such container
2020-05-21T11:23:04.130 INFO:ceph.mon.b.smithi084.stdout:May 21 11:23:04 smithi084 systemd[1]: Unit ceph-04030f80-9b55-11ea-a06a-001a4aab830c@mon.b.service entered failed state.
2020-05-21T11:23:04.131 INFO:ceph.mon.b.smithi084.stdout:May 21 11:23:04 smithi084 systemd[1]: ceph-04030f80-9b55-11ea-a06a-001a4aab830c@mon.b.service failed.

failed to add the second monitor

/a/kchai-2020-05-21_10:34:02-rados-wip-kefu-testing-2020-05-21-1652-distro-basic-smithi/5076314

please note, this issue also impacts the upgrade test.

rados/thrash-old-clients/{0-size-min-size-overrides/2-size-2-min-size.yaml 1-install/nautilus-v2only.yaml backoff/peering.yaml ceph.yaml clusters/{openstack.yaml three-plus-one.yaml} d-balancer/crush-compat.yaml distro$/{centos_7.6.yaml} msgr-failures/fastclose.yaml rados.yaml thrashers/pggrow.yaml thrashosds-health.yaml workloads/snaps-few-objects.yaml}

where we are using cephadm for deploying ceph monitors. while the error message looks like

reached maximum tries (180) after waiting for 180 seconds

see http://pulpito.ceph.com/kchai-2020-05-21_10:34:02-rados-wip-kefu-testing-2020-05-21-1652-distro-basic-smithi/5076314/

#15 Updated by Kefu Chai almost 4 years ago

  • Priority changed from High to Urgent

#16 Updated by Brad Hubbard almost 4 years ago

/a/yuriw-2020-05-22_19:55:53-rados-wip-yuri-master_5.22.20-distro-basic-smithi/5083157
/a/yuriw-2020-05-22_19:55:53-rados-wip-yuri-master_5.22.20-distro-basic-smithi/5083387
/a/yuriw-2020-05-22_19:55:53-rados-wip-yuri-master_5.22.20-distro-basic-smithi/5083421
/a/yuriw-2020-05-22_19:55:53-rados-wip-yuri-master_5.22.20-distro-basic-smithi/5083521
/a/yuriw-2020-05-22_19:55:53-rados-wip-yuri-master_5.22.20-distro-basic-smithi/5083323

#17 Updated by Brad Hubbard almost 4 years ago

/a/yuriw-2020-05-28_02:23:45-rados-wip-yuri-master_5.27.20-distro-basic-smithi/5098059
/a/yuriw-2020-05-28_02:23:45-rados-wip-yuri-master_5.27.20-distro-basic-smithi/5097857
/a/yuriw-2020-05-28_02:23:45-rados-wip-yuri-master_5.27.20-distro-basic-smithi/5097925
/a/yuriw-2020-05-28_02:23:45-rados-wip-yuri-master_5.27.20-distro-basic-smithi/5097958
/a/yuriw-2020-05-28_02:23:45-rados-wip-yuri-master_5.27.20-distro-basic-smithi/5097827
/a/yuriw-2020-05-28_02:23:45-rados-wip-yuri-master_5.27.20-distro-basic-smithi/5097692

#18 Updated by Sebastian Wagner almost 4 years ago

maybe we can actually fix this by moving to a our internal registry

#19 Updated by Kefu Chai almost 4 years ago

/a/kchai-2020-06-08_10:56:36-rados-wip-kefu-testing-2020-06-08-1713-distro-basic-smithi/5128793/

#20 Updated by Sebastian Wagner almost 4 years ago

fascinating:

systemd[1]: ceph-36889d04-a982-11ea-a06d-001a4aab830c@mon.b.service holdoff time over, scheduling restart.
systemd[1]: Stopped Ceph mon.b for 36889d04-a982-11ea-a06d-001a4aab830c.
systemd[1]: Starting Ceph mon.b for 36889d04-a982-11ea-a06d-001a4aab830c...
podman[9425]: Error: no container with name or ID ceph-36889d04-a982-11ea-a06d-001a4aab830c-mon.b found: no such container
systemd[1]: Started Ceph mon.b for 36889d04-a982-11ea-a06d-001a4aab830c.
bash[9449]: Error: no container with name or ID ceph-36889d04-a982-11ea-a06d-001a4aab830c-mon.b found: no such container
bash[9449]: Error: error creating container storage: the container name "ceph-36889d04-a982-11ea-a06d-001a4aab830c-mon.b" is already in use by "f265ae83cb9ed32b7ed6fd6e62a2e764549
systemd[1]: ceph-36889d04-a982-11ea-a06d-001a4aab830c@mon.b.service: main process exited, code=exited, status=125/n/a
podman[9489]: Error: no container with name or ID ceph-36889d04-a982-11ea-a06d-001a4aab830c-mon.b found: no such container
systemd[1]: Unit ceph-36889d04-a982-11ea-a06d-001a4aab830c@mon.b.service entered failed state.
systemd[1]: ceph-36889d04-a982-11ea-a06d-001a4aab830c@mon.b.service failed.

#22 Updated by Sebastian Wagner almost 4 years ago

  • Related to Bug #46036: cephadm: killmode=none: systemd units failed, but containers still running added

#23 Updated by Neha Ojha over 3 years ago

/a/yuriw-2020-06-29_16:59:21-rados-octopus-distro-basic-smithi/5189862

#24 Updated by Sebastian Wagner over 3 years ago

  • Status changed from In Progress to Pending Backport

was fixed in master.

#25 Updated by Sebastian Wagner over 3 years ago

  • Pull request ID set to 35524

#26 Updated by Neha Ojha over 3 years ago

Sebastian, I am seeing similar failures in rados/thrash-old-clients on recent master, can you please confirm if they need a different tracker issue?

/a/teuthology-2020-07-12_07:01:02-rados-master-distro-basic-smithi/5217488
/a/teuthology-2020-07-12_07:01:02-rados-master-distro-basic-smithi/5217586

#27 Updated by Sebastian Wagner over 3 years ago

yep, that's a different issue:

2020-07-12T15:39:57.736 INFO:journalctl@ceph.mon.c.smithi145.stdout:Jul 12 15:39:57 smithi145 bash[9707]: Error: error removing storage for container "ceph-567f836e-c455-11ea-a06e-001a4aab830c-mon.c": remove /var/lib/containers/storage/overlay/3e200c9a65162f9c55a682cb3ea6b559b07a6569a910eb1057ea4f995067f9eb/merged: device or resource busy
2020-07-12T15:39:58.066 INFO:journalctl@ceph.mon.c.smithi145.stdout:Jul 12 15:39:58 smithi145 bash[9707]: Error: error creating container storage: the container name "ceph-567f836e-c455-11ea-a06e-001a4aab830c-mon.c" is already in use by "4588abec9d4e9d178e083064e81a5766f0e2d800170592bc51a2d31f247e09b5". You have to remove that container to be able to reuse that name.: that name is already in use

https://tracker.ceph.com/issues/46529

#28 Updated by Sebastian Wagner over 3 years ago

  • Related to Bug #46529: cephadm: error removing storage for container "...-mon": remove /var/lib/containers/storage/overlay/.../merged: device or resource busy added

#29 Updated by Sebastian Wagner over 3 years ago

  • Status changed from Pending Backport to Resolved

#30 Updated by Kefu Chai over 3 years ago

2020-07-18T17:09:58.444 INFO:teuthology.orchestra.run.smithi070:> sudo /home/ubuntu/cephtest/cephadm --image quay.ceph.io/ceph-ci/ceph:a2848ece1471a0772679dea9aa70bd344a7f2a0b shell -c /etc/ceph/ceph.conf
 -k /etc/ceph/ceph.client.admin.keyring --fsid f930da78-c918-11ea-a06f-001a4aab830c -- ceph mon dump -f json
2020-07-18T17:09:58.576 INFO:journalctl@ceph.mon.b.smithi070.stdout:-- Logs begin at Sat 2020-07-18 16:57:47 UTC. --
2020-07-18T17:09:58.579 INFO:journalctl@ceph.mon.b.smithi070.stdout:Jul 18 17:09:58 smithi070 podman[9257]: 2020-07-18 17:09:58.220642722 +0000 UTC m=+0.474545854 container create f49dfa4f6185b3edbc4b028e286efb13e317e9349413c6a335fd2af75118815f (image=quay.ceph.io/ceph-ci/ceph:a2848ece1471a0772679dea9aa70bd344a7f2a0b, name=ceph-f930da78-c918-11ea-a06f-001a4aab830c-mon.b)
...
2020-07-18T17:10:06.155 INFO:journalctl@ceph.mon.b.smithi070.stdout:Jul 18 17:10:06 smithi070 podman[9257]: 2020-07-18 17:10:06.159800433 +0000 UTC m=+8.413703562 container remove f49dfa4f6185b3edbc4b028e286efb13e317e9349413c6a335fd2af75118815f (image=quay.ceph.io/ceph-ci/ceph:a2848ece1471a0772679dea9aa70bd344a7f2a0b, name=ceph-f930da78-c918-11ea-a06f-001a4aab830c-mon.b)
2020-07-18T17:10:06.157 INFO:journalctl@ceph.mon.b.smithi070.stdout:Jul 18 17:10:06 smithi070 bash[9216]: time="2020-07-18T17:10:06Z" level=error msg="unable to remove container f49dfa4f6185b3edbc4b028e286efb13e317e9349413c6a335fd2af75118815f after failing to start and attach to it" 
2020-07-18T17:10:06.281 INFO:journalctl@ceph.mon.b.smithi070.stdout:Jul 18 17:10:06 smithi070 bash[9216]: Error: container_linux.go:345: starting container process caused "exec: \"/usr/bin/ceph-mon\": stat /usr/bin/ceph-mon: no such file or directory" 
2020-07-18T17:10:06.282 INFO:journalctl@ceph.mon.b.smithi070.stdout:Jul 18 17:10:06 smithi070 bash[9216]: : OCI runtime error
2020-07-18T17:10:06.302 INFO:journalctl@ceph.mon.b.smithi070.stdout:Jul 18 17:10:06 smithi070 systemd[1]: ceph-f930da78-c918-11ea-a06f-001a4aab830c@mon.b.service: main process exited, code=exited, status=127/n/a
2020-07-18T17:10:06.533 INFO:journalctl@ceph.mon.b.smithi070.stdout:Jul 18 17:10:06 smithi070 podman[9457]: Error: no container with name or ID ceph-f930da78-c918-11ea-a06f-001a4aab830c-mon.b found: no such container
2020-07-18T17:10:06.555 INFO:journalctl@ceph.mon.b.smithi070.stdout:Jul 18 17:10:06 smithi070 systemd[1]: Unit ceph-f930da78-c918-11ea-a06f-001a4aab830c@mon.b.service entered failed state.
2020-07-18T17:10:06.555 INFO:journalctl@ceph.mon.b.smithi070.stdout:Jul 18 17:10:06 smithi070 systemd[1]: ceph-f930da78-c918-11ea-a06f-001a4aab830c@mon.b.service failed.
...
2020-07-18T17:10:10.026 INFO:tasks.cephadm:Waiting for 2 mons in monmap...
/a/kchai-2020-07-18_13:35:09-rados-wip-kefu-testing-2020-07-18-1927-distro-basic-smithi/5237560

#31 Updated by Kefu Chai over 3 years ago

  • Status changed from Resolved to New

#32 Updated by Sebastian Wagner over 3 years ago

  • Status changed from New to Resolved

/a/kchai-2020-07-18_13:35:09-rados-wip-kefu-testing-2020-07-18-1927-distro-basic-smithi/5237560 is actually #46529

#33 Updated by Deepika Upadhyay over 3 years ago

  • Status changed from Resolved to New

seems like this issue still exists as i see the fix PR was backported to octopus, seen this in:

/a/yuriw-2020-10-20_15:30:01-rados-wip-yuri5-testing-2020-10-07-1021-octopus-distro-basic-smithi/5542357/teuthology.log

msg="unable to remove container c7e4fd03b994ae13ba91fefb5d6955680ef4d535583dc08a5416267dd7c76a9c after failing to start and attach to it" 
r_linux.go:345: starting container process caused "exec: \"/usr/bin/ceph-mon\": stat /usr/bin/ceph-mon: no such file or directory" 

#34 Updated by Dan Mick about 3 years ago

I've been staring at that last failure for a few days, and I can't figure out what could possibly have caused it. The same image worked fine on smithi173, but on smithi157 it reports "/usr/bin/ceph-mon not found", and no apparent errors pulling. In fact I can't see where it was pulled at all from the log.

#35 Updated by Sebastian Wagner about 3 years ago

it has nothing to do with the image. The issue went away with https://github.com/ceph/ceph/pull/35524 indicating something must have gone wrong cleaning up the previous container with the same name.

#36 Updated by Sebastian Wagner about 3 years ago

  • Priority changed from Urgent to Normal

This is no longer critical, as it only affects octopus at this point.

#37 Updated by Sebastian Wagner about 3 years ago

  • Status changed from New to Can't reproduce

#38 Updated by Sebastian Wagner over 2 years ago

  • Related to Bug #53175: podman: failed to exec pid1: Exec format error: wrongly using the amd64-only digest added

Also available in: Atom PDF