Bug #53807
Dead jobs in rados/cephadm/smoke-roleless{...}: ingress jobs stuck
0%
Description
Description: rados/cephadm/smoke-roleless/{0-distro/centos_8.3_container_tools_3.0 0-nvme-loop 1-start 2-services/nfs-ingress 3-final}
Failure Reason: hit max job timeout
Jobs:
/a/yuriw-2022-01-06_15:50:38-rados-wip-yuri8-testing-2022-01-05-1411-distro-default-smithi/6598774
/a/yuriw-2022-01-06_15:50:38-rados-wip-yuri8-testing-2022-01-05-1411-distro-default-smithi/6598785
/a/yuriw-2022-01-06_15:50:38-rados-wip-yuri8-testing-2022-01-05-1411-distro-default-smithi/6599316
/a/yuriw-2022-01-06_15:50:38-rados-wip-yuri8-testing-2022-01-05-1411-distro-default-smithi/6599350
Earlier in the log:
2022-01-06T16:33:28.615 INFO:teuthology.task.ansible.out:^M
TASK [common : Check firewalld status] *****************************************^M
2022-01-06T16:33:28.617 INFO:teuthology.task.ansible.out:fatal: [smithi107.front.sepia.ceph.com]: FAILED! => {"censored": "the output has been hidden due to the fact that 'no_log: true' was specified for this result", "changed": true}^M
...ignoring^M
2022-01-06T16:33:28.638 INFO:teuthology.task.ansible.out:Thursday 06 January 2022 16:33:28 +0000 (0:00:00.260) 0:02:03.410 ****** ^M
Later in the log:
2022-01-06T16:44:13.943 INFO:journalctl@ceph.mon.smithi107.smithi107.stdout:Jan 06 16:44:13 smithi107 ceph-mon[30485]: from='mgr.14216 172.21.15.107:0/4007838189' entity='mgr.smithi107.tttsho' cmd=[{"prefix": "osd pool create", "pool": "cephfs.foofs.data"}]: dispatch
2022-01-06T16:44:13.943 INFO:journalctl@ceph.mon.smithi107.smithi107.stdout:Jan 06 16:44:13 smithi107 ceph-mon[30485]: pgmap v133: 33 pgs: 32 unknown, 1 active+clean; 577 KiB data, 47 MiB used, 715 GiB / 715 GiB avail
2022-01-06T16:44:13.943 INFO:journalctl@ceph.mon.smithi107.smithi107.stdout:Jan 06 16:44:13 smithi107 ceph-mon[30485]: from='mgr.14216 172.21.15.107:0/4007838189' entity='mgr.smithi107.tttsho'
2022-01-06T16:44:13.943 INFO:journalctl@ceph.mon.smithi107.smithi107.stdout:Jan 06 16:44:13 smithi107 conmon[30462]: 2022-01-06T16:44:13.732+0000 7fa865de5700 -1 log_channel(cluster) log [ERR] : Health check failed: 1 filesystem is offline (MDS_ALL_DOWN)
2022-01-06T16:44:14.150 INFO:journalctl@ceph.mon.smithi150.smithi150.stdout:Jan 06 16:44:13 smithi150 ceph-mon[37940]: from='mgr.14216 172.21.15.107:0/4007838189' entity='mgr.smithi107.tttsho' cmd='[{"prefix": "osd pool create", "pool": "cephfs.foofs.meta"}]': finished
2022-01-06T16:44:14.151 INFO:journalctl@ceph.mon.smithi150.smithi150.stdout:Jan 06 16:44:13 smithi150 ceph-mon[37940]: osdmap e43: 8 total, 8 up, 8 in
2022-01-06T16:44:14.151 INFO:journalctl@ceph.mon.smithi150.smithi150.stdout:Jan 06 16:44:13 smithi150 ceph-mon[37940]: from='mgr.14216 172.21.15.107:0/4007838189' entity='mgr.smithi107.tttsho' cmd=[{"prefix": "osd pool create", "pool": "cephfs.foofs.data"}]: dispatch
2022-01-06T16:44:14.151 INFO:journalctl@ceph.mon.smithi150.smithi150.stdout:Jan 06 16:44:13 smithi150 ceph-mon[37940]: pgmap v133: 33 pgs: 32 unknown, 1 active+clean; 577 KiB data, 47 MiB used, 715 GiB / 715 GiB avail
2022-01-06T16:44:14.151 INFO:journalctl@ceph.mon.smithi150.smithi150.stdout:Jan 06 16:44:13 smithi150 ceph-mon[37940]: from='mgr.14216 172.21.15.107:0/4007838189' entity='mgr.smithi107.tttsho'
2022-01-06T16:44:14.535 INFO:teuthology.run_tasks:Running task cephadm.apply...
Related issues
History
#1 Updated by Laura Flores almost 2 years ago
Another similar scenario, which does not involve offline filesystems:
Description: rados/cephadm/smoke-roleless/{0-distro/rhel_8.4_container_tools_rhel8 0-nvme-loop 1-start 2-services/rgw-ingress 3-final}
Failure reason: hit max job timeout
Jobs:
/a/yuriw-2022-01-06_15:50:38-rados-wip-yuri8-testing-2022-01-05-1411-distro-default-smithi/6598830
/a/yuriw-2022-01-06_15:50:38-rados-wip-yuri8-testing-2022-01-05-1411-distro-default-smithi/6599155
TASK [common : Check firewalld status] *****************************************
2022-01-06T17:12:46.159 INFO:teuthology.task.ansible.out:fatal: [smithi158.front.sepia.ceph.com]: FAILED! => {"censored": "the output has been hidden due to the fact that 'no_log: true' was specified for this result", "changed": true}
...ignoring
2022-01-06T17:12:46.180 INFO:teuthology.task.ansible.out:Thursday 06 January 2022 17:12:46 +0000 (0:00:00.236) 0:03:15.016 ******
2022-01-06T17:12:46.208 INFO:teuthology.task.ansible.out:
TASK [common : Open nrpe port if firewalld enabled] ****************************
TASK [testnode : Stop and disable iptables] ************************************
2022-01-06T17:16:52.036 INFO:teuthology.task.ansible.out:fatal: [smithi179.front.sepia.ceph.com]: FAILED! => {"changed": false, "msg": "Could not find the requested service iptables: host"}
...ignoring
2022-01-06T17:16:52.056 INFO:teuthology.task.ansible.out:Thursday 06 January 2022 17:16:52 +0000 (0:00:00.307) 0:07:20.893 ******
2022-01-06T17:16:52.698 INFO:teuthology.task.ansible.out:
TASK [testnode : Enable SELinux] ***********************************************
#2 Updated by Laura Flores almost 2 years ago
And a third similar scenario where an offline filesystem leads to failed CEPHADM daemons:
Description: rados/cephadm/smoke-roleless/{0-distro/ubuntu_20.04 0-nvme-loop 1-start 2-services/nfs-ingress 3-final}
Failure reason: hit max job timeout
Jobs:
/a/yuriw-2022-01-06_15:50:38-rados-wip-yuri8-testing-2022-01-05-1411-distro-default-smithi/6599055
/a/yuriw-2022-01-06_15:50:38-rados-wip-yuri8-testing-2022-01-05-1411-distro-default-smithi/6599082
2022-01-06T21:57:30.302 INFO:journalctl@ceph.mon.smithi005.smithi005.stdout:Jan 06 21:57:30 smithi005 bash[11180]: audit 2022-01-06T21:57:29.036464+0000 mon.smithi005 (mon.0) 651 : audit [INF] from='mgr.14206 172.21.15.5:0/2764321477' entity='mgr.smithi005.fpqapy' cmd=[{"prefix": "osd pool create", "pool": "cephfs.f
oofs.meta"}]: dispatch
2022-01-06T21:57:30.303 INFO:journalctl@ceph.mon.smithi005.smithi005.stdout:Jan 06 21:57:30 smithi005 bash[11180]: cluster 2022-01-06T21:57:29.125324+0000 mgr.smithi005.fpqapy (mgr.14206) 195 : cluster [DBG] pgmap v177: 1 pgs: 1 active+clean; 577 KiB data, 46 MiB used, 715 GiB / 715 GiB avail
2022-01-06T21:57:30.390 INFO:journalctl@ceph.mon.smithi031.smithi031.stdout:Jan 06 21:57:30 smithi031 bash[13845]: audit 2022-01-06T21:57:29.035515+0000 mgr.smithi005.fpqapy (mgr.14206) 194 : audit [DBG] from='client.14540 -' entity='client.admin' cmd=[{"prefix": "fs volume create", "name": "foofs", "target": ["mon-
mgr", ""]}]: dispatch
2022-01-06T21:57:30.390 INFO:journalctl@ceph.mon.smithi031.smithi031.stdout:Jan 06 21:57:30 smithi031 bash[13845]: audit 2022-01-06T21:57:29.036464+0000 mon.smithi005 (mon.0) 651 : audit [INF] from='mgr.14206 172.21.15.5:0/2764321477' entity='mgr.smithi005.fpqapy' cmd=[{"prefix": "osd pool create", "pool": "cephfs.f
oofs.meta"}]: dispatch
2022-01-06T21:57:30.390 INFO:journalctl@ceph.mon.smithi031.smithi031.stdout:Jan 06 21:57:30 smithi031 bash[13845]: cluster 2022-01-06T21:57:29.125324+0000 mgr.smithi005.fpqapy (mgr.14206) 195 : cluster [DBG] pgmap v177: 1 pgs: 1 active+clean; 577 KiB data, 46 MiB used, 715 GiB / 715 GiB avail
2022-01-06T21:57:31.136 INFO:journalctl@ceph.mon.smithi005.smithi005.stdout:Jan 06 21:57:31 smithi005 bash[11180]: debug 2022-01-06T21:57:31.045+0000 7f17651e8700 -1 log_channel(cluster) log [ERR] : Health check failed: 1 filesystem is offline (MDS_ALL_DOWN)
2022-01-06T21:57:31.136 INFO:journalctl@ceph.mon.smithi005.smithi005.stdout:Jan 06 21:57:31 smithi005 bash[11180]: audit 2022-01-06T21:57:30.036464+0000 mon.smithi005 (mon.0) 652 : audit [INF] from='mgr.14206 172.21.15.5:0/2764321477' entity='mgr.smithi005.fpqapy' cmd='[{"prefix": "osd pool create", "pool": "cephfs.foofs.meta"}]': finished
2022-01-06T21:58:40.526 INFO:teuthology.orchestra.run.smithi005.stdout:[{"placement": {"count": 1}, "service_name": "alertmanager", "service_type": "alertmanager", "status": {"created": "2022-01-06T21:51:00.275167Z", "last_refresh": "2022-01-06T21:58:38.559512Z", "ports": [9093, 9094], "running": 1, "size": 1}}, {"p
lacement": {"host_pattern": "*"}, "service_name": "crash", "service_type": "crash", "status": {"created": "2022-01-06T21:50:50.887947Z", "last_refresh": "2022-01-06T21:58:37.072722Z", "running": 2, "size": 2}}, {"placement": {"count": 1}, "service_name": "grafana", "service_type": "grafana", "status": {"created": "2
022-01-06T21:50:55.649076Z", "last_refresh": "2022-01-06T21:58:38.559719Z", "ports": [3000], "running": 1, "size": 1}}, {"events": ["2022-01-06T21:57:37.915361Z service:ingress.nfs.foo [INFO] \"service was created\""], "placement": {"count": 2}, "service_id": "nfs.foo", "service_name": "ingress.nfs.foo", "service_ty
pe": "ingress", "spec": {"backend_service": "nfs.foo", "frontend_port": 2049, "monitor_port": 9002, "virtual_ip": "10.0.31.5/16"}, "status": {"created": "2022-01-06T21:57:37.910852Z", "last_refresh": "2022-01-06T21:58:37.075244Z", "ports": [2049, 9002], "running": 3, "size": 4, "virtual_ip": "10.0.31.5/16"}}, {"even
ts": ["2022-01-06T21:57:31.078479Z service:mds.foofs [INFO] \"service was created\""], "placement": {"count": 2}, "service_id": "foofs", "service_name": "mds.foofs", "service_type": "mds", "status": {"created": "2022-01-06T21:57:31.074689Z", "last_refresh": "2022-01-06T21:58:37.074755Z", "running": 2, "size": 2}}, {
"placement": {"count": 2}, "service_name": "mgr", "service_type": "mgr", "status": {"created": "2022-01-06T21:50:48.757876Z", "last_refresh": "2022-01-06T21:58:37.073083Z", "running": 2, "size": 2}}, {"placement": {"count": 2, "hosts": ["smithi005:172.21.15.5=smithi005", "smithi031:172.21.15.31=smithi031"]}, "servic
e_name": "mon", "service_type": "mon", "status": {"created": "2022-01-06T21:51:55.847078Z", "last_refresh": "2022-01-06T21:58:37.073332Z", "running": 2, "size": 2}}, {"events": ["2022-01-06T21:57:37.910505Z service:nfs.foo [INFO] \"service was created\""], "placement": {"count": 2}, "service_id": "foo", "service_nam
e": "nfs.foo", "service_type": "nfs", "spec": {"port": 12049}, "status": {"created": "2022-01-06T21:57:37.905948Z", "last_refresh": "2022-01-06T21:58:37.075048Z", "ports": [12049], "running": 2, "size": 2}}, {"placement": {"host_pattern": "*"}, "service_name": "node-exporter", "service_type": "node-exporter", "statu
s": {"created": "2022-01-06T21:50:58.088177Z", "last_refresh": "2022-01-06T21:58:37.073559Z", "ports": [9100], "running": 2, "size": 2}}, {"events": ["2022-01-06T21:53:27.529310Z service:osd.all-available-devices [INFO] \"service was created\""], "placement": {"host_pattern": "*"}, "service_id": "all-available-devic
es", "service_name": "osd.all-available-devices", "service_type": "osd", "spec": {"data_devices": {"all": true}, "filter_logic": "AND", "objectstore": "bluestore"}, "status": {"created": "2022-01-06T21:53:27.522984Z", "last_refresh": "2022-01-06T21:58:37.073802Z", "running": 8, "size": 8}}, {"placement": {"count": 1
}, "service_name": "prometheus", "service_type": "prometheus", "status": {"created": "2022-01-06T21:50:53.020786Z", "last_refresh": "2022-01-06T21:58:38.559921Z", "ports": [9095], "running": 1, "size": 1}}]
2022-01-06T21:58:40.982 INFO:journalctl@ceph.mon.smithi031.smithi031.stdout:Jan 06 21:58:40 smithi031 bash[13845]: cluster 2022-01-06T21:58:39.567636+0000 mon.smithi005 (mon.0) 747 : cluster [WRN] Health check failed: 1 failed cephadm daemon(s) (CEPHADM_FAILED_DAEMON)
2022-01-06T21:58:41.053 INFO:journalctl@ceph.mon.smithi005.smithi005.stdout:Jan 06 21:58:40 smithi005 bash[11180]: cluster 2022-01-06T21:58:39.567636+0000 mon.smithi005 (mon.0) 747 : cluster [WRN] Health check failed: 1 failed cephadm daemon(s) (CEPHADM_FAILED_DAEMON)
2022-01-06T21:58:41.698 INFO:tasks.cephadm:nfs.foo has 2/2
#3 Updated by Jeff Layton almost 2 years ago
- Assignee set to Venky Shankar
Looking at /a/yuriw-2022-01-06_15:50:38-rados-wip-yuri8-testing-2022-01-05-1411-distro-default-smithi/6599082/remote/smithi137/log/6ffb065c-6f3e-11ec-8c32-001a4aab830c
2022-01-06T22:36:04.416+0000 7ff834993900 0 set uid:gid to 167:167 (ceph:ceph) 2022-01-06T22:36:04.416+0000 7ff834993900 0 ceph version 17.0.0-9958-g09cb93b0 (09cb93b02b9e3e136a791fb4aa165fc1d446de8c) quincy (dev), process ceph-mds, pid 7 2022-01-06T22:36:04.416+0000 7ff834993900 1 main not setting numa affinity 2022-01-06T22:36:04.416+0000 7ff834993900 0 pidfile_write: ignore empty --pid-file 2022-01-06T22:36:04.419+0000 7ff82abbc700 1 mds.foofs.smithi137.gkwdeh Updating MDS map to version 2 from mon.0 2022-01-06T22:36:04.557+0000 7ff82abbc700 1 mds.foofs.smithi137.gkwdeh Updating MDS map to version 3 from mon.0 2022-01-06T22:36:04.557+0000 7ff82abbc700 1 mds.foofs.smithi137.gkwdeh Monitors have assigned me to become a standby. 2022-01-06T22:36:04.561+0000 7ff82abbc700 1 mds.foofs.smithi137.gkwdeh Updating MDS map to version 4 from mon.0 2022-01-06T22:36:04.562+0000 7ff82abbc700 1 mds.0.4 handle_mds_map i am now mds.0.4 2022-01-06T22:36:04.562+0000 7ff82abbc700 1 mds.0.4 handle_mds_map state change up:boot --> up:creating 2022-01-06T22:36:04.562+0000 7ff82abbc700 0 mds.0.cache creating system inode with ino:0x1 2022-01-06T22:36:04.562+0000 7ff82abbc700 0 mds.0.cache creating system inode with ino:0x100 2022-01-06T22:36:04.562+0000 7ff82abbc700 0 mds.0.cache creating system inode with ino:0x600 2022-01-06T22:36:04.562+0000 7ff82abbc700 0 mds.0.cache creating system inode with ino:0x601 2022-01-06T22:36:04.562+0000 7ff82abbc700 0 mds.0.cache creating system inode with ino:0x602 2022-01-06T22:36:04.562+0000 7ff82abbc700 0 mds.0.cache creating system inode with ino:0x603 2022-01-06T22:36:04.562+0000 7ff82abbc700 0 mds.0.cache creating system inode with ino:0x604 2022-01-06T22:36:04.562+0000 7ff82abbc700 0 mds.0.cache creating system inode with ino:0x605 2022-01-06T22:36:04.562+0000 7ff82abbc700 0 mds.0.cache creating system inode with ino:0x606 2022-01-06T22:36:04.563+0000 7ff82abbc700 0 mds.0.cache creating system inode with ino:0x607 2022-01-06T22:36:04.563+0000 7ff82abbc700 0 mds.0.cache creating system inode with ino:0x608 2022-01-06T22:36:04.563+0000 7ff82abbc700 0 mds.0.cache creating system inode with ino:0x609 2022-01-06T22:36:04.572+0000 7ff824bb0700 1 mds.0.4 creating_done 2022-01-06T22:36:05.565+0000 7ff82abbc700 1 mds.foofs.smithi137.gkwdeh Updating MDS map to version 5 from mon.0 2022-01-06T22:36:05.565+0000 7ff82abbc700 1 mds.0.4 handle_mds_map i am now mds.0.4 2022-01-06T22:36:05.565+0000 7ff82abbc700 1 mds.0.4 handle_mds_map state change up:creating --> up:active 2022-01-06T22:36:05.565+0000 7ff82abbc700 1 mds.0.4 recovery_done -- successful recovery! 2022-01-06T22:36:05.565+0000 7ff82abbc700 1 mds.0.4 active_start 2022-01-06T22:36:10.561+0000 7ff8283b7700 -1 mds.pinger is_rank_lagging: rank=0 was never sent ping request. 2022-01-07T03:51:01.262+0000 7ff82c3bf700 -1 received signal: Hangup from Kernel ( Could be generated by pthread_kill(), raise(), abort(), alarm() ) UID: 0
There's a more recent log too, but it's scrambled. The is_rank_lagging message may be significant here, but it looks like it started up, and was running until it was shut down (probably via systemd).
The other job (6599055) looks similar, there is just no log message about a SIGHUP.
#4 Updated by Venky Shankar almost 2 years ago
Jeff Layton wrote:
Looking at /a/yuriw-2022-01-06_15:50:38-rados-wip-yuri8-testing-2022-01-05-1411-distro-default-smithi/6599082/remote/smithi137/log/6ffb065c-6f3e-11ec-8c32-001a4aab830c
[...]
There's a more recent log too, but it's scrambled. The is_rank_lagging message may be significant here, but it looks like it started up, and was running until it was shut down (probably via systemd).
is_rank_lagging should be harmless - that's a part of the metrics machinery in the MDS that does not effect the MDSs boot procedure.
The other job (6599055) looks similar, there is just no log message about a SIGHUP.
#5 Updated by Venky Shankar almost 2 years ago
Is this related to CephFS? Comment https://tracker.ceph.com/issues/53807#note-1 indicates this is being hit with rados jobs too.
#6 Updated by Laura Flores almost 2 years ago
- Project changed from CephFS to Ceph
#7 Updated by Laura Flores almost 2 years ago
Moved this Tracker out of CephFS, as offline filesystems on this particular test appear even in successful runs.
Example successful run: /a/yuriw-2022-01-04_18:45:05-rados-wip-yuriw-master-1.1.22-distro-default-smithi/6595039
2022-01-04T21:05:24.444 INFO:journalctl@ceph.mon.smithi149.smithi149.stdout:Jan 04 21:05:23 smithi149 ceph-mon[29224]: from='mgr.14214 172.21.15.149:0/4025601137' entity='mgr.smithi149.axflmp' cmd=[{"prefix": "osd pool create", "pool": "cephfs.foofs.data"}]: dispatch
2022-01-04T21:05:24.445 INFO:journalctl@ceph.mon.smithi149.smithi149.stdout:Jan 04 21:05:23 smithi149 conmon[29200]: 2022-01-04T21:05:23.994+0000 7f2cefb7c700 -1 log_channel(cluster) log [ERR] : Health check failed: 1 filesystem is offline (MDS_ALL_DOWN)
2022-01-04T21:05:25.034 DEBUG:teuthology.orchestra.run.smithi149:> sudo /home/ubuntu/cephtest/cephadm --image quay.ceph.io/ceph-ci/ceph:7b5bbfea3dc99d59b2173c093177ae92f881f823 shell -c /etc/ceph/ceph.conf -k /etc/ceph/ceph.client.admin.keyring --fsid 15c89698-6da1-11ec-8c32-001a4aab830c -- bash -c 'ceph nfs cluster create foo --ingress --virtual-ip 10.0.31.149/16 --port 2999'
Hidden Ansible output is also normal. The root of the cause must be something else:
TASK [common : Check firewalld status] *****************************************
2022-01-04T20:51:41.958 INFO:teuthology.task.ansible.out:fatal: [smithi149.front.sepia.ceph.com]: FAILED! => {"censored": "the output has been hidden due to the fact that 'no_log: true' was specified for this result", "changed": true}
...ignoring
2022-01-04T20:51:41.979 INFO:teuthology.task.ansible.out:Tuesday 04 January 2022 20:51:41 +0000 (0:00:00.081) 0:03:15.747 *******
2022-01-04T20:51:42.016 INFO:teuthology.task.ansible.out:
TASK [common : Open nrpe port if firewalld enabled] ****************************
#8 Updated by Laura Flores almost 2 years ago
- Subject changed from Hidden ansible output and offline filesystem failures lead to dead jobs to Dead jobs in rados/cephadm/smoke-roleless{...}
#9 Updated by Laura Flores almost 2 years ago
- Project changed from Ceph to Orchestrator
#10 Updated by Aishwarya Mathuria almost 2 years ago
/a/yuriw-2022-01-13_18:06:52-rados-wip-yuri3-testing-2022-01-13-0809-distro-default-smithi/6614725
/a/yuriw-2022-01-13_18:06:52-rados-wip-yuri3-testing-2022-01-13-0809-distro-default-smithi/6614681
/a/yuriw-2022-01-13_18:06:52-rados-wip-yuri3-testing-2022-01-13-0809-distro-default-smithi/6614665
#11 Updated by Sebastian Wagner almost 2 years ago
- Duplicated by Bug #53904: cephadm: ingress jobs stuck added
#12 Updated by Sebastian Wagner almost 2 years ago
- Subject changed from Dead jobs in rados/cephadm/smoke-roleless{...} to Dead jobs in rados/cephadm/smoke-roleless{...}: ingress jobs stuck
#13 Updated by Sebastian Wagner almost 2 years ago
- Priority changed from Normal to Immediate
#14 Updated by Venky Shankar almost 2 years ago
- Assignee changed from Venky Shankar to Sebastian Wagner
Reassigning to cephadm lead.
#15 Updated by Melissa Li almost 2 years ago
- Assignee changed from Sebastian Wagner to Melissa Li
On a teuthology node with the stuck job:
{ "style": "cephadm:v1", "name": "haproxy.nfs.foo.smithi086.rilsmn", "fsid": "677afccc-7d61-11ec-8c35-001a4aab830c", "systemd_unit": "ceph-677afccc-7d61-11ec-8c35-001a4aab830c@haproxy.nfs.foo.smithi086.rilsmn", "enabled": true, "state": "stopped", "service_name": "ingress.nfs.foo", "ports": [ 2999, 9999 ], "ip": null, "deployed_by": [ "quay.ceph.io/ceph-ci/ceph@sha256:4f125c7c6b9f2347c45fc02cd9dac333ee5730d930fbbe70f27ae87ecb849842" ], "rank": null, "rank_generation": null, "extra_container_args": null, "memory_request": null, "memory_limit": null, "container_id": null, "container_image_name": "docker.io/library/haproxy:2.3", "container_image_id": null, "container_image_digests": null, "version": null, "started": null, "created": "2022-01-24T22:10:42.670979Z", "deployed": "2022-01-24T22:10:41.647001Z", "configured": "2022-01-24T22:10:42.670979Z" },
the haproxy logs:
[root@smithi086 cephtest]# ./cephadm logs --name haproxy.nfs.foo.smithi086.rilsmn | tee haproxy.log Inferring fsid 677afccc-7d61-11ec-8c35-001a4aab830c -- Logs begin at Mon 2022-01-24 21:56:24 UTC, end at Wed 2022-01-26 16:47:54 UTC. -- Jan 24 22:10:41 smithi086 systemd[1]: Starting Ceph haproxy.nfs.foo.smithi086.rilsmn for 677afccc-7d61-11ec-8c35-001a4aab830c... Jan 24 22:10:42 smithi086 conmon[57221]: [NOTICE] 023/221042 (7) : haproxy version is 2.3.17-d1c9119 Jan 24 22:10:42 smithi086 conmon[57221]: [NOTICE] 023/221042 (7) : path to executable is /usr/local/sbin/haproxy Jan 24 22:10:42 smithi086 conmon[57221]: [ALERT] 023/221042 (7) : Starting frontend stats: cannot bind socket (Cannot assign requested address) [10.0.31.35:9999] Jan 24 22:10:42 smithi086 conmon[57221]: [ALERT] 023/221042 (7) : Starting frontend frontend: cannot bind socket (Cannot assign requested address) [10.0.31.35:2999] Jan 24 22:10:42 smithi086 conmon[57221]: [ALERT] 023/221042 (7) : [haproxy.main()] Some protocols failed to start their listeners! Exiting. Jan 24 22:10:42 smithi086 bash[57031]: 0dbc37f5c5cf924909892a20c3aa791436cae779913cbac45662cd51ffa60327 Jan 24 22:10:42 smithi086 systemd[1]: Started Ceph haproxy.nfs.foo.smithi086.rilsmn for 677afccc-7d61-11ec-8c35-001a4aab830c. Jan 24 22:10:43 smithi086 systemd[1]: ceph-677afccc-7d61-11ec-8c35-001a4aab830c@haproxy.nfs.foo.smithi086.rilsmn.service: Main process exited, code=exited, status=1/FAILURE Jan 24 22:10:43 smithi086 systemd[1]: ceph-677afccc-7d61-11ec-8c35-001a4aab830c@haproxy.nfs.foo.smithi086.rilsmn.service: Failed with result 'exit-code'. Jan 24 22:10:53 smithi086 systemd[1]: ceph-677afccc-7d61-11ec-8c35-001a4aab830c@haproxy.nfs.foo.smithi086.rilsmn.service: Service RestartSec=10s expired, scheduling restart. Jan 24 22:10:53 smithi086 systemd[1]: ceph-677afccc-7d61-11ec-8c35-001a4aab830c@haproxy.nfs.foo.smithi086.rilsmn.service: Scheduled restart job, restart counter is at 1. Jan 24 22:10:53 smithi086 systemd[1]: Stopped Ceph haproxy.nfs.foo.smithi086.rilsmn for 677afccc-7d61-11ec-8c35-001a4aab830c. Jan 24 22:10:53 smithi086 systemd[1]: Starting Ceph haproxy.nfs.foo.smithi086.rilsmn for 677afccc-7d61-11ec-8c35-001a4aab830c... Jan 24 22:10:54 smithi086 conmon[58096]: [NOTICE] 023/221054 (7) : New worker #1 (9) forked Jan 24 22:10:54 smithi086 bash[57906]: 97bdb5da26b18f8ea05c497035a2c72d051cc831a3b0656b5446e439f1971ea7 Jan 24 22:10:54 smithi086 systemd[1]: Started Ceph haproxy.nfs.foo.smithi086.rilsmn for 677afccc-7d61-11ec-8c35-001a4aab830c. Jan 24 22:11:53 smithi086 systemd[1]: Stopping Ceph haproxy.nfs.foo.smithi086.rilsmn for 677afccc-7d61-11ec-8c35-001a4aab830c... Jan 24 22:11:54 smithi086 bash[65491]: Error: no container with name or ID ceph-677afccc-7d61-11ec-8c35-001a4aab830c-haproxy.nfs.foo.smithi086.rilsmn found: no such container Jan 24 22:11:54 smithi086 conmon[58096]: [WARNING] 023/221154 (7) : Exiting Master process... Jan 24 22:11:54 smithi086 conmon[58096]: [NOTICE] 023/221154 (7) : haproxy version is 2.3.17-d1c9119 Jan 24 22:11:54 smithi086 conmon[58096]: [NOTICE] 023/221154 (7) : path to executable is /usr/local/sbin/haproxy Jan 24 22:11:54 smithi086 conmon[58096]: [ALERT] 023/221154 (7) : Current worker #1 (9) exited with code 143 (Terminated) Jan 24 22:11:54 smithi086 conmon[58096]: [WARNING] 023/221154 (7) : All workers exited. Exiting... (0) Jan 24 22:11:54 smithi086 bash[65491]: 97bdb5da26b18f8ea05c497035a2c72d051cc831a3b0656b5446e439f1971ea7 Jan 24 22:11:54 smithi086 bash[65491]: Error: no container with name or ID ceph-677afccc-7d61-11ec-8c35-001a4aab830c-haproxy.nfs.foo.smithi086.rilsmn found: no such container Jan 24 22:11:54 smithi086 systemd[1]: Stopped Ceph haproxy.nfs.foo.smithi086.rilsmn for 677afccc-7d61-11ec-8c35-001a4aab830c.
#16 Updated by Guillaume Abrioux almost 2 years ago
- Status changed from New to In Progress
- Assignee changed from Melissa Li to Guillaume Abrioux
#17 Updated by Guillaume Abrioux almost 2 years ago
- Pull request ID set to 45014
#18 Updated by Guillaume Abrioux almost 2 years ago
- Status changed from In Progress to Fix Under Review
#19 Updated by Guillaume Abrioux almost 2 years ago
- Backport set to quincy,pacific,octopus
#20 Updated by Laura Flores almost 2 years ago
- Status changed from Fix Under Review to Pending Backport
#21 Updated by Adam King almost 2 years ago
fix is in pacific now via https://github.com/ceph/ceph/pull/44628. Quincy backport is in testing https://github.com/ceph/ceph/pull/45038
#22 Updated by Yuri Weinstein almost 2 years ago
#23 Updated by Laura Flores almost 2 years ago
- Status changed from Pending Backport to Resolved