Project

General

Profile

Bug #63244

[NVMeOF Gateway service fails to deploy at nvmeof v0.0.5

Added by Rahul Lepakshi 4 months ago.

Status:
New
Priority:
Normal
Assignee:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

At nvmeof v0.0.5, nvmeof service fails to deploy with discovery service error

Image - quay.io/ceph/nvmeof:v0.0.5

# journalctl -u ceph-27ca60e2-6e35-11ee-a2a5-ac1f6b0a1534@nvmeof.rbd.argo027.jmrhgu.service
Oct 19 10:32:49 argo027 systemd[1]: Starting Ceph nvmeof.rbd.argo027.jmrhgu for 27ca60e2-6e35-11ee-a2a5-ac1f6b0a1534...
Oct 19 10:32:50 argo027 bash[120312]: Trying to pull quay.io/ceph/nvmeof:v0.0.5...
Oct 19 10:32:50 argo027 bash[120312]: Getting image source signatures
Oct 19 10:32:50 argo027 bash[120312]: Copying blob sha256:392941712c4e0e0d0301201608406ffbd557f30c0cf0a9ded82fd857ca5689e4
Oct 19 10:32:50 argo027 bash[120312]: Copying blob sha256:2f74143aa775446822cc3c6381b789651a2e2791c7abb6d70ec288b5c6e4b012
Oct 19 10:32:50 argo027 bash[120312]: Copying blob sha256:054724d404ce76e502951af08cc80f2e0f9d7aa1ed6f3982ad19b8f1dcbf0d6c
Oct 19 10:32:50 argo027 bash[120312]: Copying blob sha256:024c6c27de7c43beddde60d83fc17aaa858056d7bd16cffda12e1cd7a6a3d97f
Oct 19 10:32:50 argo027 bash[120312]: Copying blob sha256:85effdcd1eaa839302024df008d3cc25221a85cd9a6740e2d1ff805521d0cc03
Oct 19 10:32:50 argo027 bash[120312]: Copying blob sha256:4516bb144672a8d95c7d27d821f2c2e38c5e09ae58798a64ed48fa97f671204b
Oct 19 10:32:51 argo027 bash[120312]: Copying blob sha256:7bdc22374973ada0ec630f7b0dfbea54b7de578745d94cc70b1f0cf9503eb462
Oct 19 10:32:58 argo027 bash[120312]: Copying config sha256:11098cc7689314c6d78fe07b41b9d2d2d0cbb7226c25112602f8b8979bd0efc8
Oct 19 10:32:58 argo027 bash[120312]: Writing manifest to image destination
Oct 19 10:32:58 argo027 bash[120312]: Storing signatures
Oct 19 10:32:58 argo027 podman[120312]:
Oct 19 10:32:58 argo027 podman[120312]: 2023-10-19 10:32:58.147415064 +0000 UTC m=+8.205160074 container create a84e824795699abf927e55abc25584b0cb92d6ef78f003f072f06159873a3721 (image=quay.io/ceph/nvmeof:v0.0>
Oct 19 10:32:58 argo027 podman[120312]: 2023-10-19 10:32:58.18147582 +0000 UTC m=+8.239220834 container init a84e824795699abf927e55abc25584b0cb92d6ef78f003f072f06159873a3721 (image=quay.io/ceph/nvmeof:v0.0.5,>
Oct 19 10:32:58 argo027 podman[120312]: 2023-10-19 10:32:58.185788398 +0000 UTC m=+8.243533399 container start a84e824795699abf927e55abc25584b0cb92d6ef78f003f072f06159873a3721 (image=quay.io/ceph/nvmeof:v0.0.>
Oct 19 10:32:58 argo027 podman[120312]: 2023-10-19 10:32:49.968661104 +0000 UTC m=+0.026406104 image pull  quay.io/ceph/nvmeof:v0.0.5
Oct 19 10:32:58 argo027 bash[120312]: a84e824795699abf927e55abc25584b0cb92d6ef78f003f072f06159873a3721
Oct 19 10:32:58 argo027 systemd[1]: Started Ceph nvmeof.rbd.argo027.jmrhgu for 27ca60e2-6e35-11ee-a2a5-ac1f6b0a1534.
Oct 19 10:32:58 argo027 ceph-27ca60e2-6e35-11ee-a2a5-ac1f6b0a1534-nvmeof-rbd-argo027-jmrhgu[120511]: INFO:control.server:Starting gateway client.nvmeof.rbd.argo027.jmrhgu
Oct 19 10:32:58 argo027 ceph-27ca60e2-6e35-11ee-a2a5-ac1f6b0a1534-nvmeof-rbd-argo027-jmrhgu[120511]: DEBUG:control.server:Starting serve
Oct 19 10:32:58 argo027 ceph-27ca60e2-6e35-11ee-a2a5-ac1f6b0a1534-nvmeof-rbd-argo027-jmrhgu[120511]: DEBUG:control.server:Configuring server client.nvmeof.rbd.argo027.jmrhgu
Oct 19 10:32:58 argo027 ceph-27ca60e2-6e35-11ee-a2a5-ac1f6b0a1534-nvmeof-rbd-argo027-jmrhgu[120511]: INFO:control.server:SPDK Target Path: /usr/local/bin/nvmf_tgt
Oct 19 10:32:58 argo027 ceph-27ca60e2-6e35-11ee-a2a5-ac1f6b0a1534-nvmeof-rbd-argo027-jmrhgu[120511]: INFO:control.server:SPDK Socket: /var/tmp/spdk.sock
Oct 19 10:32:58 argo027 ceph-27ca60e2-6e35-11ee-a2a5-ac1f6b0a1534-nvmeof-rbd-argo027-jmrhgu[120511]: INFO:control.server:Starting /usr/local/bin/nvmf_tgt -u -r /var/tmp/spdk.sock
Oct 19 10:32:58 argo027 ceph-27ca60e2-6e35-11ee-a2a5-ac1f6b0a1534-nvmeof-rbd-argo027-jmrhgu[120511]: INFO:control.server:Attempting to initialize SPDK: rpc_socket: /var/tmp/spdk.sock, conn_retries: 300, timeo>
Oct 19 10:32:58 argo027 ceph-27ca60e2-6e35-11ee-a2a5-ac1f6b0a1534-nvmeof-rbd-argo027-jmrhgu[120511]: INFO: Setting log level to WARN
Oct 19 10:32:58 argo027 ceph-27ca60e2-6e35-11ee-a2a5-ac1f6b0a1534-nvmeof-rbd-argo027-jmrhgu[120511]: INFO:JSONRPCClient(/var/tmp/spdk.sock):Setting log level to WARN
Oct 19 10:32:58 argo027 ceph-27ca60e2-6e35-11ee-a2a5-ac1f6b0a1534-nvmeof-rbd-argo027-jmrhgu[120511]: [2023-10-19 10:32:58.430567] Starting SPDK v23.01.1 / DPDK 22.11.1 initialization...
Oct 19 10:32:58 argo027 ceph-27ca60e2-6e35-11ee-a2a5-ac1f6b0a1534-nvmeof-rbd-argo027-jmrhgu[120511]: [2023-10-19 10:32:58.430643] [ DPDK EAL parameters: nvmf --no-shconf -c 0x1 --no-pci --huge-unlink --log-le>
Oct 19 10:32:58 argo027 ceph-27ca60e2-6e35-11ee-a2a5-ac1f6b0a1534-nvmeof-rbd-argo027-jmrhgu[120511]: TELEMETRY: No legacy callbacks, legacy socket not created
Oct 19 10:32:58 argo027 ceph-27ca60e2-6e35-11ee-a2a5-ac1f6b0a1534-nvmeof-rbd-argo027-jmrhgu[120511]: [2023-10-19 10:32:58.545273] app.c: 712:spdk_app_start: *NOTICE*: Total cores available: 1
Oct 19 10:32:58 argo027 ceph-27ca60e2-6e35-11ee-a2a5-ac1f6b0a1534-nvmeof-rbd-argo027-jmrhgu[120511]: [2023-10-19 10:32:58.632573] reactor.c: 926:reactor_run: *NOTICE*: Reactor started on core 0
Oct 19 10:32:58 argo027 ceph-27ca60e2-6e35-11ee-a2a5-ac1f6b0a1534-nvmeof-rbd-argo027-jmrhgu[120511]: [2023-10-19 10:32:58.680929] accel_sw.c: 681:sw_accel_module_init: *NOTICE*: Accel framework software modul>
Oct 19 10:32:58 argo027 ceph-27ca60e2-6e35-11ee-a2a5-ac1f6b0a1534-nvmeof-rbd-argo027-jmrhgu[120511]: DEBUG:control.server:create_transport: tcp options: {"in_capsule_data_size": 8192, "max_io_qpairs_per_ctrlr>
Oct 19 10:32:58 argo027 ceph-27ca60e2-6e35-11ee-a2a5-ac1f6b0a1534-nvmeof-rbd-argo027-jmrhgu[120511]: [2023-10-19 10:32:58.823734] tcp.c: 629:nvmf_tcp_create: *NOTICE*: *** TCP Transport Init ***
Oct 19 10:32:58 argo027 ceph-27ca60e2-6e35-11ee-a2a5-ac1f6b0a1534-nvmeof-rbd-argo027-jmrhgu[120511]: INFO:control.server:Starting ceph nvmeof discovery service
Oct 19 10:32:58 argo027 ceph-27ca60e2-6e35-11ee-a2a5-ac1f6b0a1534-nvmeof-rbd-argo027-jmrhgu[120511]: ERROR:control.server:GatewayServer exception occurred:
Oct 19 10:32:58 argo027 ceph-27ca60e2-6e35-11ee-a2a5-ac1f6b0a1534-nvmeof-rbd-argo027-jmrhgu[120511]: Traceback (most recent call last):
Oct 19 10:32:58 argo027 ceph-27ca60e2-6e35-11ee-a2a5-ac1f6b0a1534-nvmeof-rbd-argo027-jmrhgu[120511]:   File "/src/control/__main__.py", line 35, in <module>
Oct 19 10:32:58 argo027 ceph-27ca60e2-6e35-11ee-a2a5-ac1f6b0a1534-nvmeof-rbd-argo027-jmrhgu[120511]:     gateway.serve()
Oct 19 10:32:58 argo027 ceph-27ca60e2-6e35-11ee-a2a5-ac1f6b0a1534-nvmeof-rbd-argo027-jmrhgu[120511]:   File "/src/control/server.py", line 108, in serve
Oct 19 10:32:58 argo027 ceph-27ca60e2-6e35-11ee-a2a5-ac1f6b0a1534-nvmeof-rbd-argo027-jmrhgu[120511]:     self._start_discovery_service()
Oct 19 10:32:58 argo027 ceph-27ca60e2-6e35-11ee-a2a5-ac1f6b0a1534-nvmeof-rbd-argo027-jmrhgu[120511]:   File "/src/control/server.py", line 148, in _start_discovery_service
Oct 19 10:32:58 argo027 ceph-27ca60e2-6e35-11ee-a2a5-ac1f6b0a1534-nvmeof-rbd-argo027-jmrhgu[120511]:     DiscoveryService(self.config).start_service()
Oct 19 10:32:58 argo027 ceph-27ca60e2-6e35-11ee-a2a5-ac1f6b0a1534-nvmeof-rbd-argo027-jmrhgu[120511]:   File "/src/control/discovery.py", line 319, in __init__
Oct 19 10:32:58 argo027 ceph-27ca60e2-6e35-11ee-a2a5-ac1f6b0a1534-nvmeof-rbd-argo027-jmrhgu[120511]:     log_level = self.config.get("discovery", "debug")
Oct 19 10:32:58 argo027 ceph-27ca60e2-6e35-11ee-a2a5-ac1f6b0a1534-nvmeof-rbd-argo027-jmrhgu[120511]:   File "/src/control/config.py", line 26, in get
Oct 19 10:32:58 argo027 ceph-27ca60e2-6e35-11ee-a2a5-ac1f6b0a1534-nvmeof-rbd-argo027-jmrhgu[120511]:     return self.config.get(section, param)
...skipping...
Oct 19 10:33:46 argo027 ceph-27ca60e2-6e35-11ee-a2a5-ac1f6b0a1534-nvmeof-rbd-argo027-jmrhgu[121950]:     self.ioctx.operate_write_op(write_op, self.omap_name)
Oct 19 10:33:46 argo027 ceph-27ca60e2-6e35-11ee-a2a5-ac1f6b0a1534-nvmeof-rbd-argo027-jmrhgu[121950]:   File "/src/control/server.py", line 46, in sigchld_handler
Oct 19 10:33:46 argo027 ceph-27ca60e2-6e35-11ee-a2a5-ac1f6b0a1534-nvmeof-rbd-argo027-jmrhgu[121950]:     raise SystemExit(f"Gateway subprocess terminated {pid=} {exit_code=}")
Oct 19 10:33:46 argo027 ceph-27ca60e2-6e35-11ee-a2a5-ac1f6b0a1534-nvmeof-rbd-argo027-jmrhgu[121950]: SystemExit: Gateway subprocess terminated pid=6 exit_code=1
Oct 19 10:33:46 argo027 ceph-27ca60e2-6e35-11ee-a2a5-ac1f6b0a1534-nvmeof-rbd-argo027-jmrhgu[121950]: INFO:control.server:Aborting SPDK(client.nvmeof.rbd.argo027.jmrhgu) pid 3...
Oct 19 10:33:47 argo027 ceph-27ca60e2-6e35-11ee-a2a5-ac1f6b0a1534-nvmeof-rbd-argo027-jmrhgu[121950]: INFO:control.server:Terminating discovery service...
Oct 19 10:33:47 argo027 ceph-27ca60e2-6e35-11ee-a2a5-ac1f6b0a1534-nvmeof-rbd-argo027-jmrhgu[121950]: Traceback (most recent call last):
Oct 19 10:33:47 argo027 ceph-27ca60e2-6e35-11ee-a2a5-ac1f6b0a1534-nvmeof-rbd-argo027-jmrhgu[121950]:   File "/src/control/state.py", line 187, in __init__
Oct 19 10:33:47 argo027 ceph-27ca60e2-6e35-11ee-a2a5-ac1f6b0a1534-nvmeof-rbd-argo027-jmrhgu[121950]:     self.ioctx.operate_write_op(write_op, self.omap_name)
Oct 19 10:33:47 argo027 ceph-27ca60e2-6e35-11ee-a2a5-ac1f6b0a1534-nvmeof-rbd-argo027-jmrhgu[121950]:   File "rados.pyx", line 3654, in rados.Ioctx.operate_write_op
Oct 19 10:33:47 argo027 ceph-27ca60e2-6e35-11ee-a2a5-ac1f6b0a1534-nvmeof-rbd-argo027-jmrhgu[121950]: rados.ObjectExists: [errno 17] RADOS object exists (Failed to operate write op for oid nvmeof.state)
Oct 19 10:33:47 argo027 ceph-27ca60e2-6e35-11ee-a2a5-ac1f6b0a1534-nvmeof-rbd-argo027-jmrhgu[121950]:
Oct 19 10:33:47 argo027 ceph-27ca60e2-6e35-11ee-a2a5-ac1f6b0a1534-nvmeof-rbd-argo027-jmrhgu[121950]: During handling of the above exception, another exception occurred:
Oct 19 10:33:47 argo027 ceph-27ca60e2-6e35-11ee-a2a5-ac1f6b0a1534-nvmeof-rbd-argo027-jmrhgu[121950]:
Oct 19 10:33:47 argo027 ceph-27ca60e2-6e35-11ee-a2a5-ac1f6b0a1534-nvmeof-rbd-argo027-jmrhgu[121950]: Traceback (most recent call last):
Oct 19 10:33:47 argo027 ceph-27ca60e2-6e35-11ee-a2a5-ac1f6b0a1534-nvmeof-rbd-argo027-jmrhgu[121950]:   File "/src/control/__main__.py", line 35, in <module>
Oct 19 10:33:47 argo027 ceph-27ca60e2-6e35-11ee-a2a5-ac1f6b0a1534-nvmeof-rbd-argo027-jmrhgu[121950]:     gateway.serve()
Oct 19 10:33:47 argo027 ceph-27ca60e2-6e35-11ee-a2a5-ac1f6b0a1534-nvmeof-rbd-argo027-jmrhgu[121950]:   File "/src/control/server.py", line 111, in serve
Oct 19 10:33:47 argo027 ceph-27ca60e2-6e35-11ee-a2a5-ac1f6b0a1534-nvmeof-rbd-argo027-jmrhgu[121950]:     omap_state = OmapGatewayState(self.config)
Oct 19 10:33:47 argo027 ceph-27ca60e2-6e35-11ee-a2a5-ac1f6b0a1534-nvmeof-rbd-argo027-jmrhgu[121950]:   File "/src/control/state.py", line 187, in __init__
Oct 19 10:33:47 argo027 ceph-27ca60e2-6e35-11ee-a2a5-ac1f6b0a1534-nvmeof-rbd-argo027-jmrhgu[121950]:     self.ioctx.operate_write_op(write_op, self.omap_name)
Oct 19 10:33:47 argo027 ceph-27ca60e2-6e35-11ee-a2a5-ac1f6b0a1534-nvmeof-rbd-argo027-jmrhgu[121950]:   File "/src/control/server.py", line 46, in sigchld_handler
Oct 19 10:33:47 argo027 ceph-27ca60e2-6e35-11ee-a2a5-ac1f6b0a1534-nvmeof-rbd-argo027-jmrhgu[121950]:     raise SystemExit(f"Gateway subprocess terminated {pid=} {exit_code=}")
Oct 19 10:33:47 argo027 ceph-27ca60e2-6e35-11ee-a2a5-ac1f6b0a1534-nvmeof-rbd-argo027-jmrhgu[121950]: SystemExit: Gateway subprocess terminated pid=6 exit_code=1
Oct 19 10:33:47 argo027 ceph-27ca60e2-6e35-11ee-a2a5-ac1f6b0a1534-nvmeof-rbd-argo027-jmrhgu[121950]:
Oct 19 10:33:47 argo027 ceph-27ca60e2-6e35-11ee-a2a5-ac1f6b0a1534-nvmeof-rbd-argo027-jmrhgu[121950]: During handling of the above exception, another exception occurred:
Oct 19 10:33:47 argo027 ceph-27ca60e2-6e35-11ee-a2a5-ac1f6b0a1534-nvmeof-rbd-argo027-jmrhgu[121950]:
Oct 19 10:33:47 argo027 ceph-27ca60e2-6e35-11ee-a2a5-ac1f6b0a1534-nvmeof-rbd-argo027-jmrhgu[121950]: Traceback (most recent call last):
Oct 19 10:33:47 argo027 ceph-27ca60e2-6e35-11ee-a2a5-ac1f6b0a1534-nvmeof-rbd-argo027-jmrhgu[121950]:   File "/usr/lib64/python3.9/runpy.py", line 197, in _run_module_as_main
Oct 19 10:33:47 argo027 ceph-27ca60e2-6e35-11ee-a2a5-ac1f6b0a1534-nvmeof-rbd-argo027-jmrhgu[121950]:     return _run_code(code, main_globals, None,
Oct 19 10:33:47 argo027 ceph-27ca60e2-6e35-11ee-a2a5-ac1f6b0a1534-nvmeof-rbd-argo027-jmrhgu[121950]:   File "/usr/lib64/python3.9/runpy.py", line 87, in _run_code
Oct 19 10:33:47 argo027 ceph-27ca60e2-6e35-11ee-a2a5-ac1f6b0a1534-nvmeof-rbd-argo027-jmrhgu[121950]:     exec(code, run_globals)
Oct 19 10:33:47 argo027 ceph-27ca60e2-6e35-11ee-a2a5-ac1f6b0a1534-nvmeof-rbd-argo027-jmrhgu[121950]:   File "/src/control/__main__.py", line 36, in <module>
Oct 19 10:33:47 argo027 ceph-27ca60e2-6e35-11ee-a2a5-ac1f6b0a1534-nvmeof-rbd-argo027-jmrhgu[121950]:     gateway.keep_alive()
Oct 19 10:33:47 argo027 ceph-27ca60e2-6e35-11ee-a2a5-ac1f6b0a1534-nvmeof-rbd-argo027-jmrhgu[121950]:   File "/src/control/server.py", line 93, in __exit__
Oct 19 10:33:47 argo027 ceph-27ca60e2-6e35-11ee-a2a5-ac1f6b0a1534-nvmeof-rbd-argo027-jmrhgu[121950]:     self._stop_discovery()
Oct 19 10:33:47 argo027 ceph-27ca60e2-6e35-11ee-a2a5-ac1f6b0a1534-nvmeof-rbd-argo027-jmrhgu[121950]:   File "/src/control/server.py", line 278, in _stop_discovery
Oct 19 10:33:47 argo027 ceph-27ca60e2-6e35-11ee-a2a5-ac1f6b0a1534-nvmeof-rbd-argo027-jmrhgu[121950]:     os.kill(self.discovery_pid, signal.SIGINT)
Oct 19 10:33:47 argo027 ceph-27ca60e2-6e35-11ee-a2a5-ac1f6b0a1534-nvmeof-rbd-argo027-jmrhgu[121950]: ProcessLookupError: [Errno 3] No such process
Oct 19 10:33:47 argo027 podman[121981]: 2023-10-19 10:33:47.148258938 +0000 UTC m=+0.030710761 container died 8a38502f5a7d5393a028d8e93eecd39d97c047deb0050dd9961b0eb7c3b0653d (image=quay.io/ceph/nvmeof:v0.0.5>
Oct 19 10:33:47 argo027 podman[121981]: 2023-10-19 10:33:47.164905518 +0000 UTC m=+0.047357337 container remove 8a38502f5a7d5393a028d8e93eecd39d97c047deb0050dd9961b0eb7c3b0653d (image=quay.io/ceph/nvmeof:v0.0>
Oct 19 10:33:47 argo027 systemd[1]: ceph-27ca60e2-6e35-11ee-a2a5-ac1f6b0a1534@nvmeof.rbd.argo027.jmrhgu.service: Main process exited, code=exited, status=1/FAILURE
Oct 19 10:33:47 argo027 systemd[1]: ceph-27ca60e2-6e35-11ee-a2a5-ac1f6b0a1534@nvmeof.rbd.argo027.jmrhgu.service: Failed with result 'exit-code'.
Oct 19 10:33:47 argo027 systemd[1]: ceph-27ca60e2-6e35-11ee-a2a5-ac1f6b0a1534@nvmeof.rbd.argo027.jmrhgu.service: Consumed 1.566s CPU time.
Oct 19 10:33:57 argo027 systemd[1]: ceph-27ca60e2-6e35-11ee-a2a5-ac1f6b0a1534@nvmeof.rbd.argo027.jmrhgu.service: Scheduled restart job, restart counter is at 5.
Oct 19 10:33:57 argo027 systemd[1]: Stopped Ceph nvmeof.rbd.argo027.jmrhgu for 27ca60e2-6e35-11ee-a2a5-ac1f6b0a1534.
Oct 19 10:33:57 argo027 systemd[1]: ceph-27ca60e2-6e35-11ee-a2a5-ac1f6b0a1534@nvmeof.rbd.argo027.jmrhgu.service: Consumed 1.566s CPU time.
Oct 19 10:33:57 argo027 systemd[1]: ceph-27ca60e2-6e35-11ee-a2a5-ac1f6b0a1534@nvmeof.rbd.argo027.jmrhgu.service: Start request repeated too quickly.
Oct 19 10:33:57 argo027 systemd[1]: ceph-27ca60e2-6e35-11ee-a2a5-ac1f6b0a1534@nvmeof.rbd.argo027.jmrhgu.service: Failed with result 'exit-code'.
Oct 19 10:33:57 argo027 systemd[1]: Failed to start Ceph nvmeof.rbd.argo027.jmrhgu for 27ca60e2-6e35-11ee-a2a5-ac1f6b0a1534.

Also available in: Atom PDF