Bug #47787
mgr/nfs: exercise host-level HA of NFS-Ganesha by killing the process
0%
Description
In my own testing, the process is not respawned and the NFS client hangs. I suspect there's some changes necessary to cephadm to make it restart.
History
#1 Updated by Patrick Donnelly almost 3 years ago
- Category set to Correctness/Safety
- Status changed from New to Triaged
- Assignee set to Varsha Rao
- Component(FS) qa-suite added
- Labels (FS) crash, qa added
#2 Updated by Varsha Rao almost 3 years ago
Patrick Donnelly wrote:
In my own testing, the process is not respawned and the NFS client hangs. I suspect there's some changes necessary to cephadm to make it restart.
What commands did you use for testing ? Please share the logs too.
#3 Updated by Patrick Donnelly almost 3 years ago
Varsha Rao wrote:
Patrick Donnelly wrote:
In my own testing, the process is not respawned and the NFS client hangs. I suspect there's some changes necessary to cephadm to make it restart.
What commands did you use for testing ? Please share the logs too.
I setup a Ceph cluster and NFS cluster in Linode. As for checking host-level HA, I logged into the machine with the NFS-Ganesha server and killed it (SIGTERM or SIGKILL, can't remember which).
#4 Updated by Varsha Rao almost 3 years ago
I can reproduce it with SIGTERM
# ../src/cephadm/cephadm ls [ { "style": "cephadm:v1", "name": "nfs.vstart.varsha", "fsid": "7bb24e16-dc1b-405b-a820-a37038d9fde5", "systemd_unit": "ceph-7bb24e16-dc1b-405b-a820-a37038d9fde5@nfs.vstart.varsha", "enabled": true, "state": "running", "container_id": "5353f7212cbdf8cc0bbc2638d8b3276a5c7e12c5a868d922ac40aeb2ff4ec85b", "container_image_name": "docker.io/ceph/daemon-base@sha256:619a7a460abdf90543c4f8578c047a8b5dfe544c372b654d6878f8ddba2fff86", "container_image_id": "d3ab710713d40333ae47b6aebbc081b88e3dcd4df745a26d496b5ba6a491d159", "version": "3.3", "started": "2020-12-16T09:37:45.017297", "created": "2020-12-16T09:37:45.084521", "deployed": "2020-12-16T09:37:43.791427", "configured": "2020-12-16T09:37:45.084521" }, { "style": "cephadm:v1", "name": "crash.varsha", "fsid": "7bb24e16-dc1b-405b-a820-a37038d9fde5", "systemd_unit": "ceph-7bb24e16-dc1b-405b-a820-a37038d9fde5@crash.varsha", "enabled": true, "state": "running", "container_id": "654111b86be25f3ec6c9f4b8b07a4df999c5517094e8ad0a799e77ea568453c2", "container_image_name": "docker.io/ceph/daemon-base@sha256:619a7a460abdf90543c4f8578c047a8b5dfe544c372b654d6878f8ddba2fff86", "container_image_id": "d3ab710713d40333ae47b6aebbc081b88e3dcd4df745a26d496b5ba6a491d159", "version": "16.0.0-8346-g8a74a937", "started": "2020-12-16T09:37:21.671885", "created": "2020-12-16T09:37:21.783835", "deployed": "2020-12-16T09:37:20.388734", "configured": "2020-12-16T09:37:21.783835" } ] # ps aux | grep nfs root 16819 0.0 0.0 80492 1916 ? Ssl 15:07 0:00 /usr/bin/conmon --api-version 1 -c 5353f7212cbdf8cc0bbc2638d8b3276a5c7e12c5a868d922ac40aeb2ff4ec85b -u 5353f7212cbdf8cc0bbc2638d8b3276a5c7e12c5a868d922ac40aeb2ff4ec85b -r /usr/bin/crun -b /var/lib/containers/storage/overlay-containers/5353f7212cbdf8cc0bbc2638d8b3276a5c7e12c5a868d922ac40aeb2ff4ec85b/userdata -p /var/run/containers/storage/overlay-containers/5353f7212cbdf8cc0bbc2638d8b3276a5c7e12c5a868d922ac40aeb2ff4ec85b/userdata/pidfile -n ceph-7bb24e16-dc1b-405b-a820-a37038d9fde5-nfs.vstart.varsha --exit-dir /var/run/libpod/exits --socket-dir-path /var/run/libpod/socket -s -l k8s-file:/var/lib/containers/storage/overlay-containers/5353f7212cbdf8cc0bbc2638d8b3276a5c7e12c5a868d922ac40aeb2ff4ec85b/userdata/ctr.log --log-level error --runtime-arg --log-format=json --runtime-arg --log --runtime-arg=/var/run/containers/storage/overlay-containers/5353f7212cbdf8cc0bbc2638d8b3276a5c7e12c5a868d922ac40aeb2ff4ec85b/userdata/oci-log --conmon-pidfile /run/ceph-7bb24e16-dc1b-405b-a820-a37038d9fde5@nfs.vstart.varsha.service-pid --exit-command /usr/bin/podman --exit-command-arg --root --exit-command-arg /var/lib/containers/storage --exit-command-arg --runroot --exit-command-arg /var/run/containers/storage --exit-command-arg --log-level --exit-command-arg error --exit-command-arg --cgroup-manager --exit-command-arg systemd --exit-command-arg --tmpdir --exit-command-arg /var/run/libpod --exit-command-arg --runtime --exit-command-arg crun --exit-command-arg --storage-driver --exit-command-arg overlay --exit-command-arg --storage-opt --exit-command-arg overlay.mountopt=nodev,metacopy=on --exit-command-arg --events-backend --exit-command-arg journald --exit-command-arg container --exit-command-arg cleanup --exit-command-arg --rm --exit-command-arg 5353f7212cbdf8cc0bbc2638d8b3276a5c7e12c5a868d922ac40aeb2ff4ec85b root 16822 0.8 0.1 2538936 58812 ? Ssl 15:07 0:00 /usr/bin/ganesha.nfsd -F -L STDERR -N NIV_EVENT root 17103 0.0 0.0 216088 648 pts/0 S+ 15:07 0:00 grep --color=auto nfs # kill -15 16819 # ps aux | grep nfs root 17608 0.0 0.0 216088 648 pts/0 S+ 15:10 0:00 grep --color=auto nfs # ../src/cephadm/cephadm ls [ { "style": "cephadm:v1", "name": "nfs.vstart.varsha", "fsid": "7bb24e16-dc1b-405b-a820-a37038d9fde5", "systemd_unit": "ceph-7bb24e16-dc1b-405b-a820-a37038d9fde5@nfs.vstart.varsha", "enabled": true, "state": "stopped", "container_id": null, "container_image_name": "docker.io/ceph/daemon-base@sha256:619a7a460abdf90543c4f8578c047a8b5dfe544c372b654d6878f8ddba2fff86", "container_image_id": null, "version": null, "started": null, "created": "2020-12-16T09:37:45.084521", "deployed": "2020-12-16T09:37:43.791427", "configured": "2020-12-16T09:37:45.084521" }, { "style": "cephadm:v1", "name": "crash.varsha", "fsid": "7bb24e16-dc1b-405b-a820-a37038d9fde5", "systemd_unit": "ceph-7bb24e16-dc1b-405b-a820-a37038d9fde5@crash.varsha", "enabled": true, "state": "running", "container_id": "654111b86be25f3ec6c9f4b8b07a4df999c5517094e8ad0a799e77ea568453c2", "container_image_name": "docker.io/ceph/daemon-base@sha256:619a7a460abdf90543c4f8578c047a8b5dfe544c372b654d6878f8ddba2fff86", "container_image_id": "d3ab710713d40333ae47b6aebbc081b88e3dcd4df745a26d496b5ba6a491d159", "version": "16.0.0-8346-g8a74a937", "started": "2020-12-16T09:37:21.671885", "created": "2020-12-16T09:37:21.783835", "deployed": "2020-12-16T09:37:20.388734", "configured": "2020-12-16T09:37:21.783835" } ]
Cephadm nfs service logs
# journalctl -u ceph-7bb24e16-dc1b-405b-a820-a37038d9fde5@nfs.vstart.varsha -- Logs begin at Thu 2020-04-09 10:39:57 IST, end at Wed 2020-12-16 15:10:40 IST. -- Dec 16 15:07:44 varsha systemd[1]: Starting Ceph nfs.vstart.varsha for 7bb24e16-dc1b-405b-a820-a37038d9fde5... Dec 16 15:07:44 varsha podman[16502]: Error: no container with name or ID ceph-7bb24e16-dc1b-405b-a820-a37038d9fde5-nfs.vstart.varsha found: no such container Dec 16 15:07:44 varsha bash[16538]: Error: Failed to evict container: "": Failed to find container "ceph-7bb24e16-dc1b-405b-a820-a37038d9fde5-nfs.vstart.varsha-grace-add" in state: no container with name or ID ceph-7bb24e16-dc1b-405b-a82> Dec 16 15:07:44 varsha bash[16574]: Error: no container with ID or name "ceph-7bb24e16-dc1b-405b-a820-a37038d9fde5-nfs.vstart.varsha-grace-add" found: no such container Dec 16 15:07:44 varsha podman[16610]: 2020-12-16 15:07:44.685252731 +0530 IST m=+0.062146930 container create f09230047f8e5515a4a51d29bee6d0a4ea0d2898d72677cd8b58ce988ec28683 (image=docker.io/ceph/daemon-base@sha256:619a7a460abdf90543c4f> Dec 16 15:07:44 varsha oci-umount[16652]: umounthook <debug>: prestart container_id:f09230047f8e rootfs:/var/lib/containers/storage/overlay/576b93dfa86c4c9caeed59ab08637f10d18adf7dddebdb88eaa7b9dab33d747f/merged Dec 16 15:07:44 varsha podman[16610]: 2020-12-16 15:07:44.736042311 +0530 IST m=+0.112936523 container init f09230047f8e5515a4a51d29bee6d0a4ea0d2898d72677cd8b58ce988ec28683 (image=docker.io/ceph/daemon-base@sha256:619a7a460abdf90543c4f85> Dec 16 15:07:44 varsha podman[16610]: 2020-12-16 15:07:44.740910942 +0530 IST m=+0.117805140 container start f09230047f8e5515a4a51d29bee6d0a4ea0d2898d72677cd8b58ce988ec28683 (image=docker.io/ceph/daemon-base@sha256:619a7a460abdf90543c4f8> Dec 16 15:07:44 varsha podman[16610]: 2020-12-16 15:07:44.741011447 +0530 IST m=+0.117905677 container attach f09230047f8e5515a4a51d29bee6d0a4ea0d2898d72677cd8b58ce988ec28683 (image=docker.io/ceph/daemon-base@sha256:619a7a460abdf90543c4f> Dec 16 15:07:44 varsha podman[16610]: 2020-12-16 15:07:44.820068306 +0530 IST m=+0.196962531 container died f09230047f8e5515a4a51d29bee6d0a4ea0d2898d72677cd8b58ce988ec28683 (image=docker.io/ceph/daemon-base@sha256:619a7a460abdf90543c4f85> Dec 16 15:07:44 varsha podman[16610]: 2020-12-16 15:07:44.846952238 +0530 IST m=+0.223846458 container remove f09230047f8e5515a4a51d29bee6d0a4ea0d2898d72677cd8b58ce988ec28683 (image=docker.io/ceph/daemon-base@sha256:619a7a460abdf90543c4f> Dec 16 15:07:44 varsha bash[16708]: Error: Failed to evict container: "": Failed to find container "ceph-7bb24e16-dc1b-405b-a820-a37038d9fde5-nfs.vstart.varsha" in state: no container with name or ID ceph-7bb24e16-dc1b-405b-a820-a37038d9> Dec 16 15:07:44 varsha bash[16745]: Error: no container with ID or name "ceph-7bb24e16-dc1b-405b-a820-a37038d9fde5-nfs.vstart.varsha" found: no such container Dec 16 15:07:45 varsha podman[16781]: 2020-12-16 15:07:45.027528239 +0530 IST m=+0.058077583 container create 5353f7212cbdf8cc0bbc2638d8b3276a5c7e12c5a868d922ac40aeb2ff4ec85b (image=docker.io/ceph/daemon-base@sha256:619a7a460abdf90543c4f> Dec 16 15:07:45 varsha oci-umount[16823]: umounthook <debug>: prestart container_id:5353f7212cbd rootfs:/var/lib/containers/storage/overlay/7115f6b012aa6df8cc3986578ab4fee406deb27a3ddd9603d96bcc1b02f10a4e/merged Dec 16 15:07:45 varsha podman[16781]: 2020-12-16 15:07:45.073029763 +0530 IST m=+0.103579117 container init 5353f7212cbdf8cc0bbc2638d8b3276a5c7e12c5a868d922ac40aeb2ff4ec85b (image=docker.io/ceph/daemon-base@sha256:619a7a460abdf90543c4f85> Dec 16 15:07:45 varsha podman[16781]: 2020-12-16 15:07:45.077517381 +0530 IST m=+0.108066734 container start 5353f7212cbdf8cc0bbc2638d8b3276a5c7e12c5a868d922ac40aeb2ff4ec85b (image=docker.io/ceph/daemon-base@sha256:619a7a460abdf90543c4f8> Dec 16 15:07:45 varsha bash[16781]: 5353f7212cbdf8cc0bbc2638d8b3276a5c7e12c5a868d922ac40aeb2ff4ec85b Dec 16 15:07:45 varsha systemd[1]: Started Ceph nfs.vstart.varsha for 7bb24e16-dc1b-405b-a820-a37038d9fde5. Dec 16 15:10:33 varsha podman[17353]: 2020-12-16 15:10:33.161061738 +0530 IST m=+0.049255818 container died 5353f7212cbdf8cc0bbc2638d8b3276a5c7e12c5a868d922ac40aeb2ff4ec85b (image=docker.io/ceph/daemon-base@sha256:619a7a460abdf90543c4f85> Dec 16 15:10:33 varsha podman[17353]: 2020-12-16 15:10:33.183976792 +0530 IST m=+0.072170853 container remove 5353f7212cbdf8cc0bbc2638d8b3276a5c7e12c5a868d922ac40aeb2ff4ec85b (image=docker.io/ceph/daemon-base@sha256:619a7a460abdf90543c4f> Dec 16 15:10:33 varsha podman[17392]: Error: no container with name or ID ceph-7bb24e16-dc1b-405b-a820-a37038d9fde5-nfs.vstart.varsha found: no such container Dec 16 15:10:33 varsha bash[17430]: Error: Failed to evict container: "": Failed to find container "ceph-7bb24e16-dc1b-405b-a820-a37038d9fde5-nfs.vstart.varsha-grace-remove" in state: no container with name or ID ceph-7bb24e16-dc1b-405b-> Dec 16 15:10:33 varsha bash[17467]: Error: no container with ID or name "ceph-7bb24e16-dc1b-405b-a820-a37038d9fde5-nfs.vstart.varsha-grace-remove" found: no such container Dec 16 15:10:33 varsha podman[17503]: 2020-12-16 15:10:33.458966195 +0530 IST m=+0.056219420 container create 195321435e69e8a0305a3dc14cd8c4fe72de5c517f6bf1d0c02864444727c187 (image=docker.io/ceph/daemon-base@sha256:619a7a460abdf90543c4f> Dec 16 15:10:33 varsha oci-umount[17545]: umounthook <debug>: prestart container_id:195321435e69 rootfs:/var/lib/containers/storage/overlay/fce9b03e35c2602bc5fa145895939879cff3023e9505aec10d334dcd15ab3351/merged Dec 16 15:10:33 varsha podman[17503]: 2020-12-16 15:10:33.548892251 +0530 IST m=+0.146145505 container init 195321435e69e8a0305a3dc14cd8c4fe72de5c517f6bf1d0c02864444727c187 (image=docker.io/ceph/daemon-base@sha256:619a7a460abdf90543c4f85> Dec 16 15:10:33 varsha podman[17503]: 2020-12-16 15:10:33.554883947 +0530 IST m=+0.152137178 container start 195321435e69e8a0305a3dc14cd8c4fe72de5c517f6bf1d0c02864444727c187 (image=docker.io/ceph/daemon-base@sha256:619a7a460abdf90543c4f8> Dec 16 15:10:33 varsha podman[17503]: 2020-12-16 15:10:33.555018467 +0530 IST m=+0.152271725 container attach 195321435e69e8a0305a3dc14cd8c4fe72de5c517f6bf1d0c02864444727c187 (image=docker.io/ceph/daemon-base@sha256:619a7a460abdf90543c4f> Dec 16 15:10:33 varsha podman[17503]: 2020-12-16 15:10:33.627236134 +0530 IST m=+0.224489369 container died 195321435e69e8a0305a3dc14cd8c4fe72de5c517f6bf1d0c02864444727c187 (image=docker.io/ceph/daemon-base@sha256:619a7a460abdf90543c4f85> Dec 16 15:10:33 varsha podman[17503]: 2020-12-16 15:10:33.653443804 +0530 IST m=+0.250697040 container remove 195321435e69e8a0305a3dc14cd8c4fe72de5c517f6bf1d0c02864444727c187 (image=docker.io/ceph/daemon-base@sha256:619a7a460abdf90543c4f> Dec 16 15:10:33 varsha systemd[1]: ceph-7bb24e16-dc1b-405b-a820-a37038d9fde5@nfs.vstart.varsha.service: Succeeded. Dec 16 15:10:33 varsha systemd[1]: ceph-7bb24e16-dc1b-405b-a820-a37038d9fde5@nfs.vstart.varsha.service: Consumed 1.315s CPU time.
If ganesha daemon is killed directly
# journalctl -u ceph-2b607da9-5bd2-4a99-bcb5-6eece472f448@nfs.vstart.varsha -- Logs begin at Thu 2020-04-09 10:39:57 IST, end at Wed 2020-12-16 15:14:08 IST. -- Dec 16 15:13:19 varsha systemd[1]: Starting Ceph nfs.vstart.varsha for 2b607da9-5bd2-4a99-bcb5-6eece472f448... Dec 16 15:13:20 varsha podman[21994]: Error: no container with name or ID ceph-2b607da9-5bd2-4a99-bcb5-6eece472f448-nfs.vstart.varsha found: no such container Dec 16 15:13:20 varsha bash[22031]: Error: Failed to evict container: "": Failed to find container "ceph-2b607da9-5bd2-4a99-bcb5-6eece472f448-nfs.vstart.varsha-grace-add" in state: no container with name or ID ceph-2b607da9-5bd2-4a99-bcb> Dec 16 15:13:20 varsha bash[22067]: Error: no container with ID or name "ceph-2b607da9-5bd2-4a99-bcb5-6eece472f448-nfs.vstart.varsha-grace-add" found: no such container Dec 16 15:13:20 varsha podman[22103]: 2020-12-16 15:13:20.283098258 +0530 IST m=+0.058217748 container create 585eb21151124fb2d646ae5ff98d0ee6a1b04b0f6ad617a3e35b35194c6ccc08 (image=docker.io/ceph/daemon-base@sha256:619a7a460abdf90543c4f> Dec 16 15:13:20 varsha oci-umount[22147]: umounthook <debug>: prestart container_id:585eb2115112 rootfs:/var/lib/containers/storage/overlay/8c3c66e6ac785933a2791dc3644a97a39932be6074a1ae6322c2968fcee7dab4/merged Dec 16 15:13:20 varsha podman[22103]: 2020-12-16 15:13:20.333440932 +0530 IST m=+0.108560425 container init 585eb21151124fb2d646ae5ff98d0ee6a1b04b0f6ad617a3e35b35194c6ccc08 (image=docker.io/ceph/daemon-base@sha256:619a7a460abdf90543c4f85> Dec 16 15:13:20 varsha podman[22103]: 2020-12-16 15:13:20.338166369 +0530 IST m=+0.113285857 container start 585eb21151124fb2d646ae5ff98d0ee6a1b04b0f6ad617a3e35b35194c6ccc08 (image=docker.io/ceph/daemon-base@sha256:619a7a460abdf90543c4f8> Dec 16 15:13:20 varsha podman[22103]: 2020-12-16 15:13:20.338222409 +0530 IST m=+0.113341921 container attach 585eb21151124fb2d646ae5ff98d0ee6a1b04b0f6ad617a3e35b35194c6ccc08 (image=docker.io/ceph/daemon-base@sha256:619a7a460abdf90543c4f> Dec 16 15:13:20 varsha podman[22103]: 2020-12-16 15:13:20.411280307 +0530 IST m=+0.186399818 container died 585eb21151124fb2d646ae5ff98d0ee6a1b04b0f6ad617a3e35b35194c6ccc08 (image=docker.io/ceph/daemon-base@sha256:619a7a460abdf90543c4f85> Dec 16 15:13:20 varsha podman[22103]: 2020-12-16 15:13:20.500731722 +0530 IST m=+0.275851220 container remove 585eb21151124fb2d646ae5ff98d0ee6a1b04b0f6ad617a3e35b35194c6ccc08 (image=docker.io/ceph/daemon-base@sha256:619a7a460abdf90543c4f> Dec 16 15:13:20 varsha bash[22209]: Error: Failed to evict container: "": Failed to find container "ceph-2b607da9-5bd2-4a99-bcb5-6eece472f448-nfs.vstart.varsha" in state: no container with name or ID ceph-2b607da9-5bd2-4a99-bcb5-6eece472> Dec 16 15:13:20 varsha bash[22250]: Error: no container with ID or name "ceph-2b607da9-5bd2-4a99-bcb5-6eece472f448-nfs.vstart.varsha" found: no such container Dec 16 15:13:20 varsha podman[22307]: 2020-12-16 15:13:20.693977322 +0530 IST m=+0.063773732 container create 3418fa440a58bfffb242c9308fe9480e80b87a2730574cdfeaa23edbc3599178 (image=docker.io/ceph/daemon-base@sha256:619a7a460abdf90543c4f> Dec 16 15:13:20 varsha oci-umount[22355]: umounthook <debug>: prestart container_id:3418fa440a58 rootfs:/var/lib/containers/storage/overlay/248c3d063b83f3cef8cf28b2bd20df395496082ddd06d05cd60d02b6426fd19f/merged Dec 16 15:13:20 varsha podman[22307]: 2020-12-16 15:13:20.739816224 +0530 IST m=+0.109612624 container init 3418fa440a58bfffb242c9308fe9480e80b87a2730574cdfeaa23edbc3599178 (image=docker.io/ceph/daemon-base@sha256:619a7a460abdf90543c4f85> Dec 16 15:13:20 varsha podman[22307]: 2020-12-16 15:13:20.744074812 +0530 IST m=+0.113871207 container start 3418fa440a58bfffb242c9308fe9480e80b87a2730574cdfeaa23edbc3599178 (image=docker.io/ceph/daemon-base@sha256:619a7a460abdf90543c4f8> Dec 16 15:13:20 varsha bash[22307]: 3418fa440a58bfffb242c9308fe9480e80b87a2730574cdfeaa23edbc3599178 Dec 16 15:13:20 varsha systemd[1]: Started Ceph nfs.vstart.varsha for 2b607da9-5bd2-4a99-bcb5-6eece472f448. Dec 16 15:13:49 varsha podman[22659]: 2020-12-16 15:13:49.465978196 +0530 IST m=+0.055481569 container died 3418fa440a58bfffb242c9308fe9480e80b87a2730574cdfeaa23edbc3599178 (image=docker.io/ceph/daemon-base@sha256:619a7a460abdf90543c4f85> Dec 16 15:13:49 varsha podman[22659]: 2020-12-16 15:13:49.503018646 +0530 IST m=+0.092522030 container remove 3418fa440a58bfffb242c9308fe9480e80b87a2730574cdfeaa23edbc3599178 (image=docker.io/ceph/daemon-base@sha256:619a7a460abdf90543c4f> Dec 16 15:13:49 varsha podman[22700]: Error: no container with name or ID ceph-2b607da9-5bd2-4a99-bcb5-6eece472f448-nfs.vstart.varsha found: no such container Dec 16 15:13:49 varsha bash[22737]: Error: Failed to evict container: "": Failed to find container "ceph-2b607da9-5bd2-4a99-bcb5-6eece472f448-nfs.vstart.varsha-grace-remove" in state: no container with name or ID ceph-2b607da9-5bd2-4a99-> Dec 16 15:13:49 varsha bash[22773]: Error: no container with ID or name "ceph-2b607da9-5bd2-4a99-bcb5-6eece472f448-nfs.vstart.varsha-grace-remove" found: no such container Dec 16 15:13:49 varsha podman[22808]: 2020-12-16 15:13:49.77164686 +0530 IST m=+0.056738600 container create a241934ff765e432910d20d0f018692fd501ae215ddaa16012e117ea7b0846a4 (image=docker.io/ceph/daemon-base@sha256:619a7a460abdf90543c4f8> Dec 16 15:13:49 varsha oci-umount[22851]: umounthook <debug>: prestart container_id:a241934ff765 rootfs:/var/lib/containers/storage/overlay/f943150891ae57d3fd1ac983645ab5e9a2fcb4c6c4bc60f45402e0fb3a5db2b1/merged Dec 16 15:13:49 varsha podman[22808]: 2020-12-16 15:13:49.852848781 +0530 IST m=+0.137940527 container init a241934ff765e432910d20d0f018692fd501ae215ddaa16012e117ea7b0846a4 (image=docker.io/ceph/daemon-base@sha256:619a7a460abdf90543c4f85> Dec 16 15:13:49 varsha podman[22808]: 2020-12-16 15:13:49.858332773 +0530 IST m=+0.143424523 container start a241934ff765e432910d20d0f018692fd501ae215ddaa16012e117ea7b0846a4 (image=docker.io/ceph/daemon-base@sha256:619a7a460abdf90543c4f8> Dec 16 15:13:49 varsha podman[22808]: 2020-12-16 15:13:49.858388873 +0530 IST m=+0.143480610 container attach a241934ff765e432910d20d0f018692fd501ae215ddaa16012e117ea7b0846a4 (image=docker.io/ceph/daemon-base@sha256:619a7a460abdf90543c4f> Dec 16 15:13:49 varsha podman[22808]: 2020-12-16 15:13:49.900274609 +0530 IST m=+0.185366368 container died a241934ff765e432910d20d0f018692fd501ae215ddaa16012e117ea7b0846a4 (image=docker.io/ceph/daemon-base@sha256:619a7a460abdf90543c4f85> Dec 16 15:13:49 varsha podman[22808]: 2020-12-16 15:13:49.927105413 +0530 IST m=+0.212197164 container remove a241934ff765e432910d20d0f018692fd501ae215ddaa16012e117ea7b0846a4 (image=docker.io/ceph/daemon-base@sha256:619a7a460abdf90543c4f> Dec 16 15:13:49 varsha systemd[1]: ceph-2b607da9-5bd2-4a99-bcb5-6eece472f448@nfs.vstart.varsha.service: Succeeded. Dec 16 15:13:49 varsha systemd[1]: ceph-2b607da9-5bd2-4a99-bcb5-6eece472f448@nfs.vstart.varsha.service: Consumed 1.386s CPU time.
#5 Updated by Tim Serong over 2 years ago
I tried reproducing this (admittedly with a downstream SUSE build), and TERM seemed to have no effect, while KILL did result in ganesha stopping, but it was restarted automatically after a couple of seconds. One possible difference is that the downstream SUSE deployment passes --container-init to `cephadm bootstrap`. Not sure if that makes a difference here.
The two sets of logs in the previous comment look pretty much identical to me, aside from the various IDs.
How was your test cluster deployed (vstart/cstart/something else)? I'd like to see if I'm able to reproduce myself with the latest upstream.
#6 Updated by Varsha Rao over 2 years ago
Tim Serong wrote:
How was your test cluster deployed (vstart/cstart/something else)? I'd like to see if I'm able to reproduce myself with the latest upstream.
Thanks Tim for looking into it. I tested using vstart cluster.
#7 Updated by Patrick Donnelly over 2 years ago
- Target version changed from v16.0.0 to v17.0.0
- Backport set to pacific,octopus,nautilus
#8 Updated by Varsha Rao over 2 years ago
- Backport changed from pacific,octopus,nautilus to pacific,octopus
#9 Updated by Patrick Donnelly about 2 years ago
Sebastian, do we have documentation somewhere about which situations cephadm will restart services? I think there needs to be a clear outline of what triggers restart of a service.
(I think this is a somewhat unfortunately situation as cephadm wasn't necessarily intended to provide HA via respin of services; services like MDS/OSDs already provide HA as part of their protocol. Still, something needs to be done for NFS.)
#10 Updated by Varsha Rao about 2 years ago
- Assignee changed from Varsha Rao to Sage Weil
#11 Updated by Patrick Donnelly about 1 year ago
- Target version deleted (
v17.0.0)