Project

General

Profile

Actions

Bug #47787

open

mgr/nfs: exercise host-level HA of NFS-Ganesha by killing the process

Added by Patrick Donnelly over 3 years ago. Updated almost 2 years ago.

Status:
Triaged
Priority:
High
Assignee:
Category:
Correctness/Safety
Target version:
-
% Done:

0%

Source:
Development
Tags:
Backport:
pacific,octopus
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
mgr/nfs, qa-suite
Labels (FS):
crash, qa
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

In my own testing, the process is not respawned and the NFS client hangs. I suspect there's some changes necessary to cephadm to make it restart.

Actions #1

Updated by Patrick Donnelly over 3 years ago

  • Category set to Correctness/Safety
  • Status changed from New to Triaged
  • Assignee set to Varsha Rao
  • Component(FS) qa-suite added
  • Labels (FS) crash, qa added
Actions #2

Updated by Varsha Rao over 3 years ago

Patrick Donnelly wrote:

In my own testing, the process is not respawned and the NFS client hangs. I suspect there's some changes necessary to cephadm to make it restart.

What commands did you use for testing ? Please share the logs too.

Actions #3

Updated by Patrick Donnelly over 3 years ago

Varsha Rao wrote:

Patrick Donnelly wrote:

In my own testing, the process is not respawned and the NFS client hangs. I suspect there's some changes necessary to cephadm to make it restart.

What commands did you use for testing ? Please share the logs too.

I setup a Ceph cluster and NFS cluster in Linode. As for checking host-level HA, I logged into the machine with the NFS-Ganesha server and killed it (SIGTERM or SIGKILL, can't remember which).

Actions #4

Updated by Varsha Rao over 3 years ago

I can reproduce it with SIGTERM

# ../src/cephadm/cephadm ls
[
    {
        "style": "cephadm:v1",
        "name": "nfs.vstart.varsha",
        "fsid": "7bb24e16-dc1b-405b-a820-a37038d9fde5",
        "systemd_unit": "ceph-7bb24e16-dc1b-405b-a820-a37038d9fde5@nfs.vstart.varsha",
        "enabled": true,
        "state": "running",
        "container_id": "5353f7212cbdf8cc0bbc2638d8b3276a5c7e12c5a868d922ac40aeb2ff4ec85b",
        "container_image_name": "docker.io/ceph/daemon-base@sha256:619a7a460abdf90543c4f8578c047a8b5dfe544c372b654d6878f8ddba2fff86",
        "container_image_id": "d3ab710713d40333ae47b6aebbc081b88e3dcd4df745a26d496b5ba6a491d159",
        "version": "3.3",
        "started": "2020-12-16T09:37:45.017297",
        "created": "2020-12-16T09:37:45.084521",
        "deployed": "2020-12-16T09:37:43.791427",
        "configured": "2020-12-16T09:37:45.084521" 
    },
    {
        "style": "cephadm:v1",
        "name": "crash.varsha",
        "fsid": "7bb24e16-dc1b-405b-a820-a37038d9fde5",
        "systemd_unit": "ceph-7bb24e16-dc1b-405b-a820-a37038d9fde5@crash.varsha",
        "enabled": true,
        "state": "running",
        "container_id": "654111b86be25f3ec6c9f4b8b07a4df999c5517094e8ad0a799e77ea568453c2",
        "container_image_name": "docker.io/ceph/daemon-base@sha256:619a7a460abdf90543c4f8578c047a8b5dfe544c372b654d6878f8ddba2fff86",
        "container_image_id": "d3ab710713d40333ae47b6aebbc081b88e3dcd4df745a26d496b5ba6a491d159",
        "version": "16.0.0-8346-g8a74a937",
        "started": "2020-12-16T09:37:21.671885",
        "created": "2020-12-16T09:37:21.783835",
        "deployed": "2020-12-16T09:37:20.388734",
        "configured": "2020-12-16T09:37:21.783835" 
    }
]

# ps aux | grep nfs
root       16819  0.0  0.0  80492  1916 ?        Ssl  15:07   0:00 /usr/bin/conmon --api-version 1 -c 5353f7212cbdf8cc0bbc2638d8b3276a5c7e12c5a868d922ac40aeb2ff4ec85b -u 5353f7212cbdf8cc0bbc2638d8b3276a5c7e12c5a868d922ac40aeb2ff4ec85b -r /usr/bin/crun -b /var/lib/containers/storage/overlay-containers/5353f7212cbdf8cc0bbc2638d8b3276a5c7e12c5a868d922ac40aeb2ff4ec85b/userdata -p /var/run/containers/storage/overlay-containers/5353f7212cbdf8cc0bbc2638d8b3276a5c7e12c5a868d922ac40aeb2ff4ec85b/userdata/pidfile -n ceph-7bb24e16-dc1b-405b-a820-a37038d9fde5-nfs.vstart.varsha --exit-dir /var/run/libpod/exits --socket-dir-path /var/run/libpod/socket -s -l k8s-file:/var/lib/containers/storage/overlay-containers/5353f7212cbdf8cc0bbc2638d8b3276a5c7e12c5a868d922ac40aeb2ff4ec85b/userdata/ctr.log --log-level error --runtime-arg --log-format=json --runtime-arg --log --runtime-arg=/var/run/containers/storage/overlay-containers/5353f7212cbdf8cc0bbc2638d8b3276a5c7e12c5a868d922ac40aeb2ff4ec85b/userdata/oci-log --conmon-pidfile /run/ceph-7bb24e16-dc1b-405b-a820-a37038d9fde5@nfs.vstart.varsha.service-pid --exit-command /usr/bin/podman --exit-command-arg --root --exit-command-arg /var/lib/containers/storage --exit-command-arg --runroot --exit-command-arg /var/run/containers/storage --exit-command-arg --log-level --exit-command-arg error --exit-command-arg --cgroup-manager --exit-command-arg systemd --exit-command-arg --tmpdir --exit-command-arg /var/run/libpod --exit-command-arg --runtime --exit-command-arg crun --exit-command-arg --storage-driver --exit-command-arg overlay --exit-command-arg --storage-opt --exit-command-arg overlay.mountopt=nodev,metacopy=on --exit-command-arg --events-backend --exit-command-arg journald --exit-command-arg container --exit-command-arg cleanup --exit-command-arg --rm --exit-command-arg 5353f7212cbdf8cc0bbc2638d8b3276a5c7e12c5a868d922ac40aeb2ff4ec85b
root       16822  0.8  0.1 2538936 58812 ?       Ssl  15:07   0:00 /usr/bin/ganesha.nfsd -F -L STDERR -N NIV_EVENT
root       17103  0.0  0.0 216088   648 pts/0    S+   15:07   0:00 grep --color=auto nfs

# kill -15 16819
# ps aux | grep nfs
root       17608  0.0  0.0 216088   648 pts/0    S+   15:10   0:00 grep --color=auto nfs
# ../src/cephadm/cephadm ls
[
    {
        "style": "cephadm:v1",
        "name": "nfs.vstart.varsha",
        "fsid": "7bb24e16-dc1b-405b-a820-a37038d9fde5",
        "systemd_unit": "ceph-7bb24e16-dc1b-405b-a820-a37038d9fde5@nfs.vstart.varsha",
        "enabled": true,
        "state": "stopped",
        "container_id": null,
        "container_image_name": "docker.io/ceph/daemon-base@sha256:619a7a460abdf90543c4f8578c047a8b5dfe544c372b654d6878f8ddba2fff86",
        "container_image_id": null,
        "version": null,
        "started": null,
        "created": "2020-12-16T09:37:45.084521",
        "deployed": "2020-12-16T09:37:43.791427",
        "configured": "2020-12-16T09:37:45.084521" 
    },
    {
        "style": "cephadm:v1",
        "name": "crash.varsha",
        "fsid": "7bb24e16-dc1b-405b-a820-a37038d9fde5",
        "systemd_unit": "ceph-7bb24e16-dc1b-405b-a820-a37038d9fde5@crash.varsha",
        "enabled": true,
        "state": "running",
        "container_id": "654111b86be25f3ec6c9f4b8b07a4df999c5517094e8ad0a799e77ea568453c2",
        "container_image_name": "docker.io/ceph/daemon-base@sha256:619a7a460abdf90543c4f8578c047a8b5dfe544c372b654d6878f8ddba2fff86",
        "container_image_id": "d3ab710713d40333ae47b6aebbc081b88e3dcd4df745a26d496b5ba6a491d159",
        "version": "16.0.0-8346-g8a74a937",
        "started": "2020-12-16T09:37:21.671885",
        "created": "2020-12-16T09:37:21.783835",
        "deployed": "2020-12-16T09:37:20.388734",
        "configured": "2020-12-16T09:37:21.783835" 
    }
]

Cephadm nfs service logs
# journalctl -u ceph-7bb24e16-dc1b-405b-a820-a37038d9fde5@nfs.vstart.varsha
-- Logs begin at Thu 2020-04-09 10:39:57 IST, end at Wed 2020-12-16 15:10:40 IST. --
Dec 16 15:07:44 varsha systemd[1]: Starting Ceph nfs.vstart.varsha for 7bb24e16-dc1b-405b-a820-a37038d9fde5...
Dec 16 15:07:44 varsha podman[16502]: Error: no container with name or ID ceph-7bb24e16-dc1b-405b-a820-a37038d9fde5-nfs.vstart.varsha found: no such container
Dec 16 15:07:44 varsha bash[16538]: Error: Failed to evict container: "": Failed to find container "ceph-7bb24e16-dc1b-405b-a820-a37038d9fde5-nfs.vstart.varsha-grace-add" in state: no container with name or ID ceph-7bb24e16-dc1b-405b-a82>
Dec 16 15:07:44 varsha bash[16574]: Error: no container with ID or name "ceph-7bb24e16-dc1b-405b-a820-a37038d9fde5-nfs.vstart.varsha-grace-add" found: no such container
Dec 16 15:07:44 varsha podman[16610]: 2020-12-16 15:07:44.685252731 +0530 IST m=+0.062146930 container create f09230047f8e5515a4a51d29bee6d0a4ea0d2898d72677cd8b58ce988ec28683 (image=docker.io/ceph/daemon-base@sha256:619a7a460abdf90543c4f>
Dec 16 15:07:44 varsha oci-umount[16652]: umounthook <debug>: prestart container_id:f09230047f8e rootfs:/var/lib/containers/storage/overlay/576b93dfa86c4c9caeed59ab08637f10d18adf7dddebdb88eaa7b9dab33d747f/merged
Dec 16 15:07:44 varsha podman[16610]: 2020-12-16 15:07:44.736042311 +0530 IST m=+0.112936523 container init f09230047f8e5515a4a51d29bee6d0a4ea0d2898d72677cd8b58ce988ec28683 (image=docker.io/ceph/daemon-base@sha256:619a7a460abdf90543c4f85>
Dec 16 15:07:44 varsha podman[16610]: 2020-12-16 15:07:44.740910942 +0530 IST m=+0.117805140 container start f09230047f8e5515a4a51d29bee6d0a4ea0d2898d72677cd8b58ce988ec28683 (image=docker.io/ceph/daemon-base@sha256:619a7a460abdf90543c4f8>
Dec 16 15:07:44 varsha podman[16610]: 2020-12-16 15:07:44.741011447 +0530 IST m=+0.117905677 container attach f09230047f8e5515a4a51d29bee6d0a4ea0d2898d72677cd8b58ce988ec28683 (image=docker.io/ceph/daemon-base@sha256:619a7a460abdf90543c4f>
Dec 16 15:07:44 varsha podman[16610]: 2020-12-16 15:07:44.820068306 +0530 IST m=+0.196962531 container died f09230047f8e5515a4a51d29bee6d0a4ea0d2898d72677cd8b58ce988ec28683 (image=docker.io/ceph/daemon-base@sha256:619a7a460abdf90543c4f85>
Dec 16 15:07:44 varsha podman[16610]: 2020-12-16 15:07:44.846952238 +0530 IST m=+0.223846458 container remove f09230047f8e5515a4a51d29bee6d0a4ea0d2898d72677cd8b58ce988ec28683 (image=docker.io/ceph/daemon-base@sha256:619a7a460abdf90543c4f>
Dec 16 15:07:44 varsha bash[16708]: Error: Failed to evict container: "": Failed to find container "ceph-7bb24e16-dc1b-405b-a820-a37038d9fde5-nfs.vstart.varsha" in state: no container with name or ID ceph-7bb24e16-dc1b-405b-a820-a37038d9>
Dec 16 15:07:44 varsha bash[16745]: Error: no container with ID or name "ceph-7bb24e16-dc1b-405b-a820-a37038d9fde5-nfs.vstart.varsha" found: no such container
Dec 16 15:07:45 varsha podman[16781]: 2020-12-16 15:07:45.027528239 +0530 IST m=+0.058077583 container create 5353f7212cbdf8cc0bbc2638d8b3276a5c7e12c5a868d922ac40aeb2ff4ec85b (image=docker.io/ceph/daemon-base@sha256:619a7a460abdf90543c4f>
Dec 16 15:07:45 varsha oci-umount[16823]: umounthook <debug>: prestart container_id:5353f7212cbd rootfs:/var/lib/containers/storage/overlay/7115f6b012aa6df8cc3986578ab4fee406deb27a3ddd9603d96bcc1b02f10a4e/merged
Dec 16 15:07:45 varsha podman[16781]: 2020-12-16 15:07:45.073029763 +0530 IST m=+0.103579117 container init 5353f7212cbdf8cc0bbc2638d8b3276a5c7e12c5a868d922ac40aeb2ff4ec85b (image=docker.io/ceph/daemon-base@sha256:619a7a460abdf90543c4f85>
Dec 16 15:07:45 varsha podman[16781]: 2020-12-16 15:07:45.077517381 +0530 IST m=+0.108066734 container start 5353f7212cbdf8cc0bbc2638d8b3276a5c7e12c5a868d922ac40aeb2ff4ec85b (image=docker.io/ceph/daemon-base@sha256:619a7a460abdf90543c4f8>
Dec 16 15:07:45 varsha bash[16781]: 5353f7212cbdf8cc0bbc2638d8b3276a5c7e12c5a868d922ac40aeb2ff4ec85b
Dec 16 15:07:45 varsha systemd[1]: Started Ceph nfs.vstart.varsha for 7bb24e16-dc1b-405b-a820-a37038d9fde5.
Dec 16 15:10:33 varsha podman[17353]: 2020-12-16 15:10:33.161061738 +0530 IST m=+0.049255818 container died 5353f7212cbdf8cc0bbc2638d8b3276a5c7e12c5a868d922ac40aeb2ff4ec85b (image=docker.io/ceph/daemon-base@sha256:619a7a460abdf90543c4f85>
Dec 16 15:10:33 varsha podman[17353]: 2020-12-16 15:10:33.183976792 +0530 IST m=+0.072170853 container remove 5353f7212cbdf8cc0bbc2638d8b3276a5c7e12c5a868d922ac40aeb2ff4ec85b (image=docker.io/ceph/daemon-base@sha256:619a7a460abdf90543c4f>
Dec 16 15:10:33 varsha podman[17392]: Error: no container with name or ID ceph-7bb24e16-dc1b-405b-a820-a37038d9fde5-nfs.vstart.varsha found: no such container
Dec 16 15:10:33 varsha bash[17430]: Error: Failed to evict container: "": Failed to find container "ceph-7bb24e16-dc1b-405b-a820-a37038d9fde5-nfs.vstart.varsha-grace-remove" in state: no container with name or ID ceph-7bb24e16-dc1b-405b->
Dec 16 15:10:33 varsha bash[17467]: Error: no container with ID or name "ceph-7bb24e16-dc1b-405b-a820-a37038d9fde5-nfs.vstart.varsha-grace-remove" found: no such container
Dec 16 15:10:33 varsha podman[17503]: 2020-12-16 15:10:33.458966195 +0530 IST m=+0.056219420 container create 195321435e69e8a0305a3dc14cd8c4fe72de5c517f6bf1d0c02864444727c187 (image=docker.io/ceph/daemon-base@sha256:619a7a460abdf90543c4f>
Dec 16 15:10:33 varsha oci-umount[17545]: umounthook <debug>: prestart container_id:195321435e69 rootfs:/var/lib/containers/storage/overlay/fce9b03e35c2602bc5fa145895939879cff3023e9505aec10d334dcd15ab3351/merged
Dec 16 15:10:33 varsha podman[17503]: 2020-12-16 15:10:33.548892251 +0530 IST m=+0.146145505 container init 195321435e69e8a0305a3dc14cd8c4fe72de5c517f6bf1d0c02864444727c187 (image=docker.io/ceph/daemon-base@sha256:619a7a460abdf90543c4f85>
Dec 16 15:10:33 varsha podman[17503]: 2020-12-16 15:10:33.554883947 +0530 IST m=+0.152137178 container start 195321435e69e8a0305a3dc14cd8c4fe72de5c517f6bf1d0c02864444727c187 (image=docker.io/ceph/daemon-base@sha256:619a7a460abdf90543c4f8>
Dec 16 15:10:33 varsha podman[17503]: 2020-12-16 15:10:33.555018467 +0530 IST m=+0.152271725 container attach 195321435e69e8a0305a3dc14cd8c4fe72de5c517f6bf1d0c02864444727c187 (image=docker.io/ceph/daemon-base@sha256:619a7a460abdf90543c4f>
Dec 16 15:10:33 varsha podman[17503]: 2020-12-16 15:10:33.627236134 +0530 IST m=+0.224489369 container died 195321435e69e8a0305a3dc14cd8c4fe72de5c517f6bf1d0c02864444727c187 (image=docker.io/ceph/daemon-base@sha256:619a7a460abdf90543c4f85>
Dec 16 15:10:33 varsha podman[17503]: 2020-12-16 15:10:33.653443804 +0530 IST m=+0.250697040 container remove 195321435e69e8a0305a3dc14cd8c4fe72de5c517f6bf1d0c02864444727c187 (image=docker.io/ceph/daemon-base@sha256:619a7a460abdf90543c4f>
Dec 16 15:10:33 varsha systemd[1]: ceph-7bb24e16-dc1b-405b-a820-a37038d9fde5@nfs.vstart.varsha.service: Succeeded.
Dec 16 15:10:33 varsha systemd[1]: ceph-7bb24e16-dc1b-405b-a820-a37038d9fde5@nfs.vstart.varsha.service: Consumed 1.315s CPU time.

If ganesha daemon is killed directly

# journalctl -u ceph-2b607da9-5bd2-4a99-bcb5-6eece472f448@nfs.vstart.varsha
-- Logs begin at Thu 2020-04-09 10:39:57 IST, end at Wed 2020-12-16 15:14:08 IST. --
Dec 16 15:13:19 varsha systemd[1]: Starting Ceph nfs.vstart.varsha for 2b607da9-5bd2-4a99-bcb5-6eece472f448...
Dec 16 15:13:20 varsha podman[21994]: Error: no container with name or ID ceph-2b607da9-5bd2-4a99-bcb5-6eece472f448-nfs.vstart.varsha found: no such container
Dec 16 15:13:20 varsha bash[22031]: Error: Failed to evict container: "": Failed to find container "ceph-2b607da9-5bd2-4a99-bcb5-6eece472f448-nfs.vstart.varsha-grace-add" in state: no container with name or ID ceph-2b607da9-5bd2-4a99-bcb>
Dec 16 15:13:20 varsha bash[22067]: Error: no container with ID or name "ceph-2b607da9-5bd2-4a99-bcb5-6eece472f448-nfs.vstart.varsha-grace-add" found: no such container
Dec 16 15:13:20 varsha podman[22103]: 2020-12-16 15:13:20.283098258 +0530 IST m=+0.058217748 container create 585eb21151124fb2d646ae5ff98d0ee6a1b04b0f6ad617a3e35b35194c6ccc08 (image=docker.io/ceph/daemon-base@sha256:619a7a460abdf90543c4f>
Dec 16 15:13:20 varsha oci-umount[22147]: umounthook <debug>: prestart container_id:585eb2115112 rootfs:/var/lib/containers/storage/overlay/8c3c66e6ac785933a2791dc3644a97a39932be6074a1ae6322c2968fcee7dab4/merged
Dec 16 15:13:20 varsha podman[22103]: 2020-12-16 15:13:20.333440932 +0530 IST m=+0.108560425 container init 585eb21151124fb2d646ae5ff98d0ee6a1b04b0f6ad617a3e35b35194c6ccc08 (image=docker.io/ceph/daemon-base@sha256:619a7a460abdf90543c4f85>
Dec 16 15:13:20 varsha podman[22103]: 2020-12-16 15:13:20.338166369 +0530 IST m=+0.113285857 container start 585eb21151124fb2d646ae5ff98d0ee6a1b04b0f6ad617a3e35b35194c6ccc08 (image=docker.io/ceph/daemon-base@sha256:619a7a460abdf90543c4f8>
Dec 16 15:13:20 varsha podman[22103]: 2020-12-16 15:13:20.338222409 +0530 IST m=+0.113341921 container attach 585eb21151124fb2d646ae5ff98d0ee6a1b04b0f6ad617a3e35b35194c6ccc08 (image=docker.io/ceph/daemon-base@sha256:619a7a460abdf90543c4f>
Dec 16 15:13:20 varsha podman[22103]: 2020-12-16 15:13:20.411280307 +0530 IST m=+0.186399818 container died 585eb21151124fb2d646ae5ff98d0ee6a1b04b0f6ad617a3e35b35194c6ccc08 (image=docker.io/ceph/daemon-base@sha256:619a7a460abdf90543c4f85>
Dec 16 15:13:20 varsha podman[22103]: 2020-12-16 15:13:20.500731722 +0530 IST m=+0.275851220 container remove 585eb21151124fb2d646ae5ff98d0ee6a1b04b0f6ad617a3e35b35194c6ccc08 (image=docker.io/ceph/daemon-base@sha256:619a7a460abdf90543c4f>
Dec 16 15:13:20 varsha bash[22209]: Error: Failed to evict container: "": Failed to find container "ceph-2b607da9-5bd2-4a99-bcb5-6eece472f448-nfs.vstart.varsha" in state: no container with name or ID ceph-2b607da9-5bd2-4a99-bcb5-6eece472>
Dec 16 15:13:20 varsha bash[22250]: Error: no container with ID or name "ceph-2b607da9-5bd2-4a99-bcb5-6eece472f448-nfs.vstart.varsha" found: no such container
Dec 16 15:13:20 varsha podman[22307]: 2020-12-16 15:13:20.693977322 +0530 IST m=+0.063773732 container create 3418fa440a58bfffb242c9308fe9480e80b87a2730574cdfeaa23edbc3599178 (image=docker.io/ceph/daemon-base@sha256:619a7a460abdf90543c4f>
Dec 16 15:13:20 varsha oci-umount[22355]: umounthook <debug>: prestart container_id:3418fa440a58 rootfs:/var/lib/containers/storage/overlay/248c3d063b83f3cef8cf28b2bd20df395496082ddd06d05cd60d02b6426fd19f/merged
Dec 16 15:13:20 varsha podman[22307]: 2020-12-16 15:13:20.739816224 +0530 IST m=+0.109612624 container init 3418fa440a58bfffb242c9308fe9480e80b87a2730574cdfeaa23edbc3599178 (image=docker.io/ceph/daemon-base@sha256:619a7a460abdf90543c4f85>
Dec 16 15:13:20 varsha podman[22307]: 2020-12-16 15:13:20.744074812 +0530 IST m=+0.113871207 container start 3418fa440a58bfffb242c9308fe9480e80b87a2730574cdfeaa23edbc3599178 (image=docker.io/ceph/daemon-base@sha256:619a7a460abdf90543c4f8>
Dec 16 15:13:20 varsha bash[22307]: 3418fa440a58bfffb242c9308fe9480e80b87a2730574cdfeaa23edbc3599178
Dec 16 15:13:20 varsha systemd[1]: Started Ceph nfs.vstart.varsha for 2b607da9-5bd2-4a99-bcb5-6eece472f448.
Dec 16 15:13:49 varsha podman[22659]: 2020-12-16 15:13:49.465978196 +0530 IST m=+0.055481569 container died 3418fa440a58bfffb242c9308fe9480e80b87a2730574cdfeaa23edbc3599178 (image=docker.io/ceph/daemon-base@sha256:619a7a460abdf90543c4f85>
Dec 16 15:13:49 varsha podman[22659]: 2020-12-16 15:13:49.503018646 +0530 IST m=+0.092522030 container remove 3418fa440a58bfffb242c9308fe9480e80b87a2730574cdfeaa23edbc3599178 (image=docker.io/ceph/daemon-base@sha256:619a7a460abdf90543c4f>
Dec 16 15:13:49 varsha podman[22700]: Error: no container with name or ID ceph-2b607da9-5bd2-4a99-bcb5-6eece472f448-nfs.vstart.varsha found: no such container
Dec 16 15:13:49 varsha bash[22737]: Error: Failed to evict container: "": Failed to find container "ceph-2b607da9-5bd2-4a99-bcb5-6eece472f448-nfs.vstart.varsha-grace-remove" in state: no container with name or ID ceph-2b607da9-5bd2-4a99->
Dec 16 15:13:49 varsha bash[22773]: Error: no container with ID or name "ceph-2b607da9-5bd2-4a99-bcb5-6eece472f448-nfs.vstart.varsha-grace-remove" found: no such container
Dec 16 15:13:49 varsha podman[22808]: 2020-12-16 15:13:49.77164686 +0530 IST m=+0.056738600 container create a241934ff765e432910d20d0f018692fd501ae215ddaa16012e117ea7b0846a4 (image=docker.io/ceph/daemon-base@sha256:619a7a460abdf90543c4f8>
Dec 16 15:13:49 varsha oci-umount[22851]: umounthook <debug>: prestart container_id:a241934ff765 rootfs:/var/lib/containers/storage/overlay/f943150891ae57d3fd1ac983645ab5e9a2fcb4c6c4bc60f45402e0fb3a5db2b1/merged
Dec 16 15:13:49 varsha podman[22808]: 2020-12-16 15:13:49.852848781 +0530 IST m=+0.137940527 container init a241934ff765e432910d20d0f018692fd501ae215ddaa16012e117ea7b0846a4 (image=docker.io/ceph/daemon-base@sha256:619a7a460abdf90543c4f85>
Dec 16 15:13:49 varsha podman[22808]: 2020-12-16 15:13:49.858332773 +0530 IST m=+0.143424523 container start a241934ff765e432910d20d0f018692fd501ae215ddaa16012e117ea7b0846a4 (image=docker.io/ceph/daemon-base@sha256:619a7a460abdf90543c4f8>
Dec 16 15:13:49 varsha podman[22808]: 2020-12-16 15:13:49.858388873 +0530 IST m=+0.143480610 container attach a241934ff765e432910d20d0f018692fd501ae215ddaa16012e117ea7b0846a4 (image=docker.io/ceph/daemon-base@sha256:619a7a460abdf90543c4f>
Dec 16 15:13:49 varsha podman[22808]: 2020-12-16 15:13:49.900274609 +0530 IST m=+0.185366368 container died a241934ff765e432910d20d0f018692fd501ae215ddaa16012e117ea7b0846a4 (image=docker.io/ceph/daemon-base@sha256:619a7a460abdf90543c4f85>
Dec 16 15:13:49 varsha podman[22808]: 2020-12-16 15:13:49.927105413 +0530 IST m=+0.212197164 container remove a241934ff765e432910d20d0f018692fd501ae215ddaa16012e117ea7b0846a4 (image=docker.io/ceph/daemon-base@sha256:619a7a460abdf90543c4f>
Dec 16 15:13:49 varsha systemd[1]: ceph-2b607da9-5bd2-4a99-bcb5-6eece472f448@nfs.vstart.varsha.service: Succeeded.
Dec 16 15:13:49 varsha systemd[1]: ceph-2b607da9-5bd2-4a99-bcb5-6eece472f448@nfs.vstart.varsha.service: Consumed 1.386s CPU time.
Actions #5

Updated by Tim Serong over 3 years ago

I tried reproducing this (admittedly with a downstream SUSE build), and TERM seemed to have no effect, while KILL did result in ganesha stopping, but it was restarted automatically after a couple of seconds. One possible difference is that the downstream SUSE deployment passes --container-init to `cephadm bootstrap`. Not sure if that makes a difference here.

The two sets of logs in the previous comment look pretty much identical to me, aside from the various IDs.

How was your test cluster deployed (vstart/cstart/something else)? I'd like to see if I'm able to reproduce myself with the latest upstream.

Actions #6

Updated by Varsha Rao over 3 years ago

Tim Serong wrote:

How was your test cluster deployed (vstart/cstart/something else)? I'd like to see if I'm able to reproduce myself with the latest upstream.

Thanks Tim for looking into it. I tested using vstart cluster.

Actions #7

Updated by Patrick Donnelly over 3 years ago

  • Target version changed from v16.0.0 to v17.0.0
  • Backport set to pacific,octopus,nautilus
Actions #8

Updated by Varsha Rao over 3 years ago

  • Backport changed from pacific,octopus,nautilus to pacific,octopus
Actions #9

Updated by Patrick Donnelly over 2 years ago

Sebastian, do we have documentation somewhere about which situations cephadm will restart services? I think there needs to be a clear outline of what triggers restart of a service.

(I think this is a somewhat unfortunately situation as cephadm wasn't necessarily intended to provide HA via respin of services; services like MDS/OSDs already provide HA as part of their protocol. Still, something needs to be done for NFS.)

Actions #10

Updated by Varsha Rao over 2 years ago

  • Assignee changed from Varsha Rao to Sage Weil
Actions #11

Updated by Patrick Donnelly almost 2 years ago

  • Target version deleted (v17.0.0)
Actions

Also available in: Atom PDF