Bug #44272
closed
on SUSE, crash daemon starts but then always stops a couple minutes later
Added by Nathan Cutler about 4 years ago.
Updated about 4 years ago.
Description
Recently cephadm/orchestrator started deploying crash daemon on all cluster nodes.
On SUSE (at least), the crash daemon does not stay up for long. After some minutes, it always stops. Journalctl has this to say about it:
# journalctl -u "ceph-899b6a04-5715-11ea-9d8c-525400f299cb@crash.admin.service" | head
-- Logs begin at Mon 2020-02-24 15:47:30 CET, end at Mon 2020-02-24 16:18:38 CET. --
Feb 24 15:54:29 admin systemd[1]: Starting Ceph crash.admin for 899b6a04-5715-11ea-9d8c-525400f299cb...
Feb 24 15:54:29 admin podman[15929]: Error: no container with name or ID ceph-899b6a04-5715-11ea-9d8c-525400f299cb-crash.admin found: no such container
Feb 24 15:54:29 admin systemd[1]: Started Ceph crash.admin for 899b6a04-5715-11ea-9d8c-525400f299cb.
Feb 24 15:54:30 admin bash[15941]: INFO:ceph-crash:monitoring path /var/lib/ceph/crash, delay 600s
Feb 24 16:00:16 admin systemd[1]: Stopping Ceph crash.admin for 899b6a04-5715-11ea-9d8c-525400f299cb...
Feb 24 16:00:16 admin podman[20703]: time="2020-02-24T16:00:16+01:00" level=error msg="container_linux.go:389: signaling init process caused \"permission denied\""
Feb 24 16:00:16 admin podman[20703]: container_linux.go:389: signaling init process caused "permission denied"
Feb 24 16:00:16 admin podman[20703]: Error: permission denied
Feb 24 16:00:31 admin systemd[1]: ceph-899b6a04-5715-11ea-9d8c-525400f299cb@crash.admin.service: State 'stop-post' timed out. Terminating.
Files
- Subject changed from crash daemon not managed by cephadm on SUSE to on SUSE, crash daemon starts but then always stops a couple minutes later
Rethinking. I think this is an apparmor problem. Adding the output of dmesg would be helpful.
- Status changed from New to Triaged
OK, I will reproduce, obtain dmesg output, and post here.
One thing I did notice is that, with the upstream container, "crash" is not listed in "ceph orch ps". With the downstream container, it is listed.
OK, some more information:
admin:~ # ceph orch ps
NAME HOST STATUS REFRESHED VERSION IMAGE NAME IMAGE ID CONTAINER ID
crash.admin admin error 3m ago 15.1.0.1521 registry.suse.de/devel/storage/7.0/containers/ses/7/ceph/ceph:latest 09e408f3e7f6 853ab695fdb4
mgr.admin.xdltoy admin running 3m ago 15.1.0.1521 registry.suse.de/devel/storage/7.0/containers/ses/7/ceph/ceph:latest 09e408f3e7f6 cb3b6e3ada75
mon.admin admin running 3m ago 15.1.0.1521 registry.suse.de/devel/storage/7.0/containers/ses/7/ceph/ceph:latest 09e408f3e7f6 b59f5006b9c0
osd.0 admin running 3m ago 15.1.0.1521 registry.suse.de/devel/storage/7.0/containers/ses/7/ceph/ceph:latest 09e408f3e7f6 0ca3d9ea7824
osd.1 admin running 3m ago 15.1.0.1521 registry.suse.de/devel/storage/7.0/containers/ses/7/ceph/ceph:latest 09e408f3e7f6 0272b2aceb59
osd.2 admin running 3m ago 15.1.0.1521 registry.suse.de/devel/storage/7.0/containers/ses/7/ceph/ceph:latest 09e408f3e7f6 7a04ebd44a49
osd.3 admin running 3m ago 15.1.0.1521 registry.suse.de/devel/storage/7.0/containers/ses/7/ceph/ceph:latest 09e408f3e7f6 4e9a833bba0c
admin:~ # cat /etc/os-release
NAME="SLES"
VERSION="15-SP2"
VERSION_ID="15.2"
PRETTY_NAME="SUSE Linux Enterprise Server 15 SP2"
ID="sles"
ID_LIKE="suse"
ANSI_COLOR="0;32"
CPE_NAME="cpe:/o:suse:sles:15:sp2"
admin:~ # ceph --version
ceph version 15.1.0-1521-gcdf35413a0 (cdf35413a036bd1aa59a8c718bb177839c45cab1) octopus (rc)
admin:~ # ceph versions
{
"mon": {
"ceph version 15.1.0-1521-gcdf35413a0 (cdf35413a036bd1aa59a8c718bb177839c45cab1) octopus (rc)": 1
},
"mgr": {
"ceph version 15.1.0-1521-gcdf35413a0 (cdf35413a036bd1aa59a8c718bb177839c45cab1) octopus (rc)": 1
},
"osd": {
"ceph version 15.1.0-1521-gcdf35413a0 (cdf35413a036bd1aa59a8c718bb177839c45cab1) octopus (rc)": 4
},
"mds": {},
"overall": {
"ceph version 15.1.0-1521-gcdf35413a0 (cdf35413a036bd1aa59a8c718bb177839c45cab1) octopus (rc)": 6
}
}
And dmesg output is attached!
- Assignee set to Sebastian Wagner
from dmesg:
[ 525.062394] audit: type=1400 audit(1583421345.488:2): apparmor="STATUS" operation="profile_load" profile="unconfined" name="libpod-default-1.4.4" pid=14802 comm="apparmor_parser"
[ 529.245377] audit: type=1400 audit(1583421349.672:3): apparmor="DENIED" operation="signal" profile="libpod-default-1.4.4" pid=15648 comm="ceph-mon" requested_mask="send" denied_mask="send" signal=rtmin+1 peer="libpod-default-1.4.4"
[ 529.246334] audit: type=1400 audit(1583421349.672:4): apparmor="DENIED" operation="signal" profile="libpod-default-1.4.4" pid=15648 comm="ceph-mon" requested_mask="receive" denied_mask="receive" signal=rtmin+1 peer="libpod-default-1.4.4"
[ 529.248060] audit: type=1400 audit(1583421349.676:5): apparmor="DENIED" operation="signal" profile="libpod-default-1.4.4" pid=15648 comm="ceph-mon" requested_mask="send" denied_mask="send" signal=rtmin+1 peer="libpod-default-1.4.4"
[ 529.249204] audit: type=1400 audit(1583421349.676:6): apparmor="DENIED" operation="signal" profile="libpod-default-1.4.4" pid=15648 comm="ceph-mon" requested_mask="receive" denied_mask="receive" signal=rtmin+1 peer="libpod-default-1.4.4"
[ 535.358734] audit: type=1400 audit(1583421355.788:7): apparmor="DENIED" operation="signal" profile="libpod-default-1.4.4" pid=16794 comm="ceph-mgr" requested_mask="send" denied_mask="send" signal=rtmin+1 peer="libpod-default-1.4.4"
[ 535.359912] audit: type=1400 audit(1583421355.788:8): apparmor="DENIED" operation="signal" profile="libpod-default-1.4.4" pid=16794 comm="ceph-mgr" requested_mask="receive" denied_mask="receive" signal=rtmin+1 peer="libpod-default-1.4.4"
[ 535.361136] audit: type=1400 audit(1583421355.788:9): apparmor="DENIED" operation="signal" profile="libpod-default-1.4.4" pid=16794 comm="ceph-mgr" requested_mask="send" denied_mask="send" signal=rtmin+1 peer="libpod-default-1.4.4"
[ 535.362404] audit: type=1400 audit(1583421355.788:10): apparmor="DENIED" operation="signal" profile="libpod-default-1.4.4" pid=16794 comm="ceph-mgr" requested_mask="receive" denied_mask="receive" signal=rtmin+1 peer="libpod-default-1.4.4"
[ 594.823456] audit: type=1400 audit(1583421415.251:11): apparmor="DENIED" operation="signal" profile="libpod-default-1.4.4" pid=21699 comm="podman" requested_mask="receive" denied_mask="receive" signal=exists peer="unconfined"
[ 594.832888] audit: type=1400 audit(1583421415.259:12): apparmor="DENIED" operation="signal" profile="libpod-default-1.4.4" pid=21709 comm="runc" requested_mask="receive" denied_mask="receive" signal=term peer="unconfined"
[ 594.836021] audit: type=1400 audit(1583421415.263:13): apparmor="DENIED" operation="signal" profile="libpod-default-1.4.4" pid=21699 comm="podman" requested_mask="receive" denied_mask="receive" signal=exists peer="unconfined"
- Status changed from Triaged to Fix Under Review
- Pull request ID set to 33850
- Status changed from Fix Under Review to Resolved
Also available in: Atom
PDF