Project

General

Profile

Actions

Bug #44272

closed

on SUSE, crash daemon starts but then always stops a couple minutes later

Added by Nathan Cutler about 4 years ago. Updated about 4 years ago.

Status:
Resolved
Priority:
Normal
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Recently cephadm/orchestrator started deploying crash daemon on all cluster nodes.

On SUSE (at least), the crash daemon does not stay up for long. After some minutes, it always stops. Journalctl has this to say about it:

# journalctl -u "ceph-899b6a04-5715-11ea-9d8c-525400f299cb@crash.admin.service" | head
-- Logs begin at Mon 2020-02-24 15:47:30 CET, end at Mon 2020-02-24 16:18:38 CET. --
Feb 24 15:54:29 admin systemd[1]: Starting Ceph crash.admin for 899b6a04-5715-11ea-9d8c-525400f299cb...
Feb 24 15:54:29 admin podman[15929]: Error: no container with name or ID ceph-899b6a04-5715-11ea-9d8c-525400f299cb-crash.admin found: no such container
Feb 24 15:54:29 admin systemd[1]: Started Ceph crash.admin for 899b6a04-5715-11ea-9d8c-525400f299cb.
Feb 24 15:54:30 admin bash[15941]: INFO:ceph-crash:monitoring path /var/lib/ceph/crash, delay 600s
Feb 24 16:00:16 admin systemd[1]: Stopping Ceph crash.admin for 899b6a04-5715-11ea-9d8c-525400f299cb...
Feb 24 16:00:16 admin podman[20703]: time="2020-02-24T16:00:16+01:00" level=error msg="container_linux.go:389: signaling init process caused \"permission denied\"" 
Feb 24 16:00:16 admin podman[20703]: container_linux.go:389: signaling init process caused "permission denied" 
Feb 24 16:00:16 admin podman[20703]: Error: permission denied
Feb 24 16:00:31 admin systemd[1]: ceph-899b6a04-5715-11ea-9d8c-525400f299cb@crash.admin.service: State 'stop-post' timed out. Terminating.

Files

dmesg.out (35.9 KB) dmesg.out Nathan Cutler, 03/05/2020 03:43 PM
Actions #1

Updated by Nathan Cutler about 4 years ago

  • Subject changed from crash daemon not managed by cephadm on SUSE to on SUSE, crash daemon starts but then always stops a couple minutes later
Actions #2

Updated by Sebastian Wagner about 4 years ago

related: https://github.com/opencontainers/runc/issues/2236

After reading the code at container_linux.go:389, podman error seems to not be the cause of this. systemd seems to be the first real message for the shutdown of crash.

Actions #3

Updated by Sebastian Wagner about 4 years ago

Rethinking. I think this is an apparmor problem. Adding the output of dmesg would be helpful.

Actions #4

Updated by Sebastian Wagner about 4 years ago

  • Status changed from New to Triaged
Actions #5

Updated by Nathan Cutler about 4 years ago

OK, I will reproduce, obtain dmesg output, and post here.

One thing I did notice is that, with the upstream container, "crash" is not listed in "ceph orch ps". With the downstream container, it is listed.

Actions #6

Updated by Nathan Cutler about 4 years ago

OK, some more information:

admin:~ # ceph orch ps
NAME              HOST   STATUS   REFRESHED  VERSION      IMAGE NAME                                                            IMAGE ID      CONTAINER ID  
crash.admin       admin  error    3m ago     15.1.0.1521  registry.suse.de/devel/storage/7.0/containers/ses/7/ceph/ceph:latest  09e408f3e7f6  853ab695fdb4  
mgr.admin.xdltoy  admin  running  3m ago     15.1.0.1521  registry.suse.de/devel/storage/7.0/containers/ses/7/ceph/ceph:latest  09e408f3e7f6  cb3b6e3ada75  
mon.admin         admin  running  3m ago     15.1.0.1521  registry.suse.de/devel/storage/7.0/containers/ses/7/ceph/ceph:latest  09e408f3e7f6  b59f5006b9c0  
osd.0             admin  running  3m ago     15.1.0.1521  registry.suse.de/devel/storage/7.0/containers/ses/7/ceph/ceph:latest  09e408f3e7f6  0ca3d9ea7824  
osd.1             admin  running  3m ago     15.1.0.1521  registry.suse.de/devel/storage/7.0/containers/ses/7/ceph/ceph:latest  09e408f3e7f6  0272b2aceb59  
osd.2             admin  running  3m ago     15.1.0.1521  registry.suse.de/devel/storage/7.0/containers/ses/7/ceph/ceph:latest  09e408f3e7f6  7a04ebd44a49  
osd.3             admin  running  3m ago     15.1.0.1521  registry.suse.de/devel/storage/7.0/containers/ses/7/ceph/ceph:latest  09e408f3e7f6  4e9a833bba0c  
admin:~ # cat /etc/os-release
NAME="SLES" 
VERSION="15-SP2" 
VERSION_ID="15.2" 
PRETTY_NAME="SUSE Linux Enterprise Server 15 SP2" 
ID="sles" 
ID_LIKE="suse" 
ANSI_COLOR="0;32" 
CPE_NAME="cpe:/o:suse:sles:15:sp2" 
admin:~ # ceph --version
ceph version 15.1.0-1521-gcdf35413a0 (cdf35413a036bd1aa59a8c718bb177839c45cab1) octopus (rc)
admin:~ # ceph versions
{
    "mon": {
        "ceph version 15.1.0-1521-gcdf35413a0 (cdf35413a036bd1aa59a8c718bb177839c45cab1) octopus (rc)": 1
    },
    "mgr": {
        "ceph version 15.1.0-1521-gcdf35413a0 (cdf35413a036bd1aa59a8c718bb177839c45cab1) octopus (rc)": 1
    },
    "osd": {
        "ceph version 15.1.0-1521-gcdf35413a0 (cdf35413a036bd1aa59a8c718bb177839c45cab1) octopus (rc)": 4
    },
    "mds": {},
    "overall": {
        "ceph version 15.1.0-1521-gcdf35413a0 (cdf35413a036bd1aa59a8c718bb177839c45cab1) octopus (rc)": 6
    }
}

And dmesg output is attached!

Actions #7

Updated by Nathan Cutler about 4 years ago

  • Assignee set to Sebastian Wagner
Actions #8

Updated by Sebastian Wagner about 4 years ago

from dmesg:

[  525.062394] audit: type=1400 audit(1583421345.488:2): apparmor="STATUS" operation="profile_load" profile="unconfined" name="libpod-default-1.4.4" pid=14802 comm="apparmor_parser" 
[  529.245377] audit: type=1400 audit(1583421349.672:3): apparmor="DENIED" operation="signal" profile="libpod-default-1.4.4" pid=15648 comm="ceph-mon" requested_mask="send" denied_mask="send" signal=rtmin+1 peer="libpod-default-1.4.4" 
[  529.246334] audit: type=1400 audit(1583421349.672:4): apparmor="DENIED" operation="signal" profile="libpod-default-1.4.4" pid=15648 comm="ceph-mon" requested_mask="receive" denied_mask="receive" signal=rtmin+1 peer="libpod-default-1.4.4" 
[  529.248060] audit: type=1400 audit(1583421349.676:5): apparmor="DENIED" operation="signal" profile="libpod-default-1.4.4" pid=15648 comm="ceph-mon" requested_mask="send" denied_mask="send" signal=rtmin+1 peer="libpod-default-1.4.4" 
[  529.249204] audit: type=1400 audit(1583421349.676:6): apparmor="DENIED" operation="signal" profile="libpod-default-1.4.4" pid=15648 comm="ceph-mon" requested_mask="receive" denied_mask="receive" signal=rtmin+1 peer="libpod-default-1.4.4" 
[  535.358734] audit: type=1400 audit(1583421355.788:7): apparmor="DENIED" operation="signal" profile="libpod-default-1.4.4" pid=16794 comm="ceph-mgr" requested_mask="send" denied_mask="send" signal=rtmin+1 peer="libpod-default-1.4.4" 
[  535.359912] audit: type=1400 audit(1583421355.788:8): apparmor="DENIED" operation="signal" profile="libpod-default-1.4.4" pid=16794 comm="ceph-mgr" requested_mask="receive" denied_mask="receive" signal=rtmin+1 peer="libpod-default-1.4.4" 
[  535.361136] audit: type=1400 audit(1583421355.788:9): apparmor="DENIED" operation="signal" profile="libpod-default-1.4.4" pid=16794 comm="ceph-mgr" requested_mask="send" denied_mask="send" signal=rtmin+1 peer="libpod-default-1.4.4" 
[  535.362404] audit: type=1400 audit(1583421355.788:10): apparmor="DENIED" operation="signal" profile="libpod-default-1.4.4" pid=16794 comm="ceph-mgr" requested_mask="receive" denied_mask="receive" signal=rtmin+1 peer="libpod-default-1.4.4" 
[  594.823456] audit: type=1400 audit(1583421415.251:11): apparmor="DENIED" operation="signal" profile="libpod-default-1.4.4" pid=21699 comm="podman" requested_mask="receive" denied_mask="receive" signal=exists peer="unconfined" 
[  594.832888] audit: type=1400 audit(1583421415.259:12): apparmor="DENIED" operation="signal" profile="libpod-default-1.4.4" pid=21709 comm="runc" requested_mask="receive" denied_mask="receive" signal=term peer="unconfined" 
[  594.836021] audit: type=1400 audit(1583421415.263:13): apparmor="DENIED" operation="signal" profile="libpod-default-1.4.4" pid=21699 comm="podman" requested_mask="receive" denied_mask="receive" signal=exists peer="unconfined" 

Actions #9

Updated by Sebastian Wagner about 4 years ago

  • Status changed from Triaged to Fix Under Review
  • Pull request ID set to 33850
Actions #10

Updated by Sage Weil about 4 years ago

  • Status changed from Fix Under Review to Resolved
Actions

Also available in: Atom PDF