Bug #54313: device health scraping trigger monitor elections - Ceph - Ceph

Actions

Copy link

Bug #54313

open

device health scraping trigger monitor elections

Added by Ruben Kerkhof about 2 years ago. Updated about 2 years ago.

Status:

Fix Under Review

Priority:

Normal

Assignee:

Dan van der Ster

Category:

Target version:

% Done:

Source:

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

v15.2.15

ceph-qa-suite:

Pull request ID:

45125

Crash signature (v1):

Crash signature (v2):

Description

One of our customers has 5 monitors, and each night one or more of them are shortly marked down around the same time.
After a bit of digging through the logs I noticed that each time this happens, I see this:

mon.mon0.log:2022-02-17T04:23:58.487+0100 7f51ed252700 0 log_channel(audit) log [INF] : from='admin socket' entity='admin socket' cmd='smart' args=[json]: dispatch
mon.mon0.log:2022-02-17T04:24:24.365+0100 7f51ed252700 0 log_channel(audit) log [INF] : from='admin socket' entity='admin socket' cmd=smart args=[json]: finished
mon.mon0.log-2022-02-17T04:24:24.375+0100 7f51e59c4700 0 log_channel(cluster) log [INF] : mon.mon0 calling monitor election

I can reproduce this locally on Ceph master on a vagrant-libvirt cluster by running ceph device scrape-daemon-health-metrics mon.mon0.

Related issues 1 (1 open — 0 closed)

Actions

Copy link

Updated by Dan van der Ster about 2 years ago

Thanks! We've notice this exact issue on all our clusters -- after they upgrade to octopus they elect the mons once a day. We hadn't yet pinned down what's causing it, but indeed I checked the logs and found the same correlation with the smart probe:

2022-02-17T09:18:26.857+0100 7f20d7c44700  0 log_channel(audit) log [INF] : from='admin socket' entity='admin socket' cmd='smart' args=[json]: dispatch
2022-02-17T09:18:52.188+0100 7f20d7c44700  0 log_channel(audit) log [INF] : from='admin socket' entity='admin socket' cmd=smart args=[json]: finished
2022-02-17T09:18:52.208+0100 7f20d03b7700  0 log_channel(cluster) log [INF] : mon.cephdata20-4675e5a59e calling monitor election

And on a different cluster:

2022-02-17T01:17:02.062+0100 7ff563d4b700  0 log_channel(audit) log [INF] : from='admin socket' entity='admin socket' cmd='smart' args=[json]: dispatch
2022-02-17T01:17:27.482+0100 7ff563d4b700  0 log_channel(audit) log [INF] : from='admin socket' entity='admin socket' cmd=smart args=[json]: finished
2022-02-17T01:17:27.489+0100 7ff55c4cb700  0 log_channel(cluster) log [INF] : mon.cephdwight-mon-1633994557 calling monitor election

In your example, the smart probe took ~26 seconds. In ours it's roughly the same. Probably some mon heartbeat times out during this window.

Here is some debug_mon=20:

2022-02-17T14:19:44.857+0100 7ff563d4b700  0 log_channel(audit) log [INF] : from='admin socket' entity='admin socket' cmd='smart' args=[]: dispatch
2022-02-17T14:20:10.258+0100 7ff563d4b700 10 mon.cephdwight-mon-1633994557@0(leader) e63 block_device_get_metrics failed for /dev/
2022-02-17T14:20:10.258+0100 7ff563d4b700  0 log_channel(audit) log [INF] : from='admin socket' entity='admin socket' cmd=smart args=[]: finished

"failed for /dev/" clearly wrong

Actions

Copy link

Updated by Dan van der Ster about 2 years ago

This is related to https://tracker.ceph.com/issues/52416
That hasn't been backported to octopus yet.

But this isn't the entire explanation: I used the "PrivateDevices=no" workaround on two clusters.
Initially both had devices = "" in `ceph mon metadata`.

Now with one cluster, VM mons using /dev/vda1, the devices metadata is still "".
But another cluster, with NVMe mons, the devices is fixed "devices": "nvme0n1", but there is still a timeout and election when I do `ceph daemon mon.mon0 smart`.

Actions

Copy link

Updated by Ruben Kerkhof about 2 years ago

I also see:
2022-02-17T04:24:53.803+0100 7f39d87f3700 1 mon.mon0@1(peon) e11 get_health_metrics reporting 17 slow ops, oldest is log(1 entries from seq 1595060 at 2022-02-17T04:24:12.874770+0100)
2022-02-17T04:24:55.533+0100 7f39d87f3700 1 mon.mon0@1(peon).paxos(paxos updating c 269455780..269456528) lease_timeout - calling new election

Actions

Copy link

Updated by Ruben Kerkhof about 2 years ago

I'm running master and still seeing this, so https://tracker.ceph.com/issues/52416 doesn't seem to fix this (completely).

[root@osd0 ~]# ceph version
ceph version 17.0.0-10583-g80ae7c30 (80ae7c307e92763f14f573a8882df41a5c84a21b) quincy (dev)

I'll see if I can find the root cause.

Actions

Copy link

Updated by Dan van der Ster about 2 years ago

Which OS do you run?
We have CentOS Stream 8 -- so selinux could be involved.

Indeed `setenforce 0` fixes for me on the nvme box:

[14:49][root@cephdata21b-226814ead6 (qa:ceph/beesly/mon*50:peon) audit]# getenforce
Enforcing
[14:49][root@cephdata21b-226814ead6 (qa:ceph/beesly/mon*50:peon) audit]# setenforce 0
[14:50][root@cephdata21b-226814ead6 (qa:ceph/beesly/mon*50:peon) audit]# ceph daemon mon.cephdata21b-226814ead6 smart
{
    "SAMSUNG_MZ1LB960HAJQ-00007_S435NA0N817447" : {
        "device" : {
            "info_name" : "/dev/nvme0n1",
            "name" : "/dev/nvme0n1",
            "protocol" : "NVMe",
            "type" : "nvme"

And on the VM, it also now responds quickly, though empty (which is not a problem).

[14:51][root@cephdwight-mon-1633994557 (production:ceph/dwight/mon*2:leader) ~]# getenforce
Enforcing
[14:51][root@cephdwight-mon-1633994557 (production:ceph/dwight/mon*2:leader) ~]# setenforce 0
[14:51][root@cephdwight-mon-1633994557 (production:ceph/dwight/mon*2:leader) ~]# ceph daemon mon.`hostname -s` smart
{
}

Checking the selinux profiles for OSD vs MON...

Actions

Copy link

Updated by Dan van der Ster about 2 years ago

With audit2allow I have

#============= ceph_t ==============
allow ceph_t afs3_callback_port_t:tcp_socket name_connect;
allow ceph_t afs_pt_port_t:tcp_socket name_connect;

#!!!! This avc can be allowed using the boolean 'domain_can_mmap_files'
allow ceph_t chkpwd_exec_t:file map;
allow ceph_t chkpwd_exec_t:file { execute execute_no_trans open read };
allow ceph_t couchdb_port_t:tcp_socket name_connect;
allow ceph_t gatekeeper_port_t:tcp_socket name_connect;
allow ceph_t initrc_var_run_t:file { lock open read };
allow ceph_t self:capability { audit_write sys_resource };
allow ceph_t self:netlink_audit_socket { create nlmsg_relay };
allow ceph_t self:process setrlimit;
allow ceph_t shadow_t:file { getattr open read };

#!!!! This avc can be allowed using the boolean 'domain_can_mmap_files'
allow ceph_t sudo_exec_t:file map;
allow ceph_t sudo_exec_t:file { execute execute_no_trans open read };
allow ceph_t tor_port_t:tcp_socket name_connect;
allow ceph_t xfs_port_t:tcp_socket name_connect;

#============= systemd_logind_t ==============
allow systemd_logind_t ceph_t:dbus send_msg;

Maybe the two lines about sudo are relevant?

allow ceph_t sudo_exec_t:file map;
allow ceph_t sudo_exec_t:file { execute execute_no_trans open read };

Actions

Copy link

Updated by Ruben Kerkhof about 2 years ago

Ah yes, running CentOS Stream 8 here too, and setenforce 0 fixes this.

Actions

Copy link

Updated by Dan van der Ster about 2 years ago

I followed the recipe in https://tracker.ceph.com/issues/44944 and created an selinux module that fixes it even with selinux enabled:

module my-sudo 1.0;

require {
        type ceph_t;
        type systemd_logind_t;
        type sudo_exec_t;
        type shadow_t;
        type chkpwd_exec_t;
        class file { execute execute_no_trans getattr map open read };
        class capability { audit_write sys_resource };
        class process setrlimit;
        class netlink_audit_socket { create nlmsg_relay };
        class dbus send_msg;
}

#============= ceph_t ==============

#!!!! This avc can be allowed using the boolean 'domain_can_mmap_files'
allow ceph_t chkpwd_exec_t:file map;
allow ceph_t chkpwd_exec_t:file { execute execute_no_trans open read };
allow ceph_t self:capability { audit_write sys_resource };
allow ceph_t self:netlink_audit_socket { create nlmsg_relay };
allow ceph_t self:process setrlimit;
allow ceph_t shadow_t:file { getattr open read };

#!!!! This avc can be allowed using the boolean 'domain_can_mmap_files'
allow ceph_t sudo_exec_t:file map;
allow ceph_t sudo_exec_t:file { execute execute_no_trans open read };
allow ceph_t systemd_logind_t:dbus send_msg;

#============= systemd_logind_t ==============
allow systemd_logind_t ceph_t:dbus send_msg;

I still don't understand why this only impacts mons -- osds somehow are able to detect their devices and smart probe them, even with selinux enabled and without this policy.

Actions

Copy link