Bug #54313
opendevice health scraping trigger monitor elections
0%
Description
One of our customers has 5 monitors, and each night one or more of them are shortly marked down around the same time.
After a bit of digging through the logs I noticed that each time this happens, I see this:
mon.mon0.log:2022-02-17T04:23:58.487+0100 7f51ed252700 0 log_channel(audit) log [INF] : from='admin socket' entity='admin socket' cmd='smart' args=[json]: dispatch
mon.mon0.log:2022-02-17T04:24:24.365+0100 7f51ed252700 0 log_channel(audit) log [INF] : from='admin socket' entity='admin socket' cmd=smart args=[json]: finished
mon.mon0.log-2022-02-17T04:24:24.375+0100 7f51e59c4700 0 log_channel(cluster) log [INF] : mon.mon0 calling monitor election
I can reproduce this locally on Ceph master on a vagrant-libvirt cluster by running ceph device scrape-daemon-health-metrics mon.mon0.
Updated by Dan van der Ster about 2 years ago
Thanks! We've notice this exact issue on all our clusters -- after they upgrade to octopus they elect the mons once a day. We hadn't yet pinned down what's causing it, but indeed I checked the logs and found the same correlation with the smart probe:
2022-02-17T09:18:26.857+0100 7f20d7c44700 0 log_channel(audit) log [INF] : from='admin socket' entity='admin socket' cmd='smart' args=[json]: dispatch 2022-02-17T09:18:52.188+0100 7f20d7c44700 0 log_channel(audit) log [INF] : from='admin socket' entity='admin socket' cmd=smart args=[json]: finished 2022-02-17T09:18:52.208+0100 7f20d03b7700 0 log_channel(cluster) log [INF] : mon.cephdata20-4675e5a59e calling monitor election
And on a different cluster:
2022-02-17T01:17:02.062+0100 7ff563d4b700 0 log_channel(audit) log [INF] : from='admin socket' entity='admin socket' cmd='smart' args=[json]: dispatch 2022-02-17T01:17:27.482+0100 7ff563d4b700 0 log_channel(audit) log [INF] : from='admin socket' entity='admin socket' cmd=smart args=[json]: finished 2022-02-17T01:17:27.489+0100 7ff55c4cb700 0 log_channel(cluster) log [INF] : mon.cephdwight-mon-1633994557 calling monitor election
In your example, the smart probe took ~26 seconds. In ours it's roughly the same. Probably some mon heartbeat times out during this window.
Here is some debug_mon=20:
2022-02-17T14:19:44.857+0100 7ff563d4b700 0 log_channel(audit) log [INF] : from='admin socket' entity='admin socket' cmd='smart' args=[]: dispatch 2022-02-17T14:20:10.258+0100 7ff563d4b700 10 mon.cephdwight-mon-1633994557@0(leader) e63 block_device_get_metrics failed for /dev/ 2022-02-17T14:20:10.258+0100 7ff563d4b700 0 log_channel(audit) log [INF] : from='admin socket' entity='admin socket' cmd=smart args=[]: finished
"failed for /dev/" clearly wrong
Updated by Dan van der Ster about 2 years ago
This is related to https://tracker.ceph.com/issues/52416
That hasn't been backported to octopus yet.
But this isn't the entire explanation: I used the "PrivateDevices=no" workaround on two clusters.
Initially both had devices = "" in `ceph mon metadata`.
Now with one cluster, VM mons using /dev/vda1, the devices metadata is still "".
But another cluster, with NVMe mons, the devices is fixed "devices": "nvme0n1", but there is still a timeout and election when I do `ceph daemon mon.mon0 smart`.
Updated by Ruben Kerkhof about 2 years ago
I also see:
2022-02-17T04:24:53.803+0100 7f39d87f3700 1 mon.mon0@1(peon) e11 get_health_metrics reporting 17 slow ops, oldest is log(1 entries from seq 1595060 at 2022-02-17T04:24:12.874770+0100) calling new election
2022-02-17T04:24:55.533+0100 7f39d87f3700 1 mon.mon0@1(peon).paxos(paxos updating c 269455780..269456528) lease_timeout -
Updated by Ruben Kerkhof about 2 years ago
I'm running master and still seeing this, so https://tracker.ceph.com/issues/52416 doesn't seem to fix this (completely).
[root@osd0 ~]# ceph version
ceph version 17.0.0-10583-g80ae7c30 (80ae7c307e92763f14f573a8882df41a5c84a21b) quincy (dev)
I'll see if I can find the root cause.
Updated by Dan van der Ster about 2 years ago
Which OS do you run?
We have CentOS Stream 8 -- so selinux could be involved.
Indeed `setenforce 0` fixes for me on the nvme box:
[14:49][root@cephdata21b-226814ead6 (qa:ceph/beesly/mon*50:peon) audit]# getenforce Enforcing [14:49][root@cephdata21b-226814ead6 (qa:ceph/beesly/mon*50:peon) audit]# setenforce 0 [14:50][root@cephdata21b-226814ead6 (qa:ceph/beesly/mon*50:peon) audit]# ceph daemon mon.cephdata21b-226814ead6 smart { "SAMSUNG_MZ1LB960HAJQ-00007_S435NA0N817447" : { "device" : { "info_name" : "/dev/nvme0n1", "name" : "/dev/nvme0n1", "protocol" : "NVMe", "type" : "nvme"
And on the VM, it also now responds quickly, though empty (which is not a problem).
[14:51][root@cephdwight-mon-1633994557 (production:ceph/dwight/mon*2:leader) ~]# getenforce Enforcing [14:51][root@cephdwight-mon-1633994557 (production:ceph/dwight/mon*2:leader) ~]# setenforce 0 [14:51][root@cephdwight-mon-1633994557 (production:ceph/dwight/mon*2:leader) ~]# ceph daemon mon.`hostname -s` smart { }
Checking the selinux profiles for OSD vs MON...
Updated by Dan van der Ster about 2 years ago
With audit2allow I have
#============= ceph_t ============== allow ceph_t afs3_callback_port_t:tcp_socket name_connect; allow ceph_t afs_pt_port_t:tcp_socket name_connect; #!!!! This avc can be allowed using the boolean 'domain_can_mmap_files' allow ceph_t chkpwd_exec_t:file map; allow ceph_t chkpwd_exec_t:file { execute execute_no_trans open read }; allow ceph_t couchdb_port_t:tcp_socket name_connect; allow ceph_t gatekeeper_port_t:tcp_socket name_connect; allow ceph_t initrc_var_run_t:file { lock open read }; allow ceph_t self:capability { audit_write sys_resource }; allow ceph_t self:netlink_audit_socket { create nlmsg_relay }; allow ceph_t self:process setrlimit; allow ceph_t shadow_t:file { getattr open read }; #!!!! This avc can be allowed using the boolean 'domain_can_mmap_files' allow ceph_t sudo_exec_t:file map; allow ceph_t sudo_exec_t:file { execute execute_no_trans open read }; allow ceph_t tor_port_t:tcp_socket name_connect; allow ceph_t xfs_port_t:tcp_socket name_connect; #============= systemd_logind_t ============== allow systemd_logind_t ceph_t:dbus send_msg;
Maybe the two lines about sudo are relevant?
allow ceph_t sudo_exec_t:file map; allow ceph_t sudo_exec_t:file { execute execute_no_trans open read };
Updated by Ruben Kerkhof about 2 years ago
Ah yes, running CentOS Stream 8 here too, and setenforce 0 fixes this.
Updated by Dan van der Ster about 2 years ago
I followed the recipe in https://tracker.ceph.com/issues/44944 and created an selinux module that fixes it even with selinux enabled:
module my-sudo 1.0; require { type ceph_t; type systemd_logind_t; type sudo_exec_t; type shadow_t; type chkpwd_exec_t; class file { execute execute_no_trans getattr map open read }; class capability { audit_write sys_resource }; class process setrlimit; class netlink_audit_socket { create nlmsg_relay }; class dbus send_msg; } #============= ceph_t ============== #!!!! This avc can be allowed using the boolean 'domain_can_mmap_files' allow ceph_t chkpwd_exec_t:file map; allow ceph_t chkpwd_exec_t:file { execute execute_no_trans open read }; allow ceph_t self:capability { audit_write sys_resource }; allow ceph_t self:netlink_audit_socket { create nlmsg_relay }; allow ceph_t self:process setrlimit; allow ceph_t shadow_t:file { getattr open read }; #!!!! This avc can be allowed using the boolean 'domain_can_mmap_files' allow ceph_t sudo_exec_t:file map; allow ceph_t sudo_exec_t:file { execute execute_no_trans open read }; allow ceph_t systemd_logind_t:dbus send_msg; #============= systemd_logind_t ============== allow systemd_logind_t ceph_t:dbus send_msg;
I still don't understand why this only impacts mons -- osds somehow are able to detect their devices and smart probe them, even with selinux enabled and without this policy.
Updated by Ruben Kerkhof about 2 years ago
https://tracker.ceph.com/issues/44943 also seems to have hit the same denial.
There's no policy for sudo_exec_t in ceph.
Updated by Ruben Kerkhof about 2 years ago
Sorry, we crossed streams, I was testing the same things and came to the same conclusion.
Updated by Ruben Kerkhof about 2 years ago
I do get the same denials on OSDs too however.
Updated by Dan van der Ster about 2 years ago
- Status changed from New to Fix Under Review
- Assignee set to Dan van der Ster
- Pull request ID set to 45125
Updated by Dan van der Ster about 2 years ago
- Related to Bug #54385: better test mon and osd smart command added