Bug #50657
closedsmart query on monitors
0%
Description
Since the upgrade to Pacific, our manager queries each daemon for smart statistics.
This is fine on the OSDs (at least once they are updated, since they don't have the appropriate sudoers file otherwise), but on the monitors this causes these mails:
ceph : user NOT in sudoers ; TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/sbin/smartctl -x --json=o /dev/
There are two problems here:
- The sudoers file is contained in the (deb) package ceph-osd, which isn't installed on our monitors - hence the "user NOT in sudoers" message
- The command doesn't contain a device name at the end, since the monitor doesn't have a device. So this call doesn't make any sense even if the sudoers file was in place.
Updated by Neha Ojha almost 3 years ago
- Assignee set to Yaarit Hatuka
Yaarit, can you help take a look at this?
Updated by Yaarit Hatuka almost 3 years ago
- Status changed from New to In Progress
Hi Jan-Philipp,
Thanks for reporting this.
Can you please provide the output of `df` on the host where a monitor is running on?
Updated by Jan-Philipp Litza almost 3 years ago
Sure:
Filesystem 1K-blocks Used Available Use% Mounted on udev 4053336 0 4053336 0% /dev tmpfs 815284 12688 802596 2% /run /dev/sda6 243114388 5416164 225279000 3% / tmpfs 4076404 0 4076404 0% /dev/shm tmpfs 5120 0 5120 0% /run/lock tmpfs 4076404 0 4076404 0% /sys/fs/cgroup /dev/sda1 967320 114496 786472 13% /boot /dev/sda5 967320 2492 898476 1% /var/tmp tmpfs 815280 0 815280 0% /run/user/0
Updated by Yaarit Hatuka almost 3 years ago
Thanks, Jan-Philipp.
I tried to reproduce this issue and get the empty device name, while not having a sudoer permissions.
I used 16.0.0 and 16.2.1 tags, and while the sudoer issue was trivial to reproduce, I could not get an empty device name. I'm also wondering what build you used exactly (the 16.0.1 tag does not exist).
Can you please:
- send the output of `ceph device ls`
- run `ceph device scrape-daemon-health-metrics <mon.id>` and share both the mgr and mon log entries of this command?
The command doesn't contain a device name at the end, since the monitor doesn't have a device. So this call doesn't make any sense even if the sudoers file was in place.
We wish to monitor the health of the OS device as well, which the mon is running on.
Updated by Jan-Philipp Litza almost 3 years ago
Sorry, I meant version 16.2.1 (Ubuntu packages), by now 16.2.4 of course
ceph device ls
doesn't list any devices for the monitors, only for the osds.
And ceph device scrape-daemon-health-metrics mon.mon04
says:
Error ENOENT: device mon.mon04 not found
I think that command requires a device ID (like WDC_WD40EFRX-...
), doesn't it?
Updated by Hannes von Haugwitz over 2 years ago
I also see this on mon/mgr hosts of a ceph octopus cluster:
ceph : user NOT in sudoers ; TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/sbin/smartctl -a --json=o /dev/
ceph --version
ceph version 15.2.13 (c44bc49e7a57a87d84dfff2a077a2058aa2172e2) octopus (stable)
ceph device ls
solely shows the OSD devices located on the OSD nodes.
Please let me know if I can provide any further information.
Updated by Yaarit Hatuka over 2 years ago
- Status changed from In Progress to Fix Under Review
- Backport set to pacific, octopus
- Pull request ID set to 42913
This fixes the missing sudoers file in mon nodes:
https://github.com/ceph/ceph/pull/42913
We'll address the fix for the empty device name of mon nodes in another ticket.
Jan-Philipp, Hannes, is this a bare metal deployment (what OS?), or did you use cephadm?
I think that command requires a device ID (like WDC_WD40EFRX-...), doesn't it?
The `ceph device scrape-daemon-health-metrics` command expects a daemon id, see: https://docs.ceph.com/en/latest/rados/operations/devices/#scraping
Updated by Hannes von Haugwitz over 2 years ago
Yaarit Hatuka wrote:
This fixes the missing sudoers file in mon nodes:
https://github.com/ceph/ceph/pull/42913
Thanks.
We'll address the fix for the empty device name of mon nodes in another ticket.
Do you have a bug number for the other ticket?
Jan-Philipp, Hannes, is this a bare metal deployment (what OS?), or did you use cephadm?
Yes, bare metal deployment on Ubuntu bionic (18.04).
Updated by Yaarit Hatuka over 2 years ago
- Related to Bug #52416: devices: mon devices appear empty when scraping SMART metrics added
Updated by Jan-Philipp Litza over 2 years ago
Jan-Philipp, Hannes, is this a bare metal deployment (what OS?), or did you use cephadm?
Yes, bare metal deployment on Ubuntu bionic (18.04).
Same.
Updated by Yaarit Hatuka over 2 years ago
Thanks. Are there mons on dedicated nodes or devices in your cluster configuration?
Do you have a bug number for the other ticket?
Updated by Deepika Upadhyay over 2 years ago
- Status changed from Fix Under Review to Pending Backport
Updated by Backport Bot over 2 years ago
- Copied to Backport #52450: pacific: smart query on monitors added
Updated by Backport Bot over 2 years ago
- Copied to Backport #52451: octopus: smart query on monitors added
Updated by Hannes von Haugwitz over 2 years ago
Yaarit Hatuka wrote:
Thanks. Are there mons on dedicated nodes or devices in your cluster configuration?
We have three dedicated monitor nodes in the cluster.
Updated by Matthew Darwin over 2 years ago
Just wanted to add that we have similar situation where we have 3 dedicated mon nodes, each running in their own container. smartctl is not installed in these containers.
Every day we get e-mail from these 3 containers with error message
ceph : user NOT in sudoers ; TTY=unknown ; PWD=/ ; USER=root ; COMMAND=smartctl -x --json=o /dev/
Using the ceph-mon debian package version 16.2.6-1~bpo10+1
Deployment was done manually
Updated by Loïc Dachary over 2 years ago
- Status changed from Pending Backport to Resolved
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".