Bug #50657
closed
Added by Jan-Philipp Litza almost 3 years ago.
Updated over 2 years ago.
Backport:
pacific, octopus
Description
Since the upgrade to Pacific, our manager queries each daemon for smart statistics.
This is fine on the OSDs (at least once they are updated, since they don't have the appropriate sudoers file otherwise), but on the monitors this causes these mails:
ceph : user NOT in sudoers ; TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/sbin/smartctl -x --json=o /dev/
There are two problems here:
- The sudoers file is contained in the (deb) package ceph-osd, which isn't installed on our monitors - hence the "user NOT in sudoers" message
- The command doesn't contain a device name at the end, since the monitor doesn't have a device. So this call doesn't make any sense even if the sudoers file was in place.
- Project changed from Ceph to RADOS
- Assignee set to Yaarit Hatuka
Yaarit, can you help take a look at this?
- Status changed from New to In Progress
Hi Jan-Philipp,
Thanks for reporting this.
Can you please provide the output of `df` on the host where a monitor is running on?
Sure:
Filesystem 1K-blocks Used Available Use% Mounted on
udev 4053336 0 4053336 0% /dev
tmpfs 815284 12688 802596 2% /run
/dev/sda6 243114388 5416164 225279000 3% /
tmpfs 4076404 0 4076404 0% /dev/shm
tmpfs 5120 0 5120 0% /run/lock
tmpfs 4076404 0 4076404 0% /sys/fs/cgroup
/dev/sda1 967320 114496 786472 13% /boot
/dev/sda5 967320 2492 898476 1% /var/tmp
tmpfs 815280 0 815280 0% /run/user/0
Thanks, Jan-Philipp.
I tried to reproduce this issue and get the empty device name, while not having a sudoer permissions.
I used 16.0.0 and 16.2.1 tags, and while the sudoer issue was trivial to reproduce, I could not get an empty device name. I'm also wondering what build you used exactly (the 16.0.1 tag does not exist).
Can you please:
- send the output of `ceph device ls`
- run `ceph device scrape-daemon-health-metrics <mon.id>` and share both the mgr and mon log entries of this command?
The command doesn't contain a device name at the end, since the monitor doesn't have a device. So this call doesn't make any sense even if the sudoers file was in place.
We wish to monitor the health of the OS device as well, which the mon is running on.
Sorry, I meant version 16.2.1 (Ubuntu packages), by now 16.2.4 of course
ceph device ls
doesn't list any devices for the monitors, only for the osds.
And ceph device scrape-daemon-health-metrics mon.mon04
says:
Error ENOENT: device mon.mon04 not found
I think that command requires a device ID (like WDC_WD40EFRX-...
), doesn't it?
I also see this on mon/mgr hosts of a ceph octopus cluster:
ceph : user NOT in sudoers ; TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/sbin/smartctl -a --json=o /dev/
ceph --version
ceph version 15.2.13 (c44bc49e7a57a87d84dfff2a077a2058aa2172e2) octopus (stable)
ceph device ls
solely shows the OSD devices located on the OSD nodes.
Please let me know if I can provide any further information.
- Status changed from In Progress to Fix Under Review
- Backport set to pacific, octopus
- Pull request ID set to 42913
Yaarit Hatuka wrote:
This fixes the missing sudoers file in mon nodes:
https://github.com/ceph/ceph/pull/42913
Thanks.
We'll address the fix for the empty device name of mon nodes in another ticket.
Do you have a bug number for the other ticket?
Jan-Philipp, Hannes, is this a bare metal deployment (what OS?), or did you use cephadm?
Yes, bare metal deployment on Ubuntu bionic (18.04).
- Related to Bug #52416: devices: mon devices appear empty when scraping SMART metrics added
Jan-Philipp, Hannes, is this a bare metal deployment (what OS?), or did you use cephadm?
Yes, bare metal deployment on Ubuntu bionic (18.04).
Same.
- Status changed from Fix Under Review to Pending Backport
Yaarit Hatuka wrote:
Thanks. Are there mons on dedicated nodes or devices in your cluster configuration?
We have three dedicated monitor nodes in the cluster.
Just wanted to add that we have similar situation where we have 3 dedicated mon nodes, each running in their own container. smartctl is not installed in these containers.
Every day we get e-mail from these 3 containers with error message
ceph : user NOT in sudoers ; TTY=unknown ; PWD=/ ; USER=root ; COMMAND=smartctl -x --json=o /dev/
Using the ceph-mon debian package version 16.2.6-1~bpo10+1
Deployment was done manually
- Status changed from Pending Backport to Resolved
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".
Also available in: Atom
PDF