Project

General

Profile

Bug #50657

smart query on monitors

Added by Jan-Philipp Litza 5 months ago. Updated 26 days ago.

Status:
Pending Backport
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
pacific, octopus
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Since the upgrade to Pacific, our manager queries each daemon for smart statistics.

This is fine on the OSDs (at least once they are updated, since they don't have the appropriate sudoers file otherwise), but on the monitors this causes these mails:

ceph : user NOT in sudoers ; TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/sbin/smartctl -x --json=o /dev/

There are two problems here:

  1. The sudoers file is contained in the (deb) package ceph-osd, which isn't installed on our monitors - hence the "user NOT in sudoers" message
  2. The command doesn't contain a device name at the end, since the monitor doesn't have a device. So this call doesn't make any sense even if the sudoers file was in place.

Related issues

Related to RADOS - Bug #52416: devices: mon devices appear empty when scraping SMART metrics New
Copied to RADOS - Backport #52450: pacific: smart query on monitors New
Copied to RADOS - Backport #52451: octopus: smart query on monitors New

History

#1 Updated by Sage Weil 5 months ago

  • Project changed from Ceph to RADOS

#2 Updated by Neha Ojha 5 months ago

  • Assignee set to Yaarit Hatuka

Yaarit, can you help take a look at this?

#3 Updated by Yaarit Hatuka 4 months ago

  • Status changed from New to In Progress

Hi Jan-Philipp,

Thanks for reporting this.

Can you please provide the output of `df` on the host where a monitor is running on?

#4 Updated by Jan-Philipp Litza 4 months ago

Sure:

Filesystem     1K-blocks    Used Available Use% Mounted on
udev             4053336       0   4053336   0% /dev
tmpfs             815284   12688    802596   2% /run
/dev/sda6      243114388 5416164 225279000   3% /
tmpfs            4076404       0   4076404   0% /dev/shm
tmpfs               5120       0      5120   0% /run/lock
tmpfs            4076404       0   4076404   0% /sys/fs/cgroup
/dev/sda1         967320  114496    786472  13% /boot
/dev/sda5         967320    2492    898476   1% /var/tmp
tmpfs             815280       0    815280   0% /run/user/0

#5 Updated by Yaarit Hatuka 4 months ago

Thanks, Jan-Philipp.

I tried to reproduce this issue and get the empty device name, while not having a sudoer permissions.
I used 16.0.0 and 16.2.1 tags, and while the sudoer issue was trivial to reproduce, I could not get an empty device name. I'm also wondering what build you used exactly (the 16.0.1 tag does not exist).

Can you please:
- send the output of `ceph device ls`
- run `ceph device scrape-daemon-health-metrics <mon.id>` and share both the mgr and mon log entries of this command?

The command doesn't contain a device name at the end, since the monitor doesn't have a device. So this call doesn't make any sense even if the sudoers file was in place.

We wish to monitor the health of the OS device as well, which the mon is running on.

#6 Updated by Jan-Philipp Litza 4 months ago

Sorry, I meant version 16.2.1 (Ubuntu packages), by now 16.2.4 of course

ceph device ls doesn't list any devices for the monitors, only for the osds.

And ceph device scrape-daemon-health-metrics mon.mon04 says:

Error ENOENT: device mon.mon04 not found

I think that command requires a device ID (like WDC_WD40EFRX-...), doesn't it?

#7 Updated by Hannes von Haugwitz about 2 months ago

I also see this on mon/mgr hosts of a ceph octopus cluster:

ceph : user NOT in sudoers ; TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/sbin/smartctl -a --json=o /dev/
ceph --version
ceph version 15.2.13 (c44bc49e7a57a87d84dfff2a077a2058aa2172e2) octopus (stable)
ceph device ls solely shows the OSD devices located on the OSD nodes.

Please let me know if I can provide any further information.

#8 Updated by Yaarit Hatuka about 1 month ago

  • Status changed from In Progress to Fix Under Review
  • Backport set to pacific, octopus
  • Pull request ID set to 42913

This fixes the missing sudoers file in mon nodes:
https://github.com/ceph/ceph/pull/42913

We'll address the fix for the empty device name of mon nodes in another ticket.

Jan-Philipp, Hannes, is this a bare metal deployment (what OS?), or did you use cephadm?

I think that command requires a device ID (like WDC_WD40EFRX-...), doesn't it?

The `ceph device scrape-daemon-health-metrics` command expects a daemon id, see: https://docs.ceph.com/en/latest/rados/operations/devices/#scraping

#9 Updated by Hannes von Haugwitz about 1 month ago

Yaarit Hatuka wrote:

This fixes the missing sudoers file in mon nodes:
https://github.com/ceph/ceph/pull/42913

Thanks.

We'll address the fix for the empty device name of mon nodes in another ticket.

Do you have a bug number for the other ticket?

Jan-Philipp, Hannes, is this a bare metal deployment (what OS?), or did you use cephadm?

Yes, bare metal deployment on Ubuntu bionic (18.04).

#10 Updated by Yaarit Hatuka about 1 month ago

  • Related to Bug #52416: devices: mon devices appear empty when scraping SMART metrics added

#11 Updated by Jan-Philipp Litza about 1 month ago

Jan-Philipp, Hannes, is this a bare metal deployment (what OS?), or did you use cephadm?

Yes, bare metal deployment on Ubuntu bionic (18.04).

Same.

#12 Updated by Yaarit Hatuka about 1 month ago

Thanks. Are there mons on dedicated nodes or devices in your cluster configuration?

Do you have a bug number for the other ticket?

https://tracker.ceph.com/issues/52416

#13 Updated by Deepika Upadhyay 27 days ago

  • Status changed from Fix Under Review to Pending Backport

#14 Updated by Backport Bot 27 days ago

#15 Updated by Backport Bot 27 days ago

#16 Updated by Hannes von Haugwitz 26 days ago

Yaarit Hatuka wrote:

Thanks. Are there mons on dedicated nodes or devices in your cluster configuration?

We have three dedicated monitor nodes in the cluster.

Also available in: Atom PDF