Project

General

Profile

Actions

Bug #50657

closed

smart query on monitors

Added by Jan-Philipp Litza almost 3 years ago. Updated over 2 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
pacific, octopus
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Since the upgrade to Pacific, our manager queries each daemon for smart statistics.

This is fine on the OSDs (at least once they are updated, since they don't have the appropriate sudoers file otherwise), but on the monitors this causes these mails:

ceph : user NOT in sudoers ; TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/sbin/smartctl -x --json=o /dev/

There are two problems here:

  1. The sudoers file is contained in the (deb) package ceph-osd, which isn't installed on our monitors - hence the "user NOT in sudoers" message
  2. The command doesn't contain a device name at the end, since the monitor doesn't have a device. So this call doesn't make any sense even if the sudoers file was in place.

Related issues 3 (0 open3 closed)

Related to RADOS - Bug #52416: devices: mon devices appear empty when scraping SMART metricsResolved

Actions
Copied to RADOS - Backport #52450: pacific: smart query on monitorsResolvedCory SnyderActions
Copied to RADOS - Backport #52451: octopus: smart query on monitorsResolvedCory SnyderActions
Actions #1

Updated by Sage Weil almost 3 years ago

  • Project changed from Ceph to RADOS
Actions #2

Updated by Neha Ojha almost 3 years ago

  • Assignee set to Yaarit Hatuka

Yaarit, can you help take a look at this?

Actions #3

Updated by Yaarit Hatuka almost 3 years ago

  • Status changed from New to In Progress

Hi Jan-Philipp,

Thanks for reporting this.

Can you please provide the output of `df` on the host where a monitor is running on?

Actions #4

Updated by Jan-Philipp Litza almost 3 years ago

Sure:

Filesystem     1K-blocks    Used Available Use% Mounted on
udev             4053336       0   4053336   0% /dev
tmpfs             815284   12688    802596   2% /run
/dev/sda6      243114388 5416164 225279000   3% /
tmpfs            4076404       0   4076404   0% /dev/shm
tmpfs               5120       0      5120   0% /run/lock
tmpfs            4076404       0   4076404   0% /sys/fs/cgroup
/dev/sda1         967320  114496    786472  13% /boot
/dev/sda5         967320    2492    898476   1% /var/tmp
tmpfs             815280       0    815280   0% /run/user/0

Actions #5

Updated by Yaarit Hatuka almost 3 years ago

Thanks, Jan-Philipp.

I tried to reproduce this issue and get the empty device name, while not having a sudoer permissions.
I used 16.0.0 and 16.2.1 tags, and while the sudoer issue was trivial to reproduce, I could not get an empty device name. I'm also wondering what build you used exactly (the 16.0.1 tag does not exist).

Can you please:
- send the output of `ceph device ls`
- run `ceph device scrape-daemon-health-metrics <mon.id>` and share both the mgr and mon log entries of this command?

The command doesn't contain a device name at the end, since the monitor doesn't have a device. So this call doesn't make any sense even if the sudoers file was in place.

We wish to monitor the health of the OS device as well, which the mon is running on.

Actions #6

Updated by Jan-Philipp Litza almost 3 years ago

Sorry, I meant version 16.2.1 (Ubuntu packages), by now 16.2.4 of course

ceph device ls doesn't list any devices for the monitors, only for the osds.

And ceph device scrape-daemon-health-metrics mon.mon04 says:

Error ENOENT: device mon.mon04 not found

I think that command requires a device ID (like WDC_WD40EFRX-...), doesn't it?

Actions #7

Updated by Hannes von Haugwitz over 2 years ago

I also see this on mon/mgr hosts of a ceph octopus cluster:

ceph : user NOT in sudoers ; TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/sbin/smartctl -a --json=o /dev/
ceph --version
ceph version 15.2.13 (c44bc49e7a57a87d84dfff2a077a2058aa2172e2) octopus (stable)
ceph device ls solely shows the OSD devices located on the OSD nodes.

Please let me know if I can provide any further information.

Actions #8

Updated by Yaarit Hatuka over 2 years ago

  • Status changed from In Progress to Fix Under Review
  • Backport set to pacific, octopus
  • Pull request ID set to 42913

This fixes the missing sudoers file in mon nodes:
https://github.com/ceph/ceph/pull/42913

We'll address the fix for the empty device name of mon nodes in another ticket.

Jan-Philipp, Hannes, is this a bare metal deployment (what OS?), or did you use cephadm?

I think that command requires a device ID (like WDC_WD40EFRX-...), doesn't it?

The `ceph device scrape-daemon-health-metrics` command expects a daemon id, see: https://docs.ceph.com/en/latest/rados/operations/devices/#scraping

Actions #9

Updated by Hannes von Haugwitz over 2 years ago

Yaarit Hatuka wrote:

This fixes the missing sudoers file in mon nodes:
https://github.com/ceph/ceph/pull/42913

Thanks.

We'll address the fix for the empty device name of mon nodes in another ticket.

Do you have a bug number for the other ticket?

Jan-Philipp, Hannes, is this a bare metal deployment (what OS?), or did you use cephadm?

Yes, bare metal deployment on Ubuntu bionic (18.04).

Actions #10

Updated by Yaarit Hatuka over 2 years ago

  • Related to Bug #52416: devices: mon devices appear empty when scraping SMART metrics added
Actions #11

Updated by Jan-Philipp Litza over 2 years ago

Jan-Philipp, Hannes, is this a bare metal deployment (what OS?), or did you use cephadm?

Yes, bare metal deployment on Ubuntu bionic (18.04).

Same.

Actions #12

Updated by Yaarit Hatuka over 2 years ago

Thanks. Are there mons on dedicated nodes or devices in your cluster configuration?

Do you have a bug number for the other ticket?

https://tracker.ceph.com/issues/52416

Actions #13

Updated by Deepika Upadhyay over 2 years ago

  • Status changed from Fix Under Review to Pending Backport
Actions #14

Updated by Backport Bot over 2 years ago

Actions #15

Updated by Backport Bot over 2 years ago

Actions #16

Updated by Hannes von Haugwitz over 2 years ago

Yaarit Hatuka wrote:

Thanks. Are there mons on dedicated nodes or devices in your cluster configuration?

We have three dedicated monitor nodes in the cluster.

Actions #17

Updated by Matthew Darwin over 2 years ago

Just wanted to add that we have similar situation where we have 3 dedicated mon nodes, each running in their own container. smartctl is not installed in these containers.

Every day we get e-mail from these 3 containers with error message

ceph : user NOT in sudoers ; TTY=unknown ; PWD=/ ; USER=root ; COMMAND=smartctl -x --json=o /dev/

Using the ceph-mon debian package version 16.2.6-1~bpo10+1
Deployment was done manually

Actions #18

Updated by Loïc Dachary over 2 years ago

  • Status changed from Pending Backport to Resolved

While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".

Actions

Also available in: Atom PDF