Project

General

Profile

Bug #47188

mgr/dashboard: ceph dashboard does not display device health data, with error message "No SMART data available"

Added by joel waddell 3 months ago. Updated about 2 months ago.

Status:
In Progress
Priority:
Normal
Category:
dashboard/backend
Target version:
% Done:

0%

Source:
Community (user)
Tags:
Backport:
octopus
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature:
Tags:

Description

Deployed ceph on baremetal ubuntu 20.04 servers using ceph-ansible. When going to dashboard > hosts > host-list > ddc-ceph-1 > Device Health tab, the error message presents itself. I have configured that smartctl version 7.1 and nvme-cli are installed and working on system. Tested by running ``` /usr/sbin/smartctl -a --json=o /dev/sdd ``` on osd nodes, this produces good output. When I run a device scan, I can see by tailing the /var/log/auth.log that ceph is indeed running the commands on osd nodes for each osd. screenshot of error message https://postimg.cc/YhdJM7Mb. by extension the osd also show in a unknown state in the dashboard.

cephSMARTerror.png View - ceph dashboard smart error (98.5 KB) joel waddell, 08/28/2020 02:54 PM

ddc-ceph-2-DASHbug.png View - ceph 2 (114 KB) joel waddell, 09/01/2020 08:05 PM

ddc-ceph-1-DASHbug.png View - ceph 1 (103 KB) joel waddell, 09/01/2020 08:05 PM

ddc-ceph-3-DASHbug.png View - ceph 3 (113 KB) joel waddell, 09/01/2020 08:05 PM


Related issues

Related to mgr - Bug #47494: mgr/dashboard: Dashboard becomes unresponsive when SMART data not available Resolved
Blocked by mgr - Feature #47834: mgr/dashboard: additional logging for SMART data retrieval Pending Backport

History

#1 Updated by Lenz Grimmer 3 months ago

  • Project changed from Ceph to mgr
  • Subject changed from ceph dashboard does not display device health data, with error message "No SMART data available" to mgr/dashboard: ceph dashboard does not display device health data, with error message "No SMART data available"
  • Category changed from common to dashboard/backend
  • Affected Versions v15.2.4 added

#2 Updated by Lenz Grimmer 3 months ago

Thanks for the report! Would you mind posting the output of the smartctl command as well? This might help us to figure out why the output isn't accepted and forwarded to the dashboard.

#3 Updated by joel waddell 3 months ago

Here is the output of smartctl, also attached the image I posted above in case link goes dead

root@ddc-ceph-1:~# /usr/sbin/smartctl -a --json=o /dev/sdd
{
  "json_format_version": [
    1,
    0
  ],
  "smartctl": {
    "version": [
      7,
      1
    ],
    "svn_revision": "5022",
    "platform_info": "x86_64-linux-5.4.0-42-generic",
    "build_info": "(local build)",
    "argv": [
      "smartctl",
      "-a",
      "--json=o",
      "/dev/sdd" 
    ],
    "output": [
      "smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.4.0-42-generic] (local build)",
      "Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org",
      "",
      "=== START OF INFORMATION SECTION ===",
      "Vendor:               SEAGATE",
      "Product:              XS3840SE70004",
      "Revision:             0004",
      "Compliance:           SPC-5",
      "User Capacity:        3,840,755,982,336 bytes [3.84 TB]",
      "Logical block size:   512 bytes",
      "Physical block size:  4096 bytes",
      "LU is resource provisioned, LBPRZ=1",
      "Rotation Rate:        Solid State Device",
      "Form Factor:          2.5 inches",
      "Logical Unit id:      0x5000c500a183ec8b",
      "Serial number:        HLL0497S0000822150Z3",
      "Device type:          disk",
      "Transport protocol:   SAS (SPL-3)",
      "Local Time is:        Fri Aug 28 14:53:30 2020 UTC",
      "SMART support is:     Available - device has SMART capability.",
      "SMART support is:     Enabled",
      "Temperature Warning:  Enabled",
      "",
      "=== START OF READ SMART DATA SECTION ===",
      "SMART Health Status: OK",
      "",
      "Percentage used endurance indicator: 0%",
      "Current Drive Temperature:     42 C",
      "Drive Trip Temperature:        70 C",
      "",
      "Manufactured in week 27 of year 2020",
      "Specified cycle count over device lifetime:  10000",
      "Accumulated start-stop cycles:  124",
      "Elements in grown defect list: 0",
      "",
      "Vendor (Seagate Cache) information",
      "  Blocks sent to initiator = 4543041",
      "  Blocks received from initiator = 748300",
      "  Blocks read from cache and sent to initiator = 0",
      "  Number of read and write commands whose size <= segment size = 0",
      "  Number of read and write commands whose size > segment size = 0",
      "",
      "Vendor (Seagate/Hitachi) factory information",
      "  number of hours powered up = 540.80",
      "  number of minutes until next internal SMART test = 4",
      "",
      "Error counter log:",
      "           Errors Corrected by           Total   Correction     Gigabytes    Total",
      "               ECC          rereads/    errors   algorithm      processed    uncorrected",
      "           fast | delayed   rewrites  corrected  invocations   [10^9 bytes]  errors",
      "read:          0        0         0         0          0          2.326           0",
      "write:         0        0         0         0          0          1.052           0",
      "",
      "Non-medium error count:        0",
      "",
      "SMART Self-test log",
      "Num  Test              Status                 segment  LifeTime  LBA_first_err [SK ASC ASQ]",
      "     Description                              number   (hours)",
      "# 1  Background short  Completed                   -       0                 - [-   -    -]",
      "",
      "Long (extended) Self-test duration: 1800 seconds [30.0 minutes]",
      "" 
    ],
    "exit_status": 0
  },
  "device": {
    "name": "/dev/sdd",
    "info_name": "/dev/sdd",
    "type": "scsi",
    "protocol": "SCSI" 
  },
  "vendor": "SEAGATE",
  "product": "XS3840SE70004",
  "model_name": "SEAGATE XS3840SE70004",
  "revision": "0004",
  "scsi_version": "SPC-5",
  "user_capacity": {
    "blocks": 7501476528,
    "bytes": 3840755982336
  },
  "logical_block_size": 512,
  "physical_block_size": 4096,
  "rotation_rate": 0,
  "form_factor": {
    "scsi_value": 3,
    "name": "2.5 inches" 
  },
  "serial_number": "HLL0497S0000822150Z3",
  "device_type": {
    "scsi_value": 0,
    "name": "disk" 
  },
  "local_time": {
    "time_t": 1598626410,
    "asctime": "Fri Aug 28 14:53:30 2020 UTC" 
  },
  "smart_status": {
    "passed": true
  },
  "scsi_percentage_used_endurance_indicator": 0,
  "temperature": {
    "current": 42,
    "drive_trip": 70
  },
  "scsi_grown_defect_list": 0,
  "power_on_time": {
    "hours": 540,
    "minutes": 48
  },
  "scsi_error_counter_log": {
    "read": {
      "errors_corrected_by_eccfast": 0,
      "errors_corrected_by_eccdelayed": 0,
      "errors_corrected_by_rereads_rewrites": 0,
      "total_errors_corrected": 0,
      "correction_algorithm_invocations": 0,
      "gigabytes_processed": "2.326",
      "total_uncorrected_errors": 0
    },
    "write": {
      "errors_corrected_by_eccfast": 0,
      "errors_corrected_by_eccdelayed": 0,
      "errors_corrected_by_rereads_rewrites": 0,
      "total_errors_corrected": 0,
      "correction_algorithm_invocations": 0,
      "gigabytes_processed": "1.052",
      "total_uncorrected_errors": 0
    }
  }
}

#4 Updated by Lenz Grimmer 3 months ago

  • Assignee set to Patrick Seidensal

#5 Updated by Patrick Seidensal 3 months ago

  • Status changed from New to In Progress
  • Assignee changed from Patrick Seidensal to joel waddell

Can you please try out if

ceph device ls

yields any results and let me know?

An identical output should be visible in the Ceph Dashboards' UI when looking at "Cluster -> Hosts" page, opening the menu of a host and in the "Devices" tab.

Do you see anything there or are any devices returned by `ceph device ls`?

#6 Updated by Patrick Seidensal 3 months ago

  • Status changed from In Progress to Need More Info

#7 Updated by joel waddell 3 months ago

root@ddc-ceph-1:~# ceph device ls
DEVICE HOST:DEV DAEMONS LIFE EXPECTANCY
XS3840SE70004_5000c500a183ec4b ddc-ceph-1:sdh osd.16
XS3840SE70004_5000c500a183ec63 ddc-ceph-3:sdf osd.9
XS3840SE70004_5000c500a183ec67 ddc-ceph-1:sdf osd.11
XS3840SE70004_5000c500a183ec6b ddc-ceph-3:sdc osd.0
XS3840SE70004_5000c500a183ec7f ddc-ceph-3:sdg osd.12
XS3840SE70004_5000c500a183ec8b ddc-ceph-1:sdd osd.5
XS3840SE70004_5000c500a183ec93 ddc-ceph-3:sde osd.6
XS3840SE70004_5000c500a183ecc3 ddc-ceph-3:sdd osd.3
XS3840SE70004_5000c500a183ecdf ddc-ceph-2:sdc osd.2
XS3840SE70004_5000c500a183ed0b ddc-ceph-1:sdc osd.1
XS3840SE70004_5000c500a183ed13 ddc-ceph-2:sdg osd.14
XS3840SE70004_5000c500a183ed6f ddc-ceph-3:sdh osd.15
XS3840SE70004_5000c500a183ed93 ddc-ceph-1:sdg osd.13
XS3840SE70004_5000c500a1843ee7 ddc-ceph-1:sde osd.8
XS3840SE70004_5000c500a1843f67 ddc-ceph-2:sdh osd.17
XS3840SE70004_5000c500a18458b7 ddc-ceph-2:sdd osd.4
XS3840SE70004_5000c500a184691f ddc-ceph-2:sdf osd.10
XS3840SE70004_5000c500a1846923 ddc-ceph-2:sde osd.7

Looking at the dashboard I can see some strange output now that you mention it, expanding the host dropdown displays the same osds for each node. The dropdown is the exact same for each node, see screenshots of all 3 nodes.

#8 Updated by Patrick Seidensal 3 months ago

  • Status changed from Need More Info to In Progress
  • Assignee changed from joel waddell to Patrick Seidensal

Thanks, I'll look into it.

#9 Updated by Patrick Seidensal 3 months ago

  • Status changed from In Progress to Need More Info
  • Assignee changed from Patrick Seidensal to joel waddell

I've unfortunately not been able to reproduce this behavior on my environment. I can see the SMART data here.

That `ceph device ls` works and returns a result is one of two prerequisites for the SMART data to be displayed in the dashboard. The second one is that the `smartctl` tool is installed on the hosts where the OSDs are.

The only thing I noticed from the data smartclt paste and screenshots is that, on the screenshot, ddc-ceph-3 fails to display SMART data in the `Device Health` tab, but the smartctl output provided has been run on ddc-ceph-1. You've likely checked that, but just to be sure, can you please tell me if the `Device Health` tab of an OSD on ddc-ceph-1 also shows `No SMART data available`?

That the state of health on the device tab on the hosts page is showing `unknown` is normal if the disk prediction is disabled or has just been enabled. Even after it has been enabled, it requires (I think) at least 6 snapshots of the SMART data to be able to make a prediction. By default, the SMART data is requested once per day, but that can be configured.
But if the `Device Health` tab does not work, it probably also won't work.

Without further information about the issue, I might not be able to reproduce it. What I can do to get further information is to add some additional log output in the Ceph Dashboards backend for such a case, so that we'll be able to have a closer look.

#10 Updated by joel waddell 3 months ago

All of the nodes fail to display SMART data.

Disk protection has been enabled and has been enabled for as long as the nodes have been up, proof of it enabled below:
```root@ddc-ceph-1:~# ceph mgr module enable diskprediction_local
module 'diskprediction_local' is already enabled``` It may not have been a full 6 days, i believe it is just now 6 days but the snapshot may not have been taken yet today. If you would like we could do a zoom meeting at some point and try to troubleshoot this in a more synchronous manner.

One thing that I did notice is that if you go to the OSDs tab then hit the dropdown on one of the osds, it still displays as unknown in the devices tab and no smart data in the device health tab. Probably not surprising but might give you more info that is happening under osds as well as hosts. I find it very strange that the dropdown shows exactly the same osds including uuid and daemon numbers in each dropdown, from my last uploaded screenshots.

I'm willing to work with whatever you need, so just let me know what could help.

#11 Updated by joel waddell 3 months ago

EDIT I meant to say All of the nodes fail to display SMART data in the ceph dashboard in each of their dropdowns

#12 Updated by Patrick Seidensal 3 months ago

  • Status changed from Need More Info to In Progress
  • Assignee changed from joel waddell to Patrick Seidensal

Okay, so that's not the case. I'll add additional debugging output, so that we can eventually get to the bottom of this.

#13 Updated by Dan van der Ster 3 months ago

We have the same problem in v14.2.12 on Centos 7. In our case it returns {} even when asking the osd directly:

# time ceph daemon osd.131 smart Hitachi_HUA5C3030ALA640_MJ0351YNGA1E0A
{
}

real    0m27.881s
user    0m0.154s
sys    0m0.058s

journalctl shows the osd is running smartctl correctly, but there is a problem starting the sudo session. I haven't managed to understand why it breaks yet:

Sep 16 13:47:29 p05151113782262.cern.ch sudo[2475001]:     ceph : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/sbin/smartctl -a --json=o /dev/sdaa
Sep 16 13:47:29 p05151113782262.cern.ch systemd[1]: Started Session c143 of user root.
Sep 16 13:47:54 p05151113782262.cern.ch sudo[2475001]: pam_systemd(sudo:session): Failed to create session: Connection timed out
Sep 16 13:47:54 p05151113782262.cern.ch sudo[2475001]: pam_unix(sudo:session): session opened for user root by (uid=0)
Sep 16 13:47:57 p05151113782262.cern.ch sudo[2475001]: pam_unix(sudo:session): session closed for user root

#14 Updated by Dan van der Ster 3 months ago

OK, on CentOS 7 it's selinux. I have `setenforce Permissive` and now it works. Here's the output of audit2allow for the relevant logs:

#============= ceph_t ==============
allow ceph_t chkpwd_exec_t:file { execute execute_no_trans open read };
allow ceph_t self:capability { audit_write sys_resource };
allow ceph_t self:netlink_audit_socket { create nlmsg_relay };
allow ceph_t self:process setrlimit;
allow ceph_t shadow_t:file { getattr open read };
allow ceph_t sudo_exec_t:file { execute execute_no_trans open read };
allow ceph_t system_dbusd_t:dbus send_msg;

#!!!! The file '/run/dbus/system_bus_socket' is mislabeled on your system.  
#!!!! Fix with $ restorecon -R -v /run/dbus/system_bus_socket
allow ceph_t system_dbusd_t:unix_stream_socket connectto;
allow ceph_t systemd_logind_t:dbus send_msg;

#============= systemd_logind_t ==============
allow systemd_logind_t ceph_t:dbus send_msg;

The full audit.log is at https://termbin.com/2rvng

#15 Updated by joel waddell 3 months ago

When I tail /var/log/auth.log in ubuntu I don't get an error, command seems to run ok. See output below:

```
root@ddc-ceph-1:~# tail -f /var/log/auth.log
Sep 16 15:56:04 ddc-ceph-1 sudo: pam_unix(sudo:session): session closed for user root
Sep 16 15:56:05 ddc-ceph-1 sudo: ceph : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/sbin/nvme seagate smart-log-add --json /dev/sdh
Sep 16 15:56:05 ddc-ceph-1 sudo: pam_unix(sudo:session): session opened for user root by (uid=0)
Sep 16 15:56:05 ddc-ceph-1 sudo: pam_unix(sudo:session): session closed for user root
Sep 16 15:56:07 ddc-ceph-1 sudo: ceph : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/sbin/smartctl -a --json=o /dev/sdf
Sep 16 15:56:07 ddc-ceph-1 sudo: pam_unix(sudo:session): session opened for user root by (uid=0)
Sep 16 15:56:07 ddc-ceph-1 sudo: pam_unix(sudo:session): session closed for user root
Sep 16 15:56:08 ddc-ceph-1 sudo: ceph : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/sbin/nvme seagate smart-log-add --json /dev/sdf
Sep 16 15:56:08 ddc-ceph-1 sudo: pam_unix(sudo:session): session opened for user root by (uid=0)
Sep 16 15:56:08 ddc-ceph-1 sudo: pam_unix(sudo:session): session closed for user root
Sep 16 15:56:10 ddc-ceph-1 sudo: ceph : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/sbin/smartctl -a --json=o /dev/sdd
Sep 16 15:56:10 ddc-ceph-1 sudo: pam_unix(sudo:session): session opened for user root by (uid=0)
Sep 16 15:56:10 ddc-ceph-1 sudo: pam_unix(sudo:session): session closed for user root
Sep 16 15:56:11 ddc-ceph-1 sudo: ceph : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/sbin/nvme seagate smart-log-add --json /dev/sdd
Sep 16 15:56:11 ddc-ceph-1 sudo: pam_unix(sudo:session): session opened for user root by (uid=0)
Sep 16 15:56:11 ddc-ceph-1 sudo: pam_unix(sudo:session): session closed for user root
Sep 16 15:56:12 ddc-ceph-1 sudo: ceph : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/sbin/smartctl -a --json=o /dev/sdc
Sep 16 15:56:12 ddc-ceph-1 sudo: pam_unix(sudo:session): session opened for user root by (uid=0)
Sep 16 15:56:13 ddc-ceph-1 sudo: pam_unix(sudo:session): session closed for user root
Sep 16 15:56:14 ddc-ceph-1 sudo: ceph : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/sbin/nvme seagate smart-log-add --json /dev/sdc
Sep 16 15:56:14 ddc-ceph-1 sudo: pam_unix(sudo:session): session opened for user root by (uid=0)
Sep 16 15:56:14 ddc-ceph-1 sudo: pam_unix(sudo:session): session closed for user root
Sep 16 15:56:15 ddc-ceph-1 sudo: ceph : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/sbin/smartctl -a --json=o /dev/sdg
Sep 16 15:56:15 ddc-ceph-1 sudo: pam_unix(sudo:session): session opened for user root by (uid=0)
Sep 16 15:56:15 ddc-ceph-1 sudo: pam_unix(sudo:session): session closed for user root
Sep 16 15:56:16 ddc-ceph-1 sudo: ceph : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/sbin/nvme seagate smart-log-add --json /dev/sdg
Sep 16 15:56:16 ddc-ceph-1 sudo: pam_unix(sudo:session): session opened for user root by (uid=0)
Sep 16 15:56:16 ddc-ceph-1 sudo: pam_unix(sudo:session): session closed for user root
Sep 16 15:56:17 ddc-ceph-1 sudo: ceph : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/sbin/smartctl -a --json=o /dev/sde
Sep 16 15:56:17 ddc-ceph-1 sudo: pam_unix(sudo:session): session opened for user root by (uid=0)
Sep 16 15:56:18 ddc-ceph-1 sudo: pam_unix(sudo:session): session closed for user root
Sep 16 15:56:19 ddc-ceph-1 sudo: ceph : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/sbin/nvme seagate smart-log-add --json /dev/sde
Sep 16 15:56:19 ddc-ceph-1 sudo: pam_unix(sudo:session): session opened for user root by (uid=0)
Sep 16 15:56:19 ddc-ceph-1 sudo: pam_unix(sudo:session): session closed for user root
```

#16 Updated by Dan van der Ster 3 months ago

How about `journalctl -f` ?

Maybe apparmor can interfere just like selinux does on centos?

#17 Updated by Lenz Grimmer 3 months ago

  • Related to Bug #47494: mgr/dashboard: Dashboard becomes unresponsive when SMART data not available added

#18 Updated by Lenz Grimmer 3 months ago

  • Tags set to smart
  • Tags deleted (ceph-dashboard,smartctl,device health)

#19 Updated by joel waddell 3 months ago

journalctl -f shows the same as I posted above.

```
root@ddc-ceph-1:~# journalctl f
-
Logs begin at Mon 2020-08-24 20:06:55 UTC. --
Sep 17 13:44:51 ddc-ceph-1 systemd104618: Listening on D-Bus User Message Bus Socket.
Sep 17 13:44:51 ddc-ceph-1 systemd104618: Reached target Sockets.
Sep 17 13:44:51 ddc-ceph-1 systemd104618: Reached target Basic System.
Sep 17 13:44:51 ddc-ceph-1 systemd104618: Reached target Main User Target.
Sep 17 13:44:51 ddc-ceph-1 systemd104618: Startup finished in 143ms.
Sep 17 13:44:51 ddc-ceph-1 systemd1: Started User Manager for UID 0.
Sep 17 13:44:51 ddc-ceph-1 systemd1: Started Session 449 of user root.
Sep 17 13:44:51 ddc-ceph-1 ceph-mgr1495: ::ffff:127.0.0.1 - - [17/Sep/2020:13:44:51] "GET /metrics HTTP/1.1" 200 267115 "" "Prometheus/2.7.2"
Sep 17 13:44:51 ddc-ceph-1 ceph-mgr1495: ::ffff:10.244.89.28 - - [17/Sep/2020:13:44:51] "GET /metrics HTTP/1.1" 200 267115 "" "Prometheus/2.7.2"
Sep 17 13:44:51 ddc-ceph-1 ceph-mgr1495: ::ffff:10.244.89.27 - - [17/Sep/2020:13:44:51] "GET /metrics HTTP/1.1" 200 267115 "" "Prometheus/2.7.2"
Sep 17 13:45:06 ddc-ceph-1 ceph-mgr1495: ::ffff:127.0.0.1 - - [17/Sep/2020:13:45:06] "GET /metrics HTTP/1.1" 200 267120 "" "Prometheus/2.7.2"
Sep 17 13:45:06 ddc-ceph-1 ceph-mgr1495: ::ffff:10.244.89.28 - - [17/Sep/2020:13:45:06] "GET /metrics HTTP/1.1" 200 267120 "" "Prometheus/2.7.2"
Sep 17 13:45:06 ddc-ceph-1 ceph-mgr1495: ::ffff:10.244.89.27 - - [17/Sep/2020:13:45:06] "GET /metrics HTTP/1.1" 200 267120 "" "Prometheus/2.7.2"
Sep 17 13:45:19 ddc-ceph-1 sudo104761: ceph : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/sbin/smartctl -a --json=o /dev/sdh
Sep 17 13:45:19 ddc-ceph-1 sudo104761: pam_unix(sudo:session): session opened for user root by (uid=0)
Sep 17 13:45:20 ddc-ceph-1 sudo104761: pam_unix(sudo:session): session closed for user root
Sep 17 13:45:21 ddc-ceph-1 sudo104765: ceph : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/sbin/nvme seagate smart-log-add --json /dev/sdh
Sep 17 13:45:21 ddc-ceph-1 sudo104765: pam_unix(sudo:session): session opened for user root by (uid=0)
Sep 17 13:45:21 ddc-ceph-1 sudo104765: pam_unix(sudo:session): session closed for user root
Sep 17 13:45:21 ddc-ceph-1 ceph-mgr1495: ::ffff:127.0.0.1 - - [17/Sep/2020:13:45:21] "GET /metrics HTTP/1.1" 200 267121 "" "Prometheus/2.7.2"
Sep 17 13:45:21 ddc-ceph-1 ceph-mgr1495: ::ffff:10.244.89.28 - - [17/Sep/2020:13:45:21] "GET /metrics HTTP/1.1" 200 267121 "" "Prometheus/2.7.2"
Sep 17 13:45:21 ddc-ceph-1 ceph-mgr1495: ::ffff:10.244.89.27 - - [17/Sep/2020:13:45:21] "GET /metrics HTTP/1.1" 200 267121 "" "Prometheus/2.7.2"
Sep 17 13:45:22 ddc-ceph-1 sudo104768: ceph : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/sbin/smartctl -a --json=o /dev/sdf
Sep 17 13:45:22 ddc-ceph-1 sudo104768: pam_unix(sudo:session): session opened for user root by (uid=0)
Sep 17 13:45:22 ddc-ceph-1 sudo104768: pam_unix(sudo:session): session closed for user root
Sep 17 13:45:23 ddc-ceph-1 sudo104771: ceph : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/sbin/nvme seagate smart-log-add --json /dev/sdf
Sep 17 13:45:23 ddc-ceph-1 sudo104771: pam_unix(sudo:session): session opened for user root by (uid=0)
Sep 17 13:45:23 ddc-ceph-1 sudo104771: pam_unix(sudo:session): session closed for user root
Sep 17 13:45:24 ddc-ceph-1 sudo104774: ceph : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/sbin/smartctl -a --json=o /dev/sdd
Sep 17 13:45:24 ddc-ceph-1 sudo104774: pam_unix(sudo:session): session opened for user root by (uid=0)
Sep 17 13:45:25 ddc-ceph-1 sudo104774: pam_unix(sudo:session): session closed for user root
Sep 17 13:45:26 ddc-ceph-1 sudo104778: ceph : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/sbin/nvme seagate smart-log-add --json /dev/sdd
Sep 17 13:45:26 ddc-ceph-1 sudo104778: pam_unix(sudo:session): session opened for user root by (uid=0)
Sep 17 13:45:26 ddc-ceph-1 sudo104778: pam_unix(sudo:session): session closed for user root
Sep 17 13:45:27 ddc-ceph-1 sudo104781: ceph : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/sbin/smartctl -a --json=o /dev/sdc
Sep 17 13:45:27 ddc-ceph-1 sudo104781: pam_unix(sudo:session): session opened for user root by (uid=0)
Sep 17 13:45:27 ddc-ceph-1 sudo104781: pam_unix(sudo:session): session closed for user root
Sep 17 13:45:28 ddc-ceph-1 sudo104784: ceph : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/sbin/nvme seagate smart-log-add --json /dev/sdc
Sep 17 13:45:28 ddc-ceph-1 sudo104784: pam_unix(sudo:session): session opened for user root by (uid=0)
Sep 17 13:45:28 ddc-ceph-1 sudo104784: pam_unix(sudo:session): session closed for user root
Sep 17 13:45:29 ddc-ceph-1 sudo104787: ceph : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/sbin/smartctl -a --json=o /dev/sdg
Sep 17 13:45:29 ddc-ceph-1 sudo104787: pam_unix(sudo:session): session opened for user root by (uid=0)
Sep 17 13:45:30 ddc-ceph-1 sudo104787: pam_unix(sudo:session): session closed for user root
Sep 17 13:45:31 ddc-ceph-1 sudo104791: ceph : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/sbin/nvme seagate smart-log-add --json /dev/sdg
Sep 17 13:45:31 ddc-ceph-1 sudo104791: pam_unix(sudo:session): session opened for user root by (uid=0)
Sep 17 13:45:31 ddc-ceph-1 sudo104791: pam_unix(sudo:session): session closed for user root
Sep 17 13:45:32 ddc-ceph-1 sudo104794: ceph : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/sbin/smartctl -a --json=o /dev/sde
Sep 17 13:45:32 ddc-ceph-1 sudo104794: pam_unix(sudo:session): session opened for user root by (uid=0)
Sep 17 13:45:33 ddc-ceph-1 sudo104794: pam_unix(sudo:session): session closed for user root
Sep 17 13:45:34 ddc-ceph-1 sudo104797: ceph : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/sbin/nvme seagate smart-log-add --json /dev/sde
Sep 17 13:45:34 ddc-ceph-1 sudo104797: pam_unix(sudo:session): session opened for user root by (uid=0)
Sep 17 13:45:34 ddc-ceph-1 sudo104797: pam_unix(sudo:session): session closed for user root
Sep 17 13:45:36 ddc-ceph-1 ceph-mgr1495: ::ffff:127.0.0.1 - - [17/Sep/2020:13:45:36] "GET /metrics HTTP/1.1" 200 267131 "" "Prometheus/2.7.2"
Sep 17 13:45:36 ddc-ceph-1 ceph-mgr1495: ::ffff:10.244.89.27 - - [17/Sep/2020:13:45:36] "GET /metrics HTTP/1.1" 200 267131 "" "Prometheus/2.7.2"
Sep 17 13:45:36 ddc-ceph-1 ceph-mgr1495: ::ffff:10.244.89.28 - - [17/Sep/2020:13:45:36] "GET /metrics HTTP/1.1" 200 267131 "" "Prometheus/2.7.2"
```

#20 Updated by Patrick Seidensal about 2 months ago

  • Backport set to octopus

#21 Updated by Patrick Seidensal about 2 months ago

I've created a PR to increase the logging capability of Ceph Dashboard. All additional logging output will be of type debug and only appear with the appropriate debug level set.

https://github.com/ceph/ceph/pull/37637

That should help getting closer to the cause of this issue. As you can see, I've decided to create a new issue for this improvement to keep this one open to be able to resolve the real issue here.

#22 Updated by Patrick Seidensal about 2 months ago

  • Blocked by Feature #47834: mgr/dashboard: additional logging for SMART data retrieval added

Also available in: Atom PDF