Project

General

Profile

Actions

Bug #49830

open

disk failure prediction doesn't display prediction for all disks

Added by Denis Polom about 3 years ago. Updated almost 3 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

I have enabled local prediction on Ceph

ceph version 15.2.7 (88e41c6c49beb18add4fdb6b4326ca466d931db8) octopus (stable)

but to see life expectation for disks I see it's predicted just for some of them:

# ceph device ls-by-host cache1-osd8
DEVICE                                     DEV   DAEMONS  EXPECTED FAILURE
Samsung_SSD_860_EVO_250GB_S3YJNS0N528114H  sdc   osd.414
WDC_WD1003FZEX-00K3CA0_WD-WCC6Y4XZ8UJ1     sdap  osd.265  >6w
WDC_WD1003FZEX-00K3CA0_WD-WCC6Y5KD9N5D     sdaj  osd.259  >6w
WDC_WD5002ABYS-01B1B0_WD-WCASY5339459      sdae  osd.255
WDC_WD5002ABYS-01B1B0_WD-WCASY5362382      sdad  osd.254
WDC_WD5002ABYS-01B1B0_WD-WCASY5371513      sdac  osd.253
WDC_WD5002ABYS-01B1B0_WD-WCASY5395606      sdy   osd.249
WDC_WD5002ABYS-01B1B0_WD-WMASY8124826      sdaw  osd.302
WDC_WD5002ABYS-02B1B0_WD-WCASY7295622      sdd   osd.1
WDC_WD5002ABYS-02B1B0_WD-WCASY7744009      sdal  osd.261
WDC_WD5002ABYS-02B1B0_WD-WCASY8296130      sdah  osd.418
WDC_WD5002ABYS-02B1B0_WD-WCASY8324529      sdau  osd.270
WDC_WD5002ABYS-02B1B0_WD-WCASY8325343      sdat  osd.269
WDC_WD5002ABYS-02B1B0_WD-WCASY8325372      sdam  osd.262
WDC_WD5002ABYS-02B1B0_WD-WCASY8325392      sdak  osd.260
WDC_WD5002ABYS-02B1B0_WD-WCASY8417265      sdaq  osd.266
WDC_WD5002ABYS-02B1B0_WD-WCASY8421214      sdaf  osd.428
WDC_WD5002ABYS-02B1B0_WD-WCASY8421240      sdar  osd.267
WDC_WD5002ABYS-02B1B0_WD-WCASY8435134      sdao  osd.264
WDC_WD5002ABYS-02B1B0_WD-WCASY8905009      sdf   osd.3
WDC_WD5002ABYS-02B1B0_WD-WCASY8989609      sdm   osd.101
WDC_WD5002ABYS-02B1B0_WD-WCASY8989659      sdo   osd.190
WDC_WD5002ABYS-02B1B0_WD-WCASY8990274      sdl   osd.89
WDC_WD5002ABYS-02B1B0_WD-WCASY8990704      sdn   osd.128
WDC_WD5002ABYS-02B1B0_WD-WCASY8990896      sdk   osd.84
WDC_WD5002ABYS-02B1B0_WD-WCASY8991291      sdi   osd.65
WDC_WD5002ABYS-02B1B0_WD-WCASY9702953      sdab  osd.252
WDC_WD5002ABYS-02B1B0_WD-WCASY9705333      sde   osd.333
WDC_WD5002ABYS-02B1B0_WD-WCASY9706110      sdv   osd.382
WDC_WD5002ABYS-02B1B0_WD-WCASYC970858      sdp   osd.198
WDC_WD5002ABYS-02B1B0_WD-WCASYD516034      sdr   osd.232
WDC_WD5002ABYS-02B1B0_WD-WCASYE652445      sds   osd.246
WDC_WD5002ABYS-02B1B0_WD-WCASYE741540      sdt   osd.247
WDC_WD5002ABYS-02B1B0_WD-WCASYF168442      sdh   osd.28
WDC_WD5002ABYS-02B1B0_WD-WCASYF459176      sdw   osd.342
WDC_WD5002ABYS-02B1B0_WD-WCASYF466564      sdz   osd.345
WDC_WD5002ABYS-02B1B0_WD-WCASYF466616      sdaa  osd.372
WDC_WD5002ABYS-02B1B0_WD-WCASYF470075      sdx   osd.347
WDC_WD5002ABYS-02B1B0_WD-WCASYF470176      sdg   osd.332
WDC_WD5002ABYS-02B1B0_WD-WCASYF473488      sdq   osd.219
WDC_WD5002ABYS-02B1B0_WD-WCASYF496122      sdj   osd.67
WDC_WD5003ABYX-01WERA0_WD-WMAYP0090772     sdan  osd.263
WDC_WD5003ABYX-01WERA0_WD-WMAYP0095730     sdas  osd.268  >6w
WDC_WD5003ABYX-01WERA0_WD-WMAYP0096195     sdav  osd.424
WDC_WD5003ABYX-01WERA0_WD-WMAYP0217767     sdag  osd.427
WDC_WD5003ABYX-01WERA0_WD-WMAYP1290287     sdu   osd.248

Running smartctl against disks where life expectancy isn't displayed shows that disks are healthy and exit code of smartctl is 0.

Actions #1

Updated by Yaarit Hatuka about 3 years ago

Hi Denis,

Thanks for reporting this.

Can you please specify what commands you ran in order to enable the diskprediction module?

Do you see a prediction when you run `ceph device predict-life-expectancy <devid>` with a <devid> that does not display a prediction in the output above?
For example:

ceph device predict-life-expectancy WDC_WD5002ABYS-01B1B0_WD-WCASY5339459
Actions #2

Updated by Denis Polom about 3 years ago

Hi Yaarit

I did enable it by

# ceph mgr module enable diskprediction_local

my part of config looks like:

  mgr         advanced  mgr/devicehealth/scrape_frequency          21600
  mgr         advanced  mgr/diskprediction_local/predict_interval  21600        

I can't see prediction:

# ceph device predict-life-expectancy WDC_WD5002ABYS-01B1B0_WD-WCASY5339459
unknown
Actions #3

Updated by Greg Farnum almost 3 years ago

  • Project changed from Ceph to mgr
  • Category deleted (OSD)
Actions

Also available in: Atom PDF