Project

General

Profile

Actions

Bug #43287

closed

HEALTH_ERR Module 'crash' has failed: time data '2019-07-06 15:5' does not match format '%Y-%m-%d %H:%M:%S.%f'

Added by Kitt Tientanopajai over 4 years ago. Updated over 4 years ago.

Status:
Can't reproduce
Priority:
Urgent
Assignee:
Category:
ceph-mgr
Target version:
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

I've just upgrade ceph to 14.2.5, after restart I've got:

HEALTH_ERR Module 'crash' has failed: time data '2019-07-06 15:5' does not match format '%Y-%m-%d %H:%M:%S.%f'

Further check from ceph-mgr log showed something like:

2019-12-12 18:47:21.396 7f7183955700 -1 Traceback (most recent call last):
  File "/usr/share/ceph/mgr/crash/module.py", line 46, in serve
    self._refresh_health_checks()
  File "/usr/share/ceph/mgr/crash/module.py", line 70, in _refresh_health_checks
    crashid: crash for crashid, crash in self.crashes.items()
  File "/usr/share/ceph/mgr/crash/module.py", line 71, in <dictcomp>
    if self.time_from_string(crash['timestamp']) > cutoff and 'archived' not in crash
  File "/usr/share/ceph/mgr/crash/module.py", line 108, in time_from_string
    return datetime.datetime.strptime(timestr, DATEFMT)
  File "/usr/lib/python2.7/_strptime.py", line 332, in _strptime
    (data_string, format))
ValueError: time data '2019-07-06 15:5' does not match format '%Y-%m-%d %H:%M:%S.%f'

So crash module cannot load and emitted HEALTH_ERR. As I understand, the time format for that particular dump might be wrong. Is there anyway I could fix or clean this ?
Actions #1

Updated by Kefu Chai over 4 years ago

  • Description updated (diff)
Actions #2

Updated by Sage Weil over 4 years ago

  • Status changed from New to In Progress
  • Assignee set to Sage Weil
  • Priority changed from Normal to Immediate
Actions #3

Updated by Sage Weil over 4 years ago

  • Status changed from In Progress to Need More Info
  • Priority changed from Immediate to Urgent

it looks like one of the crashes in your cluster has a mangled timestamp value? can you attach the output from 'ceph crash ls --format=json'?

Actions #4

Updated by Kitt Tientanopajai over 4 years ago

I tried, but it returned an error.

Error EIO: Module 'crash' has experienced an error and cannot handle commands: time data '2019-07-06 15:5' does not match format '%Y-%m-%d %H:%M:%S.%f'

I think the ceph mgr cannot load the crash module so all “ceph crash” command would be invalid ?

Actions #5

Updated by Kitt Tientanopajai over 4 years ago

Can I just delete /var/lib/ceph/crash and restart daemon to get rid of the error ?

Actions #6

Updated by Kitt Tientanopajai over 4 years ago

I've workaround this by hardcoding mgr/crash/module.py to handle the case. The mgr can load the crash module, all 'ceph crash' commands work.

I've purged all crashdumps, then reverted module.py back to original.

I think this issue could be closed.

Actions #7

Updated by Sage Weil over 4 years ago

  • Status changed from Need More Info to Can't reproduce
Actions

Also available in: Atom PDF