Bug #43287: HEALTH_ERR Module 'crash' has failed: time data '2019-07-06 15:5' does not match format '%Y-%m-%d %H:%M:%S.%f' - mgr - Ceph

Actions

Copy link

Bug #43287

closed

HEALTH_ERR Module 'crash' has failed: time data '2019-07-06 15:5' does not match format '%Y-%m-%d %H:%M:%S.%f'

Added by Kitt Tientanopajai over 4 years ago. Updated over 4 years ago.

Status:

Can't reproduce

Priority:

Urgent

Assignee:

Sage Weil

Category:

ceph-mgr

Target version:

Ceph - v14.2.5

% Done:

Source:

Community (user)

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

I've just upgrade ceph to 14.2.5, after restart I've got:

HEALTH_ERR Module 'crash' has failed: time data '2019-07-06 15:5' does not match format '%Y-%m-%d %H:%M:%S.%f'

Further check from ceph-mgr log showed something like:

2019-12-12 18:47:21.396 7f7183955700 -1 Traceback (most recent call last):
  File "/usr/share/ceph/mgr/crash/module.py", line 46, in serve
    self._refresh_health_checks()
  File "/usr/share/ceph/mgr/crash/module.py", line 70, in _refresh_health_checks
    crashid: crash for crashid, crash in self.crashes.items()
  File "/usr/share/ceph/mgr/crash/module.py", line 71, in <dictcomp>
    if self.time_from_string(crash['timestamp']) > cutoff and 'archived' not in crash
  File "/usr/share/ceph/mgr/crash/module.py", line 108, in time_from_string
    return datetime.datetime.strptime(timestr, DATEFMT)
  File "/usr/lib/python2.7/_strptime.py", line 332, in _strptime
    (data_string, format))
ValueError: time data '2019-07-06 15:5' does not match format '%Y-%m-%d %H:%M:%S.%f'

So crash module cannot load and emitted HEALTH_ERR. As I understand, the time format for that particular dump might be wrong. Is there anyway I could fix or clean this ?

Actions

Copy link

Updated by Kefu Chai over 4 years ago

Description updated (diff)

Actions

Copy link

Updated by Sage Weil over 4 years ago

Status changed from New to In Progress
Assignee set to Sage Weil
Priority changed from Normal to Immediate

Actions

Copy link

Updated by Sage Weil over 4 years ago

Status changed from In Progress to Need More Info
Priority changed from Immediate to Urgent

it looks like one of the crashes in your cluster has a mangled timestamp value? can you attach the output from 'ceph crash ls --format=json'?

Actions

Copy link

Updated by Kitt Tientanopajai over 4 years ago

I tried, but it returned an error.

Error EIO: Module 'crash' has experienced an error and cannot handle commands: time data '2019-07-06 15:5' does not match format '%Y-%m-%d %H:%M:%S.%f'

I think the ceph mgr cannot load the crash module so all “ceph crash” command would be invalid ?

Actions

Copy link