Project

General

Profile

Bug #43287

HEALTH_ERR Module 'crash' has failed: time data '2019-07-06 15:5' does not match format '%Y-%m-%d %H:%M:%S.%f'

Added by Kitt Tientanopajai 4 months ago. Updated 3 months ago.

Status:
Can't reproduce
Priority:
Urgent
Assignee:
Category:
ceph-mgr
Target version:
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature:

Description

I've just upgrade ceph to 14.2.5, after restart I've got:

HEALTH_ERR Module 'crash' has failed: time data '2019-07-06 15:5' does not match format '%Y-%m-%d %H:%M:%S.%f'

Further check from ceph-mgr log showed something like:

2019-12-12 18:47:21.396 7f7183955700 -1 Traceback (most recent call last):
  File "/usr/share/ceph/mgr/crash/module.py", line 46, in serve
    self._refresh_health_checks()
  File "/usr/share/ceph/mgr/crash/module.py", line 70, in _refresh_health_checks
    crashid: crash for crashid, crash in self.crashes.items()
  File "/usr/share/ceph/mgr/crash/module.py", line 71, in <dictcomp>
    if self.time_from_string(crash['timestamp']) > cutoff and 'archived' not in crash
  File "/usr/share/ceph/mgr/crash/module.py", line 108, in time_from_string
    return datetime.datetime.strptime(timestr, DATEFMT)
  File "/usr/lib/python2.7/_strptime.py", line 332, in _strptime
    (data_string, format))
ValueError: time data '2019-07-06 15:5' does not match format '%Y-%m-%d %H:%M:%S.%f'

So crash module cannot load and emitted HEALTH_ERR. As I understand, the time format for that particular dump might be wrong. Is there anyway I could fix or clean this ?

History

#1 Updated by Kefu Chai 4 months ago

  • Description updated (diff)

#2 Updated by Sage Weil 4 months ago

  • Status changed from New to In Progress
  • Assignee set to Sage Weil
  • Priority changed from Normal to Immediate

#3 Updated by Sage Weil 4 months ago

  • Status changed from In Progress to Need More Info
  • Priority changed from Immediate to Urgent

it looks like one of the crashes in your cluster has a mangled timestamp value? can you attach the output from 'ceph crash ls --format=json'?

#4 Updated by Kitt Tientanopajai 4 months ago

I tried, but it returned an error.

Error EIO: Module 'crash' has experienced an error and cannot handle commands: time data '2019-07-06 15:5' does not match format '%Y-%m-%d %H:%M:%S.%f'

I think the ceph mgr cannot load the crash module so all “ceph crash” command would be invalid ?

#5 Updated by Kitt Tientanopajai 4 months ago

Can I just delete /var/lib/ceph/crash and restart daemon to get rid of the error ?

#6 Updated by Kitt Tientanopajai 3 months ago

I've workaround this by hardcoding mgr/crash/module.py to handle the case. The mgr can load the crash module, all 'ceph crash' commands work.

I've purged all crashdumps, then reverted module.py back to original.

I think this issue could be closed.

#7 Updated by Sage Weil 3 months ago

  • Status changed from Need More Info to Can't reproduce

Also available in: Atom PDF