Bug #43287
closedHEALTH_ERR Module 'crash' has failed: time data '2019-07-06 15:5' does not match format '%Y-%m-%d %H:%M:%S.%f'
0%
Description
I've just upgrade ceph to 14.2.5, after restart I've got:
HEALTH_ERR Module 'crash' has failed: time data '2019-07-06 15:5' does not match format '%Y-%m-%d %H:%M:%S.%f'
Further check from ceph-mgr log showed something like:
2019-12-12 18:47:21.396 7f7183955700 -1 Traceback (most recent call last): File "/usr/share/ceph/mgr/crash/module.py", line 46, in serve self._refresh_health_checks() File "/usr/share/ceph/mgr/crash/module.py", line 70, in _refresh_health_checks crashid: crash for crashid, crash in self.crashes.items() File "/usr/share/ceph/mgr/crash/module.py", line 71, in <dictcomp> if self.time_from_string(crash['timestamp']) > cutoff and 'archived' not in crash File "/usr/share/ceph/mgr/crash/module.py", line 108, in time_from_string return datetime.datetime.strptime(timestr, DATEFMT) File "/usr/lib/python2.7/_strptime.py", line 332, in _strptime (data_string, format)) ValueError: time data '2019-07-06 15:5' does not match format '%Y-%m-%d %H:%M:%S.%f'
So crash module cannot load and emitted HEALTH_ERR. As I understand, the time format for that particular dump might be wrong. Is there anyway I could fix or clean this ?
Updated by Sage Weil over 4 years ago
- Status changed from New to In Progress
- Assignee set to Sage Weil
- Priority changed from Normal to Immediate
Updated by Sage Weil over 4 years ago
- Status changed from In Progress to Need More Info
- Priority changed from Immediate to Urgent
it looks like one of the crashes in your cluster has a mangled timestamp value? can you attach the output from 'ceph crash ls --format=json'?
Updated by Kitt Tientanopajai over 4 years ago
I tried, but it returned an error.
Error EIO: Module 'crash' has experienced an error and cannot handle commands: time data '2019-07-06 15:5' does not match format '%Y-%m-%d %H:%M:%S.%f'
I think the ceph mgr cannot load the crash module so all “ceph crash” command would be invalid ?
Updated by Kitt Tientanopajai over 4 years ago
Can I just delete /var/lib/ceph/crash and restart daemon to get rid of the error ?
Updated by Kitt Tientanopajai over 4 years ago
I've workaround this by hardcoding mgr/crash/module.py to handle the case. The mgr can load the crash module, all 'ceph crash' commands work.
I've purged all crashdumps, then reverted module.py back to original.
I think this issue could be closed.
Updated by Sage Weil over 4 years ago
- Status changed from Need More Info to Can't reproduce