Bug #54267
openProgress module keeps reporting Unhandled exception and put the cluster into an error state.
0%
Description
Hi.
We are currently doing a lot of recovering and moving misplaced data as we have removed and added new machines and OSDs to the cluster. During this process, the progress module keeps reporting an error and puts the cluster in an error state.
HEALTH_ERROR : MGR_MODULE_ERROR: Module 'progress' has failed: ('9f1a152c-2f09-4d01-bd81-95c0e2873d9d',)
I looked around in the logs and found this segment around when the error occurred.
--------------------------------------------
2022-02-13T01:08:31.848+0100 7ff4fb582700 1 log_channel(cluster) log [ERR] : Unhandled exception from module 'progress' while running on mgr.ceph-mgr3: ('9f1a152c-2f09-4d01-bd81-95c0e2873d9d',)
2022-02-13T01:08:31.852+0100 7ff4fb582700 -1 progress.serve:
2022-02-13T01:08:31.852+0100 7ff4fb582700 -1 Traceback (most recent call last):
File "/usr/share/ceph/mgr/progress/module.py", line 716, in serve
self._process_pg_summary()
File "/usr/share/ceph/mgr/progress/module.py", line 629, in _process_pg_summary
ev = self._events[ev_id]
KeyError: '9f1a152c-2f09-4d01-bd81-95c0e2873d9d'
-------------------------------------------
Best regards
Daniel
Updated by Daniel Persson over 2 years ago
Daniel Persson wrote:
Hi.
We are currently doing a lot of recovering and moving misplaced data as we have removed and added new machines and OSDs to the cluster. During this process, the progress module keeps reporting an error and puts the cluster in an error state.
HEALTH_ERROR : MGR_MODULE_ERROR: Module 'progress' has failed: ('9f1a152c-2f09-4d01-bd81-95c0e2873d9d',)
I looked around in the logs and found this segment around when the error occurred.
--------------------------------------------
2022-02-13T01:08:31.848+0100 7ff4fb5827001 log_channel(cluster) log [ERR] : Unhandled exception from module 'progress' while running on mgr.ceph-mgr3: ('9f1a152c-2f09-4d01-bd81-95c0e2873d9d',)
2022-02-13T01:08:31.852+0100 7ff4fb582700 -1 progress.serve:
2022-02-13T01:08:31.852+0100 7ff4fb582700 -1 Traceback (most recent call last):
File "/usr/share/ceph/mgr/progress/module.py", line 716, in serve
self._process_pg_summary()
File "/usr/share/ceph/mgr/progress/module.py", line 629, in _process_pg_summary
ev = self._events[ev_id]
KeyError: '9f1a152c-2f09-4d01-bd81-95c0e2873d9d'
-------------------------------------------Best regards
Daniel
Seems to be resolved by https://github.com/ceph/ceph/commit/b70d4a9caae0eb859e10b68f93573d507625d267