Bug #61912
closedmgr hang when purging OSDs
0%
Description
This issue occurs when incorrectly erasing a map element in a for loop:
After line https://github.com/ceph/ceph/blob/main/src/mon/PGMap.cc#L1218, the iterator "i" will become invalid,
Thus making the for-loop enter an undefined situation.
It caused a hang in my case, and because it can't release the locks, it also starved all other threads.
The mgr logs are full of auth rotating errors:
---
2023-07-02T00:00:10.946+0000 7f48aa7fc640 1 monclient: _check_auth_rotating possible clock skew, rotating keys expired way too early (before 2023-07-01T23:00:10.952817+0000)
2023-07-02T00:00:11.946+0000 7f48aa7fc640 -1 monclient: _check_auth_rotating possible clock skew, rotating keys expired way too early (before 2023-07-01T23:00:11.952939+0000)
2023-07-02T00:00:12.946+0000 7f48aa7fc640 -1 monclient: _check_auth_rotating possible clock skew, rotating keys expired way too early (before 2023-07-01T23:00:12.953042+0000)
2023-07-02T00:00:13.946+0000 7f48aa7fc640 -1 monclient: _check_auth_rotating possible clock skew, rotating keys expired way too early (before 2023-07-01T23:00:13.953143+0000)
2023-07-02T00:00:14.946+0000 7f48aa7fc640 -1 monclient: _check_auth_rotating possible clock skew, rotating keys expired way too early (before 2023-07-01T23:00:14.953244+0000)
2023-07-02T00:00:15.946+0000 7f48aa7fc640 -1 monclient: _check_auth_rotating possible clock skew, rotating keys expired way too early (before 2023-07-01T23:00:15.953348+0000)
2023-07-02T00:00:16.946+0000 7f48aa7fc640 -1 monclient: _check_auth_rotating possible clock skew, rotating keys expired way too early (before 2023-07-01T23:00:16.953451+0000)
2023-07-02T00:00:17.950+0000 7f48aa7fc640 -1 monclient: _check_auth_rotating possible clock skew, rotating keys expired way too early (before 2023-07-01T23:00:17.953580+0000)
2023-07-02T00:00:18.950+0000 7f48aa7fc640 -1 monclient: _check_auth_rotating possible clock skew, rotating keys expired way too early (before 2023-07-01T23:00:18.953691+0000)
2023-07-02T00:00:19.950+0000 7f48aa7fc640 -1 monclient: _check_auth_rotating possible clock skew, rotating keys expired way too early (before 2023-07-01T23:00:19.953797+0000)
2023-07-02T00:00:20.950+0000 7f48aa7fc640 -1 monclient: _check_auth_rotating possible clock skew, rotating keys expired way too early (before 2023-07-01T23:00:20.953899+0000)
2023-07-02T00:00:21.950+0000 7f48aa7fc640 -1 monclient: _check_auth_rotating possible clock skew, rotating keys expired way too early (before 2023-07-01T23:00:21.954013+0000)
--
This issue is observed on the 17.2.5 cluster, but still an issue in the main.
Updated by dongdong tao 10 months ago
Updated by Matan Breizman 10 months ago
- Project changed from mgr to RADOS
- Status changed from New to Fix Under Review
- Backport set to reef, quincy
- Pull request ID set to 52334
Updated by Konstantin Shalygin 10 months ago
- Assignee set to dongdong tao
- Target version set to v19.0.0
- Source set to Community (user)
Updated by Matan Breizman 10 months ago
- Has duplicate Bug #58303: active mgr crashes with segfault when running 'ceph osd purge' added
Updated by Matan Breizman 10 months ago
- Status changed from Fix Under Review to Duplicate