Project

General

Profile

Actions

Bug #61912

closed

mgr hang when purging OSDs

Added by dongdong tao 10 months ago. Updated 10 months ago.

Status:
Duplicate
Priority:
Normal
Assignee:
Category:
-
Target version:
% Done:

0%

Source:
Community (user)
Tags:
Backport:
reef, quincy
Regression:
No
Severity:
1 - critical
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

This issue occurs when incorrectly erasing a map element in a for loop:
After line https://github.com/ceph/ceph/blob/main/src/mon/PGMap.cc#L1218, the iterator "i" will become invalid,
Thus making the for-loop enter an undefined situation.

It caused a hang in my case, and because it can't release the locks, it also starved all other threads.
The mgr logs are full of auth rotating errors:
---
2023-07-02T00:00:10.946+0000 7f48aa7fc640 1 monclient: _check_auth_rotating possible clock skew, rotating keys expired way too early (before 2023-07-01T23:00:10.952817+0000)
2023-07-02T00:00:11.946+0000 7f48aa7fc640 -1 monclient: _check_auth_rotating possible clock skew, rotating keys expired way too early (before 2023-07-01T23:00:11.952939+0000)
2023-07-02T00:00:12.946+0000 7f48aa7fc640 -1 monclient: _check_auth_rotating possible clock skew, rotating keys expired way too early (before 2023-07-01T23:00:12.953042+0000)
2023-07-02T00:00:13.946+0000 7f48aa7fc640 -1 monclient: _check_auth_rotating possible clock skew, rotating keys expired way too early (before 2023-07-01T23:00:13.953143+0000)
2023-07-02T00:00:14.946+0000 7f48aa7fc640 -1 monclient: _check_auth_rotating possible clock skew, rotating keys expired way too early (before 2023-07-01T23:00:14.953244+0000)
2023-07-02T00:00:15.946+0000 7f48aa7fc640 -1 monclient: _check_auth_rotating possible clock skew, rotating keys expired way too early (before 2023-07-01T23:00:15.953348+0000)
2023-07-02T00:00:16.946+0000 7f48aa7fc640 -1 monclient: _check_auth_rotating possible clock skew, rotating keys expired way too early (before 2023-07-01T23:00:16.953451+0000)
2023-07-02T00:00:17.950+0000 7f48aa7fc640 -1 monclient: _check_auth_rotating possible clock skew, rotating keys expired way too early (before 2023-07-01T23:00:17.953580+0000)
2023-07-02T00:00:18.950+0000 7f48aa7fc640 -1 monclient: _check_auth_rotating possible clock skew, rotating keys expired way too early (before 2023-07-01T23:00:18.953691+0000)
2023-07-02T00:00:19.950+0000 7f48aa7fc640 -1 monclient: _check_auth_rotating possible clock skew, rotating keys expired way too early (before 2023-07-01T23:00:19.953797+0000)
2023-07-02T00:00:20.950+0000 7f48aa7fc640 -1 monclient: _check_auth_rotating possible clock skew, rotating keys expired way too early (before 2023-07-01T23:00:20.953899+0000)
2023-07-02T00:00:21.950+0000 7f48aa7fc640 -1 monclient: _check_auth_rotating possible clock skew, rotating keys expired way too early (before 2023-07-01T23:00:21.954013+0000)
--

This issue is observed on the 17.2.5 cluster, but still an issue in the main.


Related issues 1 (0 open1 closed)

Has duplicate Ceph - Bug #58303: active mgr crashes with segfault when running 'ceph osd purge'ResolvedChristian Theune

Actions
Actions #2

Updated by Matan Breizman 10 months ago

  • Project changed from mgr to RADOS
  • Status changed from New to Fix Under Review
  • Backport set to reef, quincy
  • Pull request ID set to 52334
Actions #3

Updated by Konstantin Shalygin 10 months ago

  • Assignee set to dongdong tao
  • Target version set to v19.0.0
  • Source set to Community (user)
Actions #4

Updated by Matan Breizman 10 months ago

  • Has duplicate Bug #58303: active mgr crashes with segfault when running 'ceph osd purge' added
Actions #5

Updated by Matan Breizman 10 months ago

  • Status changed from Fix Under Review to Duplicate
Actions

Also available in: Atom PDF