Bug #55687
pacific: Regressions with holding the GIL while attempting to lock a mutex
0%
Description
The mgr process can deadlock if the GIL is held while attempting to lock a mutex. There have been some recent regressions that make this scenario possible again. We have seen this regression cause all 5 of our managers to deadlock and become unavailable in a large cluster.
History
#1 Updated by Cory Snyder over 1 year ago
- Affected Versions v16.2.8 added
These regressions appear to have been introduced here: https://github.com/ceph/ceph/pull/44750
Note that the issues do not exist on the master branch or on Quincy, they were introduced due to mistakes with the Pacific backport.
#2 Updated by Cory Snyder over 1 year ago
- Backport deleted (
quincy, pacific)
#3 Updated by Cory Snyder over 1 year ago
- Regression changed from No to Yes
#4 Updated by Cory Snyder over 1 year ago
- Pull request ID set to 46302
#5 Updated by Neha Ojha over 1 year ago
- Subject changed from Regressions with holding the GIL while attempting to lock a mutex to pacific: Regressions with holding the GIL while attempting to lock a mutex
- Status changed from New to Resolved
#6 Updated by Ilya Dryomov over 1 year ago
- Target version set to v16.2.9
#7 Updated by Eugen Block 12 months ago
I upgraded our cluster last week to 16.2.10 and I believe I saw this issue an hour ago for the first time in this cluster. Do I understand correctly, the deadlock would cause the pod to still be "alive" but not respond anymore? I was browsing in the dashboard when it stopped working (pages didn't load), then I checked and a different MGR had taken over. I read somewhere that the prometheus module could play a role in this, but in our cluster it is not active. The logs of the failed mgr pod don't contain much information, unfortunately, but if I can provide anything useful please let me know.
#8 Updated by Eugen Block 12 months ago
- File pacific-mgr-deadlock-gdb.txt View added
Adding a gdb.txt dump from a mgr in deadlock (slightly different ceph version than ours).