Actions
Bug #20628
closedceph-osd deadlock in ?simple messenger?
Status:
Closed
Priority:
Normal
Assignee:
-
Category:
OSD
Target version:
-
% Done:
0%
Source:
Community (user)
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Description
Hi,
We have a jewel 10.2.8 osd that just deadlocked. The osd was marked failed due to no PG stats after 60s:
2017-07-14 12:27:24.869733 mon.0 128.142.35.220:6789/0 161437 : cluster [INF] osd.331 marked down after no pg stats for 61.085540seconds
(Note that we use mon osd report timeout = 60 because we've seen this deadlock before and the deadlocked osd's peers do not mark him as failed in this scenario. IOW, osd's deadlocking in this way generate slow requests until the pg stats time out.)
The OSD and cluster logs are attached and I've ceph-post-file'd the coredump with tag 57a63b32-b3c8-4c40-a2f2-7f205ff475ad.
This is 10.2.8 on centos 7, installed from downloads.ceph.com.
# rpm -q ceph-osd ceph-osd-10.2.8-0.el7.x86_64 # ceph --version ceph version 10.2.8 (f5b1f1fd7c0be0506ba73502a675de9d048b744e)
Cheers, Dan
Files
Actions