Project

General

Profile

Actions

Bug #20628

closed

ceph-osd deadlock in ?simple messenger?

Added by Dan van der Ster almost 7 years ago. Updated almost 3 years ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Category:
OSD
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Hi,
We have a jewel 10.2.8 osd that just deadlocked. The osd was marked failed due to no PG stats after 60s:

2017-07-14 12:27:24.869733 mon.0 128.142.35.220:6789/0 161437 : cluster [INF] osd.331 marked down after no pg stats for 61.085540seconds

(Note that we use mon osd report timeout = 60 because we've seen this deadlock before and the deadlocked osd's peers do not mark him as failed in this scenario. IOW, osd's deadlocking in this way generate slow requests until the pg stats time out.)

The OSD and cluster logs are attached and I've ceph-post-file'd the coredump with tag 57a63b32-b3c8-4c40-a2f2-7f205ff475ad.

This is 10.2.8 on centos 7, installed from downloads.ceph.com.

# rpm -q ceph-osd
ceph-osd-10.2.8-0.el7.x86_64
# ceph --version
ceph version 10.2.8 (f5b1f1fd7c0be0506ba73502a675de9d048b744e)

Cheers, Dan


Files

ceph-osd.331.log.gz (118 KB) ceph-osd.331.log.gz Dan van der Ster, 07/14/2017 12:00 PM
ceph.log.gz (307 KB) ceph.log.gz Dan van der Ster, 07/14/2017 12:02 PM
Actions

Also available in: Atom PDF