Project

General

Profile

Actions

Bug #17236

closed

MDS goes damaged on blacklist (failed to read JournalPointer: -108 ((108) Cannot send after transport endpoint shutdown)

Added by John Spray over 7 years ago. Updated over 7 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
-
Category:
Correctness/Safety
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
jewel
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

http://qa-proxy.ceph.com/teuthology/teuthology-2016-09-05_17:25:02-kcephfs-master-testing-basic-mira/401388/

OSD log:

2016-09-05 19:15:04.996890 7f3a92cea700 10 osd.1 pg_epoch: 9 pg[2.7( v 8'7692 (7'4629,8'7692] local-les=7 n=376 ec=6 les/c/f 7/7/0 6/6/6) [1,2] r=0 lpr=6 luod=8'7691 lua=8'7689 crt=8'7688 lcod 8'7690 mlcod 8'7688 active+clean] do_op 172.21.5.140:6808/12206 is blacklisted

remote/mira037/log/ceph-osd.1.log.gz:2016-09-05 19:15:18.303488 7f3a92cea700  1 -- 172.21.5.140:6804/11520 >> 172.21.8.106:6808/19233 conn(0x55d18aee9000 sd=66 :6804 s=STATE_OPEN pgs=44 cs=1 l=1). == tx == 0x55d18c1e3e40 osd_op_reply(4 400.00000000 [read 0~0] v0'0 uv0 ack = -108 ((108) Cannot send after transport endpoint shutdown)) v7

There is at least one case here where r!=0 is being taken to mean damage, but we should be just respawning when seeing EBLACKLISTED. Almost everywhere else MDSIOContext handles this, but JournalPointer doesn't use it because it works outside of the MDS lock.


Related issues 1 (0 open1 closed)

Copied to CephFS - Backport #17478: jewel: MDS goes damaged on blacklist (failed to read JournalPointer: -108 ((108) Cannot send after transport endpoint shutdown)ResolvedLoïc DacharyActions
Actions

Also available in: Atom PDF