Bug #42058: OSD reconnected across map epochs, inconsistent pg logs created - RADOS - Ceph

Actions

Copy link

Bug #42058

closed

OSD reconnected across map epochs, inconsistent pg logs created

Added by 相洋于 over 4 years ago. Updated over 4 years ago.

Status:

Duplicate

Priority:

Normal

Assignee:

Category:

Peering

Target version:

Ceph - v12.2.13

% Done:

Source:

Community (user)

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

Ceph - v12.2.12

ceph-qa-suite:

Component(RADOS):

OSD

Pull request ID:

30609

Crash signature (v1):

Crash signature (v2):

Description

Get the lossless cluster connection between osd.2 and osd.47 for example.

When osd.47 is restarted and at the same time osd.2 has a op need to send to osd.47.

Then osd.2 shut down current connnection and begin to reconnect osd.47.

When osd.47 is up , osd.2 send a connect msg to osd.47 , osd.47 then send a reset session msg to osd.2.

Osd.2 then delete its out queue msg, and the op is lost.

In our luminous cluster, we met a rare case(3 replica, a little complex, hard to describe) that we had a one more op in a osd in a pg group, we only found that when scrubbing. At last we found that the op is lost in client endpoint when receiving a reset session msg.

All in all , I think when the connection is lossless, even the client endpoint's csq is not 0 , server endpoint should not send a reset session msg, it's better to send a retry session command with csq set to 0.

Related issues 1 (0 open — 1 closed)

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » RADOS

Custom queries

Bug #42058

OSD reconnected across map epochs, inconsistent pg logs created

Updated by Greg Farnum over 4 years ago

Updated by 相洋于 over 4 years ago

Updated by 相洋于 over 4 years ago

Updated by Greg Farnum over 4 years ago

Updated by Greg Farnum over 4 years ago

Updated by Greg Farnum over 4 years ago

Updated by 相洋于 over 4 years ago

Updated by 相洋于 over 4 years ago

Updated by 相洋于 over 4 years ago

Project

General

Profile

Ceph » RADOS

Custom queries

Bug #42058

OSD reconnected across map epochs, inconsistent pg logs created

Updated by Greg Farnum over 4 years ago

Updated by 相洋 于 over 4 years ago

Updated by 相洋 于 over 4 years ago

Updated by Greg Farnum over 4 years ago

Updated by Greg Farnum over 4 years ago

Updated by Greg Farnum over 4 years ago

Updated by 相洋 于 over 4 years ago

Updated by 相洋 于 over 4 years ago

Updated by 相洋 于 over 4 years ago

Updated by 相洋于 over 4 years ago

Updated by 相洋于 over 4 years ago

Updated by 相洋于 over 4 years ago

Updated by 相洋于 over 4 years ago

Updated by 相洋于 over 4 years ago