Bug #42058: OSD reconnected across map epochs, inconsistent pg logs created - RADOS - Ceph

Actions

Copy link

Bug #42058

closed

OSD reconnected across map epochs, inconsistent pg logs created

Added by 相洋于 over 4 years ago. Updated over 4 years ago.

Status:

Duplicate

Priority:

Normal

Assignee:

Category:

Peering

Target version:

Ceph - v12.2.13

% Done:

Source:

Community (user)

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

Ceph - v12.2.12

ceph-qa-suite:

Component(RADOS):

OSD

Pull request ID:

30609

Crash signature (v1):

Crash signature (v2):

Description

Get the lossless cluster connection between osd.2 and osd.47 for example.

When osd.47 is restarted and at the same time osd.2 has a op need to send to osd.47.

Then osd.2 shut down current connnection and begin to reconnect osd.47.

When osd.47 is up , osd.2 send a connect msg to osd.47 , osd.47 then send a reset session msg to osd.2.

Osd.2 then delete its out queue msg, and the op is lost.

In our luminous cluster, we met a rare case(3 replica, a little complex, hard to describe) that we had a one more op in a osd in a pg group, we only found that when scrubbing. At last we found that the op is lost in client endpoint when receiving a reset session msg.

All in all , I think when the connection is lossless, even the client endpoint's csq is not 0 , server endpoint should not send a reset session msg, it's better to send a retry session command with csq set to 0.

Related issues 1 (0 open — 1 closed)

Actions

Copy link

Updated by Greg Farnum over 4 years ago

Status changed from New to In Progress
Pull request ID set to 30609

Actions

Copy link

Updated by 相洋于 over 4 years ago

@Greg Farnum

Assume pg 1.1a maps to osds[1,5,9], osd1 is the primary osd.

Time 1: osd1 osd5 osd9 was online and could send message to each other.

Time 2: old5, osd9 received an new osdmap that showed osd.1 was down ,and at the same time, osd1’s public network was down manually(physical down)，but osd.0’s cluster network is still online.

Time 3：
Because of receiving an new osdmap that showed osd1 was down, osd5 and osd9 shutdowned their connections towards osd1 up (through mark_down() ). so there were no existing connections for osd1.
As for osd.1, connections between osd.5/osd.9 encountered a failure(disconnected by osd.5/osd.9 explicitly) and were going to enter STANDBY state . As a consequence, these connections were still existing( their cs_seq > 0).
After a short while, osd1 generated two scrub operations(enable deep-scrub) about updating some objects version info(scrub_snapshot_metadata()), and was going to reestablish connections among osd5 and osd9. When osd1 was sending the first operation op1(by send_message()), the cluster messenger would reconnect the osd5/osd9 and then placing the op1 in out_q。During the connection was enter STATE_OPEN, there was a RESETSESSION between osd1 and osd5/osd9, which lead osd1 to discard the msg in out_q (by was_session_reset()). After the connection was established, osd1 sent the second operation op2 to osd5/osd9.
Eventually, there two pg log were recorded on osd1(op1,op2), but only one pg log(op2) on osd5/osd9.

Time4: when osd1 public network recovered soon, during pg peering, the primary osd(osd1) could not find any difference about pg log among osd5 and osd9. When pg 1.1a deep-scrubed over, there would trigger an inconsistent error about object version info(the version info op1 associatived).

This is a rarely situation we meet with. In some case, I think this would cause the msgs out of order . If I misdiagnosed it,please tell me.

Actions

Copy link

Updated by 相洋于 over 4 years ago

see PR: https://github.com/ceph/ceph/pull/25343 which also avoid triggerring RESETSESSION.

Actions

Copy link

Updated by Greg Farnum over 4 years ago

Project changed from Messengers to RADOS
Subject changed from 【msg/async】 bad ression at osd cluster messenger to OSD reconnected across map epochs, inconsistent pg logs created
Category changed from AsyncMessenger to Peering
Status changed from In Progress to New
Component(RADOS) OSD added

Okay, so the issue here is that osd.1 managed to reconnect to osd.5 and osd.9 without triggering a wider reset of the PG state that canceled the ongoing background (scrub) operations. osd.{59} should have detected the OSDMap version mismatches and rejected ops from osd.1.

This is running on a Luminous 12.2.13 cluster, right?

Actions

Copy link

Updated by Greg Farnum over 4 years ago

Status changed from New to Duplicate

Oh sorry I didn't look at that PR. It is the correct fix; if we do another luminous point release it should show up or you can pull it in yourself. :)
(Note the distinction that it only changes the rules during the connect phase!)

Actions

Copy link

Updated by Greg Farnum over 4 years ago

Is duplicate of Bug #36612: msg/async: connection stall added

Actions

Copy link

Updated by 相洋于 over 4 years ago

Our cluster is running on Luminous 12.2.12.
I do not think PR https://github.com/ceph/ceph/pull/25343 can not solve the problem in our case.
So reconsider my PR: https://github.com/ceph/ceph/pull/30609?
Or can you give any suggestion to resolve the problem in other modules ?

Actions

Copy link

Updated by 相洋于 over 4 years ago

Greg Farnum wrote:

Okay, so the issue here is that osd.1 managed to reconnect to osd.5 and osd.9 without triggering a wider reset of the PG state that canceled the ongoing background (scrub) operations. osd.{59} should have detected the OSDMap version mismatches and rejected ops from osd.1.

osd.{59} have accepted the later ops because osd.{59} had not committed osdmap, although osd.{59} had discarded connection with osd.1.

This is running on a Luminous 12.2.13 cluster, right?

Actions

Copy link

Updated by 相洋于 over 4 years ago

https://tracker.ceph.com/issues/22570

@Greg Farnum, my problem is related to this tracker.

Problem can be resolved and closed.

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » RADOS

Custom queries

Bug #42058

OSD reconnected across map epochs, inconsistent pg logs created

Updated by Greg Farnum over 4 years ago

Updated by 相洋于 over 4 years ago

Updated by 相洋于 over 4 years ago

Updated by Greg Farnum over 4 years ago

Updated by Greg Farnum over 4 years ago

Updated by Greg Farnum over 4 years ago

Updated by 相洋于 over 4 years ago

Updated by 相洋于 over 4 years ago

Updated by 相洋于 over 4 years ago

Project

General

Profile

Ceph » RADOS

Custom queries

Bug #42058

OSD reconnected across map epochs, inconsistent pg logs created

Updated by Greg Farnum over 4 years ago

Updated by 相洋 于 over 4 years ago

Updated by 相洋 于 over 4 years ago

Updated by Greg Farnum over 4 years ago

Updated by Greg Farnum over 4 years ago

Updated by Greg Farnum over 4 years ago

Updated by 相洋 于 over 4 years ago

Updated by 相洋 于 over 4 years ago

Updated by 相洋 于 over 4 years ago

Updated by 相洋于 over 4 years ago

Updated by 相洋于 over 4 years ago

Updated by 相洋于 over 4 years ago

Updated by 相洋于 over 4 years ago

Updated by 相洋于 over 4 years ago