Bug #52761: OSDs announcing incorrect front_addr after upgrade to 16.2.6 - RADOS - Ceph

Actions

Copy link

Bug #52761

open

OSDs announcing incorrect front_addr after upgrade to 16.2.6

Added by Javier Cacheiro over 2 years ago. Updated over 2 years ago.

Status:

New

Priority:

Normal

Assignee:

Category:

Target version:

% Done:

Source:

Tags:

Backport:

Regression:

Severity:

2 - major

Reviewed:

Affected Versions:

Ceph - v16.2.6

ceph-qa-suite:

Component(RADOS):

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

Ceph cluster configured with a public and cluster network:

ceph config dump|grep network

global advanced cluster_network 10.114.0.0/16 *
mon advanced public_network 10.113.0.0/16 *

Upgraded from 16.2.4 to 16.2.6 and all nodes rebooted after the upgrade.

Investigating an issue with clients not being able to connect I found that the problem is that clients are directed to the cluster_network address for some OSDs.

Looking at the osd metadata I see in most OSDs the front addresses are correctly configured through the 10.113 public network, like this one:

osd.0
"back_addr": "[v2:10.114.29.10:6813/2947358317,v1:10.114.29.10:6819/2947358317]",
"front_addr": "[v2:10.113.29.10:6801/2947358317,v1:10.113.29.10:6807/2947358317]",
"hb_back_addr": "[v2:10.114.29.10:6837/2947358317,v1:10.114.29.10:6843/2947358317]",
"hb_front_addr": "[v2:10.113.29.10:6825/2947358317,v1:10.113.29.10:6832/2947358317]",

But then, there are also many osds where the configuration is incorrect, but this could happen in different ways.

For example in some OSDs the error is just in the front_addr, but the hb_front_addr is fine:

osd.26
"back_addr": "[v2:10.114.29.5:6866/4155549673,v1:10.114.29.5:6867/4155549673]",
"front_addr": "[v2:10.114.29.5:6864/4155549673,v1:10.114.29.5:6865/4155549673]",
"hb_back_addr": "[v2:10.114.29.5:6870/4155549673,v1:10.114.29.5:6871/4155549673]",
"hb_front_addr": "[v2:10.113.29.5:6868/4155549673,v1:10.113.29.5:6869/4155549673]",

In others it is the hb_front_addr:

osd.34
"back_addr": "[v2:10.114.29.6:6802/3934363792,v1:10.114.29.6:6803/3934363792]",
"front_addr": "[v2:10.113.29.6:6800/3934363792,v1:10.113.29.6:6801/3934363792]",
"hb_back_addr": "[v2:10.114.29.6:6806/3934363792,v1:10.114.29.6:6807/3934363792]",
"hb_front_addr": "[v2:10.114.29.6:6804/3934363792,v1:10.114.29.6:6805/3934363792]",

And in others both are wrong:

osd.32
"back_addr": "[v2:10.114.29.10:6814/2403531529,v1:10.114.29.10:6820/2403531529]",
"front_addr": "[v2:10.114.29.10:6802/2403531529,v1:10.114.29.10:6808/2403531529]",
"hb_back_addr": "[v2:10.114.29.10:6836/2403531529,v1:10.114.29.10:6841/2403531529]",
"hb_front_addr": "[v2:10.114.29.10:6826/2403531529,v1:10.114.29.10:6830/2403531529]",

This happens just for the front_addr assignation, the back_addr in all OSDs is in the cluster network (10.114).

In the same node there can be OSDs that have the right configuration and OSDs that are announcing wrong front addresses.

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » RADOS

Custom queries

Bug #52761

OSDs announcing incorrect front_addr after upgrade to 16.2.6

Updated by Javier Cacheiro over 2 years ago

Updated by Javier Cacheiro over 2 years ago

Updated by Javier Cacheiro over 2 years ago

Updated by Javier Cacheiro over 2 years ago

Updated by Javier Cacheiro over 2 years ago

Updated by Greg Farnum over 2 years ago

Updated by Neha Ojha over 2 years ago

Updated by Javier Cacheiro over 2 years ago