Project

General

Profile

Actions

Bug #56057

open

Add health error if one or more OSDs registered v1/v2 public ip addresses are not within defined public_network subnet

Added by Prashant D almost 2 years ago. Updated 4 months ago.

Status:
Pending Backport
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
backport_processed
Backport:
reef, quincy
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
OSD
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

In a containerized environment after a OSD node reboot, some OSDs registered their public v1/v2 addresses on cluster network instead on defined public network. The v1/v2 addresses were well outside the public network configured in the cluster for these OSDs. This has seriously caused the rados/rbd clients failing to communicate with cluster as clients were failed to initiate socket connection (socket connection always linger in SYN_SENT state in this case) with these OSDs having wrong public addresses.

For example, on node01 the osd.2 and osd.4 public ip address are registered on cluster network and not within public network 10.0.5.0/22

node01 : public_addr [10.0.5.15] cluster_addr[10.0.6.15]
osd.2 up in weight 1 up_from 935 up_thru 943 down_at 934 last_clean_interval [435,926) [v2:10.0.6.15:6896/1320177090,v1:10.0.6.15:6897/1320177090] [v2:10.0.6.15:6898/1320177090,v1:10.0.6.15:6899/1320177090] exists,up aaabbbcc-dd11-1234-5678-8f8fbf147df5
osd.4 up in weight 1 up_from 935 up_thru 943 down_at 934 last_clean_interval [93,926) [v2:10.0.6.15:6872/4023558816,v1:10.0.6.15:6873/4023558816] [v2:10.0.6.15:6874/4023558816,v1:10.0.6.15:6875/4023558816] exists,up aaabbbcc-dd11-1234-5678-3febe1700ba1
osd.8 up in weight 1 up_from 934 up_thru 943 down_at 933 last_clean_interval [97,926) [v2:10.0.5.15:6848/845822693,v1:10.0.5.15:6849/845822693] [v2:10.0.6.15:6850/845822693,v1:10.0.6.15:6851/845822693] exists,up aaabbbcc-dd11-1234-5678-1aa039b1bb07

node02 : public_addr [10.0.5.16] cluster_addr[10.0.6.16]
osd.1 up in weight 1 up_from 933 up_thru 943 down_at 932 last_clean_interval [231,926) [v2:10.0.5.106:6896/3239795858,v1:10.0.5.106:6897/3239795858] [v2:10.0.6.16:6898/3239795858,v1:10.0.6.16:6899/3239795858] exists,up aaabbbcc-dd11-1234-5678-66bd9f3a1638
osd.5 up in weight 1 up_from 930 up_thru 943 down_at 929 last_clean_interval [231,926) [v2:10.0.5.106:6840/3437485403,v1:10.0.5.106:6841/3437485403] [v2:10.0.6.16:6842/3437485403,v1:10.0.6.16:6843/3437485403] exists,up aaabbbcc-dd11-1234-5678-6df7d9e91e2b
osd.7 up in weight 1 up_from 932 up_thru 943 down_at 931 last_clean_interval [232,926) [v2:10.0.5.106:6824/222439380,v1:10.0.5.106:6825/222439380] [v2:10.0.6.16:6826/222439380,v1:10.0.6.16:6827/222439380] exists,up aaabbbcc-dd11-1234-5678-58864f1b3c49

node03 : public_addr [10.0.5.17] cluster_addr[10.0.6.17]
osd.0 up in weight 1 up_from 931 up_thru 942 down_at 930 last_clean_interval [496,926) [v2:10.0.5.107:6803/457564359,v1:10.0.5.107:6805/457564359] [v2:10.0.6.17:6807/457564359,v1:10.0.6.17:6809/457564359] exists,up aaabbbcc-dd11-1234-5678-30fdc31d5597
osd.3 up in weight 1 up_from 930 up_thru 942 down_at 929 last_clean_interval [492,926) [v2:10.0.5.107:6820/1251607472,v1:10.0.5.107:6826/1251607472] [v2:10.0.6.17:6833/1251607472,v1:10.0.6.17:6838/1251607472] exists,up aaabbbcc-dd11-1234-5678-f22a195f4255
osd.6 up in weight 1 up_from 931 up_thru 942 down_at 930 last_clean_interval [497,926) [v2:10.0.5.107:6873/4049022179,v1:10.0.5.107:6875/4049022179] [v2:10.0.6.17:6877/4049022179,v1:10.0.6.17:6879/4049022179]
exists,up aaabbbcc-dd11-1234-5678-945779f22d4a

Raise health error if we observe this inconsistency in the cluster.

Refer bugzilla https://bugzilla.redhat.com/show_bug.cgi?id=2083115 for more details


Related issues 2 (2 open0 closed)

Copied to RADOS - Backport #63842: reef: Add health error if one or more OSDs registered v1/v2 public ip addresses are not within defined public_network subnet In ProgressPrashant DActions
Copied to RADOS - Backport #63843: quincy: Add health error if one or more OSDs registered v1/v2 public ip addresses are not within defined public_network subnet In ProgressPrashant DActions
Actions #1

Updated by Prashant D almost 2 years ago

  • Status changed from New to Fix Under Review
  • Assignee set to Prashant D
  • Pull request ID set to 46692
Actions #2

Updated by Prashant D 11 months ago

  • Backport set to reef, quincy
Actions #3

Updated by Radoslaw Zarzynski 4 months ago

  • Status changed from Fix Under Review to Pending Backport
Actions #4

Updated by Backport Bot 4 months ago

  • Copied to Backport #63842: reef: Add health error if one or more OSDs registered v1/v2 public ip addresses are not within defined public_network subnet added
Actions #5

Updated by Backport Bot 4 months ago

  • Copied to Backport #63843: quincy: Add health error if one or more OSDs registered v1/v2 public ip addresses are not within defined public_network subnet added
Actions #6

Updated by Backport Bot 4 months ago

  • Tags set to backport_processed
Actions

Also available in: Atom PDF