Project

General

Profile

Actions

Bug #58915

open

map eXX had wrong heartbeat addr

Added by Laura Flores about 1 year ago. Updated 2 months ago.

Status:
Pending Backport
Priority:
Normal
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
backport_processed
Backport:
reef, quincy, pacific
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Occurred during "Unwinding manager ceph" task.

/a/yuriw-2023-02-22_20:55:15-rados-wip-yuri4-testing-2023-02-22-0817-quincy-distro-default-smithi/7184640/

2023-02-23T07:07:23.464 INFO:teuthology.orchestra.run.smithi156.stderr:0
2023-02-23T07:07:23.464 INFO:teuthology.orchestra.run.smithi156.stderr:1
2023-02-23T07:07:23.464 INFO:teuthology.orchestra.run.smithi156.stderr:2
2023-02-23T07:07:23.464 INFO:teuthology.orchestra.run.smithi156.stderr:abort
2023-02-23T07:07:23.464 INFO:teuthology.orchestra.run.smithi156.stderr:assert
2023-02-23T07:07:23.464 INFO:teuthology.orchestra.run.smithi156.stderr:config diff
2023-02-23T07:07:23.464 INFO:teuthology.orchestra.run.smithi156.stderr:config diff get <var>
2023-02-23T07:07:23.464 INFO:teuthology.orchestra.run.smithi156.stderr:config get <var>
2023-02-23T07:07:23.464 INFO:teuthology.orchestra.run.smithi156.stderr:config help [<var>]
2023-02-23T07:07:23.464 INFO:teuthology.orchestra.run.smithi156.stderr:config set <var> <val>...
2023-02-23T07:07:23.464 INFO:teuthology.orchestra.run.smithi156.stderr:admin_socket: invalid command
2023-02-23T07:07:23.464 INFO:tasks.ceph.ceph_manager.ceph:waiting on admin_socket for osd-3, ['dump_ops_in_flight']
2023-02-23T07:07:23.464 INFO:tasks.ceph.osd.3.smithi156.stderr:2023-02-23T07:07:23.450+0000 7f46c93173c0 -1 osd.3 51 log_to_monitors true
2023-02-23T07:07:23.860 INFO:tasks.ceph.osd.3.smithi156.stderr:2023-02-23T07:07:23.874+0000 7f46c1884700 -1 osd.3 51 set_numa_affinity unable to identify public interface '' numa node: (2) No such file or directory
2023-02-23T07:07:25.861 INFO:tasks.ceph.osd.3.smithi156.stderr:2023-02-23T07:07:25.874+0000 7f46ba075700 -1 log_channel(cluster) log [ERR] : map e66 had wrong heartbeat front addr ([v2:0.0.0.0:6828/132919,v1:0.0.0.0:6829/132919] != my [v2:172.21.15.156:6828/132919,v1:172.21.15.156:6829/132919])

osd.3.log.gz

2023-02-23T07:07:25.874+0000 7f46c72de700 10 osd.3 64 ms_handle_authentication new session 0x555603f24000 con 0x555603fb9000 entity osd.2 addr
2023-02-23T07:07:25.874+0000 7f46c72de700 10 osd.3 64 ms_handle_authentication session 0x555603f24000 osd.2 has caps osdcap[grant(*)] 'allow *'
2023-02-23T07:07:25.874+0000 7f46ba075700 10 osd.3 64 _committed_osd_maps 65..66
2023-02-23T07:07:25.874+0000 7f46ba075700 10 osd.3 64  advance to epoch 65 (<= last 66 <= newest_map 66)
2023-02-23T07:07:25.874+0000 7f46ba075700 10 osd.3 65  advance to epoch 66 (<= last 66 <= newest_map 66)
2023-02-23T07:07:25.874+0000 7f46ba075700 10 osd.3 66 up_epoch is 66
2023-02-23T07:07:25.874+0000 7f46ba075700 10 osd.3 66 boot_epoch is 66
2023-02-23T07:07:25.874+0000 7f46ba075700  1 osd.3 66 state: booting -> active
2023-02-23T07:07:25.874+0000 7f46ba075700 -1 log_channel(cluster) log [ERR] : map e66 had wrong heartbeat front addr ([v2:0.0.0.0:6828/132919,v1:0.0.0.0:6829/132919] != my [v2:172.21.15.156:6828/132919,v1:172.21.15.156:6829/132919])
2023-02-23T07:07:25.874+0000 7f46ba075700  1 osd.3 66 start_waiting_for_healthy
2023-02-23T07:07:25.874+0000 7f46c72de700 10 osd.3 66 ms_handle_authentication new session 0x555603f243c0 con 0x555603fb8400 entity osd.0 addr
2023-02-23T07:07:25.874+0000 7f46a2045700 20 osd.3 op_wq(7) _process empty q, waiting
2023-02-23T07:07:25.874+0000 7f46ba075700  1 -- [v2:172.21.15.156:6824/132919,v1:172.21.15.156:6825/132919] --> [v2:172.21.15.156:3300/0,v1:172.21.15.156:6789/0] -- mon_subscribe({osdmap=67}) v3 -- 0x555603ed7860 con 0x555603ccf000
2023-02-23T07:07:25.874+0000 7f46c72de700 10 osd.3 66 ms_handle_authentication session 0x555603f243c0 osd.0 has caps osdcap[grant(*)] 'allow *'
2023-02-23T07:07:25.874+0000 7f46ba075700  1 -- [v2:172.21.15.156:6826/132919,v1:172.21.15.156:6827/132919] rebind rebind avoid 6826,6827
2023-02-23T07:07:25.874+0000 7f46c72de700  1 -- [v2:172.21.15.156:6826/132919,v1:172.21.15.156:6827/132919] <== osd.1 v2:172.21.15.156:6810/132695 11 ==== pg_notify2(2.2 2.2 (query:66 sent:66 2.2( empty local-lis/les=38/39 n=0 ec=16/16 lis/c=38/38 les/c/f=39/39/0 sis=66) ([38,65] all_participants=1,3 intervals=([38,56] acting 1,3))) e66/66) v1 ==== 1132+0+0 (crc 0 0 0) 0x555603fa0e00 con 0x555603f64c00


Related issues 3 (2 open1 closed)

Copied to RADOS - Backport #64410: quincy: map eXX had wrong heartbeat addrIn ProgressKonstantin ShalyginActions
Copied to RADOS - Backport #64411: pacific: map eXX had wrong heartbeat addrRejectedRadoslaw ZarzynskiActions
Copied to RADOS - Backport #64412: reef: map eXX had wrong heartbeat addrIn ProgressKonstantin ShalyginActions
Actions #1

Updated by Radoslaw Zarzynski about 1 year ago

I wonder whether this is a fallout from the public_bind changes (for the overlapping IP problem) but it looks the branch is from 22 Feb while we started testing of the quincy backport on 27 Feb.

Actions #2

Updated by Radoslaw Zarzynski about 1 year ago

Verified whether the testing branch had the public_bind_addr support in OSD:

rzarz@ubulap:~/dev/ceph$ git checkout cbccb547f47ec697c2e2ecf23392cc636ea19450
M    src/crypto/isa-l/isa-l_crypto
M    src/fmt
M    src/rapidjson
M    src/s3select
M    src/seastar
HEAD is now at cbccb547f47 Merge branch 'wip-58212-quincy' of https://github.com/cfsnyder/ceph into wip-yuri4-testing-2023-02-22-0817-quincy
rzarz@ubulap:~/dev/ceph$ grep -r CEPH_PICK_ADDRESS_PUBLIC_BIND src; echo $?
1

There changes were NOT there -- the issue is unrelated.

Actions #3

Updated by Radoslaw Zarzynski about 1 year ago

The heartbeat msgr instances started with the INADDR_ANY addresses.

rzarzynski@teuthology:/a/yuriw-2023-02-22_20:55:15-rados-wip-yuri4-testing-2023-02-22-0817-quincy-distro-default-smithi/7184640$ less ./remote/smithi156/log/ceph-osd.3.log.gz
...
2023-02-23T07:07:23.874+0000 7f46c1884700 20 osd.3 51  initial client_addrs [v2:172.21.15.156:6824/132919,v1:172.21.15.156:6825/132919], cluster_addrs [v2:0.0.0.0:6826/132919,v1:0.0.0.0:6827/132919], hb_back_addrs [v2:0.0.0.0:6830/132919,v1:0.0.0.0:6831/132919], hb_front_addrs [v2:0.0.0.0:6828/132919,v1:0.0.0.0:6829/132919]
...
2023-02-23T07:07:23.874+0000 7f46c82e0700  1 --2- [v2:0.0.0.0:6828/132919,v1:0.0.0.0:6829/132919] >>  conn(0x555603f64000 0x555603f6a000 unknown :-1 s=HELLO_ACCEPTING pgs=0 cs=0 l=1 rev1=1 crypto rx=0 tx=0 comp rx=0 tx=0).handle_hello peer v2:172.21.15.156:54094/0 says I am v2:172.21.15.156:6828/70876 (socket says 172.21.15.156:6828)
...
2023-02-23T07:07:23.874+0000 7f46c1884700  1 -- [v2:0.0.0.0:6830/132919,v1:0.0.0.0:6831/132919] set_addr_unknowns assuming my addr v2:0.0.0.0:6830/132919 matches provided addr v2:172.21.15.156:6826/132919
2023-02-23T07:07:23.874+0000 7f46c1884700  1 -- [v2:0.0.0.0:6830/132919,v1:0.0.0.0:6831/132919] set_addr_unknowns assuming my addr v1:0.0.0.0:6831/132919 matches provided addr v2:172.21.15.156:6826/132919
2023-02-23T07:07:23.874+0000 7f46c82e0700  1 -- [v2:172.21.15.156:6828/132919,v1:172.21.15.156:6829/132919] learned_addr learned my addr [v2:172.21.15.156:6828/132919,v1:172.21.15.156:6829/132919] (peer_addr_for_me v2:172.21.15.156:0/70876)
...

Why the the nonce in the peer's-addr-for-me is 70876?

Just after learning the addrs:

2023-02-23T07:07:23.874+0000 7f46c82e0700  1 --2- [v2:172.21.15.156:6828/132919,v1:172.21.15.156:6829/132919] >>  conn(0x555603f64000 0x555603f6a000 unknown :-1 s=HELLO_ACCEPTING pgs=0 cs=0 l=1 rev1=1 crypto rx=0 tx=0 comp rx=0 tx=0).handle_hello state changed while learned_addr, mark_down or  replacing must be happened just now
2023-02-23T07:07:23.874+0000 7f46a2045700 10 osd.3 pg_epoch: 51 pg[2.7( empty local-lis/les=38/39 n=0 ec=16/16 lis/c=38/38 les/c/f=39/39/0 sis=38) [3,0] r=0 lpr=39 crt=0'0 mlcod 0'0 peering mbc={}] state<Started/Primary/Peering/GetInfo>:  no prior_set down osds, will clear prior_readable_until_ub before activating
2023-02-23T07:07:23.874+0000 7f46c1884700 10 osd.3 51  assuming hb_back_addrs match cluster_addrs [v2:172.21.15.156:6826/132919,v1:172.21.15.156:6827/132919]
2023-02-23T07:07:23.874+0000 7f46c7adf700  1 --2- [v2:172.21.15.156:6830/132919,v1:172.21.15.156:6831/132919] >>  conn(0x555603f64400 0x555603f6a580 unknown :-1 s=HELLO_ACCEPTING pgs=0 cs=0 l=1 rev1=1 crypto rx=0 tx=0 comp rx=0 tx=0).handle_hello state changed while learned_addr, mark_down or  replacing must be happened just now
2023-02-23T07:07:23.874+0000 7f46a2045700 10 osd.3 pg_epoch: 51 pg[2.7( empty local-lis/les=38/39 n=0 ec=16/16 lis/c=38/38 les/c/f=39/39/0 sis=38) [3,0] r=0 lpr=39 crt=0'0 mlcod 0'0 peering mbc={}] state<Started/Primary/Peering/GetInfo>:  querying info from osd.0
2023-02-23T07:07:23.874+0000 7f46c1884700  1 -- [v2:172.21.15.156:6828/132919,v1:172.21.15.156:6829/132919] set_addr_unknowns [v2:172.21.15.156:6824/132919,v1:172.21.15.156:6825/132919]
2023-02-23T07:07:23.874+0000 7f46a2045700 20 osd.3 pg_epoch: 51 pg[2.7( empty local-lis/les=38/39 n=0 ec=16/16 lis/c=38/38 les/c/f=39/39/0 sis=38) [3,0] r=0 lpr=39 crt=0'0 mlcod 0'0 peering mbc={}] prepare_stats_for_publish reporting purged_snaps []
2023-02-23T07:07:23.874+0000 7f46a2045700 15 osd.3 pg_epoch: 51 pg[2.7( empty local-lis/les=38/39 n=0 ec=16/16 lis/c=38/38 les/c/f=39/39/0 sis=38) [3,0] r=0 lpr=39 crt=0'0 mlcod 0'0 peering mbc={}] publish_stats_to_osd 39:24
2023-02-23T07:07:23.874+0000 7f46c1884700  1 -- [v2:172.21.15.156:6828/132919,v1:172.21.15.156:6829/132919] set_addr_unknowns now [v2:172.21.15.156:6828/132919,v1:172.21.15.156:6829/132919]
...
2023-02-23T07:07:23.874+0000 7f46c1884700 10 osd.3 51  final client_addrs [v2:172.21.15.156:6824/132919,v1:172.21.15.156:6825/132919], cluster_addrs [v2:172.21.15.156:6826/132919,v1:172.21.15.156:6827/132919], hb_back_addrs [v2:172.21.15.156:6830/132919,v1:172.21.15.156:6831/132919], hb_front_addrs [v2:0.0.0.0:6828/132919,v1:0.0.0.0:6829/132919]

So, hb_back_addrs was set but hb_front_addrs remained 0.0.0.0.

Actions #4

Updated by Radoslaw Zarzynski about 1 year ago

The direct reason why the OSDMap got wrong hb address is that the OSD had sent so:

2023-02-23T07:07:23.874+0000 7f46c1884700 10 osd.3 51  final client_addrs [v2:172.21.15.156:6824/132919,v1:172.21.15.156:6825/132919], cluster_addrs [v2:172.21.15.156:6826/132919,v1:172.21.15.156:6827/132919], hb_back_addrs [v2:172.21.15.156:6830/132919,v1:172.21.15.156:6831/132919], hb_front_addrs [v2:0.0.0.0:6828/132919,v1:0.0.0.0:6829/132919]
void OSD::_send_boot()
{ 
  // ...
  MOSDBoot *mboot = new MOSDBoot(
    superblock, get_osdmap_epoch(), service.get_boot_epoch(),
    hb_back_addrs, hb_front_addrs, cluster_addrs,
    CEPH_FEATURES_ALL);
  dout(10) << " final client_addrs " << client_addrs
           << ", cluster_addrs " << cluster_addrs
           << ", hb_back_addrs " << hb_back_addrs
           << ", hb_front_addrs " << hb_front_addrs
           << dendl;
  _collect_metadata(&mboot->metadata);
  monc->send_mon_message(mboot);
Actions #5

Updated by Radoslaw Zarzynski about 1 year ago

What's interesting is that the log contains:

2023-02-23T07:07:23.874+0000 7f46c1884700 10 osd.3 51  assuming hb_back_addrs match cluster_addrs [v2:172.21.15.156:6826/132919,v1:172.21.15.156:6827/132919]

BUT lacks the analogous debug for the front heartbeat messenger:

  local_connection = hb_back_server_messenger->get_loopback_connection().get();
  if (hb_back_server_messenger->set_addr_unknowns(cluster_addrs)) {
    dout(10) << " assuming hb_back_addrs match cluster_addrs " 
             << cluster_addrs << dendl;
    hb_back_addrs = hb_back_server_messenger->get_myaddrs();
  }
  if (auto session = local_connection->get_priv(); !session) {
    hb_back_server_messenger->ms_deliver_handle_fast_connect(local_connection);
  }

  local_connection = hb_front_server_messenger->get_loopback_connection().get();
  if (hb_front_server_messenger->set_addr_unknowns(client_addrs)) {
    dout(10) << " assuming hb_front_addrs match client_addrs " 
             << client_addrs << dendl;
    hb_front_addrs = hb_front_server_messenger->get_myaddrs();
  }

Why the return of set_addr_unknowns turned out to be false?

Actions #6

Updated by Radoslaw Zarzynski about 1 year ago

Why the return of set_addr_unknowns turned out to be false?

Because it was already learnt!

bool AsyncMessenger::set_addr_unknowns(const entity_addrvec_t &addrs)
{
  ldout(cct,1) << __func__ << " " << addrs << dendl;
  bool ret = false;
  std::lock_guard l{lock};

  entity_addrvec_t newaddrs = *my_addrs;
  for (auto& a : newaddrs.v) {
    if (a.is_blank_ip()) {
      int type = a.get_type();
      int port = a.get_port();
      uint32_t nonce = a.get_nonce();
      for (auto& b : addrs.v) {
        if (a.get_family() == b.get_family()) {
          ldout(cct,1) << __func__ << " assuming my addr " << a
                       << " matches provided addr " << b << dendl;
          a = b;
          a.set_nonce(nonce);
          a.set_type(type);
          a.set_port(port);
          ret = true;
          break;
        }
      }
    }
  }
  set_myaddrs(newaddrs);
  if (ret) {
    _init_local_connection();
  }
  ldout(cct,1) << __func__ << " now " << *my_addrs << dendl;
  return ret;
}

The root cause is that we cached the zeroed addresses acquired at the very beginning.

class Messenger {
  // ...
  const entity_addrvec_t& get_myaddrs() {
    return *my_addrs;
  }

Please note the get_myaddrs() return a const reference but hb_front_addrs (and friends, BTW) in OSD::_send_boot() is an lvalue:

void OSD::_send_boot()
{
  dout(10) << "_send_boot" << dendl;
  Connection *local_connection =
    cluster_messenger->get_loopback_connection().get();
  entity_addrvec_t client_addrs = client_messenger->get_myaddrs();
  entity_addrvec_t cluster_addrs = cluster_messenger->get_myaddrs();
  entity_addrvec_t hb_back_addrs = hb_back_server_messenger->get_myaddrs();
  entity_addrvec_t hb_front_addrs = hb_front_server_messenger->get_myaddrs();

  dout(20) << " initial client_addrs " << client_addrs
           << ", cluster_addrs " << cluster_addrs
           << ", hb_back_addrs " << hb_back_addrs
           << ", hb_front_addrs " << hb_front_addrs
           << dendl;
  // ...
  if (hb_front_server_messenger->set_addr_unknowns(client_addrs)) {
    dout(10) << " assuming hb_front_addrs match client_addrs " 
             << client_addrs << dendl;
    hb_front_addrs = hb_front_server_messenger->get_myaddrs();
  }
  // ...
  MOSDBoot *mboot = new MOSDBoot(
    superblock, get_osdmap_epoch(), service.get_boot_epoch(),
    hb_back_addrs, hb_front_addrs, cluster_addrs,
    CEPH_FEATURES_ALL);
  dout(10) << " final client_addrs " << client_addrs
           << ", cluster_addrs " << cluster_addrs
           << ", hb_back_addrs " << hb_back_addrs
           << ", hb_front_addrs " << hb_front_addrs
           << dendl;
  _collect_metadata(&mboot->metadata);
  monc->send_mon_message(mboot);
  set_state(STATE_BOOTING);
}
Actions #7

Updated by Radoslaw Zarzynski about 1 year ago

  • Status changed from New to In Progress
Actions #8

Updated by Radoslaw Zarzynski about 1 year ago

  • Status changed from In Progress to Fix Under Review
  • Assignee set to Radoslaw Zarzynski
  • Pull request ID set to 50422
Actions #9

Updated by Radoslaw Zarzynski about 1 year ago

  • Backport changed from quincy to quincy, pacific
Actions #10

Updated by Radoslaw Zarzynski about 1 year ago

  • Backport changed from quincy, pacific to reef, quincy, pacific
Actions #11

Updated by Laura Flores 3 months ago

Pretty sure this is the same stale messenger address issue, except with hb_back_addrs instead of hb_front_address:

/a/yuriw-2024-02-03_16:26:04-rados-wip-yuri10-testing-2024-02-02-1149-pacific-distro-default-smithi/7545493/teuthology.log

2024-02-04T00:42:34.150 INFO:tasks.ceph.osd.4.smithi005.stderr:2024-02-04T00:42:34.145+0000 7f73b2ba5200 -1 osd.4 50 log_to_monitors {default=true}
2024-02-04T00:42:34.463 INFO:tasks.ceph.osd.4.smithi005.stderr:2024-02-04T00:42:34.461+0000 7f73ab0db700 -1 osd.4 50 set_numa_affinity unable to identify public interface '' numa node: (2) No such file or directory
2024-02-04T00:42:36.466 INFO:tasks.ceph.osd.4.smithi005.stderr:2024-02-04T00:42:36.461+0000 7f73a28ca700 -1 log_channel(cluster) log [ERR] : map e69 had wrong heartbeat back addr ([v2:0.0.0.0:6838/129129,v1:0.0.0.0:6839/129129] != my [v2:172.21.15.5:6838/129129,v1:172.21.15.5:6839/129129])

After learning the addrs, hb_front_addrs was set, but hb_back_addrs remained at 0.0.0.0:

2024-02-04T00:42:34.461+0000 7f73b1b6e700  1 --2- [v2:172.21.15.5:6838/129129,v1:172.21.15.5:6839/129129] >>  conn(0x55b7b6d3c800 0x55b7b6cecd00 unknown :-1 s=HELLO_ACCEPTING pgs=0 cs=0 l=1 rev1=1 rx=0 tx=0).handle_hello state changed while learned_addr, mark_down or  replacing must be happened just now
2024-02-04T00:42:34.461+0000 7f73ab0db700  1 -- [v2:0.0.0.0:6834/129129,v1:0.0.0.0:6835/129129] set_addr_unknowns assuming my addr v1:0.0.0.0:6835/129129 matches provided addr v2:172.21.15.5:6832/129129
2024-02-04T00:42:34.461+0000 7f73ab0db700 10 osd.4 50  new session (outgoing) 0x55b7b6d36780 con=0x55b7b56b6c00 addr=v2:172.21.15.5:6834/129129
2024-02-04T00:42:34.461+0000 7f73ab0db700  1 -- [v2:172.21.15.5:6834/129129,v1:172.21.15.5:6835/129129] set_addr_unknowns now [v2:172.21.15.5:6834/129129,v1:172.21.15.5:6835/129129]
2024-02-04T00:42:34.461+0000 7f73ab0db700 10 osd.4 50  assuming cluster_addrs match client_addrs [v2:172.21.15.5:6832/129129,v1:172.21.15.5:6833/129129]
2024-02-04T00:42:34.461+0000 7f73b1b6e700  1 --2- [v2:0.0.0.0:6836/129129,v1:0.0.0.0:6837/129129] >>  conn(0x55b7b6d3cc00 0x55b7b6ced200 unknown :-1 s=NONE pgs=0 cs=0 l=0 rev1=0 rx=0 tx=0).accept
2024-02-04T00:42:34.461+0000 7f73ab0db700  1 -- [v2:172.21.15.5:6838/129129,v1:172.21.15.5:6839/129129] set_addr_unknowns [v2:172.21.15.5:6834/129129,v1:172.21.15.5:6835/129129]
2024-02-04T00:42:34.461+0000 7f73ab0db700  1 -- [v2:172.21.15.5:6838/129129,v1:172.21.15.5:6839/129129] set_addr_unknowns now [v2:172.21.15.5:6838/129129,v1:172.21.15.5:6839/129129]
2024-02-04T00:42:34.461+0000 7f73ab0db700  1 -- [v2:0.0.0.0:6836/129129,v1:0.0.0.0:6837/129129] set_addr_unknowns [v2:172.21.15.5:6832/129129,v1:172.21.15.5:6833/129129]
2024-02-04T00:42:34.461+0000 7f73ab0db700  1 -- [v2:0.0.0.0:6836/129129,v1:0.0.0.0:6837/129129] set_addr_unknowns assuming my addr v2:0.0.0.0:6836/129129 matches provided addr v2:172.21.15.5:6832/129129
2024-02-04T00:42:34.461+0000 7f73ab0db700  1 -- [v2:0.0.0.0:6836/129129,v1:0.0.0.0:6837/129129] set_addr_unknowns assuming my addr v1:0.0.0.0:6837/129129 matches provided addr v2:172.21.15.5:6832/129129
2024-02-04T00:42:34.461+0000 7f73ab0db700  1 -- [v2:172.21.15.5:6836/129129,v1:172.21.15.5:6837/129129] set_addr_unknowns now [v2:172.21.15.5:6836/129129,v1:172.21.15.5:6837/129129]
2024-02-04T00:42:34.461+0000 7f73ab0db700 10 osd.4 50  assuming hb_front_addrs match client_addrs [v2:172.21.15.5:6832/129129,v1:172.21.15.5:6833/129129]

...

2024-02-04T00:42:34.461+0000 7f73ab0db700 -1 osd.4 50 set_numa_affinity unable to identify public interface '' numa node: (2) No such file or directory
2024-02-04T00:42:34.461+0000 7f73ab0db700  1 osd.4 50 set_numa_affinity not setting numa affinity
2024-02-04T00:42:34.461+0000 7f73ab0db700 10 osd.4 50  final client_addrs [v2:172.21.15.5:6832/129129,v1:172.21.15.5:6833/129129], cluster_addrs [v2:172.21.15.5:6834/129129,v1:172.21.15.5:6835/129129], hb_back_addrs [v2:0.0.0.0:6838/129129,v1:0.0.0.0:6839/129129], hb_front_addrs [v2:172.21.15.5:6836/129129,v1:172.21.15.5:6837/129129]

Actions #12

Updated by Laura Flores 3 months ago

  • Subject changed from map eXX had wrong heartbeat front addr to map eXX had wrong heartbeat addr
Actions #13

Updated by Radoslaw Zarzynski 2 months ago

bump up

Actions #14

Updated by Radoslaw Zarzynski 2 months ago

  • Status changed from Fix Under Review to Pending Backport
Actions #15

Updated by Backport Bot 2 months ago

  • Copied to Backport #64410: quincy: map eXX had wrong heartbeat addr added
Actions #16

Updated by Backport Bot 2 months ago

  • Copied to Backport #64411: pacific: map eXX had wrong heartbeat addr added
Actions #17

Updated by Backport Bot 2 months ago

Actions #18

Updated by Backport Bot 2 months ago

  • Tags set to backport_processed
Actions

Also available in: Atom PDF