Project

General

Profile

Actions

Feature #63889

open

OSD (trim_maps): Missing periodic MOSDMap updates from monitor

Added by jianwei zhang 5 months ago. Updated 5 months ago.

Status:
New
Priority:
Low
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Reviewed:
12/25/2023
Affected Versions:
Component(RADOS):
OSD
Pull request ID:

Description

OSDMap trimming lower bound is originated from the monitor's `osdmap_first_committed` epoch.
This value is shared from the monitor to the OSDs through OSDMap messages (MOSDMaps).
In the scenario where there is no OSDMap related activity, the monitor may have new `osdmap_first_committed` epoch without informing the OSDs about it.
As a result the OSDSuperblock::cluster_osdmap_trim_lower_bound (mon's `osdmap_first_committed`)
will remain "stale" until the next MOSDMap shared from the monitor. Since the tlb is stale, trim_maps calls won't actually trim any maps.

To resolve this scenario, we can enhance the monitor's behavior to share a MOSDMap with the new cluster_osdmap_trim_lower_bound
regardless of any MOSDMap activity.

ref-PR: (osd: fix scheduling of OSD::trim_maps is not timely) https://github.com/ceph/ceph/pull/54686

Problem Description
1. ceph -s check HEALTH_OK, and all pg are active+lean
2. ceph report | grep osdmap_, check the latest osdmap epoch is 25307, the oldest osdmap epoch is 24807, the difference between the two is 500 osdmaps
3. ceph tell osd.0 status, view
- oldest_map is 21594
- superblock.cluster_osdmap_trim_lower_bound=21594, should be 24807, not as expected
- newest_map is 25307
- The difference of `3714` osdmap is not as expected
4. ceph -s --debug_ms=1 > aa 2>&1 show that cluster_osdmap_trim_lower_bound = 24807 of osd_map

[root@zjw-cmain-dev build]# ceph -s
  cluster:
    id:     713551e4-fec5-453b-ac9e-ae0716d235cb
    health: HEALTH_OK

  services:
    mon: 1 daemons, quorum a (age 5d)
    mgr: x(active, since 5d)
    osd: 2 osds: 2 up (since 2d), 2 in (since 3d)
         flags noout,nobackfill,norebalance,norecover,noscrub,nodeep-scrub

  data:
    pools:   2 pools, 256 pgs
    objects: 10.26k objects, 10 GiB
    usage:   10 GiB used, 9.2 TiB / 9.2 TiB avail
    pgs:     256 active+clean

[root@zjw-cmain-dev build]# ceph osd dump | head
epoch 25307

[root@zjw-cmain-dev build]# ceph report | grep osdmap_
report 974124877
    "osdmap_clean_epochs": {
    "osdmap_first_committed": 24807,
    "osdmap_last_committed": 25307,

25307 - 24807 = 500 epoch osdmap

[root@zjw-cmain-dev build]# ceph report | jq  .osdmap_clean_epochs
report 1902116976
{
  "min_last_epoch_clean": 25306,
  "last_epoch_clean": {
    "per_pool": [
      {
        "poolid": 1,
        "floor": 25307
      },
      {
        "poolid": 2,
        "floor": 25306
      }
    ]
  },
  "osd_epochs": [
    {
      "id": 0,
      "epoch": 25307
    },
    {
      "id": 1,
      "epoch": 25307
    }
  ]
}
```

osd.0.log
```
[root@zjw-cmain-dev build]# tail -n 1000000 out/osd.0.log |grep trim_maps | tail -n  10
2023-12-25T15:23:02.055+0800 7f9d5dfa3700  5 osd.0 25307 trim_maps: min=21594 oldest_map=21594 superblock.cluster_osdmap_trim_lower_bound=21594 service.map_cache.cached_key_lower_bound=25258 
2023-12-25T15:24:04.020+0800 7f9d5dfa3700  5 osd.0 25307 trim_maps: min=21594 oldest_map=21594 superblock.cluster_osdmap_trim_lower_bound=21594 service.map_cache.cached_key_lower_bound=25258 
2023-12-25T15:25:05.060+0800 7f9d5dfa3700  5 osd.0 25307 trim_maps: min=21594 oldest_map=21594 superblock.cluster_osdmap_trim_lower_bound=21594 service.map_cache.cached_key_lower_bound=25258 
2023-12-25T15:26:06.941+0800 7f9d5dfa3700  5 osd.0 25307 trim_maps: min=21594 oldest_map=21594 superblock.cluster_osdmap_trim_lower_bound=21594 service.map_cache.cached_key_lower_bound=25258 
2023-12-25T15:27:08.033+0800 7f9d5dfa3700  5 osd.0 25307 trim_maps: min=21594 oldest_map=21594 superblock.cluster_osdmap_trim_lower_bound=21594 service.map_cache.cached_key_lower_bound=25258 
2023-12-25T15:28:09.170+0800 7f9d5dfa3700  5 osd.0 25307 trim_maps: min=21594 oldest_map=21594 superblock.cluster_osdmap_trim_lower_bound=21594 service.map_cache.cached_key_lower_bound=25258 
2023-12-25T15:29:11.127+0800 7f9d5dfa3700  5 osd.0 25307 trim_maps: min=21594 oldest_map=21594 superblock.cluster_osdmap_trim_lower_bound=21594 service.map_cache.cached_key_lower_bound=25258 
2023-12-25T15:30:12.387+0800 7f9d5dfa3700  5 osd.0 25307 trim_maps: min=21594 oldest_map=21594 superblock.cluster_osdmap_trim_lower_bound=21594 service.map_cache.cached_key_lower_bound=25258 
2023-12-25T15:31:14.312+0800 7f9d5dfa3700  5 osd.0 25307 trim_maps: min=21594 oldest_map=21594 superblock.cluster_osdmap_trim_lower_bound=21594 service.map_cache.cached_key_lower_bound=25258 
2023-12-25T15:32:16.066+0800 7f9d5dfa3700  5 osd.0 25307 trim_maps: min=21594 oldest_map=21594 superblock.cluster_osdmap_trim_lower_bound=21594 service.map_cache.cached_key_lower_bound=25258 
```

```
[root@zjw-cmain-dev build]# ceph tell osd.0 status
{
    "cluster_fsid": "713551e4-fec5-453b-ac9e-ae0716d235cb",
    "osd_fsid": "bc068df8-0acb-4164-8262-3813b40ce736",
    "whoami": 0,
    "state": "active",
    "maps": "[21594~3714]",
    "cluster_osdmap_trim_lower_bound": 21594,
    "num_pgs": 128
}
[root@zjw-cmain-dev build]# ceph tell osd.1 status
{
    "cluster_fsid": "713551e4-fec5-453b-ac9e-ae0716d235cb",
    "osd_fsid": "a62eb046-59cb-40f7-bf63-82c228ef5e30",
    "whoami": 1,
    "state": "active",
    "maps": "[21594~3714]",
    "cluster_osdmap_trim_lower_bound": 21594,
    "num_pgs": 128
}
```

```
[root@zjw-cmain-dev build]# ceph -s --debug_ms=1 > aa 2>&1

[root@zjw-cmain-dev build]# grep osdmap aa
2023-12-25T16:10:19.628+0800 7ffb2a3fc700  1 -- 127.0.0.1:0/810641284 --> v2:127.0.0.1:40815/0 -- mon_subscribe({osdmap=0}) v3 -- 0x7ffb24111330 con 0x7ffb240fef90

[root@zjw-cmain-dev build]# grep osd_map aa
2023-12-25T16:10:19.630+0800 7ffb20ff9700  1 -- 127.0.0.1:0/810641284 <== mon.0 v2:127.0.0.1:40815/0 5 ==== osd_map(25307..25307 src has 24807..25307) v4 ==== 2651+0+0 (crc 0 0 0) 0x7ffb14032bb0 con 0x7ffb240fef90

Monitor::handle_subscribe
OSDMonitor::check_osdmap_sub
sub->session->con->send_message(build_latest_full(sub->session->con_features));
OSDMonitor::build_latest_full

class MOSDMap final : public Message {
  void print(std::ostream& out) const override {
    out << "osd_map(" << get_first() << ".." << get_last();
    if (cluster_osdmap_trim_lower_bound || newest_map)
      out << " src has " << cluster_osdmap_trim_lower_bound
          << ".." << newest_map;
    out << ")";
  }
};

cluster_osdmap_trim_lower_bound = 24807
newest_map = 25307

reproduce

1. # cat start-cluster.sh
#!/bin/sh
set -x

rm -rf dev ceph.conf out

MON=1 MGR=1 OSD=1 FS=0 MDS=0 RGW=0 ../src/vstart.sh -n -l -X -b --msgr2 \
    --bluestore-devs /dev/disk/by-id/ata-ST10000NM0196-2AA131_ZA29A049 \
    --bluestore-db-devs /dev/disk/by-id/ata-INTEL_SSDSC2KG480G8_BTYG909204YN480BGN-part3 \
    --bluestore-wal-devs /dev/disk/by-id/ata-INTEL_SSDSC2KG480G8_BTYG909204YN480BGN-part1

2. # ../src/vstart.sh --inc-osd

3. # crush tree
   osd.0 in root default-root-0
   osd.1 in root default-root-1

4. # create pool
   create test-pool-0 with default-root-0 root. (single replicas)
   create test-pool-1 with default-root-1 root. (single replicas)

5. # kill -9 <osd.0.pid>

6. # change osdmap
for i in {1..10000}; do
    ceph osd pool application rm "test-pool-1" rgw "abc" 
    ceph osd pool application set "test-pool-1" rgw "abc" "efg" 
    ceph osd dump | head -n 1
done

7. # record osdmap info
  ceph osd dump | head
  ceph report | grep osdmap
  ceph report | jq .osdmap_clean_epochs
  ceph tell osd.1 status

8. # start osd.0
  init-ceph start osd.0

9. # record osdmap info
  ceph tell osd.0 status
  ceph tell osd.1 status
  ceph osd dump | head
  ceph report | grep osdmap
  ceph report | jq .osdmap_clean_epochs

10. # ceph daemon osd.0 config set debug_osd 20
      ceph daemon osd.1 config set debug_osd 20
      ceph daemon osd.0 config set osd_map_trim_min_interval 60
      ceph daemon osd.1 config set osd_map_trim_min_interval 60

11. tail -f out/osd.0.log | grep trim_maps


Files

osd0-osdmap-change.png (358 KB) osd0-osdmap-change.png jianwei zhang, 12/26/2023 01:40 AM
Actions

Also available in: Atom PDF