Project

General

Profile

Actions

Bug #63609

open

osd acquire map_cache_lock high latency

Added by jianwei zhang 6 months ago. Updated 5 months ago.

Status:
Fix Under Review
Priority:
Normal
Assignee:
-
Category:
-
Target version:
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
11/22/2023
Affected Versions:
ceph-qa-suite:
rados
Component(RADOS):
OSD
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description


1 osd, 128 pgs
5 shard-queue/thread, 128/5 = 26 pgs

MOSDMap receive 40 epochs osdmap once, store to bluestore, and consume_map for every pg

2023-11-22T13:02:46.218+0800 7ff03b597700  1 -- v2:127.0.0.1:6821/912494 <== mon.0 v2:127.0.0.1:40545/0 7 ==== osd_map(2571..2610 src has 252..2687) v4 ==== 639477+0+0 (crc 0 0 0) 0x55b311457500 con 0x55b3122cfc00
2023-11-22T13:02:47.078+0800 7ff03c599700 10 osd.0 2570 _committed_osd_maps 2571..2610

pg 1.4 consume osdmap  2571..2610  needs 2.271 seconds

26 pgs * 2.271 = 59.046 seconds  // very high

This will cause osd to be unable to catch up with the latest osdmap epoch in time.

The root cause is
1. The map_cache_lock lock range is too large, it needs to protect 3 lru caches

   //osd map cache (past osd maps)
   ceph::mutex map_cache_lock = ceph::make_mutex("OSDService::map_cache_lock");
   SharedLRU<epoch_t, const OSDMap> map_cache;
   SimpleLRU<epoch_t, ceph::buffer::list> map_bl_cache;
   SimpleLRU<epoch_t, ceph::buffer::list> map_bl_inc_cache;

2. When lru map_cache misses, it will read the disk and OSDMap::decode

2023-11-22T13:02:50.010 - 2023-11-22T13:02:47.739 = 2.271 秒

- OSDService::try_get_map

  - all_lat = 1.765460617

  - map_cache_lock_lat = 1.281302137  // high latency

  - map_cache_lookup_lat = 0.000025702

  - map_cache_key_lower_bound_lat = 0.00001841

  - _get_map_bl_lat = 0.059360301

  - map_decode_lat = 0.424754057
                                                                                                                           all_lat        map_cache_lock_lat        map_cache_lookup_lat        map_cache_key_lower_bound_lat        _get_map_bl_lat        map_decode_lat
2023-11-22T13:02:48.078    7ff02e57d700    1    osd.0    2610    try_get_map    OSDService::try_get_map    all_lat    0.060132444    map_cache_lock_lat    0.049500227    map_cache_lookup_lat    0.049500927    map_cache_key_lower_bound_lat    0.000000467    _get_map_bl_lat    0.001472938    map_decode_lat    0.009158111
2023-11-22T13:02:48.092    7ff02e57d700    1    osd.0    2610    try_get_map    OSDService::try_get_map    all_lat    0.009916984    map_cache_lock_lat    0.000000139    map_cache_lookup_lat    0.000000458    map_cache_key_lower_bound_lat    0.000000222    _get_map_bl_lat    0.001354447    map_decode_lat    0.008561857
2023-11-22T13:02:48.162    7ff02e57d700    1    osd.0    2610    try_get_map    OSDService::try_get_map    all_lat    0.06527137    map_cache_lock_lat    0.054349292    map_cache_lookup_lat    0.054349929    map_cache_key_lower_bound_lat    0.000000302    _get_map_bl_lat    0.00148555    map_decode_lat    0.009435585
2023-11-22T13:02:48.228    7ff02e57d700    1    osd.0    2610    try_get_map    OSDService::try_get_map    all_lat    0.062032465    map_cache_lock_lat    0.050890993    map_cache_lookup_lat    0.050891679    map_cache_key_lower_bound_lat    0.000000359    _get_map_bl_lat    0.001522906    map_decode_lat    0.009617516
2023-11-22T13:02:48.296    7ff02e57d700    1    osd.0    2610    try_get_map    OSDService::try_get_map    all_lat    0.064207315    map_cache_lock_lat    0.053334683    map_cache_lookup_lat    0.053335417    map_cache_key_lower_bound_lat    0.000000482    _get_map_bl_lat    0.00156488    map_decode_lat    0.009306537
2023-11-22T13:02:48.356    7ff02e57d700    1    osd.0    2610    try_get_map    OSDService::try_get_map    all_lat    0.055843525    map_cache_lock_lat    0.037233017    map_cache_lookup_lat    0.037233509    map_cache_key_lower_bound_lat    0.00000034    _get_map_bl_lat    0.001857088    map_decode_lat    0.016752589
2023-11-22T13:02:48.427    7ff02e57d700    1    osd.0    2610    try_get_map    OSDService::try_get_map    all_lat    0.063933834    map_cache_lock_lat    0.042691372    map_cache_lookup_lat    0.042692229    map_cache_key_lower_bound_lat    0.000001041    _get_map_bl_lat    0.002116326    map_decode_lat    0.019124236
2023-11-22T13:02:48.496    7ff02e57d700    1    osd.0    2610    try_get_map    OSDService::try_get_map    all_lat    0.061010312    map_cache_lock_lat    0.042960327    map_cache_lookup_lat    0.042961389    map_cache_key_lower_bound_lat    0.000000979    _get_map_bl_lat    0.002153001    map_decode_lat    0.015894946
2023-11-22T13:02:48.567    7ff02e57d700    1    osd.0    2610    try_get_map    OSDService::try_get_map    all_lat    0.065189451    map_cache_lock_lat    0.050964955    map_cache_lookup_lat    0.050965529    map_cache_key_lower_bound_lat    0.000000656    _get_map_bl_lat    0.001760848    map_decode_lat    0.012462417
2023-11-22T13:02:48.649    7ff02e57d700    1    osd.0    2610    try_get_map    OSDService::try_get_map    all_lat    0.075792655    map_cache_lock_lat    0.05511919    map_cache_lookup_lat    0.055120368    map_cache_key_lower_bound_lat    0.000000894    _get_map_bl_lat    0.002399989    map_decode_lat    0.018271407
2023-11-22T13:02:48.673    7ff02e57d700    1    osd.0    2610    try_get_map    OSDService::try_get_map    all_lat    0.016864773    map_cache_lock_lat    0.000000163    map_cache_lookup_lat    0.000000714    map_cache_key_lower_bound_lat    0.000000423    _get_map_bl_lat    0.001980496    map_decode_lat    0.014883139
2023-11-22T13:02:48.746    7ff02e57d700    1    osd.0    2610    try_get_map    OSDService::try_get_map    all_lat    0.066411771    map_cache_lock_lat    0.053015575    map_cache_lookup_lat    0.053016394    map_cache_key_lower_bound_lat    0.000000381    _get_map_bl_lat    0.001849816    map_decode_lat    0.011545174
2023-11-22T13:02:48.812    7ff02e57d700    1    osd.0    2610    try_get_map    OSDService::try_get_map    all_lat    0.061037824    map_cache_lock_lat    0.050285898    map_cache_lookup_lat    0.050286617    map_cache_key_lower_bound_lat    0.000000472    _get_map_bl_lat    0.001592858    map_decode_lat    0.009157878
2023-11-22T13:02:48.886    7ff02e57d700    1    osd.0    2610    try_get_map    OSDService::try_get_map    all_lat    0.070571721    map_cache_lock_lat    0.05933978    map_cache_lookup_lat    0.059340537    map_cache_key_lower_bound_lat    0.000000588    _get_map_bl_lat    0.001702452    map_decode_lat    0.009528146
2023-11-22T13:02:48.971    7ff02e57d700    1    osd.0    2610    try_get_map    OSDService::try_get_map    all_lat    0.080418512    map_cache_lock_lat    0.060756236    map_cache_lookup_lat    0.060757279    map_cache_key_lower_bound_lat    0.000000724    _get_map_bl_lat    0.002362467    map_decode_lat    0.017298045
2023-11-22T13:02:48.994    7ff02e57d700    1    osd.0    2610    try_get_map    OSDService::try_get_map    all_lat    0.016526777    map_cache_lock_lat    0.000000174    map_cache_lookup_lat    0.000000666    map_cache_key_lower_bound_lat    0.000000431    _get_map_bl_lat    0.002043063    map_decode_lat    0.014482617
2023-11-22T13:02:49.071    7ff02e57d700    1    osd.0    2610    try_get_map    OSDService::try_get_map    all_lat    0.071925208    map_cache_lock_lat    0.052028365    map_cache_lookup_lat    0.052029613    map_cache_key_lower_bound_lat    0.000000667    _get_map_bl_lat    0.002546297    map_decode_lat    0.01734863
2023-11-22T13:02:49.157    7ff02e57d700    1    osd.0    2610    try_get_map    OSDService::try_get_map    all_lat    0.078764386    map_cache_lock_lat    0.058724303    map_cache_lookup_lat    0.058725547    map_cache_key_lower_bound_lat    0.000000754    _get_map_bl_lat    0.002463211    map_decode_lat    0.017574869
2023-11-22T13:02:49.240    7ff02e57d700    1    osd.0    2610    try_get_map    OSDService::try_get_map    all_lat    0.075854778    map_cache_lock_lat    0.055942215    map_cache_lookup_lat    0.055943493    map_cache_key_lower_bound_lat    0.000000868    _get_map_bl_lat    0.00240916    map_decode_lat    0.017501255
2023-11-22T13:02:49.263    7ff02e57d700    1    osd.0    2610    try_get_map    OSDService::try_get_map    all_lat    0.016464595    map_cache_lock_lat    0.000000179    map_cache_lookup_lat    0.00000073    map_cache_key_lower_bound_lat    0.000000443    _get_map_bl_lat    0.001978494    map_decode_lat    0.014484928
2023-11-22T13:02:49.346    7ff02e57d700    1    osd.0    2610    try_get_map    OSDService::try_get_map    all_lat    0.077999257    map_cache_lock_lat    0.058135744    map_cache_lookup_lat    0.058136925    map_cache_key_lower_bound_lat    0.000000484    _get_map_bl_lat    0.002515038    map_decode_lat    0.017346809
2023-11-22T13:02:49.431    7ff02e57d700    1    osd.0    2610    try_get_map    OSDService::try_get_map    all_lat    0.078063644    map_cache_lock_lat    0.056774575    map_cache_lookup_lat    0.056775816    map_cache_key_lower_bound_lat    0.00000071    _get_map_bl_lat    0.002506333    map_decode_lat    0.018780785
2023-11-22T13:02:49.517    7ff02e57d700    1    osd.0    2610    try_get_map    OSDService::try_get_map    all_lat    0.078985013    map_cache_lock_lat    0.057551544    map_cache_lookup_lat    0.057552725    map_cache_key_lower_bound_lat    0.000000874    _get_map_bl_lat    0.002575265    map_decode_lat    0.018856149
2023-11-22T13:02:49.597    7ff02e57d700    1    osd.0    2610    try_get_map    OSDService::try_get_map    all_lat    0.073418237    map_cache_lock_lat    0.053344097    map_cache_lookup_lat    0.053345256    map_cache_key_lower_bound_lat    0.000000835    _get_map_bl_lat    0.002311108    map_decode_lat    0.017761037
2023-11-22T13:02:49.681    7ff02e57d700    1    osd.0    2610    try_get_map    OSDService::try_get_map    all_lat    0.077397563    map_cache_lock_lat    0.056956548    map_cache_lookup_lat    0.056957699    map_cache_key_lower_bound_lat    0.000001048    _get_map_bl_lat    0.002435201    map_decode_lat    0.018003613
2023-11-22T13:02:49.762    7ff02e57d700    1    osd.0    2610    try_get_map    OSDService::try_get_map    all_lat    0.074050911    map_cache_lock_lat    0.056898676    map_cache_lookup_lat    0.056899667    map_cache_key_lower_bound_lat    0.000000643    _get_map_bl_lat    0.00209178    map_decode_lat    0.015058822
2023-11-22T13:02:49.846    7ff02e57d700    1    osd.0    2610    try_get_map    OSDService::try_get_map    all_lat    0.077292934    map_cache_lock_lat    0.057717863    map_cache_lookup_lat    0.057719007    map_cache_key_lower_bound_lat    0.000001004    _get_map_bl_lat    0.002327521    map_decode_lat    0.017245404
2023-11-22T13:02:49.927    7ff02e57d700    1    osd.0    2610    try_get_map    OSDService::try_get_map    all_lat    0.074901029    map_cache_lock_lat    0.056785829    map_cache_lookup_lat    0.056787081    map_cache_key_lower_bound_lat    0.000000878    _get_map_bl_lat    0.00214289    map_decode_lat    0.015970185
2023-11-22T13:02:49.948    7ff02e57d700    1    osd.0    2610    try_get_map    OSDService::try_get_map    all_lat    0.015181329    map_cache_lock_lat    0.000000178    map_cache_lookup_lat    0.000000639    map_cache_key_lower_bound_lat    0.000000441    _get_map_bl_lat    0.001838878    map_decode_lat    0.013341371
                                                                                                                     1.765460617        1.281302137        1.281327839        0.00001841        0.059360301        0.424754057
OSDMapRef OSDService::try_get_map(epoch_t epoch)
{
  auto ts1 = ceph::real_clock::now();
  std::lock_guard l(map_cache_lock);
  auto ts1_1 = ceph::real_clock::now();
  OSDMapRef retval = map_cache.lookup(epoch);
  if (retval) {
    dout(30) << "get_map " << epoch << " -cached" << dendl;
    logger->inc(l_osd_map_cache_hit);
    return retval;
  }
  auto ts2 = ceph::real_clock::now();
  {
    logger->inc(l_osd_map_cache_miss);
    epoch_t lb = map_cache.cached_key_lower_bound();
    if (epoch < lb) {
      dout(30) << "get_map " << epoch << " - miss, below lower bound" << dendl;
      logger->inc(l_osd_map_cache_miss_low);
      logger->inc(l_osd_map_cache_miss_low_avg, lb - epoch);
    }
  }

  auto ts3 = ceph::real_clock::now();
  OSDMap *map = new OSDMap;
  if (epoch > 0) {
    dout(20) << "get_map " << epoch << " - loading and decoding " << map << dendl;
    bufferlist bl;
    if (!_get_map_bl(epoch, bl) || bl.length() == 0) {
      derr << "failed to load OSD map for epoch " << epoch << ", got " << bl.length() << " bytes" << dendl;
      delete map;
      return OSDMapRef();
    }
    auto ts4 = ceph::real_clock::now();
    map->decode(bl);
    auto ts5 = ceph::real_clock::now();
    auto all_lat = ts5 - ts1;
    auto map_cache_lock_lat = ts1_1 - ts1;
    auto map_cache_lookup_lat = ts2 - ts1_1;
    auto map_cache_key_lower_bound_lat = ts3 - ts2;
    auto _get_map_bl_lat = ts4 - ts3;
    auto map_decode_lat = ts5 - ts4;
    dout(1) << __func__ << " OSDService::try_get_map zhangjianwei2: " 
           << " all_lat=" << all_lat
           << " map_cache_lock_lat=" << map_cache_lock_lat
           << " map_cache_lookup_lat=" << map_cache_lookup_lat
           << " map_cache_key_lower_bound_lat=" << map_cache_key_lower_bound_lat
           << " _get_map_bl_lat=" << _get_map_bl_lat
           << " map_decode_lat=" << map_decode_lat
           << dendl;
  } else {
    dout(20) << "get_map " << epoch << " - return initial " << map << dendl;
  }
  return _add_map(map);
}
Actions #1

Updated by jianwei zhang 5 months ago

https://github.com/ceph/ceph/pull/54612

commit-1
```
- MOSDMap receive 40 epochs osdmap once,
  - store to bluestore
  - then consume_map for every pg
    - OSD::advance_pg call try_get_map
    - try_get_map will acquire map_cache_lock
      - but very high latency
      - 40 epoch osdmaps need cost 1.7~2.5s
- resolve
  - Split map_cache_lock to reduce the lock range
```

commit-2

    fix: osd stuck with preboot when boot due to OSDMap consumes slowly in advance_pg

    question
    - handle_osd_map receives 40 osdmaps and persists them to the osd store
    - after persistence is completed, these 40 osdmaps will be applied to 128 pg
      - 1 osd has 128 pg, 5 shard-queue/thread
      - 1 shard-queue/thread is divided into about 26 pg
      - When pg consumes 40 osdmaps and map_cache misses, it takes about 2 seconds.
      - 26 pg, takes about 52 seconds
        - pg consumes osdmap with too long delay
        - This will cause handle_osd_map to block receiving subsequent osdmaps
        - Makes osd unable to track the latest osdmap epoch
        - This will cause the osd to stay in the preboot state for a long time
        - and even affect business IO.

    Solution: Exchange space for time
    - separate parameters to avoid mutual interference after setting
      - osd_map_bl_cache_size
      - osd_map_bl_inc_cache_size
      - osd_map_cache_size
      - osd_map_cache_burst_size
      - osd_map_cache_reduce_step
    - added osd_map_cache_burst_size configuration parameter
      - When OSD::handle_osd_map receives a new osdmap
        - allows temporarily increasing the max_size of map_cache to cache more osdmaps
      - OSD::tick periodically checks the max_size of map_cache
        - when 128 pg osdmap epochs have caught up with superblock.newest_map
        - restore the max_size of map_cache to osd_map_cache_size
        - avoid taking up too much physical memory

- issue: https://tracker.ceph.com/issues/63609

Actions #2

Updated by jianwei zhang 5 months ago

after this PR

1 个 shard consume 3617 ==> 3877 ,all 261 osdmap epoch,sum latency 0.536 seconds
- 0.104 + 0.070 + 0.092 + 0.054 + 0.082 + 0.061 + 0.046 + 0.017 + 0.010
- Note: Each value represents the total time taken by the same shard thread to process the same batch of osdmaps (for example, 40 images) of 25 pg.

2023-11-27T19:54:17.480+0800 7f3b50908700  1 -- v2:127.0.0.1:6829/978719 <== mon.0 v2:127.0.0.1:40545/0 7 ==== osd_map(3617..3656 src has 252..3875) v4 ==== 13060+0+0 (crc 0 0 0) 0x556342513880 con 0x55634338b800
2023-11-27T19:54:19.212+0800 7f3b5190a700 10 osd.0 3616 _committed_osd_maps 3617..3656

[root@zjw-cmain-dev build]# awk '{print $2}' 61 | sort -u
7f3b438ee700
7f3b440ef700
7f3b448f0700
7f3b450f1700
7f3b458f2700

[root@zjw-cmain-dev build]# awk '{print $6}' 63 |sort -u
3656
3696
3736
3776
3816
3856
3875
3876
3877

[root@zjw-cmain-dev build]# for i in $(awk '{print $6}' 63 |sort -u); do grep -w $i 63 | head -n 1 ; grep -w $i 63 | tail -n 1 ; echo&&echo;done
2023-11-27 19:54:19.231 7f3b438ee700  1 osd.0 3656 advance_pg OSD::advance_pg zhangjianwei2:  all_lat 0.006511487 for_lat 0.006417336 active_map_lat 0.000092693
2023-11-27 19:54:19.335 7f3b438ee700  1 osd.0 3656 advance_pg OSD::advance_pg zhangjianwei2:  all_lat 0.003731232 for_lat 0.003587536 active_map_lat 0.000143069
0.104

2023-11-27 19:54:20.201 7f3b438ee700  1 osd.0 3696 advance_pg OSD::advance_pg zhangjianwei2:  all_lat 0.003752530 for_lat 0.003723646 active_map_lat 0.000027418
2023-11-27 19:54:20.271 7f3b438ee700  1 osd.0 3696 advance_pg OSD::advance_pg zhangjianwei2:  all_lat 0.002460087 for_lat 0.002448241 active_map_lat 0.000011269
0.070

2023-11-27 19:54:21.159 7f3b438ee700  1 osd.0 3736 advance_pg OSD::advance_pg zhangjianwei2:  all_lat 0.004881147 for_lat 0.004811188 active_map_lat 0.000068437
2023-11-27 19:54:21.251 7f3b438ee700  1 osd.0 3736 advance_pg OSD::advance_pg zhangjianwei2:  all_lat 0.002777449 for_lat 0.002753038 active_map_lat 0.000023707
0.092

2023-11-27 19:54:22.125 7f3b438ee700  1 osd.0 3776 advance_pg OSD::advance_pg zhangjianwei2:  all_lat 0.003139652 for_lat 0.003121953 active_map_lat 0.000016450
2023-11-27 19:54:22.179 7f3b438ee700  1 osd.0 3776 advance_pg OSD::advance_pg zhangjianwei2:  all_lat 0.002182543 for_lat 0.002170591 active_map_lat 0.000011480
0.054

2023-11-27 19:54:23.051 7f3b438ee700  1 osd.0 3816 advance_pg OSD::advance_pg zhangjianwei2:  all_lat 0.004775339 for_lat 0.004729768 active_map_lat 0.000043962
2023-11-27 19:54:23.133 7f3b438ee700  1 osd.0 3816 advance_pg OSD::advance_pg zhangjianwei2:  all_lat 0.003050267 for_lat 0.003015324 active_map_lat 0.000034058
0.082

2023-11-27 19:54:24.006 7f3b438ee700  1 osd.0 3856 advance_pg OSD::advance_pg zhangjianwei2:  all_lat 0.003120644 for_lat 0.003103860 active_map_lat 0.000015432
2023-11-27 19:54:24.067 7f3b438ee700  1 osd.0 3856 advance_pg OSD::advance_pg zhangjianwei2:  all_lat 0.001953311 for_lat 0.001942356 active_map_lat 0.000010342
0.061

2023-11-27 19:54:24.518 7f3b438ee700  1 osd.0 3875 advance_pg OSD::advance_pg zhangjianwei2:  all_lat 0.002515004 for_lat 0.002465258 active_map_lat 0.000048024
2023-11-27 19:54:24.564 7f3b438ee700  1 osd.0 3875 advance_pg OSD::advance_pg zhangjianwei2:  all_lat 0.001613933 for_lat 0.001587265 active_map_lat 0.000025903
0.046

2023-11-27 19:54:25.830 7f3b438ee700  1 osd.0 3876 advance_pg OSD::advance_pg zhangjianwei2:  all_lat 0.000649957 for_lat 0.000433233 active_map_lat 0.000215730
2023-11-27 19:54:25.847 7f3b438ee700  1 osd.0 3876 advance_pg OSD::advance_pg zhangjianwei2:  all_lat 0.000828225 for_lat 0.000520408 active_map_lat 0.000307067
0.017

2023-11-27 19:54:27.799 7f3b438ee700  1 osd.0 3877 advance_pg OSD::advance_pg zhangjianwei2:  all_lat 0.000473883 for_lat 0.000208642 active_map_lat 0.000263508
2023-11-27 19:54:27.809 7f3b438ee700  1 osd.0 3877 advance_pg OSD::advance_pg zhangjianwei2:  all_lat 0.000349016 for_lat 0.000122499 active_map_lat 0.000225553
0.010

Actions #3

Updated by Radoslaw Zarzynski 5 months ago

  • Status changed from New to Fix Under Review
  • Pull request ID set to 54612
Actions

Also available in: Atom PDF