Actions
Bug #63609
openosd acquire map_cache_lock high latency
% Done:
0%
Source:
Community (user)
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
11/22/2023
Affected Versions:
ceph-qa-suite:
rados
Component(RADOS):
OSD
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
1 osd, 128 pgs 5 shard-queue/thread, 128/5 = 26 pgs MOSDMap receive 40 epochs osdmap once, store to bluestore, and consume_map for every pg 2023-11-22T13:02:46.218+0800 7ff03b597700 1 -- v2:127.0.0.1:6821/912494 <== mon.0 v2:127.0.0.1:40545/0 7 ==== osd_map(2571..2610 src has 252..2687) v4 ==== 639477+0+0 (crc 0 0 0) 0x55b311457500 con 0x55b3122cfc00 2023-11-22T13:02:47.078+0800 7ff03c599700 10 osd.0 2570 _committed_osd_maps 2571..2610 pg 1.4 consume osdmap 2571..2610 needs 2.271 seconds 26 pgs * 2.271 = 59.046 seconds // very high This will cause osd to be unable to catch up with the latest osdmap epoch in time. The root cause is 1. The map_cache_lock lock range is too large, it needs to protect 3 lru caches //osd map cache (past osd maps) ceph::mutex map_cache_lock = ceph::make_mutex("OSDService::map_cache_lock"); SharedLRU<epoch_t, const OSDMap> map_cache; SimpleLRU<epoch_t, ceph::buffer::list> map_bl_cache; SimpleLRU<epoch_t, ceph::buffer::list> map_bl_inc_cache; 2. When lru map_cache misses, it will read the disk and OSDMap::decode 2023-11-22T13:02:50.010 - 2023-11-22T13:02:47.739 = 2.271 秒 - OSDService::try_get_map - all_lat = 1.765460617 - map_cache_lock_lat = 1.281302137 // high latency - map_cache_lookup_lat = 0.000025702 - map_cache_key_lower_bound_lat = 0.00001841 - _get_map_bl_lat = 0.059360301 - map_decode_lat = 0.424754057
all_lat map_cache_lock_lat map_cache_lookup_lat map_cache_key_lower_bound_lat _get_map_bl_lat map_decode_lat 2023-11-22T13:02:48.078 7ff02e57d700 1 osd.0 2610 try_get_map OSDService::try_get_map all_lat 0.060132444 map_cache_lock_lat 0.049500227 map_cache_lookup_lat 0.049500927 map_cache_key_lower_bound_lat 0.000000467 _get_map_bl_lat 0.001472938 map_decode_lat 0.009158111 2023-11-22T13:02:48.092 7ff02e57d700 1 osd.0 2610 try_get_map OSDService::try_get_map all_lat 0.009916984 map_cache_lock_lat 0.000000139 map_cache_lookup_lat 0.000000458 map_cache_key_lower_bound_lat 0.000000222 _get_map_bl_lat 0.001354447 map_decode_lat 0.008561857 2023-11-22T13:02:48.162 7ff02e57d700 1 osd.0 2610 try_get_map OSDService::try_get_map all_lat 0.06527137 map_cache_lock_lat 0.054349292 map_cache_lookup_lat 0.054349929 map_cache_key_lower_bound_lat 0.000000302 _get_map_bl_lat 0.00148555 map_decode_lat 0.009435585 2023-11-22T13:02:48.228 7ff02e57d700 1 osd.0 2610 try_get_map OSDService::try_get_map all_lat 0.062032465 map_cache_lock_lat 0.050890993 map_cache_lookup_lat 0.050891679 map_cache_key_lower_bound_lat 0.000000359 _get_map_bl_lat 0.001522906 map_decode_lat 0.009617516 2023-11-22T13:02:48.296 7ff02e57d700 1 osd.0 2610 try_get_map OSDService::try_get_map all_lat 0.064207315 map_cache_lock_lat 0.053334683 map_cache_lookup_lat 0.053335417 map_cache_key_lower_bound_lat 0.000000482 _get_map_bl_lat 0.00156488 map_decode_lat 0.009306537 2023-11-22T13:02:48.356 7ff02e57d700 1 osd.0 2610 try_get_map OSDService::try_get_map all_lat 0.055843525 map_cache_lock_lat 0.037233017 map_cache_lookup_lat 0.037233509 map_cache_key_lower_bound_lat 0.00000034 _get_map_bl_lat 0.001857088 map_decode_lat 0.016752589 2023-11-22T13:02:48.427 7ff02e57d700 1 osd.0 2610 try_get_map OSDService::try_get_map all_lat 0.063933834 map_cache_lock_lat 0.042691372 map_cache_lookup_lat 0.042692229 map_cache_key_lower_bound_lat 0.000001041 _get_map_bl_lat 0.002116326 map_decode_lat 0.019124236 2023-11-22T13:02:48.496 7ff02e57d700 1 osd.0 2610 try_get_map OSDService::try_get_map all_lat 0.061010312 map_cache_lock_lat 0.042960327 map_cache_lookup_lat 0.042961389 map_cache_key_lower_bound_lat 0.000000979 _get_map_bl_lat 0.002153001 map_decode_lat 0.015894946 2023-11-22T13:02:48.567 7ff02e57d700 1 osd.0 2610 try_get_map OSDService::try_get_map all_lat 0.065189451 map_cache_lock_lat 0.050964955 map_cache_lookup_lat 0.050965529 map_cache_key_lower_bound_lat 0.000000656 _get_map_bl_lat 0.001760848 map_decode_lat 0.012462417 2023-11-22T13:02:48.649 7ff02e57d700 1 osd.0 2610 try_get_map OSDService::try_get_map all_lat 0.075792655 map_cache_lock_lat 0.05511919 map_cache_lookup_lat 0.055120368 map_cache_key_lower_bound_lat 0.000000894 _get_map_bl_lat 0.002399989 map_decode_lat 0.018271407 2023-11-22T13:02:48.673 7ff02e57d700 1 osd.0 2610 try_get_map OSDService::try_get_map all_lat 0.016864773 map_cache_lock_lat 0.000000163 map_cache_lookup_lat 0.000000714 map_cache_key_lower_bound_lat 0.000000423 _get_map_bl_lat 0.001980496 map_decode_lat 0.014883139 2023-11-22T13:02:48.746 7ff02e57d700 1 osd.0 2610 try_get_map OSDService::try_get_map all_lat 0.066411771 map_cache_lock_lat 0.053015575 map_cache_lookup_lat 0.053016394 map_cache_key_lower_bound_lat 0.000000381 _get_map_bl_lat 0.001849816 map_decode_lat 0.011545174 2023-11-22T13:02:48.812 7ff02e57d700 1 osd.0 2610 try_get_map OSDService::try_get_map all_lat 0.061037824 map_cache_lock_lat 0.050285898 map_cache_lookup_lat 0.050286617 map_cache_key_lower_bound_lat 0.000000472 _get_map_bl_lat 0.001592858 map_decode_lat 0.009157878 2023-11-22T13:02:48.886 7ff02e57d700 1 osd.0 2610 try_get_map OSDService::try_get_map all_lat 0.070571721 map_cache_lock_lat 0.05933978 map_cache_lookup_lat 0.059340537 map_cache_key_lower_bound_lat 0.000000588 _get_map_bl_lat 0.001702452 map_decode_lat 0.009528146 2023-11-22T13:02:48.971 7ff02e57d700 1 osd.0 2610 try_get_map OSDService::try_get_map all_lat 0.080418512 map_cache_lock_lat 0.060756236 map_cache_lookup_lat 0.060757279 map_cache_key_lower_bound_lat 0.000000724 _get_map_bl_lat 0.002362467 map_decode_lat 0.017298045 2023-11-22T13:02:48.994 7ff02e57d700 1 osd.0 2610 try_get_map OSDService::try_get_map all_lat 0.016526777 map_cache_lock_lat 0.000000174 map_cache_lookup_lat 0.000000666 map_cache_key_lower_bound_lat 0.000000431 _get_map_bl_lat 0.002043063 map_decode_lat 0.014482617 2023-11-22T13:02:49.071 7ff02e57d700 1 osd.0 2610 try_get_map OSDService::try_get_map all_lat 0.071925208 map_cache_lock_lat 0.052028365 map_cache_lookup_lat 0.052029613 map_cache_key_lower_bound_lat 0.000000667 _get_map_bl_lat 0.002546297 map_decode_lat 0.01734863 2023-11-22T13:02:49.157 7ff02e57d700 1 osd.0 2610 try_get_map OSDService::try_get_map all_lat 0.078764386 map_cache_lock_lat 0.058724303 map_cache_lookup_lat 0.058725547 map_cache_key_lower_bound_lat 0.000000754 _get_map_bl_lat 0.002463211 map_decode_lat 0.017574869 2023-11-22T13:02:49.240 7ff02e57d700 1 osd.0 2610 try_get_map OSDService::try_get_map all_lat 0.075854778 map_cache_lock_lat 0.055942215 map_cache_lookup_lat 0.055943493 map_cache_key_lower_bound_lat 0.000000868 _get_map_bl_lat 0.00240916 map_decode_lat 0.017501255 2023-11-22T13:02:49.263 7ff02e57d700 1 osd.0 2610 try_get_map OSDService::try_get_map all_lat 0.016464595 map_cache_lock_lat 0.000000179 map_cache_lookup_lat 0.00000073 map_cache_key_lower_bound_lat 0.000000443 _get_map_bl_lat 0.001978494 map_decode_lat 0.014484928 2023-11-22T13:02:49.346 7ff02e57d700 1 osd.0 2610 try_get_map OSDService::try_get_map all_lat 0.077999257 map_cache_lock_lat 0.058135744 map_cache_lookup_lat 0.058136925 map_cache_key_lower_bound_lat 0.000000484 _get_map_bl_lat 0.002515038 map_decode_lat 0.017346809 2023-11-22T13:02:49.431 7ff02e57d700 1 osd.0 2610 try_get_map OSDService::try_get_map all_lat 0.078063644 map_cache_lock_lat 0.056774575 map_cache_lookup_lat 0.056775816 map_cache_key_lower_bound_lat 0.00000071 _get_map_bl_lat 0.002506333 map_decode_lat 0.018780785 2023-11-22T13:02:49.517 7ff02e57d700 1 osd.0 2610 try_get_map OSDService::try_get_map all_lat 0.078985013 map_cache_lock_lat 0.057551544 map_cache_lookup_lat 0.057552725 map_cache_key_lower_bound_lat 0.000000874 _get_map_bl_lat 0.002575265 map_decode_lat 0.018856149 2023-11-22T13:02:49.597 7ff02e57d700 1 osd.0 2610 try_get_map OSDService::try_get_map all_lat 0.073418237 map_cache_lock_lat 0.053344097 map_cache_lookup_lat 0.053345256 map_cache_key_lower_bound_lat 0.000000835 _get_map_bl_lat 0.002311108 map_decode_lat 0.017761037 2023-11-22T13:02:49.681 7ff02e57d700 1 osd.0 2610 try_get_map OSDService::try_get_map all_lat 0.077397563 map_cache_lock_lat 0.056956548 map_cache_lookup_lat 0.056957699 map_cache_key_lower_bound_lat 0.000001048 _get_map_bl_lat 0.002435201 map_decode_lat 0.018003613 2023-11-22T13:02:49.762 7ff02e57d700 1 osd.0 2610 try_get_map OSDService::try_get_map all_lat 0.074050911 map_cache_lock_lat 0.056898676 map_cache_lookup_lat 0.056899667 map_cache_key_lower_bound_lat 0.000000643 _get_map_bl_lat 0.00209178 map_decode_lat 0.015058822 2023-11-22T13:02:49.846 7ff02e57d700 1 osd.0 2610 try_get_map OSDService::try_get_map all_lat 0.077292934 map_cache_lock_lat 0.057717863 map_cache_lookup_lat 0.057719007 map_cache_key_lower_bound_lat 0.000001004 _get_map_bl_lat 0.002327521 map_decode_lat 0.017245404 2023-11-22T13:02:49.927 7ff02e57d700 1 osd.0 2610 try_get_map OSDService::try_get_map all_lat 0.074901029 map_cache_lock_lat 0.056785829 map_cache_lookup_lat 0.056787081 map_cache_key_lower_bound_lat 0.000000878 _get_map_bl_lat 0.00214289 map_decode_lat 0.015970185 2023-11-22T13:02:49.948 7ff02e57d700 1 osd.0 2610 try_get_map OSDService::try_get_map all_lat 0.015181329 map_cache_lock_lat 0.000000178 map_cache_lookup_lat 0.000000639 map_cache_key_lower_bound_lat 0.000000441 _get_map_bl_lat 0.001838878 map_decode_lat 0.013341371 1.765460617 1.281302137 1.281327839 0.00001841 0.059360301 0.424754057
OSDMapRef OSDService::try_get_map(epoch_t epoch) { auto ts1 = ceph::real_clock::now(); std::lock_guard l(map_cache_lock); auto ts1_1 = ceph::real_clock::now(); OSDMapRef retval = map_cache.lookup(epoch); if (retval) { dout(30) << "get_map " << epoch << " -cached" << dendl; logger->inc(l_osd_map_cache_hit); return retval; } auto ts2 = ceph::real_clock::now(); { logger->inc(l_osd_map_cache_miss); epoch_t lb = map_cache.cached_key_lower_bound(); if (epoch < lb) { dout(30) << "get_map " << epoch << " - miss, below lower bound" << dendl; logger->inc(l_osd_map_cache_miss_low); logger->inc(l_osd_map_cache_miss_low_avg, lb - epoch); } } auto ts3 = ceph::real_clock::now(); OSDMap *map = new OSDMap; if (epoch > 0) { dout(20) << "get_map " << epoch << " - loading and decoding " << map << dendl; bufferlist bl; if (!_get_map_bl(epoch, bl) || bl.length() == 0) { derr << "failed to load OSD map for epoch " << epoch << ", got " << bl.length() << " bytes" << dendl; delete map; return OSDMapRef(); } auto ts4 = ceph::real_clock::now(); map->decode(bl); auto ts5 = ceph::real_clock::now(); auto all_lat = ts5 - ts1; auto map_cache_lock_lat = ts1_1 - ts1; auto map_cache_lookup_lat = ts2 - ts1_1; auto map_cache_key_lower_bound_lat = ts3 - ts2; auto _get_map_bl_lat = ts4 - ts3; auto map_decode_lat = ts5 - ts4; dout(1) << __func__ << " OSDService::try_get_map zhangjianwei2: " << " all_lat=" << all_lat << " map_cache_lock_lat=" << map_cache_lock_lat << " map_cache_lookup_lat=" << map_cache_lookup_lat << " map_cache_key_lower_bound_lat=" << map_cache_key_lower_bound_lat << " _get_map_bl_lat=" << _get_map_bl_lat << " map_decode_lat=" << map_decode_lat << dendl; } else { dout(20) << "get_map " << epoch << " - return initial " << map << dendl; } return _add_map(map); }
Updated by jianwei zhang 5 months ago
https://github.com/ceph/ceph/pull/54612
commit-1 ``` - MOSDMap receive 40 epochs osdmap once, - store to bluestore - then consume_map for every pg - OSD::advance_pg call try_get_map - try_get_map will acquire map_cache_lock - but very high latency - 40 epoch osdmaps need cost 1.7~2.5s - resolve - Split map_cache_lock to reduce the lock range ``` commit-2 fix: osd stuck with preboot when boot due to OSDMap consumes slowly in advance_pg question - handle_osd_map receives 40 osdmaps and persists them to the osd store - after persistence is completed, these 40 osdmaps will be applied to 128 pg - 1 osd has 128 pg, 5 shard-queue/thread - 1 shard-queue/thread is divided into about 26 pg - When pg consumes 40 osdmaps and map_cache misses, it takes about 2 seconds. - 26 pg, takes about 52 seconds - pg consumes osdmap with too long delay - This will cause handle_osd_map to block receiving subsequent osdmaps - Makes osd unable to track the latest osdmap epoch - This will cause the osd to stay in the preboot state for a long time - and even affect business IO. Solution: Exchange space for time - separate parameters to avoid mutual interference after setting - osd_map_bl_cache_size - osd_map_bl_inc_cache_size - osd_map_cache_size - osd_map_cache_burst_size - osd_map_cache_reduce_step - added osd_map_cache_burst_size configuration parameter - When OSD::handle_osd_map receives a new osdmap - allows temporarily increasing the max_size of map_cache to cache more osdmaps - OSD::tick periodically checks the max_size of map_cache - when 128 pg osdmap epochs have caught up with superblock.newest_map - restore the max_size of map_cache to osd_map_cache_size - avoid taking up too much physical memory - issue: https://tracker.ceph.com/issues/63609
Updated by jianwei zhang 5 months ago
after this PR
1 个 shard consume 3617 ==> 3877 ,all 261 osdmap epoch,sum latency 0.536 seconds
- 0.104 + 0.070 + 0.092 + 0.054 + 0.082 + 0.061 + 0.046 + 0.017 + 0.010
- Note: Each value represents the total time taken by the same shard thread to process the same batch of osdmaps (for example, 40 images) of 25 pg.
2023-11-27T19:54:17.480+0800 7f3b50908700 1 -- v2:127.0.0.1:6829/978719 <== mon.0 v2:127.0.0.1:40545/0 7 ==== osd_map(3617..3656 src has 252..3875) v4 ==== 13060+0+0 (crc 0 0 0) 0x556342513880 con 0x55634338b800 2023-11-27T19:54:19.212+0800 7f3b5190a700 10 osd.0 3616 _committed_osd_maps 3617..3656 [root@zjw-cmain-dev build]# awk '{print $2}' 61 | sort -u 7f3b438ee700 7f3b440ef700 7f3b448f0700 7f3b450f1700 7f3b458f2700 [root@zjw-cmain-dev build]# awk '{print $6}' 63 |sort -u 3656 3696 3736 3776 3816 3856 3875 3876 3877 [root@zjw-cmain-dev build]# for i in $(awk '{print $6}' 63 |sort -u); do grep -w $i 63 | head -n 1 ; grep -w $i 63 | tail -n 1 ; echo&&echo;done 2023-11-27 19:54:19.231 7f3b438ee700 1 osd.0 3656 advance_pg OSD::advance_pg zhangjianwei2: all_lat 0.006511487 for_lat 0.006417336 active_map_lat 0.000092693 2023-11-27 19:54:19.335 7f3b438ee700 1 osd.0 3656 advance_pg OSD::advance_pg zhangjianwei2: all_lat 0.003731232 for_lat 0.003587536 active_map_lat 0.000143069 0.104 2023-11-27 19:54:20.201 7f3b438ee700 1 osd.0 3696 advance_pg OSD::advance_pg zhangjianwei2: all_lat 0.003752530 for_lat 0.003723646 active_map_lat 0.000027418 2023-11-27 19:54:20.271 7f3b438ee700 1 osd.0 3696 advance_pg OSD::advance_pg zhangjianwei2: all_lat 0.002460087 for_lat 0.002448241 active_map_lat 0.000011269 0.070 2023-11-27 19:54:21.159 7f3b438ee700 1 osd.0 3736 advance_pg OSD::advance_pg zhangjianwei2: all_lat 0.004881147 for_lat 0.004811188 active_map_lat 0.000068437 2023-11-27 19:54:21.251 7f3b438ee700 1 osd.0 3736 advance_pg OSD::advance_pg zhangjianwei2: all_lat 0.002777449 for_lat 0.002753038 active_map_lat 0.000023707 0.092 2023-11-27 19:54:22.125 7f3b438ee700 1 osd.0 3776 advance_pg OSD::advance_pg zhangjianwei2: all_lat 0.003139652 for_lat 0.003121953 active_map_lat 0.000016450 2023-11-27 19:54:22.179 7f3b438ee700 1 osd.0 3776 advance_pg OSD::advance_pg zhangjianwei2: all_lat 0.002182543 for_lat 0.002170591 active_map_lat 0.000011480 0.054 2023-11-27 19:54:23.051 7f3b438ee700 1 osd.0 3816 advance_pg OSD::advance_pg zhangjianwei2: all_lat 0.004775339 for_lat 0.004729768 active_map_lat 0.000043962 2023-11-27 19:54:23.133 7f3b438ee700 1 osd.0 3816 advance_pg OSD::advance_pg zhangjianwei2: all_lat 0.003050267 for_lat 0.003015324 active_map_lat 0.000034058 0.082 2023-11-27 19:54:24.006 7f3b438ee700 1 osd.0 3856 advance_pg OSD::advance_pg zhangjianwei2: all_lat 0.003120644 for_lat 0.003103860 active_map_lat 0.000015432 2023-11-27 19:54:24.067 7f3b438ee700 1 osd.0 3856 advance_pg OSD::advance_pg zhangjianwei2: all_lat 0.001953311 for_lat 0.001942356 active_map_lat 0.000010342 0.061 2023-11-27 19:54:24.518 7f3b438ee700 1 osd.0 3875 advance_pg OSD::advance_pg zhangjianwei2: all_lat 0.002515004 for_lat 0.002465258 active_map_lat 0.000048024 2023-11-27 19:54:24.564 7f3b438ee700 1 osd.0 3875 advance_pg OSD::advance_pg zhangjianwei2: all_lat 0.001613933 for_lat 0.001587265 active_map_lat 0.000025903 0.046 2023-11-27 19:54:25.830 7f3b438ee700 1 osd.0 3876 advance_pg OSD::advance_pg zhangjianwei2: all_lat 0.000649957 for_lat 0.000433233 active_map_lat 0.000215730 2023-11-27 19:54:25.847 7f3b438ee700 1 osd.0 3876 advance_pg OSD::advance_pg zhangjianwei2: all_lat 0.000828225 for_lat 0.000520408 active_map_lat 0.000307067 0.017 2023-11-27 19:54:27.799 7f3b438ee700 1 osd.0 3877 advance_pg OSD::advance_pg zhangjianwei2: all_lat 0.000473883 for_lat 0.000208642 active_map_lat 0.000263508 2023-11-27 19:54:27.809 7f3b438ee700 1 osd.0 3877 advance_pg OSD::advance_pg zhangjianwei2: all_lat 0.000349016 for_lat 0.000122499 active_map_lat 0.000225553 0.010
Updated by Radoslaw Zarzynski 5 months ago
- Status changed from New to Fix Under Review
- Pull request ID set to 54612
Actions