Actions
Bug #63609
openosd acquire map_cache_lock high latency
% Done:
0%
Source:
Community (user)
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
11/22/2023
Affected Versions:
ceph-qa-suite:
rados
Component(RADOS):
OSD
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
1 osd, 128 pgs 5 shard-queue/thread, 128/5 = 26 pgs MOSDMap receive 40 epochs osdmap once, store to bluestore, and consume_map for every pg 2023-11-22T13:02:46.218+0800 7ff03b597700 1 -- v2:127.0.0.1:6821/912494 <== mon.0 v2:127.0.0.1:40545/0 7 ==== osd_map(2571..2610 src has 252..2687) v4 ==== 639477+0+0 (crc 0 0 0) 0x55b311457500 con 0x55b3122cfc00 2023-11-22T13:02:47.078+0800 7ff03c599700 10 osd.0 2570 _committed_osd_maps 2571..2610 pg 1.4 consume osdmap 2571..2610 needs 2.271 seconds 26 pgs * 2.271 = 59.046 seconds // very high This will cause osd to be unable to catch up with the latest osdmap epoch in time. The root cause is 1. The map_cache_lock lock range is too large, it needs to protect 3 lru caches //osd map cache (past osd maps) ceph::mutex map_cache_lock = ceph::make_mutex("OSDService::map_cache_lock"); SharedLRU<epoch_t, const OSDMap> map_cache; SimpleLRU<epoch_t, ceph::buffer::list> map_bl_cache; SimpleLRU<epoch_t, ceph::buffer::list> map_bl_inc_cache; 2. When lru map_cache misses, it will read the disk and OSDMap::decode 2023-11-22T13:02:50.010 - 2023-11-22T13:02:47.739 = 2.271 秒 - OSDService::try_get_map - all_lat = 1.765460617 - map_cache_lock_lat = 1.281302137 // high latency - map_cache_lookup_lat = 0.000025702 - map_cache_key_lower_bound_lat = 0.00001841 - _get_map_bl_lat = 0.059360301 - map_decode_lat = 0.424754057
all_lat map_cache_lock_lat map_cache_lookup_lat map_cache_key_lower_bound_lat _get_map_bl_lat map_decode_lat 2023-11-22T13:02:48.078 7ff02e57d700 1 osd.0 2610 try_get_map OSDService::try_get_map all_lat 0.060132444 map_cache_lock_lat 0.049500227 map_cache_lookup_lat 0.049500927 map_cache_key_lower_bound_lat 0.000000467 _get_map_bl_lat 0.001472938 map_decode_lat 0.009158111 2023-11-22T13:02:48.092 7ff02e57d700 1 osd.0 2610 try_get_map OSDService::try_get_map all_lat 0.009916984 map_cache_lock_lat 0.000000139 map_cache_lookup_lat 0.000000458 map_cache_key_lower_bound_lat 0.000000222 _get_map_bl_lat 0.001354447 map_decode_lat 0.008561857 2023-11-22T13:02:48.162 7ff02e57d700 1 osd.0 2610 try_get_map OSDService::try_get_map all_lat 0.06527137 map_cache_lock_lat 0.054349292 map_cache_lookup_lat 0.054349929 map_cache_key_lower_bound_lat 0.000000302 _get_map_bl_lat 0.00148555 map_decode_lat 0.009435585 2023-11-22T13:02:48.228 7ff02e57d700 1 osd.0 2610 try_get_map OSDService::try_get_map all_lat 0.062032465 map_cache_lock_lat 0.050890993 map_cache_lookup_lat 0.050891679 map_cache_key_lower_bound_lat 0.000000359 _get_map_bl_lat 0.001522906 map_decode_lat 0.009617516 2023-11-22T13:02:48.296 7ff02e57d700 1 osd.0 2610 try_get_map OSDService::try_get_map all_lat 0.064207315 map_cache_lock_lat 0.053334683 map_cache_lookup_lat 0.053335417 map_cache_key_lower_bound_lat 0.000000482 _get_map_bl_lat 0.00156488 map_decode_lat 0.009306537 2023-11-22T13:02:48.356 7ff02e57d700 1 osd.0 2610 try_get_map OSDService::try_get_map all_lat 0.055843525 map_cache_lock_lat 0.037233017 map_cache_lookup_lat 0.037233509 map_cache_key_lower_bound_lat 0.00000034 _get_map_bl_lat 0.001857088 map_decode_lat 0.016752589 2023-11-22T13:02:48.427 7ff02e57d700 1 osd.0 2610 try_get_map OSDService::try_get_map all_lat 0.063933834 map_cache_lock_lat 0.042691372 map_cache_lookup_lat 0.042692229 map_cache_key_lower_bound_lat 0.000001041 _get_map_bl_lat 0.002116326 map_decode_lat 0.019124236 2023-11-22T13:02:48.496 7ff02e57d700 1 osd.0 2610 try_get_map OSDService::try_get_map all_lat 0.061010312 map_cache_lock_lat 0.042960327 map_cache_lookup_lat 0.042961389 map_cache_key_lower_bound_lat 0.000000979 _get_map_bl_lat 0.002153001 map_decode_lat 0.015894946 2023-11-22T13:02:48.567 7ff02e57d700 1 osd.0 2610 try_get_map OSDService::try_get_map all_lat 0.065189451 map_cache_lock_lat 0.050964955 map_cache_lookup_lat 0.050965529 map_cache_key_lower_bound_lat 0.000000656 _get_map_bl_lat 0.001760848 map_decode_lat 0.012462417 2023-11-22T13:02:48.649 7ff02e57d700 1 osd.0 2610 try_get_map OSDService::try_get_map all_lat 0.075792655 map_cache_lock_lat 0.05511919 map_cache_lookup_lat 0.055120368 map_cache_key_lower_bound_lat 0.000000894 _get_map_bl_lat 0.002399989 map_decode_lat 0.018271407 2023-11-22T13:02:48.673 7ff02e57d700 1 osd.0 2610 try_get_map OSDService::try_get_map all_lat 0.016864773 map_cache_lock_lat 0.000000163 map_cache_lookup_lat 0.000000714 map_cache_key_lower_bound_lat 0.000000423 _get_map_bl_lat 0.001980496 map_decode_lat 0.014883139 2023-11-22T13:02:48.746 7ff02e57d700 1 osd.0 2610 try_get_map OSDService::try_get_map all_lat 0.066411771 map_cache_lock_lat 0.053015575 map_cache_lookup_lat 0.053016394 map_cache_key_lower_bound_lat 0.000000381 _get_map_bl_lat 0.001849816 map_decode_lat 0.011545174 2023-11-22T13:02:48.812 7ff02e57d700 1 osd.0 2610 try_get_map OSDService::try_get_map all_lat 0.061037824 map_cache_lock_lat 0.050285898 map_cache_lookup_lat 0.050286617 map_cache_key_lower_bound_lat 0.000000472 _get_map_bl_lat 0.001592858 map_decode_lat 0.009157878 2023-11-22T13:02:48.886 7ff02e57d700 1 osd.0 2610 try_get_map OSDService::try_get_map all_lat 0.070571721 map_cache_lock_lat 0.05933978 map_cache_lookup_lat 0.059340537 map_cache_key_lower_bound_lat 0.000000588 _get_map_bl_lat 0.001702452 map_decode_lat 0.009528146 2023-11-22T13:02:48.971 7ff02e57d700 1 osd.0 2610 try_get_map OSDService::try_get_map all_lat 0.080418512 map_cache_lock_lat 0.060756236 map_cache_lookup_lat 0.060757279 map_cache_key_lower_bound_lat 0.000000724 _get_map_bl_lat 0.002362467 map_decode_lat 0.017298045 2023-11-22T13:02:48.994 7ff02e57d700 1 osd.0 2610 try_get_map OSDService::try_get_map all_lat 0.016526777 map_cache_lock_lat 0.000000174 map_cache_lookup_lat 0.000000666 map_cache_key_lower_bound_lat 0.000000431 _get_map_bl_lat 0.002043063 map_decode_lat 0.014482617 2023-11-22T13:02:49.071 7ff02e57d700 1 osd.0 2610 try_get_map OSDService::try_get_map all_lat 0.071925208 map_cache_lock_lat 0.052028365 map_cache_lookup_lat 0.052029613 map_cache_key_lower_bound_lat 0.000000667 _get_map_bl_lat 0.002546297 map_decode_lat 0.01734863 2023-11-22T13:02:49.157 7ff02e57d700 1 osd.0 2610 try_get_map OSDService::try_get_map all_lat 0.078764386 map_cache_lock_lat 0.058724303 map_cache_lookup_lat 0.058725547 map_cache_key_lower_bound_lat 0.000000754 _get_map_bl_lat 0.002463211 map_decode_lat 0.017574869 2023-11-22T13:02:49.240 7ff02e57d700 1 osd.0 2610 try_get_map OSDService::try_get_map all_lat 0.075854778 map_cache_lock_lat 0.055942215 map_cache_lookup_lat 0.055943493 map_cache_key_lower_bound_lat 0.000000868 _get_map_bl_lat 0.00240916 map_decode_lat 0.017501255 2023-11-22T13:02:49.263 7ff02e57d700 1 osd.0 2610 try_get_map OSDService::try_get_map all_lat 0.016464595 map_cache_lock_lat 0.000000179 map_cache_lookup_lat 0.00000073 map_cache_key_lower_bound_lat 0.000000443 _get_map_bl_lat 0.001978494 map_decode_lat 0.014484928 2023-11-22T13:02:49.346 7ff02e57d700 1 osd.0 2610 try_get_map OSDService::try_get_map all_lat 0.077999257 map_cache_lock_lat 0.058135744 map_cache_lookup_lat 0.058136925 map_cache_key_lower_bound_lat 0.000000484 _get_map_bl_lat 0.002515038 map_decode_lat 0.017346809 2023-11-22T13:02:49.431 7ff02e57d700 1 osd.0 2610 try_get_map OSDService::try_get_map all_lat 0.078063644 map_cache_lock_lat 0.056774575 map_cache_lookup_lat 0.056775816 map_cache_key_lower_bound_lat 0.00000071 _get_map_bl_lat 0.002506333 map_decode_lat 0.018780785 2023-11-22T13:02:49.517 7ff02e57d700 1 osd.0 2610 try_get_map OSDService::try_get_map all_lat 0.078985013 map_cache_lock_lat 0.057551544 map_cache_lookup_lat 0.057552725 map_cache_key_lower_bound_lat 0.000000874 _get_map_bl_lat 0.002575265 map_decode_lat 0.018856149 2023-11-22T13:02:49.597 7ff02e57d700 1 osd.0 2610 try_get_map OSDService::try_get_map all_lat 0.073418237 map_cache_lock_lat 0.053344097 map_cache_lookup_lat 0.053345256 map_cache_key_lower_bound_lat 0.000000835 _get_map_bl_lat 0.002311108 map_decode_lat 0.017761037 2023-11-22T13:02:49.681 7ff02e57d700 1 osd.0 2610 try_get_map OSDService::try_get_map all_lat 0.077397563 map_cache_lock_lat 0.056956548 map_cache_lookup_lat 0.056957699 map_cache_key_lower_bound_lat 0.000001048 _get_map_bl_lat 0.002435201 map_decode_lat 0.018003613 2023-11-22T13:02:49.762 7ff02e57d700 1 osd.0 2610 try_get_map OSDService::try_get_map all_lat 0.074050911 map_cache_lock_lat 0.056898676 map_cache_lookup_lat 0.056899667 map_cache_key_lower_bound_lat 0.000000643 _get_map_bl_lat 0.00209178 map_decode_lat 0.015058822 2023-11-22T13:02:49.846 7ff02e57d700 1 osd.0 2610 try_get_map OSDService::try_get_map all_lat 0.077292934 map_cache_lock_lat 0.057717863 map_cache_lookup_lat 0.057719007 map_cache_key_lower_bound_lat 0.000001004 _get_map_bl_lat 0.002327521 map_decode_lat 0.017245404 2023-11-22T13:02:49.927 7ff02e57d700 1 osd.0 2610 try_get_map OSDService::try_get_map all_lat 0.074901029 map_cache_lock_lat 0.056785829 map_cache_lookup_lat 0.056787081 map_cache_key_lower_bound_lat 0.000000878 _get_map_bl_lat 0.00214289 map_decode_lat 0.015970185 2023-11-22T13:02:49.948 7ff02e57d700 1 osd.0 2610 try_get_map OSDService::try_get_map all_lat 0.015181329 map_cache_lock_lat 0.000000178 map_cache_lookup_lat 0.000000639 map_cache_key_lower_bound_lat 0.000000441 _get_map_bl_lat 0.001838878 map_decode_lat 0.013341371 1.765460617 1.281302137 1.281327839 0.00001841 0.059360301 0.424754057
OSDMapRef OSDService::try_get_map(epoch_t epoch) { auto ts1 = ceph::real_clock::now(); std::lock_guard l(map_cache_lock); auto ts1_1 = ceph::real_clock::now(); OSDMapRef retval = map_cache.lookup(epoch); if (retval) { dout(30) << "get_map " << epoch << " -cached" << dendl; logger->inc(l_osd_map_cache_hit); return retval; } auto ts2 = ceph::real_clock::now(); { logger->inc(l_osd_map_cache_miss); epoch_t lb = map_cache.cached_key_lower_bound(); if (epoch < lb) { dout(30) << "get_map " << epoch << " - miss, below lower bound" << dendl; logger->inc(l_osd_map_cache_miss_low); logger->inc(l_osd_map_cache_miss_low_avg, lb - epoch); } } auto ts3 = ceph::real_clock::now(); OSDMap *map = new OSDMap; if (epoch > 0) { dout(20) << "get_map " << epoch << " - loading and decoding " << map << dendl; bufferlist bl; if (!_get_map_bl(epoch, bl) || bl.length() == 0) { derr << "failed to load OSD map for epoch " << epoch << ", got " << bl.length() << " bytes" << dendl; delete map; return OSDMapRef(); } auto ts4 = ceph::real_clock::now(); map->decode(bl); auto ts5 = ceph::real_clock::now(); auto all_lat = ts5 - ts1; auto map_cache_lock_lat = ts1_1 - ts1; auto map_cache_lookup_lat = ts2 - ts1_1; auto map_cache_key_lower_bound_lat = ts3 - ts2; auto _get_map_bl_lat = ts4 - ts3; auto map_decode_lat = ts5 - ts4; dout(1) << __func__ << " OSDService::try_get_map zhangjianwei2: " << " all_lat=" << all_lat << " map_cache_lock_lat=" << map_cache_lock_lat << " map_cache_lookup_lat=" << map_cache_lookup_lat << " map_cache_key_lower_bound_lat=" << map_cache_key_lower_bound_lat << " _get_map_bl_lat=" << _get_map_bl_lat << " map_decode_lat=" << map_decode_lat << dendl; } else { dout(20) << "get_map " << epoch << " - return initial " << map << dendl; } return _add_map(map); }
Actions