Project

General

Profile

Actions

Bug #21761

closed

ceph-osd consumes way too much memory during recovery

Added by Wyllys Ingersoll over 6 years ago. Updated over 5 years ago.

Status:
Can't reproduce
Priority:
Normal
Assignee:
-
Category:
OSD
Target version:
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
No
Severity:
1 - critical
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

The ceph-osd processes have been seen consuming upwards of 20GB of RAM per process when a system is recovering. Even on systems that are provisioned with 192GB RAM, 8 or 10 OSD processes can consume all of the RAM and make it impossible to bring up all of the OSDS which further complicates the recovery since the ones that dont have enough RAM to start get marked out which causes more rebalancing.

This is a critical issue for Jewel users (10.2.9) using filestore.

Example of a system with 128GB RAM, 11 OSDs (4TB drives) running a recovery operation (triggered by cephfs snapshot bug #21412):


$ top

Tasks: 460 total,   1 running, 459 sleeping,   0 stopped,   0 zombie
%Cpu(s):  4.4 us,  2.0 sy,  0.0 ni, 83.2 id, 10.3 wa,  0.0 hi,  0.2 si,  0.0 st
KiB Mem : 13193300+total,  1150304 free, 12910152+used,  1681184 buff/cache
KiB Swap:   999420 total,     2368 free,   997052 used.   483084 avail Mem 

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                                                            
   4771 ceph      20   0 18.831g 0.014t  30392 S  25.0 11.3 158:41.12 ceph-osd                                                                                           
  35087 ceph      20   0 21.398g 0.017t  20092 S  25.0 13.5 145:30.42 ceph-osd                                                                                           
   3929 ceph      20   0 22.269g 0.018t  27516 S  18.8 14.5 105:34.46 ceph-osd                                                                                           
  86341 ceph      20   0 23.413g 0.019t  33712 S  12.5 15.4 100:29.06 ceph-osd                                                                                           
 108322 ceph      20   0 4398468 2.628g 483264 D  12.5  2.1   0:22.57 ceph-osd                                                                                           
   4068 ceph      20   0 22.543g 0.018t  49780 S   6.2 14.4 161:15.71 ceph-osd                                                                                           
  11898 ceph      20   0 19.647g 0.014t  34524 S   6.2 11.6 144:21.33 ceph-osd  

  $ free -h
              total        used        free      shared  buff/cache   available
Mem:           125G        123G        1.1G         88M        1.3G        385M
Swap:          975M        973M        2.3M                                                                                      

Starting new OSDs becomes nearly impossible once the system memory becomes low. The ceph-osd process will start, but then soon crashes and restarts again endlessly, the crash log looks like this:

   -10> 2017-10-11 10:18:44.480698 7f6cf18658c0  5 osd.72 pg_epoch: 303792 pg[28.36b(unlocked)] enter Initial
    -9> 2017-10-11 10:18:44.480996 7f6cf18658c0  5 osd.72 pg_epoch: 303792 pg[28.36b( v 269495'33 (0'0,269495'33] local-les=303763 n=12 ec=269489 les/c/f 303608/300422/0 303758/303760/284901) [100,72,37] r=1 lpr=0 pi=269489-303759/139 crt=269495'33 lcod 0'0 inactive NOTIFY NIBBLEWISE] exit Initial 0.000298 0 0.000000
    -8> 2017-10-11 10:18:44.481010 7f6cf18658c0  5 osd.72 pg_epoch: 303792 pg[28.36b( v 269495'33 (0'0,269495'33] local-les=303763 n=12 ec=269489 les/c/f 303608/300422/0 303758/303760/284901) [100,72,37] r=1 lpr=0 pi=269489-303759/139 crt=269495'33 lcod 0'0 inactive NOTIFY NIBBLEWISE] enter Reset
    -7> 2017-10-11 10:18:44.645087 7f6cf18658c0  5 osd.72 pg_epoch: 303768 pg[1.4f4(unlocked)] enter Initial
    -6> 2017-10-11 10:18:46.762577 7f6cf18658c0  5 osd.72 pg_epoch: 303768 pg[1.4f4( v 279387'245575 (202648'235117,279387'245575] local-les=303763 n=8170 ec=23117 les/c/f 303101/280936/0 303758/303760/302603) [67,72,100] r=1 lpr=0 pi=278053-303759/75 crt=279387'245575 lcod 0'0 inactive NOTIFY NIBBLEWISE] exit Initial 2.117488 0 0.000000
    -5> 2017-10-11 10:18:46.762616 7f6cf18658c0  5 osd.72 pg_epoch: 303768 pg[1.4f4( v 279387'245575 (202648'235117,279387'245575] local-les=303763 n=8170 ec=23117 les/c/f 303101/280936/0 303758/303760/302603) [67,72,100] r=1 lpr=0 pi=278053-303759/75 crt=279387'245575 lcod 0'0 inactive NOTIFY NIBBLEWISE] enter Reset
    -4> 2017-10-11 10:18:46.816171 7f6cf18658c0  5 osd.72 pg_epoch: 303791 pg[9.644(unlocked)] enter Initial
    -3> 2017-10-11 10:18:46.825798 7f6cf18658c0  5 osd.72 pg_epoch: 303791 pg[9.644( v 269495'38555 (35683'35552,269495'38555] local-les=303763 n=27 ec=1076 les/c/f 300534/300248/0 303758/303759/295662) [62,68,72] r=2 lpr=0 pi=203019-303758/217 crt=269495'38555 lcod 0'0 inactive NOTIFY NIBBLEWISE] exit Initial 0.009627 0 0.000000
    -2> 2017-10-11 10:18:46.825815 7f6cf18658c0  5 osd.72 pg_epoch: 303791 pg[9.644( v 269495'38555 (35683'35552,269495'38555] local-les=303763 n=27 ec=1076 les/c/f 300534/300248/0 303758/303759/295662) [62,68,72] r=2 lpr=0 pi=203019-303758/217 crt=269495'38555 lcod 0'0 inactive NOTIFY NIBBLEWISE] enter Reset
    -1> 2017-10-11 10:18:46.974700 7f6cf18658c0  5 osd.72 pg_epoch: 303798 pg[1.5a4(unlocked)] enter Initial
     0> 2017-10-11 10:18:47.517124 7f6cf18658c0 -1 *** Caught signal (Aborted) **
 in thread 7f6cf18658c0 thread_name:ceph-osd

 ceph version 10.2.9 (2ee413f77150c0f375ff6f10edd6c8f9c7d060d0)
 1: (()+0x984c4e) [0x56400a2d3c4e]
 2: (()+0x11390) [0x7f6cf0723390]
 3: (gsignal()+0x38) [0x7f6cee6c1428]
 4: (abort()+0x16a) [0x7f6cee6c302a]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x16d) [0x7f6cef00384d]
 6: (()+0x8d6b6) [0x7f6cef0016b6]
 7: (()+0x8d701) [0x7f6cef001701]
 8: (()+0x8d919) [0x7f6cef001919]
 9: (()+0x1230f) [0x7f6cf13fb30f]
 10: (operator new[](unsigned long)+0x4e7) [0x7f6cf141f4b7]
 11: (leveldb::ReadBlock(leveldb::RandomAccessFile*, leveldb::ReadOptions const&, leveldb::BlockHandle const&, leveldb::BlockContents*)+0x313) [0x7f6cf0dace63]
 12: (leveldb::Table::BlockReader(void*, leveldb::ReadOptions const&, leveldb::Slice const&)+0x276) [0x7f6cf0db1426]
 13: (()+0x421be) [0x7f6cf0db51be]
 14: (()+0x42240) [0x7f6cf0db5240]
 15: (()+0x4261e) [0x7f6cf0db561e]
 16: (()+0x3d835) [0x7f6cf0db0835]
 17: (()+0x1fffb) [0x7f6cf0d92ffb]
 18: (LevelDBStore::LevelDBWholeSpaceIteratorImpl::next()+0x8f) [0x56400a18f40f]
 19: (DBObjectMap::DBObjectMapIteratorImpl::next(bool)+0x34) [0x56400a142954]
 20: (PGLog::read_log(ObjectStore*, coll_t, coll_t, ghobject_t, pg_info_t const&, std::map<eversion_t, hobject_t, std::less<eversion_t>, std::allocator<std::pair<eversion_t const, hobject_t> > >&, PGLog::IndexedLog&, pg_missing_t&, std::__cxx11::basic_ostringstream<char, std::char_traits<char>, std::allocator<char> >&, bool, DoutPrefixProvider const*, std::set<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >*)+0xafb) [0x564009f7e97b]
 21: (PG::read_state(ObjectStore*, ceph::buffer::list&)+0x313) [0x564009dacfb3]
 22: (OSD::load_pgs()+0x87a) [0x564009ce796a]
 23: (OSD::init()+0x2026) [0x564009cf2c56]
 24: (main()+0x2ef1) [0x564009c64391]
 25: (__libc_start_main()+0xf0) [0x7f6cee6ac830]
 26: (_start()+0x29) [0x564009ca5b99]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.


Related issues 1 (0 open1 closed)

Related to RADOS - Bug #21331: pg recovery priority inversionResolvedSage Weil09/09/2017

Actions
Actions

Also available in: Atom PDF