Bug #21761
closedceph-osd consumes way too much memory during recovery
0%
Description
The ceph-osd processes have been seen consuming upwards of 20GB of RAM per process when a system is recovering. Even on systems that are provisioned with 192GB RAM, 8 or 10 OSD processes can consume all of the RAM and make it impossible to bring up all of the OSDS which further complicates the recovery since the ones that dont have enough RAM to start get marked out which causes more rebalancing.
This is a critical issue for Jewel users (10.2.9) using filestore.
Example of a system with 128GB RAM, 11 OSDs (4TB drives) running a recovery operation (triggered by cephfs snapshot bug #21412):
$ top Tasks: 460 total, 1 running, 459 sleeping, 0 stopped, 0 zombie %Cpu(s): 4.4 us, 2.0 sy, 0.0 ni, 83.2 id, 10.3 wa, 0.0 hi, 0.2 si, 0.0 st KiB Mem : 13193300+total, 1150304 free, 12910152+used, 1681184 buff/cache KiB Swap: 999420 total, 2368 free, 997052 used. 483084 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 4771 ceph 20 0 18.831g 0.014t 30392 S 25.0 11.3 158:41.12 ceph-osd 35087 ceph 20 0 21.398g 0.017t 20092 S 25.0 13.5 145:30.42 ceph-osd 3929 ceph 20 0 22.269g 0.018t 27516 S 18.8 14.5 105:34.46 ceph-osd 86341 ceph 20 0 23.413g 0.019t 33712 S 12.5 15.4 100:29.06 ceph-osd 108322 ceph 20 0 4398468 2.628g 483264 D 12.5 2.1 0:22.57 ceph-osd 4068 ceph 20 0 22.543g 0.018t 49780 S 6.2 14.4 161:15.71 ceph-osd 11898 ceph 20 0 19.647g 0.014t 34524 S 6.2 11.6 144:21.33 ceph-osd $ free -h total used free shared buff/cache available Mem: 125G 123G 1.1G 88M 1.3G 385M Swap: 975M 973M 2.3M
Starting new OSDs becomes nearly impossible once the system memory becomes low. The ceph-osd process will start, but then soon crashes and restarts again endlessly, the crash log looks like this:
-10> 2017-10-11 10:18:44.480698 7f6cf18658c0 5 osd.72 pg_epoch: 303792 pg[28.36b(unlocked)] enter Initial -9> 2017-10-11 10:18:44.480996 7f6cf18658c0 5 osd.72 pg_epoch: 303792 pg[28.36b( v 269495'33 (0'0,269495'33] local-les=303763 n=12 ec=269489 les/c/f 303608/300422/0 303758/303760/284901) [100,72,37] r=1 lpr=0 pi=269489-303759/139 crt=269495'33 lcod 0'0 inactive NOTIFY NIBBLEWISE] exit Initial 0.000298 0 0.000000 -8> 2017-10-11 10:18:44.481010 7f6cf18658c0 5 osd.72 pg_epoch: 303792 pg[28.36b( v 269495'33 (0'0,269495'33] local-les=303763 n=12 ec=269489 les/c/f 303608/300422/0 303758/303760/284901) [100,72,37] r=1 lpr=0 pi=269489-303759/139 crt=269495'33 lcod 0'0 inactive NOTIFY NIBBLEWISE] enter Reset -7> 2017-10-11 10:18:44.645087 7f6cf18658c0 5 osd.72 pg_epoch: 303768 pg[1.4f4(unlocked)] enter Initial -6> 2017-10-11 10:18:46.762577 7f6cf18658c0 5 osd.72 pg_epoch: 303768 pg[1.4f4( v 279387'245575 (202648'235117,279387'245575] local-les=303763 n=8170 ec=23117 les/c/f 303101/280936/0 303758/303760/302603) [67,72,100] r=1 lpr=0 pi=278053-303759/75 crt=279387'245575 lcod 0'0 inactive NOTIFY NIBBLEWISE] exit Initial 2.117488 0 0.000000 -5> 2017-10-11 10:18:46.762616 7f6cf18658c0 5 osd.72 pg_epoch: 303768 pg[1.4f4( v 279387'245575 (202648'235117,279387'245575] local-les=303763 n=8170 ec=23117 les/c/f 303101/280936/0 303758/303760/302603) [67,72,100] r=1 lpr=0 pi=278053-303759/75 crt=279387'245575 lcod 0'0 inactive NOTIFY NIBBLEWISE] enter Reset -4> 2017-10-11 10:18:46.816171 7f6cf18658c0 5 osd.72 pg_epoch: 303791 pg[9.644(unlocked)] enter Initial -3> 2017-10-11 10:18:46.825798 7f6cf18658c0 5 osd.72 pg_epoch: 303791 pg[9.644( v 269495'38555 (35683'35552,269495'38555] local-les=303763 n=27 ec=1076 les/c/f 300534/300248/0 303758/303759/295662) [62,68,72] r=2 lpr=0 pi=203019-303758/217 crt=269495'38555 lcod 0'0 inactive NOTIFY NIBBLEWISE] exit Initial 0.009627 0 0.000000 -2> 2017-10-11 10:18:46.825815 7f6cf18658c0 5 osd.72 pg_epoch: 303791 pg[9.644( v 269495'38555 (35683'35552,269495'38555] local-les=303763 n=27 ec=1076 les/c/f 300534/300248/0 303758/303759/295662) [62,68,72] r=2 lpr=0 pi=203019-303758/217 crt=269495'38555 lcod 0'0 inactive NOTIFY NIBBLEWISE] enter Reset -1> 2017-10-11 10:18:46.974700 7f6cf18658c0 5 osd.72 pg_epoch: 303798 pg[1.5a4(unlocked)] enter Initial 0> 2017-10-11 10:18:47.517124 7f6cf18658c0 -1 *** Caught signal (Aborted) ** in thread 7f6cf18658c0 thread_name:ceph-osd ceph version 10.2.9 (2ee413f77150c0f375ff6f10edd6c8f9c7d060d0) 1: (()+0x984c4e) [0x56400a2d3c4e] 2: (()+0x11390) [0x7f6cf0723390] 3: (gsignal()+0x38) [0x7f6cee6c1428] 4: (abort()+0x16a) [0x7f6cee6c302a] 5: (__gnu_cxx::__verbose_terminate_handler()+0x16d) [0x7f6cef00384d] 6: (()+0x8d6b6) [0x7f6cef0016b6] 7: (()+0x8d701) [0x7f6cef001701] 8: (()+0x8d919) [0x7f6cef001919] 9: (()+0x1230f) [0x7f6cf13fb30f] 10: (operator new[](unsigned long)+0x4e7) [0x7f6cf141f4b7] 11: (leveldb::ReadBlock(leveldb::RandomAccessFile*, leveldb::ReadOptions const&, leveldb::BlockHandle const&, leveldb::BlockContents*)+0x313) [0x7f6cf0dace63] 12: (leveldb::Table::BlockReader(void*, leveldb::ReadOptions const&, leveldb::Slice const&)+0x276) [0x7f6cf0db1426] 13: (()+0x421be) [0x7f6cf0db51be] 14: (()+0x42240) [0x7f6cf0db5240] 15: (()+0x4261e) [0x7f6cf0db561e] 16: (()+0x3d835) [0x7f6cf0db0835] 17: (()+0x1fffb) [0x7f6cf0d92ffb] 18: (LevelDBStore::LevelDBWholeSpaceIteratorImpl::next()+0x8f) [0x56400a18f40f] 19: (DBObjectMap::DBObjectMapIteratorImpl::next(bool)+0x34) [0x56400a142954] 20: (PGLog::read_log(ObjectStore*, coll_t, coll_t, ghobject_t, pg_info_t const&, std::map<eversion_t, hobject_t, std::less<eversion_t>, std::allocator<std::pair<eversion_t const, hobject_t> > >&, PGLog::IndexedLog&, pg_missing_t&, std::__cxx11::basic_ostringstream<char, std::char_traits<char>, std::allocator<char> >&, bool, DoutPrefixProvider const*, std::set<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >*)+0xafb) [0x564009f7e97b] 21: (PG::read_state(ObjectStore*, ceph::buffer::list&)+0x313) [0x564009dacfb3] 22: (OSD::load_pgs()+0x87a) [0x564009ce796a] 23: (OSD::init()+0x2026) [0x564009cf2c56] 24: (main()+0x2ef1) [0x564009c64391] 25: (__libc_start_main()+0xf0) [0x7f6cee6ac830] 26: (_start()+0x29) [0x564009ca5b99] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.