Project

General

Profile

Bug #9619

excessive mon memory usage when rbd rm 1PB

Added by Loïc Dachary over 9 years ago. Updated over 9 years ago.

Status:
Can't reproduce
Priority:
Urgent
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Development
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Steps to reproduce:

  • create a 1 peta byte rbd image
  • remove the image

the mon memory usage will grow over 10GB

mon.a.profile.0001.heap - 1.heap (402 KB) Loïc Dachary, 10/02/2014 07:55 AM

mon.a.profile.0007.heap - 7.heap (485 KB) Loïc Dachary, 10/02/2014 07:55 AM

mon.a.profile.0008.heap - 8 heap (1020 KB) Loïc Dachary, 10/03/2014 12:38 AM

History

#1 Updated by Sage Weil over 9 years ago

  • Project changed from Ceph to rbd
  • Priority changed from Normal to Urgent
  • Source changed from other to Development

#2 Updated by Sage Weil over 9 years ago

  • Project changed from rbd to Ceph

#3 Updated by Loïc Dachary over 9 years ago

  • Status changed from New to Can't reproduce
  • Assignee set to Loïc Dachary

With a vstart cluster with one monitor and three OSDs and

$ rbd create --size $((1024 * 1024 * 1024))  big
$ rbd rm big

and starting the profiler on the largest OSD
ceph tell osd.0 heap start_profiler

and checking how it grew after a while shows it is stable
$ google-pprof --text --base out/osd.0.profile.0002.heap ceph-osd out/osd.0.profile.0004.heap | head -20
Using local file ceph-osd.
Using local file out/osd.0.profile.0004.heap.
Total: 0.0 MB
     0.0  56.8%  56.8%      0.0  72.7% ReplicatedPG::do_op
     0.0  26.8%  83.6%      0.0  34.0% SharedPtrRegistry::lookup_or_create
     0.0  16.4% 100.0%      0.0  16.4% __gnu_cxx::new_allocator::allocate
     0.0   8.7% 108.7%     -0.0  -2.8% ReplicatedPG::get_snapset_context

(note the 0.0 MB means the growth is >100KB). The other OSDs have a size (according to PS) that still is lower.

#4 Updated by Loïc Dachary over 9 years ago

  • Status changed from Can't reproduce to New

Checking the OSD memory usage when the problem is MON growth is not a good idea.

#5 Updated by Loïc Dachary over 9 years ago

The mon memory indeed grows but after 30 minutes running I'm not sure it is related. And it's growing slowly.

$ google-pprof --text --base out/mon.a.profile.0001.heap ceph-osd out/mon.a.profile.0007.heap | head -20
Using local file ceph-osd.
Using local file out/mon.a.profile.0007.heap.
Total: 3.2 MB
     2.3  73.1%  73.1%      2.3  73.2% leveldb::Arena::AllocateNewBlock
     0.7  21.3%  94.4%      3.2 100.0% _init
     0.2   5.4%  99.9%      0.2   5.4% std::string::_Rep::_S_create
     0.0   0.1% 100.0%      0.0   0.1% std::vector::_M_emplace_back_aux
     0.0   0.0% 100.0%      3.2 100.0% clone
     0.0   0.0% 100.0%      0.2   7.0% leveldb::Arena::AllocateFallback
     0.0   0.0% 100.0%      2.3  73.2% leveldb::DBImpl::Write
     0.0   0.0% 100.0%      2.3  73.2% leveldb::MemTable::Add
     0.0   0.0% 100.0%      0.1   2.2% leveldb::SkipList::Insert
     0.0   0.0% 100.0%      2.3  73.2% leveldb::WriteBatch::Handler::~Handler
     0.0   0.0% 100.0%      2.3  73.2% leveldb::WriteBatch::Iterate
     0.0   0.0% 100.0%      2.3  73.2% leveldb::WriteBatchInternal::InsertInto
     0.0   0.0% 100.0%      3.2 100.0% start_thread
     0.0   0.0% 100.0%      0.2   5.8% std::__ostream_insert
     0.0   0.0% 100.0%      0.2   5.4% std::basic_streambuf::xsputn
     0.0   0.0% 100.0%     -0.0  -0.4% std::num_put::_M_insert_int
     0.0   0.0% 100.0%     -0.0  -0.4% std::num_put::do_put
     0.0   0.0% 100.0%      0.2   5.2% std::operator<< 
     0.0   0.0% 100.0%     -0.0  -0.4% std::ostream::_M_insert

#6 Updated by Loïc Dachary over 9 years ago

  • Status changed from New to Can't reproduce

At 83% completion (rbd rm big)

$ google-pprof --text --base out/mon.a.profile.0001.heap ceph-osd out/mon.a.profile.0008.heap | head -20
Using local file ceph-osd.
Using local file out/mon.a.profile.0008.heap.
Total: 3.4 MB
     2.4  69.2%  69.2%      2.4  69.2% leveldb::Arena::AllocateNewBlock
     0.8  22.9%  92.1%      3.4  99.9% _init
     0.2   5.9%  98.0%      0.2   5.9% std::string::_Rep::_S_create
     0.1   1.9%  99.9%      0.1   1.9% leveldb::ReadBlock
     0.0   0.0%  99.9%      0.0   0.0% leveldb::Table::Open
     0.0   0.0%  99.9%      0.0   0.0% allocate_dtv
     0.0   0.0%  99.9%      0.0   0.0% __fopen_internal
     0.0   0.0%  99.9%      0.0   0.0% leveldb::VersionSet::LogAndApply
     0.0   0.0% 100.0%      0.0   0.0% leveldb::Cache::~Cache
     0.0   0.0% 100.0%      0.0   0.0% leveldb::NewTwoLevelIterator
     0.0   0.0% 100.0%      0.0   0.1% leveldb::EnvWrapper::SleepForMicroseconds
     0.0   0.0% 100.0%      0.0   0.1% leveldb::DBImpl::MakeRoomForWrite
     0.0   0.0% 100.0%      0.0   0.0% leveldb::NewMergingIterator
     0.0   0.0% 100.0%      0.0   0.0% leveldb::NewDBIterator
     0.0   0.0% 100.0%      0.0   0.0% leveldb::Block::NewIterator
     0.0   0.0% 100.0%      0.0   0.0% leveldb::Version::NewConcatenatingIterator
     0.0   0.0% 100.0%      0.0   0.0% std::vector::reserve
     0.0   0.0% 100.0%      0.0   0.1% leveldb::TableCache::FindTable
     0.0   0.0% 100.0%      0.0   0.0% leveldb::MemTable::NewIterator
loic@fold:~/software/ceph/ceph/src$ ls -l out/mon.a.profile.0001.heap out/mon.a.profile.0008.heap
-rw-rw-r-- 1 loic loic  411295 oct.   2 15:39 out/mon.a.profile.0001.heap
-rw-rw-r-- 1 loic loic 1048572 oct.   3 08:10 out/mon.a.profile.0008.heap

it ist stable.

Also available in: Atom PDF