Project

General

Profile

Actions

Bug #3883

closed

osd: leaks memory (possibly triggered by scrubbing) on argonaut

Added by Sage Weil over 11 years ago. Updated almost 11 years ago.

Status:
Won't Fix
Priority:
Urgent
Assignee:
Category:
OSD
Target version:
-
% Done:

0%

Spent time:
Source:
Community (user)
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

100MB/day reported by multiple users, both on 0.48 and 0.56.1.

Some correlation with scrubbing. Possibly specific to the RBD pool.

See ML thread 'Re: [0.48.3] OSD memory leak when scrubbing'


Files

osd_mem_noscrub.png (149 KB) osd_mem_noscrub.png OSD memory over the last 3.5 days Sylvain Munaut, 01/31/2013 05:18 AM
Actions #1

Updated by Sage Weil over 11 years ago

  • Priority changed from Urgent to High
Actions #2

Updated by Sage Weil about 11 years ago

  • Subject changed from osd: new memory leak in wip-f? to osd: leaks memory (possibly triggered by scrubbing)
  • Description updated (diff)
  • Priority changed from High to Urgent
  • Source changed from Development to Community (user)
Actions #3

Updated by Samuel Just about 11 years ago

The burnupi57 cluster (wip-f) does not appear to be leaking after all, the osds seem to have leveled off at around 350MB of memory. I left the profiler running for two days for osd.1 and the heap appears to have not grown at all in that time on osd.1. The burnupi 15 cluster running wip_bobtail_f (56 series) also does not appear to be leaking (still watching). Both appear to be scrubbing regularly.

Actions #4

Updated by Sylvain Munaut about 11 years ago

I disabled scrubbing using

ceph osd tell \* injectargs '--osd-scrub-min-interval 1000000'
ceph osd tell \* injectargs '--osd-scrub-max-interval 10000000'

and the leak seems to be gone.

See the graph with the OSD memory for the 12 osd processes over the last 3.5 days. Memory was rising every 24h. I did the change yesterday around 13h00 and OSDs stopped growing. OSD memory even seems to go down slowly by small blocks.

Actions #5

Updated by Wido den Hollander about 11 years ago

I've also started to see this and will try to get some heap profiling done to report back.

  • ceph version 0.56.1 (e4a541624df62ef353e754391cbbb707f54b16f7
  • kernel version 3.2.0-36-generic
  • OSD filesystem: btrfs

In this case it's RBD running behind CloudStack, so regular KVM, no kernel RBD.

It was a 32GB machine with just two OSDs which started swapping (16GB) and eventually went OOM.

To be sure that I collect the right information:

I see osd.7 go wild and do:

ceph osd tell 7 heap start_profiler
ceph osd tell 7 heap dump
ceph osd tell 7 heap stop_profiler
kill -SEGV <osd.7's pid>

That would give me a heap file and core, correct?

Actions #6

Updated by Josh Durgin about 11 years ago

On wip-f, one osd grew to consume 70% of ram. The heap profiler tells us:

(pprof) top20
Total: 2.5 MB
     1.9  75.4%  75.4%      1.9  75.4% ceph::log::Log::create_entry
     0.6  24.5%  99.9%      0.6  24.5% std::string::_Rep::_S_create
     0.0   0.1% 100.0%      0.0   0.3% ceph_heap_profiler_handle_command
     0.0   0.0% 100.0%      0.0   0.0% std::deque::_M_push_back_aux
     0.0   0.0% 100.0%      0.0   0.2% LogClient::do_log
     0.0   0.0% 100.0%      0.0   0.0% MMonCommand::print
     0.0   0.0% 100.0%      0.0   0.0% MMonSubscribeAck::print
     0.0   0.0% 100.0%      0.0   0.0% MOSDMap::print
     0.0   0.0% 100.0%      0.1   4.1% MOSDPing::print
Actions #7

Updated by Greg Farnum about 11 years ago

Oh, how interesting...I wonder if this is syslog not having enough network bandwidth? Or (in the more general sense) logging messages not getting out of the OSD process and into the target due to throughput issues which are triggered by additional activity from scrub?

Actions #8

Updated by Greg Farnum about 11 years ago

Something like that (or some kind of bug in the logging system that only gets hit with syslog or when not logging) would explain why we've been unable to reproduce locally but others have been seeing it so consistently, fyi.

Actions #9

Updated by Josh Durgin about 11 years ago

A log dump shows nothing, so I'm guessing the log is corrupted such that it keeps logging to more and more memory without bound.

ubuntu@burnupi57:~$ sudo ceph --admin-daemon /var/run/ceph/ceph-osd.1.asok log dump 

ubuntu@burnupi57:~$

It's just using default log settings, so there's no syslog involved.

Actions #10

Updated by Josh Durgin about 11 years ago

(gdb) p *(g_ceph_context->_log)
$4 = {<Thread> = {_vptr.Thread = 0xb42910, thread_id = 140568769971968}, m_indirect_this = 0x3300010, m_subs = 0x330d090, m_lock = 1, m_queue_mutex = {__data = {
      __lock = 0, __count = 0, __owner = 0, __nusers = 1, __kind = 0, __spins = 0, __list = {__prev = 0x0, __next = 0x0}}, 
    __size = '\000' <repeats 12 times>, "\001", '\000' <repeats 26 times>, __align = 0}, m_flush_mutex = {__data = {__lock = 0, __count = 0, __owner = 0, 
      __nusers = 4294967290, __kind = 0, __spins = 0, __list = {__prev = 0x0, __next = 0x0}}, 
    __size = '\000' <repeats 12 times>"\372, \377\377\377", '\000' <repeats 23 times>, __align = 0}, m_cond = {__data = {__lock = 0, __futex = 322509383, 
      __total_seq = 161254692, __wakeup_seq = 161254691, __woken_seq = 161254691, __mutex = 0x3310028, __nwaiters = 2, __broadcast_seq = 0}, 
    __size = "\000\000\000\000G\032\071\023$\215\234\t\000\000\000\000#\215\234\t\000\000\000\000#\215\234\t\000\000\000\000(\000\061\003\000\000\000\000\002\000\000\000\000\000\000", __align = 1385167252638138368}, m_new = {m_len = 0, m_head = 0x0, m_tail = 0x0}, m_recent = {m_len = 1934, m_head = 0x8efea90, m_tail = 0x6719dd0}, 
  m_log_file = {static npos = <optimized out>, _M_dataplus = {<std::allocator<char>> = {<__gnu_cxx::new_allocator<char>> = {<No data fields>}, <No data fields>}, 
      _M_p = 0x330a858 "/var/log/ceph/ceph-osd.1.log"}}, m_fd = 6, m_syslog_log = -2, m_syslog_crash = -2, m_stderr_log = -1, m_stderr_crash = -1, m_stop = false, 
  m_max_new = 1000000, m_max_recent = 10000}
Actions #11

Updated by Sage Weil about 11 years ago

the bit that looks fishy here is m_flush_mutex.__nusers. can you see what that thread is doing in gdb?

maybe it's not being initialized properly or something...

Actions #12

Updated by Ian Colle about 11 years ago

  • Assignee set to Josh Durgin
Actions #13

Updated by Sage Weil about 11 years ago

  • Subject changed from osd: leaks memory (possibly triggered by scrubbing) to osd: leaks memory (possibly triggered by scrubbing) on argonaut
  • Status changed from 12 to Won't Fix
Actions #14

Updated by Vladislav Gorbunov almost 11 years ago

simple workaround
run this command by cron, every 5 minutes:

for osd in $(cat /var/run/ceph/osd.*.pid | xargs -I {} ps -p {} -o pid,\%mem,args|grep -v PID | awk '{if ($2 > 50) print "memory leak on process: "$0}'|awk '{print $9}'); do ceph osd set noout && service ceph stop osd.$osd && sleep 30 && service ceph start osd.$osd && ceph osd unset noout; done

this script restart the osd process if used memory >50%

Actions

Also available in: Atom PDF