Bug #3883: osd: leaks memory (possibly triggered by scrubbing) on argonaut - Ceph - Ceph

Actions

Copy link

Bug #3883

closed

osd: leaks memory (possibly triggered by scrubbing) on argonaut

Added by Sage Weil over 11 years ago. Updated almost 11 years ago.

Status:

Won't Fix

Priority:

Urgent

Assignee:

Josh Durgin

Category:

OSD

Target version:

% Done:

Spent time:

2:00 h

Source:

Community (user)

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

100MB/day reported by multiple users, both on 0.48 and 0.56.1.

Some correlation with scrubbing. Possibly specific to the RBD pool.

See ML thread 'Re: [0.48.3] OSD memory leak when scrubbing'

Files

osd_mem_noscrub.png (149 KB) osd_mem_noscrub.png

OSD memory over the last 3.5 days

Sylvain Munaut, 01/31/2013 05:18 AM

Actions

Copy link

Updated by Sage Weil over 11 years ago

Priority changed from Urgent to High

Actions

Copy link

Updated by Sage Weil about 11 years ago

Subject changed from osd: new memory leak in wip-f? to osd: leaks memory (possibly triggered by scrubbing)
Description updated (diff)
Priority changed from High to Urgent
Source changed from Development to Community (user)

Actions

Copy link

Updated by Samuel Just about 11 years ago

The burnupi57 cluster (wip-f) does not appear to be leaking after all, the osds seem to have leveled off at around 350MB of memory. I left the profiler running for two days for osd.1 and the heap appears to have not grown at all in that time on osd.1. The burnupi 15 cluster running wip_bobtail_f (56 series) also does not appear to be leaking (still watching). Both appear to be scrubbing regularly.

Actions

Copy link

Updated by Sylvain Munaut about 11 years ago

File osd_mem_noscrub.png osd_mem_noscrub.png added

I disabled scrubbing using

ceph osd tell \* injectargs '--osd-scrub-min-interval 1000000'
ceph osd tell \* injectargs '--osd-scrub-max-interval 10000000'

and the leak seems to be gone.

See the graph with the OSD memory for the 12 osd processes over the last 3.5 days. Memory was rising every 24h. I did the change yesterday around 13h00 and OSDs stopped growing. OSD memory even seems to go down slowly by small blocks.

Actions

Copy link

Updated by Wido den Hollander about 11 years ago

I've also started to see this and will try to get some heap profiling done to report back.

ceph version 0.56.1 (e4a541624df62ef353e754391cbbb707f54b16f7
kernel version 3.2.0-36-generic
OSD filesystem: btrfs

In this case it's RBD running behind CloudStack, so regular KVM, no kernel RBD.

It was a 32GB machine with just two OSDs which started swapping (16GB) and eventually went OOM.

To be sure that I collect the right information:

I see osd.7 go wild and do:

ceph osd tell 7 heap start_profiler
ceph osd tell 7 heap dump
ceph osd tell 7 heap stop_profiler
kill -SEGV <osd.7's pid>

That would give me a heap file and core, correct?

Actions

Copy link

Updated by Josh Durgin about 11 years ago

On wip-f, one osd grew to consume 70% of ram. The heap profiler tells us:

(pprof) top20
Total: 2.5 MB
     1.9  75.4%  75.4%      1.9  75.4% ceph::log::Log::create_entry
     0.6  24.5%  99.9%      0.6  24.5% std::string::_Rep::_S_create
     0.0   0.1% 100.0%      0.0   0.3% ceph_heap_profiler_handle_command
     0.0   0.0% 100.0%      0.0   0.0% std::deque::_M_push_back_aux
     0.0   0.0% 100.0%      0.0   0.2% LogClient::do_log
     0.0   0.0% 100.0%      0.0   0.0% MMonCommand::print
     0.0   0.0% 100.0%      0.0   0.0% MMonSubscribeAck::print
     0.0   0.0% 100.0%      0.0   0.0% MOSDMap::print
     0.0   0.0% 100.0%      0.1   4.1% MOSDPing::print

Actions

Copy link

Updated by Greg Farnum about 11 years ago

Oh, how interesting...I wonder if this is syslog not having enough network bandwidth? Or (in the more general sense) logging messages not getting out of the OSD process and into the target due to throughput issues which are triggered by additional activity from scrub?

Actions

Copy link

Updated by Greg Farnum about 11 years ago

Something like that (or some kind of bug in the logging system that only gets hit with syslog or when not logging) would explain why we've been unable to reproduce locally but others have been seeing it so consistently, fyi.

Actions

Copy link

Updated by Josh Durgin about 11 years ago

A log dump shows nothing, so I'm guessing the log is corrupted such that it keeps logging to more and more memory without bound.

ubuntu@burnupi57:~$ sudo ceph --admin-daemon /var/run/ceph/ceph-osd.1.asok log dump 

ubuntu@burnupi57:~$

It's just using default log settings, so there's no syslog involved.

Actions

Copy link

#10

Updated by Josh Durgin about 11 years ago

(gdb) p *(g_ceph_context->_log)
$4 = {<Thread> = {_vptr.Thread = 0xb42910, thread_id = 140568769971968}, m_indirect_this = 0x3300010, m_subs = 0x330d090, m_lock = 1, m_queue_mutex = {__data = {
      __lock = 0, __count = 0, __owner = 0, __nusers = 1, __kind = 0, __spins = 0, __list = {__prev = 0x0, __next = 0x0}}, 
    __size = '\000' <repeats 12 times>, "\001", '\000' <repeats 26 times>, __align = 0}, m_flush_mutex = {__data = {__lock = 0, __count = 0, __owner = 0, 
      __nusers = 4294967290, __kind = 0, __spins = 0, __list = {__prev = 0x0, __next = 0x0}}, 
    __size = '\000' <repeats 12 times>"\372, \377\377\377", '\000' <repeats 23 times>, __align = 0}, m_cond = {__data = {__lock = 0, __futex = 322509383, 
      __total_seq = 161254692, __wakeup_seq = 161254691, __woken_seq = 161254691, __mutex = 0x3310028, __nwaiters = 2, __broadcast_seq = 0}, 
    __size = "\000\000\000\000G\032\071\023$\215\234\t\000\000\000\000#\215\234\t\000\000\000\000#\215\234\t\000\000\000\000(\000\061\003\000\000\000\000\002\000\000\000\000\000\000", __align = 1385167252638138368}, m_new = {m_len = 0, m_head = 0x0, m_tail = 0x0}, m_recent = {m_len = 1934, m_head = 0x8efea90, m_tail = 0x6719dd0}, 
  m_log_file = {static npos = <optimized out>, _M_dataplus = {<std::allocator<char>> = {<__gnu_cxx::new_allocator<char>> = {<No data fields>}, <No data fields>}, 
      _M_p = 0x330a858 "/var/log/ceph/ceph-osd.1.log"}}, m_fd = 6, m_syslog_log = -2, m_syslog_crash = -2, m_stderr_log = -1, m_stderr_crash = -1, m_stop = false, 
  m_max_new = 1000000, m_max_recent = 10000}

Actions

Copy link

#11

Updated by Sage Weil about 11 years ago

the bit that looks fishy here is m_flush_mutex.__nusers. can you see what that thread is doing in gdb?

maybe it's not being initialized properly or something...

Actions

Copy link

#12

Updated by Ian Colle about 11 years ago

Assignee set to Josh Durgin

Actions

Copy link

#13

Updated by Sage Weil about 11 years ago

Subject changed from osd: leaks memory (possibly triggered by scrubbing) to osd: leaks memory (possibly triggered by scrubbing) on argonaut
Status changed from 12 to Won't Fix

Actions

Copy link

#14

Updated by Vladislav Gorbunov almost 11 years ago

simple workaround
run this command by cron, every 5 minutes:

for osd in $(cat /var/run/ceph/osd.*.pid | xargs -I {} ps -p {} -o pid,\%mem,args|grep -v PID | awk '{if ($2 > 50) print "memory leak on process: "$0}'|awk '{print $9}'); do ceph osd set noout && service ceph stop osd.$osd && sleep 30 && service ceph start osd.$osd && ceph osd unset noout; done

this script restart the osd process if used memory >50%

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph

Custom queries

Bug #3883

osd: leaks memory (possibly triggered by scrubbing) on argonaut

Updated by Sage Weil over 11 years ago

Updated by Sage Weil about 11 years ago

Updated by Samuel Just about 11 years ago

Updated by Sylvain Munaut about 11 years ago

Updated by Wido den Hollander about 11 years ago

Updated by Josh Durgin about 11 years ago

Updated by Greg Farnum about 11 years ago

Updated by Greg Farnum about 11 years ago

Updated by Josh Durgin about 11 years ago

Updated by Josh Durgin about 11 years ago

Updated by Sage Weil about 11 years ago

Updated by Ian Colle about 11 years ago

Updated by Sage Weil about 11 years ago

Updated by Vladislav Gorbunov almost 11 years ago