Bug #5700
very high memory usage after update
0%
Description
With bobtail a few month ago my osds used around 500mb after restart but grew over time, due to memory leaks.
When I upgraded from bobtail to cuttlefish 61.3 osds started to consume around 1 gb right after restart. The didn't grow much over time, so much less memory leaks it seems. But still, a lot of very memory...
Today I upgraded to latest cuttlefish 61.5 and now my osds consume 1.6 - 2.5 gb right after restart. This is really bad as it totally changes how much osds I can run on a single hardware and so I also wasn't able to restart all osds as my machines wouldn't have enough memory.
I really wonder why the osds need that much memory and as the documentation says osds should take around 500 mb,o I clearly think this is a big bug/ regression.
Please let me know how I can help to debug so this issue can be solved asap.
History
#1 Updated by Corin Langosch over 10 years ago
Just a small update: I hoped the memory usage would go down after some hours, but it stays high:
ceph version 0.61.5 (8ee10dc4bb73bdd918873f29c70eedc3c7ef1979)
root 10283 2.1 0.2 761104 134364 ? Sl Jul20 17:24 /usr/bin/ceph-mon -i a --pid-file /var/run/ceph/mon.a.pid -c /etc/ceph/ceph.conf root 14330 9.0 3.0 3286468 2012940 ? Ssl Jul20 71:40 /usr/bin/ceph-osd -i 3 --pid-file /var/run/ceph/osd.3.pid -c /etc/ceph/ceph.conf root 25295 8.1 2.8 2999896 1894104 ? Ssl Jul20 61:47 /usr/bin/ceph-osd -i 4 --pid-file /var/run/ceph/osd.4.pid -c /etc/ceph/ceph.conf root 25406 8.7 2.9 3217812 1973528 ? Ssl Jul20 65:45 /usr/bin/ceph-osd -i 5 --pid-file /var/run/ceph/osd.5.pid -c /etc/ceph/ceph.conf
on another host
root 9783 6.6 6.5 2943520 2146796 ? Ssl Jul20 52:37 /usr/bin/ceph-osd -i 12 --pid-file /var/run/ceph/osd.12.pid -c /etc/ceph/ceph.conf root 11037 7.0 4.2 2649968 1403744 ? Ssl Jul20 55:14 /usr/bin/ceph-osd -i 13 --pid-file /var/run/ceph/osd.13.pid -c /etc/ceph/ceph.conf
#2 Updated by Mark Nelson over 10 years ago
Hi,
Could you tell me a couple of things about your cluster?
How many PGs total across all of your pool?
How much replication?
When you start the cluster up and there is high memory usage, is the cluster scrubbing? (ceph -w should tell you).
Are these packages (what OS?) or did you compile yourself?
Is tcmalloc enabled?
#3 Updated by Corin Langosch over 10 years ago
Hi Mark,
Here's the output of ceph osd dump:
epoch 3388 fsid 4ac0e21b-6ea2-4ac7-8114-122bd9ba55d6 created 2013-02-17 12:50:11.549322 modified 2013-07-20 20:29:06.110250 flags pool 5 'ssd' rep size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 4096 pgp_num 4096 last_change 29 owner 0 pool 6 'hdd' rep size 2 min_size 1 crush_ruleset 1 object_hash rjenkins pg_num 4096 pgp_num 4096 last_change 30 owner 0 max_osd 15 osd.0 up in weight 1 up_from 3323 up_thru 3339 down_at 3321 last_clean_interval [3319,3320) 10.0.0.4:6804/22388 10.0.0.4:6805/22388 10.0.0.4:6806/22388 exists,up c33d9eb5-3466-4078-9da8-553768fa98fe osd.1 up in weight 1 up_from 3298 up_thru 3385 down_at 3296 last_clean_interval [3010,3295) 10.0.0.4:6802/22412 10.0.0.4:6809/22412 10.0.0.4:6810/22412 exists,up 6812e61b-d3c7-44b1-ab74-c13e4805be00 osd.2 up in weight 1 up_from 3353 up_thru 3384 down_at 3351 last_clean_interval [3302,3352) 10.0.0.4:6800/23716 10.0.0.4:6807/23716 10.0.0.4:6808/23716 exists,up a3747638-0ee8-4839-a7c8-f6936b27b2cb osd.3 up in weight 1 up_from 3327 up_thru 3339 down_at 3325 last_clean_interval [3282,3324) 10.0.0.5:6805/14328 10.0.0.5:6806/14328 10.0.0.5:6807/14328 exists,up 63a1e842-0103-48fd-91fd-8cc6b8a35859 osd.4 up in weight 1 up_from 3381 up_thru 3381 down_at 3376 last_clean_interval [3286,3375) 10.0.0.5:6800/25293 10.0.0.5:6801/25293 10.0.0.5:6802/25293 exists,up b92c0c38-4595-4b0e-a98a-78de1874100d osd.5 up in weight 1 up_from 3383 up_thru 3383 down_at 3378 last_clean_interval [3290,3377) 10.0.0.5:6803/25404 10.0.0.5:6804/25404 10.0.0.5:6808/25404 exists,up c9150d50-dfbc-4d55-9951-0caf18629444 osd.6 up in weight 1 up_from 3331 up_thru 3339 down_at 3329 last_clean_interval [3270,3328) 10.0.0.6:6800/18318 10.0.0.6:6802/18318 10.0.0.6:6803/18318 exists,up 15e5b405-adbf-46aa-b698-7bc4c69778d3 osd.7 up in weight 1 up_from 3368 up_thru 3383 down_at 3358 last_clean_interval [3278,3357) 10.0.0.6:6801/28326 10.0.0.6:6804/28326 10.0.0.6:6805/28326 exists,up 6a9014cf-19b5-48a8-a468-bfd279c1c7b5 osd.8 up in weight 1 up_from 3366 up_thru 3383 down_at 3360 last_clean_interval [3274,3359) 10.0.0.6:6806/28434 10.0.0.6:6807/28434 10.0.0.6:6808/28434 exists,up 5a4cea68-5fc8-4320-ba58-5a276dc95511 osd.9 up in weight 1 up_from 3335 up_thru 3339 down_at 3333 last_clean_interval [3262,3332) 10.0.0.7:6803/14828 10.0.0.7:6804/14828 10.0.0.7:6805/14828 exists,up a36db462-0862-4064-8200-8ff91dc7316e osd.10 up in weight 1 up_from 3349 up_thru 3383 down_at 3347 last_clean_interval [3307,3346) 10.0.0.7:6800/15947 10.0.0.7:6801/15947 10.0.0.7:6802/15947 exists,up 21915a11-30e1-4612-a35d-d2405bb63617 osd.11 down out weight 0 up_from 1062 up_thru 1367 down_at 1372 last_clean_interval [379,1056) 10.0.0.7:6803/23663 10.0.0.7:6805/23663 10.0.0.7:6806/23663 autoout,exists 576e642d-ed76-43db-9815-14bdb438e533 osd.12 up in weight 1 up_from 3339 up_thru 3339 down_at 3337 last_clean_interval [3314,3336) 10.0.0.8:6800/9781 10.0.0.8:6802/9781 10.0.0.8:6806/9781 exists,up fbb71a0c-5d97-4a23-8f74-7e1a130bb60d osd.13 up in weight 1 up_from 3343 up_thru 3383 down_at 3341 last_clean_interval [3258,3340) 10.0.0.8:6801/11035 10.0.0.8:6803/11035 10.0.0.8:6804/11035 exists,up c0acd06a-4bff-4b92-afcb-f10b8b0268c7
Every few minutes some osd is scrubbing but I don't think it has any impact on the memory usage. I restarted almost all osds and they all have quite high memory usage (1.3 gb to 2.3gb).
Here's the output of ps aux | grep ceph:
10283 2.1 0.2 728996 137620 ? Sl Jul20 61:27 /usr/bin/ceph-mon -i a --pid-file /var/run/ceph/mon.a.pid -c /etc/ceph/ceph.conf 12929 1.6 0.2 729636 161432 ? Sl Jul20 46:34 /usr/bin/ceph-mon -i b --pid-file /var/run/ceph/mon.b.pid -c /etc/ceph/ceph.conf 8747 1.6 0.2 633504 160416 ? Sl Jul20 46:50 /usr/bin/ceph-mon -i c --pid-file /var/run/ceph/mon.c.pid -c /etc/ceph/ceph.conf 22390 34.8 19.2 3166452 1573844 ? Ssl Jul20 982:04 /usr/bin/ceph-osd -i 0 --pid-file /var/run/ceph/osd.0.pid -c /etc/ceph/ceph.conf 22414 14.1 16.4 2737916 1344548 ? Ssl Jun17 7138:16 /usr/bin/ceph-osd -i 1 --pid-file /var/run/ceph/osd.1.pid -c /etc/ceph/ceph.conf 23718 15.4 18.6 3608532 1521792 ? Ssl Jun17 7766:29 /usr/bin/ceph-osd -i 2 --pid-file /var/run/ceph/osd.2.pid -c /etc/ceph/ceph.conf 14330 9.0 2.3 2982828 1518052 ? Ssl Jul20 255:49 /usr/bin/ceph-osd -i 3 --pid-file /var/run/ceph/osd.3.pid -c /etc/ceph/ceph.conf 25295 8.3 1.8 2692020 1242940 ? Ssl Jul20 232:51 /usr/bin/ceph-osd -i 4 --pid-file /var/run/ceph/osd.4.pid -c /etc/ceph/ceph.conf 25406 8.4 2.3 3017540 1526996 ? Ssl Jul20 235:07 /usr/bin/ceph-osd -i 5 --pid-file /var/run/ceph/osd.5.pid -c /etc/ceph/ceph.conf 18320 6.0 2.1 2976612 1423440 ? Ssl Jul20 170:53 /usr/bin/ceph-osd -i 6 --pid-file /var/run/ceph/osd.6.pid -c /etc/ceph/ceph.conf 28328 7.4 1.9 2766676 1263956 ? Ssl Jul20 206:47 /usr/bin/ceph-osd -i 7 --pid-file /var/run/ceph/osd.7.pid -c /etc/ceph/ceph.conf 28436 7.1 2.0 2699084 1329996 ? Ssl Jul20 198:36 /usr/bin/ceph-osd -i 8 --pid-file /var/run/ceph/osd.8.pid -c /etc/ceph/ceph.conf 14830 7.6 3.0 3618872 1998092 ? Ssl Jul20 215:25 /usr/bin/ceph-osd -i 9 --pid-file /var/run/ceph/osd.9.pid -c /etc/ceph/ceph.conf 15950 6.8 1.9 2671816 1265128 ? Ssl Jul20 192:37 /usr/bin/ceph-osd -i 10 --pid-file /var/run/ceph/osd.10.pid -c /etc/ceph/ceph.conf 9783 6.3 6.7 3517092 2221328 ? Ssl Jul20 177:16 /usr/bin/ceph-osd -i 12 --pid-file /var/run/ceph/osd.12.pid -c /etc/ceph/ceph.conf 11037 6.8 3.8 2623248 1280464 ? Ssl Jul20 192:51 /usr/bin/ceph-osd -i 13 --pid-file /var/run/ceph/osd.13.pid -c /etc/ceph/ceph.conf
As you can see I restarted everything, except osd 1 and osd 2.
System is Ubuntu 12.10, the packages are from the official ceph repository:
ii ceph 0.61.5-1quantal amd64 distributed storage and file system ii ceph-common 0.61.5-1quantal amd64 common utilities to mount and interact with a ceph storage cluster ii ceph-fs-common 0.61.5-1quantal amd64 common utilities to mount and interact with a ceph file system ii ceph-fuse 0.61.5-1quantal amd64 FUSE-based client for the Ceph distributed file system ii ceph-mds 0.61.5-1quantal amd64 metadata server for the ceph distributed file system ii libcephfs1 0.61.5-1quantal amd64 Ceph distributed file system client library
Not sure if tcmalloc is enabled, I didn't specify anything special. How can I check?
Finally here's my ceph.conf (same on all nodes):
[global] auth cluster required = cephx auth service required = cephx auth client required = cephx cephx require signatures = true public network = 10.0.0.0/24 cluster network = 10.0.0.0/24 [client] rbd cache = true rbd cache size = 33554432 rbd cache max dirty = 25165824 rbd cache target dirty = 16777216 rbd cache max dirty age = 3 [osd] osd journal size = 1000 osd journal dio = true osd journal aio = true osd journal = /xfs-drive1/$cluster-osd-$id.journal osd op threads = 8 filestore op threads = 16 filestore max sync interval = 10 filestore min sync interval = 3 [mon.a] host = n103 mon addr = 10.0.0.5:6789 [mon.b] host = n104 mon addr = 10.0.0.6:6789 [mon.c] host = n105 mon addr = 10.0.0.7:6789 [osd.0] host = n102 [osd.1] host = n102 [osd.2] host = n102 [osd.3] host = n103 [osd.4] host = n103 [osd.5] host = n103 [osd.6] host = n104 [osd.7] host = n104 [osd.8] host = n104 [osd.9] host = n105 [osd.10] host = n105 [osd.11] host = n105 [osd.12] host = n106 [osd.13] host = n106 [osd.14] host = n106
#4 Updated by Mark Nelson over 10 years ago
Hi Corin,
tcmalloc should be enabled if you are using our packages. Would you mind generating a core dump from one of the high memory OSD processes, and package it up with the ceph-osd binary?
You can get the core file by running gcore:
gcore [-o filename] pid
I've emailed you with instructions on where to send the file once you've got a core dump.
Thanks!
Mark
#5 Updated by Corin Langosch over 10 years ago
Hi Mark,
I just uploaded the archive. It's called corin.tar.gz.
While taking the core dump (which took only a couple of seconds) the osd memory usage grew from around 1.9 GB to 2.8 GB. Now, around 30 minutes later, it's still almost 2.8 GB but it seems it's slowly decreasing. Also all other osdss went down a few hundret MB each during the last few days, but they are still all consuming 1.2GB+.
Let me know if you need anything else.
Corin
#6 Updated by Sage Weil over 10 years ago
- Assignee set to David Zafman
#7 Updated by Ian Colle over 10 years ago
- Priority changed from Urgent to High
#8 Updated by Sage Weil over 10 years ago
- Status changed from New to Can't reproduce
don't see anything strange from the core. i suspect this is just lots of pgs...
#9 Updated by Corin Langosch over 10 years ago
Hi Sage,
to be honest I'm a little disappointed by your answer. 8192 isn't a lot of pgs? The docs say 50-100 pgs per osds per pool. So 2 pools with 4096 pgs each just allows for 40 - 80 osds - which isn't really that much?
Why did the memory consumption change that much from bobtail to cuttlefish? I didn't change the pools in any way.
There's also a big discrepancy with the docs, as they state an osd would consume 200 - 500 mb (http://ceph.com/docs/next/install/hardware-recommendations/). I know David is still waiting for a detailed debug log from me, which I'll provide within a short while. But if ceph's memory requirements are really that high, fixing the docs to allow for proper resource planing is really a must.
Corin
#10 Updated by Sage Weil over 10 years ago
Corin Langosch wrote:
Hi Sage,
to be honest I'm a little disappointed by your answer. 8192 isn't a lot of pgs? The docs say 50-100 pgs per osds per pool. So 2 pools with 4096 pgs each just allows for 40 - 80 osds - which isn't really that much?
It's the pgs per osd that matters. But yeah, I'm not happy with my answer either, but I don't have much else to go on. My suspicion is that we will see the memory consumed in the usual places with the per-PG in-memory state. Having the massif results will let us confirm that. Maybe we have more hash_map's in use than before (those can quickly eat RAM) and we just didn't notice.
Why did the memory consumption change that much from bobtail to cuttlefish? I didn't change the pools in any way.
There's also a big discrepancy with the docs, as they state an osd would consume 200 - 500 mb (http://ceph.com/docs/next/install/hardware-recommendations/). I know David is still waiting for a detailed debug log from me, which I'll provide within a short while. But if ceph's memory requirements are really that high, fixing the docs to allow for proper resource planing is really a must.
We should probably update that to estimate in terms of PGs per OSD to be a bit more accurate.
Either way, if you can provide the massif output that will help tremendously.
Thanks, Corin!
#11 Updated by Corin Langosch over 10 years ago
It's the pgs per osd that matters. But yeah, I'm not happy with my answer either, but I don't have much else to go on. My suspicion is that we will see the memory consumed in the usual places with the per-PG in-memory state. Having the massif results will let us confirm that. Maybe we have more hash_map's in use than before (those can quickly eat RAM) and we just didn't notice.
Then it'd really be great if one could start with a small number of pgs but grow as more osds are added. Afaik ceph has some support for it, but it's not stable yet? Does expaning cause a lot of data shifting?
To check how the number of pgs affects the osd memory consumption I just added a new pool with "rados mkpool test 4096 4096". Now ceph -w shows 12288 pgs, but I couldn't notice any change of the osds' memory usage. Should this have affected the memory usage or do I need to place some objects into that pool first?
Also osd memory usage differs quite strong from osd to osd. They were all started at the same time, but after some weeks one osd for example uses 1.5 gb ram, while another 3 gb ram. There we no recoveries during the last 2-3 weeks, so memory usage should be almost equal?
I'll to the debugging now.. :)
Corin
#12 Updated by Corin Langosch over 10 years ago
Here we go:
- restart with valgrind **
valgrind --tool=massif /usr/bin/ceph-osd -i 12 --pid-file /var/run/ceph/osd.12.pid -c /etc/ceph/ceph.conf -f
startup (till cluster reports the osd up/in again):
- took around 20 minutes
- always taking 100% cpu
- grew to around 3.5 gb
peering/ recovery
- took around 10 minutes, then aborted
- aborted because osd was reported as down by other osds (too slow)
- restart without valgrind **
service ceph start osd
startup (till cluster reports the osd up/in again):
- took only a few seconds
- always taking 100% cpu
- grew to around 2.8 gb
peering/ recovery
- took only a few seconds
- grew to around 3.0 gb
- finished successfully
I uploaded my ceph-osd and the massif output to cephdrop. The filename is corin.tar.gz.
md5sum /usr/bin/ceph-osd
f3e762a608b2bedaea1b9baf4066cedf /usr/bin/ceph-osd
md5sum massif.out.14535
6030beeaf2232a8f7e71502583fac6c5 massif.out.14535
md5sum corin.tar.gz
83e1aba34fb700ab9f4a4dbcaf395a47 corin.tar.gz
To me it looks like a startup problem, as the process grows that much before even joing the cluster?
#13 Updated by Corin Langosch over 10 years ago
Looks like ceph is reading the whole log file (1GB) in memory and not freeing it again?
#14 Updated by Corin Langosch over 10 years ago
Looks like ceph is reading the whole log file (1GB) in memory and not freeing it again?
#15 Updated by Corin Langosch over 10 years ago
I jus thought recreating the journal would help, but I didn't help at all.
kill osd.12
/usr/bin/ceph-osd -i 12 --flush-journal
/usr/bin/ceph-osd -i 12 --mkjournal
restart osd.12
Memory usage is as high as it was with the old journal :-(
#16 Updated by Sage Weil over 10 years ago
Corin Langosch wrote:
I jus thought recreating the journal would help, but I didn't help at all.
kill osd.12
/usr/bin/ceph-osd -i 12 --flush-journal
/usr/bin/ceph-osd -i 12 --mkjournal
restart osd.12Memory usage is as high as it was with the old journal :-(
It's a different log than the journal.
Try reducing these by a factor of 10:
osd_min_pg_log_entries = 3000
osd_max_pg_log_entries = 10000
so, 300 and 1000. note that it will not be able to trim until after it peers, and it is controleld by the primary, so all osds will need to restart with that setting. once it does trim, the heap isn't always freed back to the OS, but after a second restart (of a single osd) i think you will see it's memory utilization goes down.
The memory utilization is basically num_pgs * num_log_entries, and the log entries only appear after you've done lots of write activity. i think this is why you don't see usage go up when you create a new pool.
#17 Updated by Corin Langosch over 10 years ago
Do these settings affect data safety in any way? The cluster is an important production one, so I cannot really play around with it. Probably we could combine it with the upgrade of the cluster to latest dumpling, when the next point release is out (if it contains all fixes for the reported slowdows since cuttlefish)?
From the docs osd_min_pg_log_entries is 1000 by default (not 3000?). I couldn't find osd_max_pg_log_entries in the docs. How can I check the values my cluster is currently using?
But still, I really wonder why ceph needs to read that much information on startup? Can't this be greatly reduced? Why isn't the memory freed as soon as it's not needed anymore - shouldn't this be fixed as well? If we had an osd binary with these fixes I'd be happy to give it a try on a single osd.
#18 Updated by Corin Langosch over 10 years ago
BTW, would be nice if this issue would be re-opened. I cannot do this.. :(
#19 Updated by Sage Weil over 10 years ago
- Status changed from Can't reproduce to 7
- Assignee changed from David Zafman to Sage Weil
Corin Langosch wrote:
Do these settings affect data safety in any way? The cluster is an important production one, so I cannot really play around with it. Probably we could combine it with the upgrade of the cluster to latest dumpling, when the next point release is out (if it contains all fixes for the reported slowdows since cuttlefish)?
From the docs osd_min_pg_log_entries is 1000 by default (not 3000?). I couldn't find osd_max_pg_log_entries in the docs. How can I check the values my cluster is currently using?
But still, I really wonder why ceph needs to read that much information on startup? Can't this be greatly reduced? Why isn't the memory freed as soon as it's not needed anymore - shouldn't this be fixed as well? If we had an osd binary with these fixes I'd be happy to give it a try on a single osd.
They don't affect data safety. They do need to be high enough to cover a reasonable window of activity as the log is used to prevent resent ops (i.e., after a client is temporarily disconnected or the data mapping changes) from being reapplied. Longer logs also expand the window during which an OSD can be down and come back up and rejoin without doing a full backfill/sync on its data.
Can you verify that lowering these values reduces your memory consumption?
#20 Updated by Sage Weil over 10 years ago
Also, massif should have generated a report file that indicates which callers are allocating all of the memory. Can you attach that? THanks!
#21 Updated by Corin Langosch over 10 years ago
For testing I'd like to wait for for the next dumpling release (hopefully with fixed http://tracker.ceph.com/issues/6040). I'll then restart the whole cluster with the new settings (osd_min_pg_log_entries = 100 and osd_max_pg_log_entries = 1000). Is this ok for you?
There was no report file, only one with extension massif. But I'll double check after upgrading and testing again.
#22 Updated by Sage Weil over 10 years ago
Corin Langosch wrote:
For testing I'd like to wait for for the next dumpling release (hopefully with fixed http://tracker.ceph.com/issues/6040). I'll then restart the whole cluster with the new settings (osd_min_pg_log_entries = 100 and osd_max_pg_log_entries = 1000). Is this ok for you?
Sounds good. Should be out today or tomorrow.
There was no report file, only one with extension massif. But I'll double check after upgrading and testing again.
That's the one! Can you attach?
thanks-
#23 Updated by Corin Langosch over 10 years ago
- File corin.tar.gz added
Great! There are no other show stoppers and upgrade should be smooth, right? :)
I uploaded my binary and the massif file to cephdrop, see http://tracker.ceph.com/issues/5700#note-12 But I still have it here, and so attached it :)
#24 Updated by Sage Weil over 10 years ago
Ah, sorry, I missed that.
And yeah, the massif output confirms that ~80% of the heap is consumed by the pg logs. Reducing those values will help considerably, as will keeping the pg count fixed as your cluster expands over time and PGs spread out over a larger number of OSDs.
FWIW we increased that value from 1000 -> 3000 just before cuttlefish.
#25 Updated by Corin Langosch over 10 years ago
Ah ok, that increase might explain the increased memory usage, We'll know for sure in a few days :)
But anyway, is it really necessary to load all the logs into memory and not free them again? Sorry for bugging you with that, but it just doesn't feel right to me. Especially as from your comment I assume the logs are only needed during recovery or startup (to see what changed), but not during normal operation?
#26 Updated by Corin Langosch over 10 years ago
I just upgraded to dumpling: adjusted ceph.conf, restarted all mons, restarted all osds
After the restart, the osds still consume a lot of memory:
root 19135 0.0 0.0 8112 924 pts/0 S+ 14:30 0:00 grep --color=auto ceph
root 27851 12.8 8.0 3363176 2638292 ? Ssl 13:48 5:17 /usr/bin/ceph-osd -i 12 --pid-file /var/run/ceph/osd.12.pid -c /etc/ceph/ceph.conf
root 30685 18.1 5.7 2711064 1907044 ? Ssl 13:51 6:58 /usr/bin/ceph-osd -i 13 --pid-file /var/run/ceph/osd.13.pid -c /etc/ceph/ceph.conf
I'll let them run 1-2 days to see what happens.
This is my current osd config:
[osd]
osd journal size = 1000
osd journal dio = true
osd journal aio = true
osd journal = /xfs-drive1/$cluster-osd-$id.journal
osd op threads = 8
osd min pg log entries = 1000
osd max pg log entries = 3000
filestore op threads = 16
filestore max sync interval = 10
filestore min sync interval = 3
Please not that I didn't change to min = 100 and max = 500 as I read that I could cause problems (http://lists.ceph.com/pipermail/ceph-users-ceph.com/2013-July/002770.html). Is it really safe to go to min = 100 and max = 500 (it's a kvm production cluster)? Can i change those values without restarting all daemons again?
#27 Updated by Corin Langosch over 10 years ago
So after 5 days of having latest dumpling running, memory usage is still quite high:
root 17320 0.0 0.0 8112 924 pts/0 S+ 21:58 0:00 grep --color=auto ceph root 27851 7.8 5.1 4020792 1695804 ? Ssl Sep12 489:57 /usr/bin/ceph-osd -i 12 --pid-file /var/run/ceph/osd.12.pid -c /etc/ceph/ceph.conf root 30685 13.5 4.1 2723340 1351056 ? Ssl Sep12 848:44 /usr/bin/ceph-osd -i 13 --pid-file /var/run/ceph/osd.13.pid -c /etc/ceph/ceph.conf
root 22984 44.3 19.6 3287164 1604404 ? Ssl Sep12 2761:22 /usr/bin/ceph-osd -i 0 --pid-file /var/run/ceph/osd.0.pid -c /etc/ceph/ceph.conf root 25798 0.0 0.0 8112 936 pts/1 S+ 22:00 0:00 grep --color=auto ceph root 28596 33.0 13.5 3094128 1108760 ? Ssl Sep12 2057:44 /usr/bin/ceph-osd -i 2 --pid-file /var/run/ceph/osd.2.pid -c /etc/ceph/ceph.conf root 32160 28.8 12.7 2845976 1041860 ? Ssl Sep12 1796:40 /usr/bin/ceph-osd -i 1 --pid-file /var/run/ceph/osd.1.pid -c /etc/ceph/ceph.conf
Is there any chance that ceph can be made much less memory hungry?
#28 Updated by Sage Weil over 10 years ago
- Status changed from 7 to Resolved
after reviewing this again, there are 2 things:
1- the default # of pg log entires increased from bobtail to cuttlefish
2- you have a lot of pgs given your number of osds. this will eventually get better as you expand your cluster over time.
all indications are that there are no leaks or other regressions. closing this bug!
#29 Updated by Corin Langosch over 10 years ago
Allow me this last question: why has all this log information be kept in memory all the time?
#30 Updated by Greg Farnum over 10 years ago
I created #6570 for that, Corin. There are tradeoffs involved and some of them are probably worth making, but it's not a quick fix. :)