Bug #44939: The mon and/or osd pod memory consumption is not even. One of them consumes about 50% more. - RADOS - Ceph

Actions

Copy link

Bug #44939

open

The mon and/or osd pod memory consumption is not even. One of them consumes about 50% more.

Added by Yan Zhao about 4 years ago. Updated about 4 years ago.

Status:

New

Priority:

Normal

Assignee:

Category:

Performance/Resource Usage

Target version:

Ceph - v14.2.8

% Done:

Source:

Community (user)

Tags:

Backport:

Regression:

Severity:

2 - major

Reviewed:

Affected Versions:

ceph-qa-suite:

Component(RADOS):

Monitor, OSD

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

This is a ceph deployment with rook release 1.2.7/ceph 14.2.8. After deployment, one of the mon pods and/or osd pods consumes about 50% more memory (RSS) than the other two. All OSD and MON POD runs on the identical hardware or openstack config. The rook ticket has opened for the same issue ( https://github.com/rook/rook/issues/5175 ). This is sunny day case.

1). pidstat output for each of the ceph-mon process in 3 identical nodes. The "ceph-mon" process on control-03 node consumes 50% more memory than the other two.

[root@bcmt-control-01 ~/rook245]# pidstat -r -p 3441
Linux 4.18.0-147.5.1.el8_1.x86_64 (bcmt-control-01) 04/04/2020 x86_64 (48 CPU)

07:00:39 PM UID PID minflt/s majflt/s VSZ RSS %MEM Command
07:00:39 PM 167 3441 13.33 0.00 1119708 655796 0.25 ceph-mon

[root@bcmt-control-01 ~/rook245]# ssh -q -i ./0207.pem cloud-user@bcmt-control-02 "sudo pidstat -r -p 45329" | grep ceph-mon
07:01:50 PM 167 45329 10.63 0.00 1117620 648360 0.25 ceph-mon
[root@bcmt-control-01 ~/rook245]# ssh -q -i ./0207.pem cloud-user@bcmt-control-03 "sudo pidstat -r -p 47131" | grep ceph-mon
07:02:24 PM 167 47131 18.86 0.00 1508912 983248 0.37 ceph-mon

2). pmap output for ceph-mon on control-03 node

47131: ceph-mon --fsid=1e2e65aa-0bd4-45fc-8517-7244ee30362b --keyring=/etc/ceph/keyring-store/keyring --log-to-stderr=true --err-to-stderr=true --mon-cluster-log-to-stderr=true --log-stderr-prefix=debug --default-log-to-file=false --default-mon-cluster-log-to-file=false --mon-host=[v2:10.254.92.35:3300,v1:10.254.92.35:6789],[v2:10.254.149.240:3300,v1:10.254.149.240:6789],[v2:10.254.13.209:3300,v1:10.254.13.209:6789] --mon-initial-members=a,b,c --id=a --setuser=ceph --setgroup=ceph --foreground --public-addr=10.
Address Kbytes RSS Dirty Mode Mapping
0000557f66cb5000 8180 7596 0 r-x-- ceph-mon
0000557f676b2000 100 100 100 r---- ceph-mon
0000557f676cb000 12 12 12 rw--- ceph-mon
0000557f676ce000 48 40 40 rw--- [ anon ]
0000557f68d1a000 1027016 955852 955852 rw--- [ anon ]
00007fb01623f000 4 0 0 ----- [ anon ]
00007fb016240000 8192 40 40 rw--- [ anon ]
00007fb016a40000 4 0 0 ----- [ anon ]

...

---------------- ------- ------- -------
total kB 1508916 987536 960424

3). The ceph-mon memory consumption starts with 373Mi and increased to 954Mi in about 4 hours with traffic.

Actions

Copy link

Updated by Yan Zhao about 4 years ago

Here's overall memory consumption under traffic.The OSD consumes much more memory as well.

knc top pods | egrep "osd|mon"
rook-ceph-mon-a-57854657b8-8c8h9 4m 952Mi
rook-ceph-mon-b-655d5d4b97-krngs 5m 634Mi
rook-ceph-mon-c-6bcffb8cbf-p8nq9 3m 624Mi
rook-ceph-osd-0-5f76d76bfb-clpl7 146m 3408Mi
rook-ceph-osd-1-d96fbfd8d-s7g5r 156m 3433Mi
rook-ceph-osd-2-568d4ccd5b-6kx8r 154m 3282Mi

Actions

Copy link

Updated by Josh Durgin about 4 years ago

What is your mon_memory_target and osd_memory_target?

Uneven memory on the mons is likely due to the leader doing more work/caching more.

In general the daemons attempt to stay below their targets, but do not try to maintain even usage with each other. In a busy cluster you should see OSD memory stay close to the target due to memory being used for caching.

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » RADOS

Custom queries

Bug #44939

The mon and/or osd pod memory consumption is not even. One of them consumes about 50% more.

Updated by Yan Zhao about 4 years ago

Updated by Josh Durgin about 4 years ago