Project

General

Profile

Bug #58685

RHEL 9 (cgroups v2) - the pid limits ARE enforced as compared to RHEL8 (cgroup v1)

Added by Vikhyat Umrao about 1 year ago. Updated about 1 year ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
cephadm
Target version:
-
% Done:

0%

Source:
Tags:
backport_processed
Backport:
quincy, pacific
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Downstream bug - https://bugzilla.redhat.com/show_bug.cgi?id=2165644

Mark Kogan and I have been working and discussing this issue and as the tracker title says we need to take care of this change because, in the RGW container, we saw the container crash when we wanted to use rgw thread pool size of 2048.

2023-01-30T14:01:04.421+0000 7f650419f600 -1 *** Caught signal (Aborted) **
 in thread 7f650419f600 thread_name:radosgw

 ceph version 17.2.5-63.el9cp (5c1d62abbfba4f16a4ecda23145329df253ac85a) quincy (stable)
 1: /lib64/libc.so.6(+0x54d90) [0x7f65079dfd90]
 2: /lib64/libc.so.6(+0xa154c) [0x7f6507a2c54c]
 3: raise()
 4: abort()
 5: /lib64/libstdc++.so.6(+0xa1a21) [0x7f6507c50a21]
 6: /lib64/libstdc++.so.6(+0xad39c) [0x7f6507c5c39c]
 7: /lib64/libstdc++.so.6(+0xad407) [0x7f6507c5c407]
 8: /lib64/libstdc++.so.6(+0xad669) [0x7f6507c5c669]
 9: (std::__throw_system_error(int)+0x9b) [0x7f6507c537f8]
 10: /lib64/libstdc++.so.6(+0xdbafd) [0x7f6507c8aafd]
 11: (RGWAsioFrontend::run()+0x1bc) [0x7f65081f181c]
 12: (radosgw_Main(int, char const**)+0x4bbe) [0x7f65083453de]
 13: /lib64/libc.so.6(+0x3feb0) [0x7f65079caeb0]
 14: __libc_start_main()
 15: _start()
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- begin dump of recent events -

Mark and my internal discussion:

Mark Kogan, 12:48 AM
@Vikhyat Umrao - please note cgroupVersion// RHEL 8:
podman info --debug | grep -i cgroup
  cgroupControllers: []
  cgroupManager: cgroupfs
  cgroupVersion: v1

// RHEL 9:
podman info --debug | grep -i cgroup
  cgroupControllers:
  cgroupManager: systemd
  cgroupVersion: v2

Mark Kogan, 1:24 AM, Edited
[edited]

@Vikhyat Umrao 
updating that with this test application running in fedora pod podman run -it --rm --replace --name fedora fedora[root@331514e9f2a9 /]# cat threads.cpp 
#include <iostream>
#include <thread>
#include <vector>
#include <unistd.h>
#include <chrono>

void print_thread_id(int id) {
  using namespace std::chrono_literals;
  std::cout << "Thread " << id << std::endl;
  std::this_thread::sleep_for(2000ms);
}

int main() {
  std::vector<std::thread> threads;

  for (int i = 0; i < 4096 ; ++i) {
    threads.push_back(std::thread(print_thread_id, i));
  }

  for (auto& t : threads) {
    t.join();
  }

  return 0;
}
[root@331514e9f2a9 /]# clang++ threads.cpp -pthread -o threads

results shows that under RHEL 8 (cgroups v1)
there is no enforcement - can reach Thread 4093

Thread 4094
Thread 4095

BUT on RHEL 9 (cgroups v2) - the pid limits ARE enforced !Thread 2043

Thread 2044
Thread 2045
terminate called after throwing an instance of 'std::system_error'
  what():  Resource temporarily unavailable
Aborted (core dumped)

@Mark Kogan as always great work mate, I can add more that I have also verified that it has nothing to do with the host OS version if it is RHEL8 or RHEL9 ... that is why the 5.3 version is working fine in the RHEL9 host and RHCS 6.0 version is not working fine in the RHEL9 host

It is mainly to do with container image RHEL version specially cgroup version as you said

5.3 is built RHEL8 container image and 6.0 is built with RHEL9 container image

I have already tested 6.0 that is built with RHEL9 image has same issue in RHEL8 host

this confirms that it is issue with the container image that is being used to build RHCS 6

with this, we need to think wider as it is not only impacting the RGW container it can impact all other Ceph containers, it could be MDS, OSD, MON, MGR etc ...

I will move this bug to Cephadm and will open a rook bug also we need to take care of it in both product lines ...

Matt Benjamin, 11:28 AM
yes...

Vikhyat Umrao, 11:59 AM
@Adam King please review this thread I would be moving this bz to cephadm - https://bugzilla.redhat.com/show_bug.cgi?id=2165644

and will be opening a new bug for rook ...

this is behavior change in podman/cgroup version2 in rhel9

Vikhyat Umrao, 12:04 PM, Edited
I am testing the solution to set --pids-limit=-1 to see if it fixes the issue

Vikhyat Umrao, 12:17 PM
# cat unit.run | grep pids-limit
/usr/bin/podman run --rm --ipc=host --stop-signal=SIGTERM --authfile=/etc/ceph/podman-auth.json --net=host --entrypoint /usr/bin/radosgw --init --name ceph-04bf3460-a7b4-11ed-bf7c-000af7995756-rgw-rgws-f22-h21-000-6048r-ghroyj -d --log-driver journald --conmon-pidfile /run/ceph-06bf3460-a7b4-11ed-bf7c-000af7995756@rgw.rgws.f22-h21-000-6048r.ghroyj.service-pid --cidfile /run/ceph-06bf3460-a7b4-11ed-bf7c-000af7995756@rgw.rgws.f22-h21-000-6048r.ghroyj.service-cid --pids-limit -1 --cgroups=split -e CONTAINER_IMAGE=registry-proxy.engineering.redhat.com/rh-osbs/rhceph@sha256:7672426dd2265ccabc6550cce3ffa0711c44e1ce04c20a93b2955707c4494f85 -e NODE_NAME=f22-h21-000-6048r.rdu2.scalelab.redhat.com -e CEPH_USE_RANDOM_NONCE=1 -e TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES=134217728 -v /var/run/ceph/06bf3460-a7b4-11ed-bf7c-000af7995756:/var/run/ceph:z -v /var/log/ceph/06bf3460-a7b4-11ed-bf7c-000af7995756:/var/log/ceph:z -v /var/lib/ceph/06bf3460-a7b4-11ed-bf7c-000af7995756/crash:/var/lib/ceph/crash:z -v /run/systemd/journal:/run/systemd/journal -v /var/lib/ceph/06bf3460-a7b4-11ed-bf7c-000af7995756/rgw.rgws.f22-h21-000-6048r.ghroyj:/var/lib/ceph/radosgw/ceph-rgw.rgws.f22-h21-000-6048r.ghroyj:z -v /var/lib/ceph/06bf3460-a7b4-11ed-bf7c-000af7995756/rgw.rgws.f22-h21-000-6048r.ghroyj/config:/etc/ceph/ceph.conf:z registry-proxy.engineering.redhat.com/rh-osbs/rhceph@sha256:7672426dd2265ccabc6550cce3ffa0711c44e1ce04c20a93b2955707c4494f85 -n client.rgw.rgws.f22-h21-000-6048r.ghroyj -f --setuser ceph --setgroup ceph --default-log-to-file=false --default-log-to-journald=true --default-log-to-stderr=false

bingo so the solution is working

Vikhyat Umrao, 12:19 PM
I also found out the man page in RHEL9 has a typo :( will open a RHEL9 bug

https://docs.podman.io/en/latest/markdown/podman-run.1.html

--pids-limit=limit
Tune the container’s pids limit. Set to -1 to have unlimited pids for the container. The default is 2048 on systems that support “pids” cgroup controller.

this is from the above link

but if I do man page in RHEL9

# man podman-run | grep -A3 pids-limit
   --pids-limit=limit
       Tune  the container's pids limit. Set to -1 to have unlimited pids for the container. The default
       is 4096 on systems that support "pids" cgroup controller.

so man pages 4096 but URL says 2048 and per our testing it is 2048 so looks like man page needs to be fixed

Vikhyat Umrao, 27 min
rook bug - https://bugzilla.redhat.com/show_bug.cgi?id=2168722

Vikhyat Umrao, 20 min
man page bug - https://bugzilla.redhat.com/show_bug.cgi?id=2168727


Related issues

Copied to Orchestrator - Backport #58882: pacific: RHEL 9 (cgroups v2) - the pid limits ARE enforced as compared to RHEL8 (cgroup v1) Resolved
Copied to Orchestrator - Backport #58883: quincy: RHEL 9 (cgroups v2) - the pid limits ARE enforced as compared to RHEL8 (cgroup v1) Resolved

History

#1 Updated by Vikhyat Umrao about 1 year ago

Adam - FYI!

#2 Updated by Adam King about 1 year ago

  • Status changed from New to In Progress
  • Assignee set to Adam King
  • Backport set to quincy, pacific
  • Pull request ID set to 50083

#3 Updated by Adam King about 1 year ago

  • Status changed from In Progress to Pending Backport

#4 Updated by Backport Bot about 1 year ago

  • Copied to Backport #58882: pacific: RHEL 9 (cgroups v2) - the pid limits ARE enforced as compared to RHEL8 (cgroup v1) added

#5 Updated by Backport Bot about 1 year ago

  • Copied to Backport #58883: quincy: RHEL 9 (cgroups v2) - the pid limits ARE enforced as compared to RHEL8 (cgroup v1) added

#6 Updated by Backport Bot about 1 year ago

  • Tags set to backport_processed

#7 Updated by Adam King about 1 year ago

  • Status changed from Pending Backport to Resolved

Also available in: Atom PDF