Project

General

Profile

Actions

Bug #48975

closed

after setting certificate only one rgw pods starts

Added by Patrik Fürer over 3 years ago. Updated over 2 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Hi

I am on ceph 15.2.8 with 2 rgw pods defined.

after we switched to ssl and added the certificate/key to the config database with
ceph config-key set rgw/cert/adcubum/enge_u227.crt -i /root/<cert_file>
ceph config-key set rgw/cert/adcubum/enge_u227.key -i /root/<key_file>
ceph config set client.rgw.<rgw_realm>.<rgw_zone> rgw_frontends "beast port=80 ssl_port=443 ssl_certificate=config://rgw/cert/adcubum/enge_u227.crt ssl_private_key=config://rgw/cert/adcubum/enge_u227.key"

only one of the two rgw's is starting while the other complains about the certificate:
2021-01-22T15:58:31.940+0000 7f1fc4af2280 0 deferred set uid:gid to 167:167 (ceph:ceph)
2021-01-22T15:58:31.940+0000 7f1fc4af2280 0 ceph version 15.2.8 (bdf3eebcd22d7d0b3dd4d5501bee5bac354d5b55) octopus (stable), process radosgw, pid 1
2021-01-22T15:58:31.940+0000 7f1fc4af2280 0 framework: beast
2021-01-22T15:58:31.940+0000 7f1fc4af2280 0 framework conf key: port, val: 80
2021-01-22T15:58:31.940+0000 7f1fc4af2280 0 framework conf key: ssl_port, val: 443
2021-01-22T15:58:31.940+0000 7f1fc4af2280 0 framework conf key: ssl_certificate, val: config://rgw/cert/adcubum/enge_u227.crt
2021-01-22T15:58:31.940+0000 7f1fc4af2280 0 framework conf key: ssl_private_key, val: config://rgw/cert/adcubum/enge_u227.key
2021-01-22T15:58:31.940+0000 7f1fc4af2280 1 radosgw_Main not setting numa affinity
2021-01-22T15:58:32.144+0000 7f1fc4af2280 0 framework: beast
2021-01-22T15:58:32.144+0000 7f1fc4af2280 0 framework conf key: ssl_certificate, val: config://rgw/cert/$realm/$zone.crt
2021-01-22T15:58:32.144+0000 7f1fc4af2280 0 framework conf key: ssl_private_key, val: config://rgw/cert/$realm/$zone.key
2021-01-22T15:58:32.144+0000 7f1fc4af2280 0 starting handler: beast
2021-01-22T15:58:32.146+0000 7f1fc4af2280 -1 ssl_private_key was not found: rgw/cert/adcubum/enge_u227.key
2021-01-22T15:58:32.147+0000 7f1fc4af2280 -1 ssl_private_key was not found: rgw/cert/adcubum/enge_u227.crt
2021-01-22T15:58:32.147+0000 7f1fc4af2280 -1 no ssl_certificate configured for ssl_port
2021-01-22T15:58:32.147+0000 7f1fc4af2280 -1 ERROR: failed initializing frontend

the keys are there otherwise the other pod would not start and I also rechecked:
  1. ceph config-key get rgw/cert/adcubum/enge_u227.crt
    obtained 'rgw/cert/adcubum/enge_u227.crt'
    -----BEGIN CERTIFICATE-----
    ....
  1. ceph config-key get rgw/cert/adcubum/enge_u227.key
    obtained 'rgw/cert/adcubum/enge_u227.key'
    -----BEGIN RSA PRIVATE KEY-----
    ....

Not sure what we are missing here.


Files

Actions #1

Updated by Casey Bodley about 3 years ago

  • Status changed from New to Triaged
  • Assignee set to Mark Kogan
Actions #2

Updated by Mark Kogan about 3 years ago

Please provide details of the setup to work on reproducing ...
(for example is it a rook or podman environment?)

(a reproducer comamnds flow would accelerate the debugging)

Thanks

Actions #3

Updated by Patrik Fürer about 3 years ago

cluster is setup with cephadm and podman
used this command to deploy rgw:
ceph orch apply rgw adcubum enge_u227 --placement="2 host1 host2"

both rgw daemons have been running fine (although not used by the users).

then the request was coming to switch to a secure communication so took a wildcard certificate (and its key) for the domain and copied it to /var/lib/ceph/<fsid>/home/ on the administration node
openened a cephadm shell and did run the commands:
ceph config-key set rgw/cert/adcubum/enge_u227.crt -i /root/cert.pem
ceph config-key set rgw/cert/adcubum/enge_u227.key -i /root/cert.key
ceph config set client.rgw.adcubum.enge_u227 rgw_frontends "beast port=80 ssl_port=443 ssl_certificate=config://rgw/cert/adcubum/enge_u227.crt ssl_private_key=config://rgw/cert/adcubum/enge_u227.key"

then I restarted the rgw daemons with
ceph orch restart rgw

only 1 of the two daemons is starting anymore with one giving the error from the ticket description.
other rgw is running fine.

Actions #4

Updated by Mark Kogan about 3 years ago

could the issue be that both rgws try to bind to the same port?

is it possible to please change one of the rgw's to bind to different ports?
(for example:)

rgw #1
rgw_frontends "beast port=80 ssl_port=443 ssl_certificate=config://rgw/cert/adcubum/enge_u227.crt ssl_private_key=config://rgw/cert/adcubum/enge_u227.key" 
rgw #2
rgw_frontends "beast port=81 ssl_port=444 ssl_certificate=config://rgw/cert/adcubum/enge_u227.crt ssl_private_key=config://rgw/cert/adcubum/enge_u227.key" 

provided that 81 and 444 are unused per
netstat -nap

and if not resolved please attach logs with

debug_rgw=20

Actions #5

Updated by Patrik Fürer about 3 years ago

as the pods are running on different nodes that should not matter and logs do not indicate such a thing.

i have attached the log from the node where rgw is failing

Actions #6

Updated by Patrik Fürer about 3 years ago

I tried to remove the rgw and configured only port 8080 and did a apply rgw and no pods are deployed anymore. so seems ssl setting is no issue.
Probably best to close this case and I will check again and open a new issue if problem with deployment persists.

Actions #7

Updated by Patrik Fürer about 3 years ago

btw. I found the issue. it was because the health status was on HEALTH_WARN and no HEALTH_OK and the orchestrator was not willing to deploy the rgw when health is not ok.

Actions #8

Updated by Casey Bodley over 2 years ago

  • Status changed from Triaged to Closed

Patrik Fürer wrote:

btw. I found the issue. it was because the health status was on HEALTH_WARN and no HEALTH_OK and the orchestrator was not willing to deploy the rgw when health is not ok.

great thanks

Actions

Also available in: Atom PDF