Bug #48975: after setting certificate only one rgw pods starts - rgw - Ceph

Actions

Copy link

Bug #48975

closed

after setting certificate only one rgw pods starts

Added by Patrik Fürer over 3 years ago. Updated over 2 years ago.

Status:

Closed

Priority:

Normal

Assignee:

Mark Kogan

Target version:

% Done:

Source:

Community (user)

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

Ceph - v15.2.8

ceph-qa-suite:

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

I am on ceph 15.2.8 with 2 rgw pods defined.

after we switched to ssl and added the certificate/key to the config database with
ceph config-key set rgw/cert/adcubum/enge_u227.crt -i /root/<cert_file>
ceph config-key set rgw/cert/adcubum/enge_u227.key -i /root/<key_file>
ceph config set client.rgw.<rgw_realm>.<rgw_zone> rgw_frontends "beast port=80 ssl_port=443 ssl_certificate=config://rgw/cert/adcubum/enge_u227.crt ssl_private_key=config://rgw/cert/adcubum/enge_u227.key"

only one of the two rgw's is starting while the other complains about the certificate:
2021-01-22T15:58:31.940+0000 7f1fc4af2280 0 deferred set uid:gid to 167:167 (ceph:ceph)
2021-01-22T15:58:31.940+0000 7f1fc4af2280 0 ceph version 15.2.8 (bdf3eebcd22d7d0b3dd4d5501bee5bac354d5b55) octopus (stable), process radosgw, pid 1
2021-01-22T15:58:31.940+0000 7f1fc4af2280 0 framework: beast
2021-01-22T15:58:31.940+0000 7f1fc4af2280 0 framework conf key: port, val: 80
2021-01-22T15:58:31.940+0000 7f1fc4af2280 0 framework conf key: ssl_port, val: 443
2021-01-22T15:58:31.940+0000 7f1fc4af2280 0 framework conf key: ssl_certificate, val: config://rgw/cert/adcubum/enge_u227.crt
2021-01-22T15:58:31.940+0000 7f1fc4af2280 0 framework conf key: ssl_private_key, val: config://rgw/cert/adcubum/enge_u227.key
2021-01-22T15:58:31.940+0000 7f1fc4af2280 1 radosgw_Main not setting numa affinity
2021-01-22T15:58:32.144+0000 7f1fc4af2280 0 framework: beast
2021-01-22T15:58:32.144+0000 7f1fc4af2280 0 framework conf key: ssl_certificate, val: config://rgw/cert/$realm/$zone.crt
2021-01-22T15:58:32.144+0000 7f1fc4af2280 0 framework conf key: ssl_private_key, val: config://rgw/cert/$realm/$zone.key
2021-01-22T15:58:32.144+0000 7f1fc4af2280 0 starting handler: beast
2021-01-22T15:58:32.146+0000 7f1fc4af2280 -1 ssl_private_key was not found: rgw/cert/adcubum/enge_u227.key
2021-01-22T15:58:32.147+0000 7f1fc4af2280 -1 ssl_private_key was not found: rgw/cert/adcubum/enge_u227.crt
2021-01-22T15:58:32.147+0000 7f1fc4af2280 -1 no ssl_certificate configured for ssl_port
2021-01-22T15:58:32.147+0000 7f1fc4af2280 -1 ERROR: failed initializing frontend

the keys are there otherwise the other pod would not start and I also rechecked:

ceph config-key get rgw/cert/adcubum/enge_u227.crt
obtained 'rgw/cert/adcubum/enge_u227.crt'
-----BEGIN CERTIFICATE-----
....

ceph config-key get rgw/cert/adcubum/enge_u227.key
obtained 'rgw/cert/adcubum/enge_u227.key'
-----BEGIN RSA PRIVATE KEY-----
....

Not sure what we are missing here.

Files

ceph-client.rgw.adcubum.enge_u227.adzh-srlp-cdn02.hfqhki.log (10.4 KB) ceph-client.rgw.adcubum.enge_u227.adzh-srlp-cdn02.hfqhki.log

Patrik Fürer, 02/11/2021 03:33 PM

Actions

Copy link

Updated by Casey Bodley about 3 years ago

Status changed from New to Triaged
Assignee set to Mark Kogan

Actions

Copy link

Updated by Mark Kogan about 3 years ago

Please provide details of the setup to work on reproducing ...
(for example is it a rook or podman environment?)

(a reproducer comamnds flow would accelerate the debugging)

Thanks

Actions

Copy link

Updated by Patrik Fürer about 3 years ago

cluster is setup with cephadm and podman
used this command to deploy rgw:
ceph orch apply rgw adcubum enge_u227 --placement="2 host1 host2"

both rgw daemons have been running fine (although not used by the users).

then the request was coming to switch to a secure communication so took a wildcard certificate (and its key) for the domain and copied it to /var/lib/ceph/<fsid>/home/ on the administration node
openened a cephadm shell and did run the commands:
ceph config-key set rgw/cert/adcubum/enge_u227.crt -i /root/cert.pem
ceph config-key set rgw/cert/adcubum/enge_u227.key -i /root/cert.key
ceph config set client.rgw.adcubum.enge_u227 rgw_frontends "beast port=80 ssl_port=443 ssl_certificate=config://rgw/cert/adcubum/enge_u227.crt ssl_private_key=config://rgw/cert/adcubum/enge_u227.key"

then I restarted the rgw daemons with
ceph orch restart rgw

only 1 of the two daemons is starting anymore with one giving the error from the ticket description.
other rgw is running fine.

Actions

Copy link

Updated by Mark Kogan about 3 years ago

could the issue be that both rgws try to bind to the same port?

is it possible to please change one of the rgw's to bind to different ports?
(for example:)

rgw #1
rgw_frontends "beast port=80 ssl_port=443 ssl_certificate=config://rgw/cert/adcubum/enge_u227.crt ssl_private_key=config://rgw/cert/adcubum/enge_u227.key" 
rgw #2
rgw_frontends "beast port=81 ssl_port=444 ssl_certificate=config://rgw/cert/adcubum/enge_u227.crt ssl_private_key=config://rgw/cert/adcubum/enge_u227.key"

provided that 81 and 444 are unused per

netstat -nap

and if not resolved please attach logs with

debug_rgw=20

Actions

Copy link

Updated by Patrik Fürer about 3 years ago

File ceph-client.rgw.adcubum.enge_u227.adzh-srlp-cdn02.hfqhki.log ceph-client.rgw.adcubum.enge_u227.adzh-srlp-cdn02.hfqhki.log added

as the pods are running on different nodes that should not matter and logs do not indicate such a thing.

i have attached the log from the node where rgw is failing

Actions

Copy link

Updated by Patrik Fürer about 3 years ago

I tried to remove the rgw and configured only port 8080 and did a apply rgw and no pods are deployed anymore. so seems ssl setting is no issue.
Probably best to close this case and I will check again and open a new issue if problem with deployment persists.

Actions

Copy link

Updated by Patrik Fürer about 3 years ago

btw. I found the issue. it was because the health status was on HEALTH_WARN and no HEALTH_OK and the orchestrator was not willing to deploy the rgw when health is not ok.

Actions

Copy link

Updated by Casey Bodley over 2 years ago

Status changed from Triaged to Closed

Patrik Fürer wrote:

btw. I found the issue. it was because the health status was on HEALTH_WARN and no HEALTH_OK and the orchestrator was not willing to deploy the rgw when health is not ok.

great thanks

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » rgw

Custom queries

Bug #48975

after setting certificate only one rgw pods starts

Updated by Casey Bodley about 3 years ago

Updated by Mark Kogan about 3 years ago

Updated by Patrik Fürer about 3 years ago

Updated by Mark Kogan about 3 years ago

Updated by Patrik Fürer about 3 years ago

Updated by Patrik Fürer about 3 years ago

Updated by Patrik Fürer about 3 years ago

Updated by Casey Bodley over 2 years ago