Bug #51642
closedcephadm/rgw : RGW server is not coming up: Initialization timeout, failed to initialize
0%
Description
I have created the ceph cluster via cephadm and it looks fine but when I tried to deploy RGW it was failing and does not much info in logs regarding the failure(says failed to intialise). Any help in debugging will be much appreciated.
- Bootstrap ceph cluster via cephadm bootstrap --mon-ip<IP> in fedora 34 VM which uses podman
- add host as VM to ceph cluster
- add three storage disks and create osd's via ceph orch daemon add osd <host>:*<device-path>*
- now deployed rgw via ceph orch apply rgw newstore1 --port=7080 --placement=1
- ceph status
cluster:
id: bd183ab8-dfd8-11eb-83cf-525400bf929c
health: HEALTH_WARN
1 failed cephadm daemon(s)
Reduced data availability: 33 pgs inactive
Degraded data redundancy: 33 pgs undersizedservices:
mon: 1 daemons, quorum fedora (age 85m)
mgr: fedora.vrlaif(active, since 85m)
osd: 3 osds: 3 up (since 85m), 3 in (since 5d)data:
pools: 2 pools, 33 pgs
objects: 0 objects, 0 B
usage: 17 MiB used, 30 GiB / 30 GiB avail
pgs: 100.000% pgs not active
33 undersized+peered
- ceph osd pool ls
device_health_metrics
.rgw.root
- ceph orch ps
NAME HOST PORTS STATUS REFRESHED AGE VERSION IMAGE ID CONTAINER ID
alertmanager.fedora fedora *:9093,9094 running (99m) 7m ago 5d 0.20.0 0881eb8f169f 97a9a4130594
crash.fedora fedora running (99m) 7m ago 5d 16.2.4 8d91d370c2b8 7eab2bef9692
grafana.fedora fedora *:3000 running (99m) 7m ago 5d 6.7.4 ae5c36c3d3cd 5e98deb7ac7a
mgr.fedora.vrlaif fedora *:9283 running (99m) 7m ago 5d 16.2.4 8d91d370c2b8 f33d98b6edd7
mon.fedora fedora running (99m) 7m ago 5d 16.2.4 8d91d370c2b8 c7e1168011c6
node-exporter.fedora fedora *:9100 running (99m) 7m ago 5d 0.18.1 e5a616e4b9cf a397fbac7bb0
osd.0 fedora running (99m) 7m ago 5d 16.2.4 8d91d370c2b8 fc1c1f9ac6dc
osd.1 fedora running (99m) 7m ago 5d 16.2.4 8d91d370c2b8 5379d795c1fb
osd.2 fedora running (99m) 7m ago 5d 16.2.4 8d91d370c2b8 99c5d3447c44
prometheus.fedora fedora *:9095 running (99m) 7m ago 5d 2.18.1 de242295e225 e2e71c2f9bcf
rgw.newstore1.fedora.mkhhgr fedora *:7080 error 7m ago 4d <unknown> <unknown> <unknown>
attaching logs, rgw pod spec and systemctl status of rgw as file
Files
Updated by Sebastian Wagner almost 3 years ago
- Subject changed from cephadm/rgw : RGW server is not coming up to cephadm/rgw : RGW server is not coming up: Initialization timeout, failed to initialize
Updated by Sebastian Wagner almost 3 years ago
the rgw log looks like so:
2021-07-13T10:39:08.749+0000 7f8f5770b440 0 deferred set uid:gid to 167:167 (ceph:ceph) 2021-07-13T10:39:08.749+0000 7f8f5770b440 0 ceph version 16.2.4 (3cbe25cde3cfa028984618ad32de9edc4c1eaed0) pacific (stable), process radosgw, pid 2 2021-07-13T10:39:08.749+0000 7f8f5770b440 0 framework: beast 2021-07-13T10:39:08.749+0000 7f8f5770b440 0 framework conf key: port, val: 7080 2021-07-13T10:39:08.749+0000 7f8f5770b440 1 radosgw_Main not setting numa affinity 2021-07-13T10:44:08.750+0000 7f8f4399b700 -1 Initialization timeout, failed to initialize (repeates 5 times)
Updated by Sebastian Wagner almost 3 years ago
Try running
ceph orch daemon rm rgw.newstore1.fedora.mkhhgr
If this doesn't help, I will probably need some help form the RGW team.
Updated by Dimitri Savineau almost 3 years ago
pgs: 100.000% pgs not active 33 undersized+peered
I'm pretty sure the issue occured because the PGs aren't active+clean.
The default RGW pools (.rgw.root, default.rgw.control, default.rgw.meta and default.rgw.log) should be created during the first RGW instance start.
But if the first RGW pool (.rgw.root) isn't in a correct state then the others aren't created.
@Jiffin Tony Thottan : Would you be able to retest this but, because you have a all-in-one setup, set the osd_pool_default_size variable to 1 before deploying the OSDs ?
ceph config set global osd_pool_default_size 1
Updated by Dimitri Savineau almost 3 years ago
Answering to myself...
# cephadm --image quay.io/ceph/ceph:v16.2.5 bootstrap --mon-ip x.x.x.x --skip-pull --skip-dashboard --skip-monitoring-stack # cephadm shell ceph config set global mon_warn_on_pool_no_redundancy false # cephadm shell ceph config set global osd_pool_default_size 1 # cephadm shell ceph orch apply osd --all-available-devices Scheduled osd.all-available-devices update... # cephadm shell ceph orch apply rgw newstore1 --port=7080 --placement=1 Scheduled rgw.newstore1 update... # cephadm shell ceph orch ls --service_type rgw NAME PORTS RUNNING REFRESHED AGE PLACEMENT rgw.newstore1 ?:7080 1/1 5m ago 5m count:1 # cephadm shell ceph orch ps --service_name rgw.newstore1 NAME HOST PORTS STATUS REFRESHED AGE MEM USE MEM LIM VERSION IMAGE ID CONTAINER ID rgw.newstore1.cephaio.xcyqan cephaio *:7080 running (5m) 5m ago 5m 15.5M - 16.2.5 6933c2a0b7dd c3cf0c0aa3ad
So I think we can close this issue.
Updated by Jiffin Tony Thottan almost 3 years ago
- Description updated (diff)
Dimitri Savineau wrote:
Answering to myself...
[...]
So I think we can close this issue.
I was able to brought up RGW server by setting to osd_pool_default_size. Hence closing this bug.
Thanks for the help.
--
Jiffin
Updated by Jiffin Tony Thottan almost 3 years ago
- Status changed from New to Resolved
Updated by Christian Rohmann over 1 year ago
I just ran into this issue and would like to propose reopening this.
radosgw should clearly log something when starting to set up rados pools
and then an the actual error if not successful.