Project

General

Profile

Bug #51642

cephadm/rgw : RGW server is not coming up: Initialization timeout, failed to initialize

Added by Jiffin Tony Thottan 7 months ago. Updated 7 months ago.

Status:
Resolved
Priority:
High
Assignee:
-
Category:
cephadm/rgw
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

I have created the ceph cluster via cephadm and it looks fine but when I tried to deploy RGW it was failing and does not much info in logs regarding the failure(says failed to intialise). Any help in debugging will be much appreciated.

  • Bootstrap ceph cluster via cephadm bootstrap --mon-ip<IP> in fedora 34 VM which uses podman
  • add host as VM to ceph cluster
  • add three storage disks and create osd's via ceph orch daemon add osd <host>:*<device-path>*
  • now deployed rgw via ceph orch apply rgw newstore1 --port=7080 --placement=1
  1. ceph status
    cluster:
    id: bd183ab8-dfd8-11eb-83cf-525400bf929c
    health: HEALTH_WARN
    1 failed cephadm daemon(s)
    Reduced data availability: 33 pgs inactive
    Degraded data redundancy: 33 pgs undersized

    services:
    mon: 1 daemons, quorum fedora (age 85m)
    mgr: fedora.vrlaif(active, since 85m)
    osd: 3 osds: 3 up (since 85m), 3 in (since 5d)

    data:
    pools: 2 pools, 33 pgs
    objects: 0 objects, 0 B
    usage: 17 MiB used, 30 GiB / 30 GiB avail
    pgs: 100.000% pgs not active
    33 undersized+peered

  1. ceph osd pool ls
    device_health_metrics
    .rgw.root
  1. ceph orch ps
    NAME HOST PORTS STATUS REFRESHED AGE VERSION IMAGE ID CONTAINER ID
    alertmanager.fedora fedora *:9093,9094 running (99m) 7m ago 5d 0.20.0 0881eb8f169f 97a9a4130594
    crash.fedora fedora running (99m) 7m ago 5d 16.2.4 8d91d370c2b8 7eab2bef9692
    grafana.fedora fedora *:3000 running (99m) 7m ago 5d 6.7.4 ae5c36c3d3cd 5e98deb7ac7a
    mgr.fedora.vrlaif fedora *:9283 running (99m) 7m ago 5d 16.2.4 8d91d370c2b8 f33d98b6edd7
    mon.fedora fedora running (99m) 7m ago 5d 16.2.4 8d91d370c2b8 c7e1168011c6
    node-exporter.fedora fedora *:9100 running (99m) 7m ago 5d 0.18.1 e5a616e4b9cf a397fbac7bb0
    osd.0 fedora running (99m) 7m ago 5d 16.2.4 8d91d370c2b8 fc1c1f9ac6dc
    osd.1 fedora running (99m) 7m ago 5d 16.2.4 8d91d370c2b8 5379d795c1fb
    osd.2 fedora running (99m) 7m ago 5d 16.2.4 8d91d370c2b8 99c5d3447c44
    prometheus.fedora fedora *:9095 running (99m) 7m ago 5d 2.18.1 de242295e225 e2e71c2f9bcf
    rgw.newstore1.fedora.mkhhgr fedora *:7080 error 7m ago 4d <unknown> <unknown> <unknown>

attaching logs, rgw pod spec and systemctl status of rgw as file

systemcctl_status - systemctl status (1.89 KB) Jiffin Tony Thottan, 07/13/2021 12:20 PM

latest_podspec - podspec (16.7 KB) Jiffin Tony Thottan, 07/13/2021 12:20 PM

cephadm.log View - cephadmlog (38.8 KB) Jiffin Tony Thottan, 07/13/2021 12:20 PM

ceph-client.rgw.newstore1.fedora.mkhhgr.log View - rgwlog (5.33 KB) Jiffin Tony Thottan, 07/13/2021 12:21 PM

ceph.tar.xz - full logs (704 KB) Jiffin Tony Thottan, 07/13/2021 12:24 PM

History

#1 Updated by Sebastian Wagner 7 months ago

  • Description updated (diff)

#2 Updated by Sebastian Wagner 7 months ago

  • Subject changed from cephadm/rgw : RGW server is not coming up to cephadm/rgw : RGW server is not coming up: Initialization timeout, failed to initialize

#3 Updated by Sebastian Wagner 7 months ago

the rgw log looks like so:

2021-07-13T10:39:08.749+0000 7f8f5770b440  0 deferred set uid:gid to 167:167 (ceph:ceph)
2021-07-13T10:39:08.749+0000 7f8f5770b440  0 ceph version 16.2.4 (3cbe25cde3cfa028984618ad32de9edc4c1eaed0) pacific (stable), process radosgw, pid 2
2021-07-13T10:39:08.749+0000 7f8f5770b440  0 framework: beast
2021-07-13T10:39:08.749+0000 7f8f5770b440  0 framework conf key: port, val: 7080
2021-07-13T10:39:08.749+0000 7f8f5770b440  1 radosgw_Main not setting numa affinity
2021-07-13T10:44:08.750+0000 7f8f4399b700 -1 Initialization timeout, failed to initialize
(repeates 5 times)

#4 Updated by Sebastian Wagner 7 months ago

Try running

ceph orch daemon rm rgw.newstore1.fedora.mkhhgr

If this doesn't help, I will probably need some help form the RGW team.

#5 Updated by Dimitri Savineau 7 months ago

    pgs:     100.000% pgs not active
             33 undersized+peered

I'm pretty sure the issue occured because the PGs aren't active+clean.

The default RGW pools (.rgw.root, default.rgw.control, default.rgw.meta and default.rgw.log) should be created during the first RGW instance start.

But if the first RGW pool (.rgw.root) isn't in a correct state then the others aren't created.

@Jiffin : Would you be able to retest this but, because you have a all-in-one setup, set the osd_pool_default_size variable to 1 before deploying the OSDs ?

ceph config set global osd_pool_default_size 1

#6 Updated by Dimitri Savineau 7 months ago

Answering to myself...

# cephadm --image quay.io/ceph/ceph:v16.2.5 bootstrap --mon-ip x.x.x.x --skip-pull --skip-dashboard --skip-monitoring-stack
# cephadm shell ceph config set global mon_warn_on_pool_no_redundancy false
# cephadm shell ceph config set global osd_pool_default_size 1
# cephadm shell ceph orch apply osd --all-available-devices
Scheduled osd.all-available-devices update...
# cephadm shell ceph orch apply rgw newstore1 --port=7080 --placement=1
Scheduled rgw.newstore1 update...
# cephadm shell ceph orch ls --service_type rgw
NAME           PORTS   RUNNING  REFRESHED  AGE  PLACEMENT  
rgw.newstore1  ?:7080      1/1  5m ago     5m   count:1
# cephadm shell ceph orch ps --service_name rgw.newstore1
NAME                          HOST     PORTS   STATUS        REFRESHED  AGE  MEM USE  MEM LIM  VERSION  IMAGE ID      CONTAINER ID  
rgw.newstore1.cephaio.xcyqan  cephaio  *:7080  running (5m)     5m ago   5m    15.5M        -  16.2.5   6933c2a0b7dd  c3cf0c0aa3ad

So I think we can close this issue.

#7 Updated by Jiffin Tony Thottan 7 months ago

  • Description updated (diff)

Dimitri Savineau wrote:

Answering to myself...

[...]

So I think we can close this issue.

I was able to brought up RGW server by setting to osd_pool_default_size. Hence closing this bug.
Thanks for the help.
--
Jiffin

#8 Updated by Jiffin Tony Thottan 7 months ago

  • Status changed from New to Resolved

Also available in: Atom PDF