Project

General

Profile

Actions

Documentation #10848

closed

rgw: federated configuration docs run through

Added by Yehuda Sadeh about 9 years ago. Updated about 9 years ago.

Status:
Closed
Priority:
High
Assignee:
Target version:
-
% Done:

0%

Tags:
Backport:
Reviewed:
Affected Versions:
Pull request ID:

Files

cli_output.png (67.8 KB) cli_output.png Alfredo Deza, 02/23/2015 04:21 PM
Actions #1

Updated by Yehuda Sadeh about 9 years ago

  • Tracker changed from Tasks to Documentation

Verify that the rgw sync agent federated config documentation is up to date.

Actions #2

Updated by Yehuda Sadeh about 9 years ago

  • introduce concepts (what is a zone, region, etc.)
  • when do various settings need to be used (when are settings read from ceph options vs. region and zone config)
  • which pools should be created in each zone?
Actions #3

Updated by Yehuda Sadeh about 9 years ago

  • Priority changed from Normal to High
Actions #4

Updated by Yehuda Sadeh about 9 years ago

  • Assignee set to Alfredo Deza
Actions #5

Updated by Alfredo Deza about 9 years ago

  • Status changed from New to In Progress
Actions #6

Updated by Alfredo Deza about 9 years ago

It is confusing to get started with a guide that insists in four instances (two per region) because the consumer now needs to have to juggle all those variables
(and concepts) while seeping through.

I can't see a reason why we need to insist in 4 gateways. It would be easier to cope with just two instances and two regions.

Created issue #10959

Actions #7

Updated by Alfredo Deza about 9 years ago

Going from top to bottom in the 'configure a master region' section in 'create pools' (http://ceph.com/docs/master/radosgw/federated-config/#create-pools) it is unclear to me
where do I need to do that.

Each instance? Is one instance per cluster? If so, I need to create these pools in one cluster or in both?

As a new user, I have no idea what a pool is, so I follow the links to understand what they are and how to configure them. But that is a lot of information.

Detailed information is good, but for setting up this, do I need to understand every single detail on how to configure a federated gateway? Can't we push the
more verbose information for later? What is the least amount of information I need to move forward?

After going through the details of pools, it does seem like I am required to have knowledge on:
  • pg-num
  • pgp-num
  • replicated vs erasure
  • erasure-code-profile
  • ruleset-name
  • ruleset-number

In the http://docs.ceph.com/docs/master/rados/operations/pools/#create-a-pool section, I am asked to " refer to the Pool, PG and CRUSH Config Reference" before creating them.

That is now three levels deep of highly detailed information I am not sure I need (or want to grasp all at once).

In the config reference for pools there are several links for yet another level of reference, for example to understand what a `pg-num` is, which seems important as the description says that the default is not suitable.

At this point I have five tabs open with docs and still unsure what is the least minimum I need to get things going:

http://docs.ceph.com/docs/master/radosgw/config-ref/#pools
http://docs.ceph.com/docs/master/rados/configuration/pool-pg-config-ref/
http://docs.ceph.com/docs/master/rados/operations/placement-groups/
http://docs.ceph.com/docs/master/rados/operations/pools/#create-a-pool
http://ceph.com/docs/master/radosgw/federated-config/#create-pools

All of this when the documentation said:

See Configuration Reference - Pools for details on the default pools for gateways. See Pools for details on creating pools. Execute the following to create a pool:

And then followed by a command which implies I need to understand all of the variables, since it is instructing the user, imperatively, to run that command:

ceph osd pool create {poolname} {pg-num} {pgp-num} {replicated | erasure} [{erasure-code-profile}]  {ruleset-name} {ruleset-number}

Issue #10991 created.

Actions #8

Updated by Alfredo Deza about 9 years ago

I can't run any ceph commands because there is no keyring from the servers I am attempting to do so.

ubuntu@mira107:~$ sudo ceph osd pool
2015-02-13 10:44:49.186463 7f58413bf700 -1 monclient(hunting): ERROR: missing keyring, cannot use cephx for authentication
2015-02-13 10:44:49.186472 7f58413bf700  0 librados: client.admin initialization error (2) No such file or directory
Error connecting to cluster: ObjectNotFound

But until this point there is no mention of needing to set the keyrings. The keyring section comes after. Ideally this would need to be the first item in the list.

There should be a `note` section if someone is coming from an already setup cluster, where the admin key might exist already.

Issue #10992 created.

Actions #9

Updated by Alfredo Deza about 9 years ago

In the create-pools section (http://ceph.com/docs/master/radosgw/federated-config/#create-pools) a naming convention is explained and an example of that convention
is showed, but it is not clear what should a user create.

I understand (after reading http://docs.ceph.com/docs/master/radosgw/config-ref/#pools ) that it depends on the cluster setup, but for PGs and not for the actual pool names.

So the docs should state that the user should run the command to create a pool for all the ones that appear in the example list, because they otherwise are read
as optional.

Issue #10993 created.

Actions #10

Updated by Alfredo Deza about 9 years ago

Still stuck at attempting to create the pools correctly, trying to find the right values, it is not clear what the docs mean by 'ruleset-number':

ceph osd pool create {poolname} {pg-num} {pgp-num} {replicated | erasure} [{erasure-code-profile}]  {ruleset-name} {ruleset-number}

Going through the two links that point to more 'details' I see no reference to 'ruleset-number' although 'crush-ruleset-name' appears in several places.

The examples on 'How to create a Pool' in http://docs.ceph.com/docs/master/rados/operations/pools/#create-a-pool don't mention anything about a 'ruleset-number'.

So 2 things here:

  • if 'ruleset-name' in the RGW docs is meant to be the same as 'crush-ruleset-name' in the Pools docs, name it the same (and possibly link them directly)
  • if 'ruleset-number' is optional make it clear that it is or if it is dated, then remove it.

Issue #10994 created.

Actions #11

Updated by Alfredo Deza about 9 years ago

Related to trying to get a master slave running I opened:

  • #10877 - CLI error numbers are not described anywhere
  • #10888 - `ceph osd` missing command gives incorrect suggestion
  • #10891 - get all values for a given pool
Actions #12

Updated by Alfredo Deza about 9 years ago

After checking other pools and their values in the existing cluster, I am ready to create the pools but the naming convention explained in the in the 'Naming for the Master Region' does not look like the examples.

The docs assume more than one Ceph Object Gateway per zone and gives example of two per zone:

Finally, let’s assume that zones may have more than one Ceph Object Gateway instance per zone. For continuity, our naming convention will use {region name}-{zone name}-{instance} format, but you can use any naming convention you prefer.

    United States Region, Master Zone, Instance 1: us-east-1
    United States Region, Secondary Zone, Instance 1: us-west-1

But it ends up with final examples that a) are not mentioning both instances per zone and b) are prefaced by a dot

For continuity, our naming convention will use {region name}-{zone name} format prepended to the pool name, but you can use any naming convention you prefer. For example:

    .us-east.rgw.root
    .us-east.rgw.control
    .us-east.rgw.gc
    .us-east.rgw.buckets
    .us-east.rgw.buckets.index
    .us-east.rgw.buckets.extra
    .us-east.log
    .us-east.intent-log
    .us-east.usage
    .us-east.users
    .us-east.users.email
    .us-east.users.swift
    .us-east.users.uid

It is not clear if I am required to preface the pool names with a dot or not.

Asked Yehuda and he said that because of historical reasons, RGW pools are required to start with a dot, otherwise it will not work.

I couldn't find anywhere that mentioned this.

Issue #10995 created.

Actions #13

Updated by Alfredo Deza about 9 years ago

For the Apache configuration, I am using RHEL but the docs only seem to point to Debian-like Apache structures in http://ceph.com/docs/master/radosgw/federated-config/#create-a-gateway-configuration

E.g.:

For each instance, create an Ceph Object Gateway configuration file under the /etc/apache2/sites-available directory on the host(s) where you installed the Ceph Object Gateway daemon(s). 

Searching around I found this section to be a (somewhat lacking) duplicate of http://ceph.com/docs/master/radosgw/config/ which does have configuration examples for RPM-based distros.

Issue #11002 created.

Actions #14

Updated by Alfredo Deza about 9 years ago

The first step to replace the path to the socket in the Apache config mentions I should have the same in ceph.conf but until this point I have not touched
the ceph.conf file.

It is probably assuming implicitly that I have already setup the Gateway? But that would be equally confusing because I went through the installation process. Had to jump back to the guide for the Gateway where it does show where that path should go.

Issue #11003 created.

Actions #15

Updated by Alfredo Deza about 9 years ago

The 'enable the configuration' section assumes a Debian-like setup using `a2ensite` and `a2dissite` references http://ceph.com/docs/master/radosgw/federated-config/#enable-the-configuration

For each instance, enable the gateway configuration and disable the default site.

    Enable the site for the gateway configuration.

    sudo a2ensite {rgw-conf-filename}

    Disable the default site.

    sudo a2dissite default

Note

Failure to disable the default site can lead to problems.
Actions #16

Updated by Alfredo Deza about 9 years ago

The federated config guide doesn't mention that I need to ensure that FastCgiWrapper is Off. I was able to tell this by going side by side with the regular radosgw
guide at http://ceph.com/docs/master/radosgw/config/

Issue #11004 created.

Actions #17

Updated by Alfredo Deza about 9 years ago

Trying to add the instances to the ceph.conf (in section http://ceph.com/docs/master/radosgw/federated-config/#add-instances-to-ceph-config-file ) it uses two conventions to indicate the user should replace values:

For example:

rgw dns name = {hostname}
rgw socket path = /var/run/ceph/$name.sock

For the socket path, I had to go back all the way to the apache configuration example that had the path already. If the docs assumes locations in one place, it would be nice to keep the assumptions throughout for consistency.

It also doesn't explain why these two have different variable names (implying they are in fact different?):

rgw dns name = {hostname}
host = {host-name}

Issue #11005 created.

Actions #18

Updated by Alfredo Deza about 9 years ago

After finishing editing the config file, out of the blue, I am asked to use "ceph-deploy", which was never described as to what it is or where to install it from.

Neither does it mention that to be able to "push" the configuration file I need a proper SSH setup, or that I should be in the same directory where the ceph.conf file is.

Issue #11006 created.

Actions #19

Updated by Alfredo Deza about 9 years ago

The new changes to ceph.conf need to be copied to every ceph node.

I am not sure how to list every node in the cluster, I check the glossary in the docs and it specifies that a node is "Any single machine or server in a Ceph System."

Asking around it looks like I need to run a few commands to get the OSD hosts and the MON hosts:

For OSDs

$ sudo ceph osd tree | grep host | awk '{print $4}'

For MONs

$ sudo ceph mon dump | grep ^[0-9]

And for MDSs

$ sudo ceph mds dump | grep ^[0-9]

There is no way (I am told) to list anything else, so hopefully getting the ceph.conf file to these servers will be enough.

The docs need to specify the expectation here (and if I need to restart any services too).

Opened a feature request to be able to list nodes in a cluster: issue #10904

Actions #20

Updated by Alfredo Deza about 9 years ago

When adding the json file to create the region it is not clear if the command is actually doing what it was intended. The response from the CLI is to dump the contents of the JSON file that was used as an input.

The `radosgw-admin regionmap update` command suffers from the same behavior: it just dumps JSON back to the terminal. I'm not sure if I should look at anything specific.

Created issue #10964

Actions #21

Updated by Alfredo Deza about 9 years ago

Trying to create the system users now, but getting an error from the `radosgw-admin` tool which doesn't tell me much:

$ sudo radosgw-admin user create --uid="magna" --display-name="Region-Magna" --name client.radosgw.magna --system
couldn't init storage provider

The exit status is 5, but then again, not sure what am I doing wrong or anything in the docs that might point to something
I should be checking to troubleshoot:

$ echo $?
5
Actions #22

Updated by Alfredo Deza about 9 years ago

Investigating what might be wrong I see that Apache never restarted. There is a Syntax Error in a configuration file. In this RHEL server
I am asked to check with `systemctl`:

$ sudo service httpd restart
Redirecting to /bin/systemctl restart  httpd.service
Job for httpd.service failed. See 'systemctl status httpd.service' and 'journalctl -xn' for details.

Which points to the exact problem:

$ sudo systemctl status -l httpd.service
httpd.service - The Apache HTTP Server
   Loaded: loaded (/usr/lib/systemd/system/httpd.service; disabled)
   Active: failed (Result: exit-code) since Wed 2015-02-18 08:15:50 PST; 14min ago
  Process: 13642 ExecStop=/bin/kill -WINCH ${MAINPID} (code=exited, status=0/SUCCESS)
  Process: 13640 ExecStart=/usr/sbin/httpd $OPTIONS -DFOREGROUND (code=exited, status=1/FAILURE)
 Main PID: 13640 (code=exited, status=1/FAILURE)
   CGroup: /system.slice/httpd.service

Feb 18 08:15:50 magna123 httpd[13640]: httpd: Syntax error on line 353 of /etc/httpd/conf/httpd.conf: Syntax error on line 30 of /etc/httpd/conf.d/rgw-magna.conf: Expected </!--ServerAlias> but saw </VirtualHost>
Feb 18 08:15:50 magna123 systemd[1]: httpd.service: main process exited, code=exited, status=1/FAILURE
Feb 18 08:15:50 magna123 systemd[1]: Failed to start The Apache HTTP Server.
Feb 18 08:15:50 magna123 systemd[1]: Unit httpd.service entered failed state.

The "comment" in the configuration example for Apache is invalid (comments in Apache configs should start with '#') but they are not for the examples:

FastCgiExternalServer /var/www/html/s3gw.fcgi -socket /var/run/ceph/ceph.radosgw.gateway.fastcgi.sock

<VirtualHost *:80>

    ServerName {fqdn}
    <!--Remove the comment. Add a server alias with *.{fqdn} for S3 subdomains-->
    <!--ServerAlias *.{fqdn}-->
    ServerAdmin {email.address}
    DocumentRoot /var/www/html
    RewriteEngine On
    RewriteRule  ^/(.*) /s3gw.fcgi?%{QUERY_STRING} [E=HTTP_AUTHORIZATION:%{HTTP:Authorization},L]
...

Properly commenting those out fixed the problem for restarting apache

Created issue #10943

Actions #23

Updated by Alfredo Deza about 9 years ago

Attempting to start the radosgw daemon results in failure and I am unable to tell why.

I try to look for logs for the failure but those don't exist. Looking into the init script, it seems it
reads log locations from ceph.conf. I go to ceph.conf and verified that there is no log defined which is unexpected
since I used this section to configure the daemon: http://ceph.com/docs/master/radosgw/federated-config/#add-instances-to-ceph-config-file

That section doesn't mention a log location.

I add:

log file = /var/log/radosgw/client.radosgw.magna.log

And try to start the daemon again resulting in failure. The log is unfortunately empty.

[ubuntu@magna123 configs]$ sudo service radosgw start
Redirecting to /bin/systemctl start  radosgw.service
Job for ceph-radosgw.service failed. See 'systemctl status ceph-radosgw.service' and 'journalctl -xn' for details.
[ubuntu@magna123 configs]$ echo $?
1

`systemctl` is unable to explain what is going on:

[ubuntu@magna123 configs]$ sudo systemctl status ceph-radosgw.service
ceph-radosgw.service - LSB: radosgw RESTful rados gateway
   Loaded: loaded (/etc/rc.d/init.d/ceph-radosgw)
   Active: failed (Result: exit-code) since Wed 2015-02-18 12:57:19 PST; 12s ago
  Process: 16160 ExecStart=/etc/rc.d/init.d/ceph-radosgw start (code=exited, status=1/FAILURE)

Feb 18 12:57:18 magna123 ceph-radosgw[16160]: Starting radosgw instance(s)...
Feb 18 12:57:19 magna123 ceph-radosgw[16160]: /bin/radosgw is not running.
Feb 18 12:57:19 magna123 systemd[1]: ceph-radosgw.service: control process exited, code=exited status=1
Feb 18 12:57:19 magna123 systemd[1]: Failed to start LSB: radosgw RESTful rados gateway.
Feb 18 12:57:19 magna123 systemd[1]: Unit ceph-radosgw.service entered failed state.

And the logs are empty:

[ubuntu@magna123 configs]$ tail /var/log/radosgw/client.radosgw.magna.log

Created issue #10927

Actions #24

Updated by Alfredo Deza about 9 years ago

Brought Yehuda to give me a hand and he had to add a `-x` to the init script (!) to actually get some output as to
why the daemon isn't starting.

The init script now sent errors to `/var/log/messages` where we saw:

Feb 20 13:42:25 magna123 ceph-radosgw: ++ ceph-conf -n client.radosgw.magna host
Feb 20 13:42:25 magna123 ceph-radosgw: + host=magna123.ceph.redhat.com
Feb 20 13:42:25 magna123 ceph-radosgw: ++ hostname -s
Feb 20 13:42:25 magna123 ceph-radosgw: + hostname=magna123
Feb 20 13:42:25 magna123 ceph-radosgw: + '[' magna123.ceph.redhat.com '!=' magna123 ']'

The init script (or the logs) need to be explicit and explain why there was an error: "fatal error: expected $some-hostname, but got $other-hostname from $config-file"

We had to change `ceph.conf` to do:

host = magna123 # magna123.ceph.redhat.com

Never in the docs it was mentioned that we were required to use the output of `shortname -s`, but even if it did, the user should be able to tell what is going on from an error report of any sort.

Issue #11007 created.

Actions #25

Updated by Alfredo Deza about 9 years ago

It looks like at some point I missed creating a zone for the region I had. This wasn't clear anywhere (logs or CLI output). Yehuda seemed to know to run:

# radosgw-admin -n client.radosgw.magna zone list
{ "zones": []}

I create a zone, and get the odd output from the CLI (attached) which doesn't really explain what is going on

Issue #11008 created.

Actions #26

Updated by Alfredo Deza about 9 years ago

I am completely unable to start the daemon, I have tailed the logs to no avail, and the CLI is completely silent as to what is going on (other than systemctl saying it exited with status 1):

[ubuntu@magna123 ~]$ tail -f /var/log/radosgw/client.radosgw.magna.log
2015-02-24 10:30:05.587462 7f4a9d058880  0 ceph version 0.87 (c51c8f9d80fa4e0168aa52685b8de40e42758578), process radosgw, pid 5972
2015-02-24 10:30:05.601140 7f4a9d058880 -1 Couldn't init storage provider (RADOS)
2015-02-24 10:30:58.755760 7fdfe7285880  0 ceph version 0.87 (c51c8f9d80fa4e0168aa52685b8de40e42758578), process radosgw, pid 6089
2015-02-24 10:30:58.805145 7fdfe7285880 -1 Couldn't init storage provider (RADOS)
2015-02-24 10:31:39.841528 7f264cd32880  0 ceph version 0.87 (c51c8f9d80fa4e0168aa52685b8de40e42758578), process radosgw, pid 6193
2015-02-24 10:31:39.871321 7f264cd32880 -1 Couldn't init storage provider (RADOS)
2015-02-24 10:32:48.810165 7fcb27e98880  0 ceph version 0.87 (c51c8f9d80fa4e0168aa52685b8de40e42758578), process radosgw, pid 6283
2015-02-24 10:32:48.900493 7fcb27e98880 -1 Couldn't init storage provider (RADOS)
2015-02-24 11:22:48.393405 7fa6b4c9a880  0 ceph version 0.87 (c51c8f9d80fa4e0168aa52685b8de40e42758578), process radosgw, pid 7809
2015-02-24 11:22:48.519571 7fa6b4c9a880 -1 Couldn't init storage provider (RADOS)

Starting manually gives the same results

[ubuntu@magna123 ~]$ sudo radosgw -c /etc/ceph/ceph.conf -d -n client.radosgw.magna
2015-02-24 12:10:20.872719 7fecf52a2880  0 ceph version 0.87 (c51c8f9d80fa4e0168aa52685b8de40e42758578), process radosgw, pid 9508
2015-02-24 12:10:20.885567 7fecf52a2880 -1 Couldn't init storage provider (RADOS)
[ubuntu@magna123 ~]$
Actions #27

Updated by Alfredo Deza about 9 years ago

After hours of trying to get the daemon to start and looking at logs (and setting `-x` on the init script) I decided to backtrack all the way to the beginning
and check if I missed out on something.

It turns out there was a configuration value that I'd missed and I had not updated it to reflect that I am not using multiple instances/zones as the configuration infers:

[client.radosgw.magna]
...
rgw zone = us-east

Had to be changed to:

rgw zone = magna

It is critical that something reports back but the logs were empty, the init script never said anything beyond 'Couldn't init storage provider' and the docs
never specified what to look for at all.

After this change the daemon was able to start although it had errors (that I am still unsure how to go about):

$ sudo radosgw -c /etc/ceph/ceph.conf -d -n client.radosgw.magna
2015-02-25 07:51:10.729264 7f271c758880  0 ceph version 0.87 (c51c8f9d80fa4e0168aa52685b8de40e42758578), process radosgw, pid 15092
2015-02-25 07:51:13.763835 7f26b6bf9700  0 ERROR: can't get key: ret=-2
2015-02-25 07:51:13.763846 7f26b6bf9700  0 ERROR: sync_all_users() returned ret=-2
2015-02-25 07:51:13.764035 7f271c758880  0 framework: fastcgi
2015-02-25 07:51:13.764043 7f271c758880  0 framework: civetweb
2015-02-25 07:51:13.764051 7f271c758880  0 framework conf key: port, val: 7480
2015-02-25 07:51:13.764066 7f271c758880  0 starting handler: civetweb
2015-02-25 07:51:13.766730 7f271c758880  0 starting handler: fastcgi
2015-02-25 07:52:13.991972 7f26b52f5700  1 ====== starting new request req=0x7f2684005760 =====
2015-02-25 07:52:14.012380 7f26b52f5700  1 ====== req done req=0x7f2684005760 http_status=200 ======
2015-02-25 07:52:14.012428 7f26b52f5700  1 civetweb: 0x7f26840008c0: 10.3.112.75 - - [25/Feb/2015:07:52:13 -0800] "GET / HTTP/1.1" -1 0 - Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:35.0) Gecko/20100101 Firefox/35.0

Created issue #10953

Actions #28

Updated by Alfredo Deza about 9 years ago

  • Status changed from In Progress to Closed
Actions

Also available in: Atom PDF