Project

General

Profile

Actions

Feature #54972

open

[Feature specification] Introduce Storage classes in usage stats

Added by Rafael Weingartner about 2 years ago. Updated 4 months ago.

Status:
New
Priority:
Normal
Assignee:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Reviewed:
Affected Versions:
Pull request ID:

Description

Introduce Storage classes in usage stats

This specification proposes the addition of storage classes information in the GET "/admin/usage" and "GET /bucket" data. The goal of the specification is to validate the proposals here presented with the community before we implement and create the pull requests. Any feeedback is welcome here.

Problem Description

In RadosGW one can configure different storage classes to provide different quality of services (QoS); for instance, using HDDs to store non-critical and non-latency sensitive information at a lower cost. On the other hand, it is also possible to use a custom storage class to store objects in NVMe storage pools to provide better response times. It is also possible to store data in erasure code pools, which in consequence will use less storage space, and may have some impact on performance. There are many other possibilities and different use cases that can be achieved with the use of storage classes. We just mentioned a few of them here.

One example is the following configuration (obtained with "radosgw-admin zone get").

{
    ..
    Many data that we do not care for now here
    ..
    "placement_pools": [
        {
            "key": "default-placement",
            "val": {
                "index_pool": "default.rgw.buckets.index",
                "storage_classes": {
                    "STANDARD": {
                        "data_pool": "rgw.buckets.hdd" 
                    },
                    "STANDARD_EC": {
                        "data_pool": "rgw.buckets.hdd.data_ec" 
                    },
                    "SSD": {
                        "data_pool": "rgw.buckets.ssd" 
                    },
                    "NVME": {
                        "data_pool": "rgw.buckets.nvme" 
                    }
                },
                "data_extra_pool": "default.rgw.buckets.non-ec",
                "index_type": 0
            }
        }
    ]
}

Everything works fine with respect to RadosGW APIs to upload object using specific storage classes. However, there is no practical method to discover/find out the volume of data and number of objects in a bucket that are using the storage classes provided in the RadosGW. As follows, we can see one example of the output of the buckets stats API.

{
    "bucket": "bucket1",
    "num_shards": 11,
    "tenant": "",
    "zonegroup": "2a761aef-610c-4da1-9eff-f5287252c513",
    "placement_rule": "default-placement",
    "explicit_placement": {
        "data_pool": "",
        "data_extra_pool": "",
        "index_pool": "" 
    },
    "id": "1f878f22-f69c-486b-81de-144920efcbdb.55411.2",
    "marker": "1f878f22-f69c-486b-81de-144920efcbdb.55411.2",
    "index_type": "Normal",
    "owner": "rafael",
    "ver": "0#201,1#1,2#201,3#1,4#201,5#799,6#799,7#400,8#401,9#1,10#401",
    "master_ver": "0#0,1#0,2#0,3#0,4#0,5#0,6#0,7#0,8#0,9#0,10#0",
    "mtime": "2021-11-26T11:23:47.375664Z",
    "creation_time": "2021-11-26T11:23:47.363417Z",
    "max_marker": "0#,1#,2#,3#,4#,5#,6#,7#,8#,9#,10#",
    "usage": {
        "rgw.main": {
            "size": 9215803392,
            "size_actual": 9215803392,
            "size_utilized": 9215803392,
            "size_kb": 8999808,
            "size_kb_actual": 8999808,
            "size_kb_utilized": 8999808,
            "num_objects": 3
        },
        "rgw.multimeta": {
            "size": 0,
            "size_actual": 0,
            "size_utilized": 0,
            "size_kb": 0,
            "size_kb_actual": 0,
            "size_kb_utilized": 0,
            "num_objects": 0
        }
    },
    "bucket_quota": {
        "enabled": false,
        "check_on_raw": false,
        "max_size": -1,
        "max_size_kb": 0,
        "max_objects": -1
    }
}

As shown above, there is no indication regarding the storage classes used to store data. The same situation happens with the usage admin API. There is no indication regarding the storage classes of objects in operations to upload/download objects to/from RadosGW. As follows, we show an example of output of such an API.

{
    "entries": [
        {
            "buckets": [
                {
                    "bucket": "",
                    "categories": [
                        {
                            "bytes_received": 0,
                            "bytes_sent": 35015,
                            "category": "list_buckets",
                            "ops": 45,
                            "successful_ops": 45
                        }
                    ],
                    "epoch": 1609520400,
                    "owner": "72431a5a-4a34-4319-a7f5-5f150f370284",
                    "time": "2021-01-01 17:00:00.000000Z" 
                },
                {
                    "bucket": "-",
                    "categories": [
                        {
                            "bytes_received": 0,
                            "bytes_sent": 293,
                            "category": "get_bucket_policy",
                            "ops": 1,
                            "successful_ops": 0
                        }
                    ],
                    "epoch": 1609866000,
                    "owner": "72431a5a-4a34-4319-a7f5-5f150f370284",
                    "time": "2021-01-05 17:00:00.000000Z" 
                },
                {
                    "bucket": "bucket_test-1",
                    "categories": [
                        {
                            "bytes_received": 0,
                            "bytes_sent": 0,
                            "category": "create_bucket",
                            "ops": 1,
                            "successful_ops": 1
                        },
                        {
                            "bytes_received": 0,
                            "bytes_sent": 6669,
                            "category": "get_bucket_policy",
                            "ops": 27,
                            "successful_ops": 27
                        },
                        {
                            "bytes_received": 0,
                            "bytes_sent": 0,
                            "category": "put_bucket_policy",
                            "ops": 1,
                            "successful_ops": 1
                        }
                    ],
                    "epoch": 1609862400,
                    "owner": "72431a5a-4a34-4319-a7f5-5f150f370284",
                    "time": "2021-01-05 16:00:00.000000Z" 
                }
            ],
            "user": "e6bde5e6-718c-4137-8438-f99883388042" 
        }
    ],
    "summary": [
        {
            "categories": [
                {
                    "bytes_received": 0,
                    "bytes_sent": 8,
                    "category": "list_buckets",
                    "ops": 4,
                    "successful_ops": 4
                }
            ],
            "total": {
                "bytes_received": 0,
                "bytes_sent": 8,
                "ops": 4,
                "successful_ops": 4
            },
            "user": "e6bde5e6-718c-4137-8438-f99883388042" 
        }]
}

Having said all that, the use of storage classes in RadosGW works, but it (RadosGW) does not provide a mechanism to enable billing/rating objects stored and operations that affect objects that have different storage classes. This has also already been reported in https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/D7GDNGCJYJMDF5JP74MOGC6EYLKZ3S7S/, and later registered as a feature request via: https://tracker.ceph.com/issues/47342.

Proposed Change

To address the reported problem, we propose to extend the admin usage API (/admin/usage), which uses usage data stored in the Ceph file system regarding operations (PUT/POST/DELETE/LIST/and so on). These operations are grouped/counted by user and bucket fashion. They are also stored. This means that every interaction with RadosGW API generates a usage entry, which is persisted in Ceph itself. To extend this part, We would need to load the storage class of the object being handled (for methods PUT/POST/DELETE); only operations for objects would be affected here, and then extend the usage entry to hold this new attribute, and later consider it when aggregating data that is presented in the response of the request.

One example of a response for the API with the proposed changes are the following:


{
    "entries": [
        {
            "buckets": [
                {
                    "bucket": "",
                    "categories": [
                        {
                            "bytes_received": 0,
                            "bytes_sent": 35015,
                            "category": "list_buckets",
                            "ops": 45,
                            "successful_ops": 45
                        }
                    ],
                    "epoch": 1609520400,
                    "owner": "72431a5a-4a34-4319-a7f5-5f150f370284",
                    "time": "2021-01-01 17:00:00.000000Z" 
                },
                {
                    "bucket": "-",
                    "categories": [
                        {
                            "bytes_received": 0,
                            "bytes_sent": 293,
                            "category": "get_bucket_policy",
                            "ops": 1,
                            "successful_ops": 0
                        }
                    ],
                    "epoch": 1609866000,
                    "owner": "72431a5a-4a34-4319-a7f5-5f150f370284",
                    "time": "2021-01-05 17:00:00.000000Z" 
                },
                {
                    "bucket": "bucket_test-1",
                    "categories": [
                        {
                            "bytes_received": 0,
                            "bytes_sent": 0,
                            "category": "create_bucket",
                            "ops": 1,
                            "successful_ops": 1
                        },
                        {
                            "bytes_received": 0,
                            "bytes_sent": 6669,
                            "category": "get_bucket_policy",
                            "ops": 27,
                            "successful_ops": 27
                        },
                        {
                            "bytes_received": 0,
                            "bytes_sent": 0,
                            "category": "put_bucket_policy",
                            "ops": 1,
                            "successful_ops": 1
                        }
                    ],
                    "categories-<storage-class-name>": [
                         {
                            "bytes_received": 129836684,
                            "bytes_sent": 0,
                            "category": "put_obj",
                            "ops": 323,
                            "successful_ops": 323
                        },
                        {
                            "bytes_received": 0,
                            "bytes_sent": 119318246,
                            "category": "get_obj",
                            "ops": 17956,
                            "successful_ops": 17956
                        }
                      <Many other operations here>
                                ],
                    "epoch": 1609862400,
                    "owner": "72431a5a-4a34-4319-a7f5-5f150f370284",
                    "time": "2021-01-05 16:00:00.000000Z" 
                }
            ],
            "user": "e6bde5e6-718c-4137-8438-f99883388042" 
        }
    ],
    "summary": [
        {
            "categories": [
                {
                    "bytes_received": 0,
                    "bytes_sent": 8,
                    "category": "list_buckets",
                    "ops": 4,
                    "successful_ops": 4
                }
            ],
            "categories-<storage-class-name>": [
                 {
                    "bytes_received": 129836684,
                    "bytes_sent": 0,
                    "category": "put_obj",
                    "ops": 323,
                    "successful_ops": 323
                },
                {
                    "bytes_received": 0,
                    "bytes_sent": 119318246,
                    "category": "get_obj",
                    "ops": 17956,
                    "successful_ops": 17956
                }
              <Many other operations here>
            ],
            "total": {
                "bytes_received": 0,
                "bytes_sent": 8,
                "ops": 4,
                "successful_ops": 4
            },
            "user": "e6bde5e6-718c-4137-8438-f99883388042" 
        }]
}

As one can see, new entries would be created with the pattern "category-<storage-class-name>", where the data regarding the operations that affected objects with the given category is presented.

The other API, which is normally used via "radosgw-admin bucket stats --bucket=bucket1" (/bucket), which presents the total amount of resources (objects) that the bucket has. This API is implemented by checking the objects/files stored in the bucket in Ceph, and then counting/summing them up. The objects (when using different storage classes) would be stored in different storage pools. Therefore, we can distinguish between them, and provide more granular accounting based on the storage classes being used in a bucket. One example of a response for the API with the proposed changes are the following:


{
    "bucket": "bucket1",
    "num_shards": 11,
    "tenant": "",
    "zonegroup": "2a761aef-610c-4da1-9eff-f5287252c513",
    "placement_rule": "default-placement",
    "explicit_placement": {
        "data_pool": "",
        "data_extra_pool": "",
        "index_pool": "" 
    },
    "id": "1f878f22-f69c-486b-81de-144920efcbdb.55411.2",
    "marker": "1f878f22-f69c-486b-81de-144920efcbdb.55411.2",
    "index_type": "Normal",
    "owner": "rafael",
    "ver": "0#201,1#1,2#201,3#1,4#201,5#799,6#799,7#400,8#401,9#1,10#401",
    "master_ver": "0#0,1#0,2#0,3#0,4#0,5#0,6#0,7#0,8#0,9#0,10#0",
    "mtime": "2021-11-26T11:23:47.375664Z",
    "creation_time": "2021-11-26T11:23:47.363417Z",
    "max_marker": "0#,1#,2#,3#,4#,5#,6#,7#,8#,9#,10#",
    "usage": {
        "rgw.main": {
            "size": 9215803392,
            "size_actual": 9215803392,
            "size_utilized": 9215803392,
            "size_kb": 8999808,
            "size_kb_actual": 8999808,
            "size_kb_utilized": 8999808,
            "num_objects": 3
        },
        "rgw.multimeta": {
            "size": 0,
            "size_actual": 0,
            "size_utilized": 0,
            "size_kb": 0,
            "size_kb_actual": 0,
            "size_kb_utilized": 0,
            "num_objects": 0
        }
        "rgw.storage-classes": [
        {
            "name": "STANDARD" 
            "size": 9215803392,
            "size_actual": 9215803392,
            "size_utilized": 9215803392,
            "size_kb": 8999808,
            "size_kb_actual": 8999808,
            "size_kb_utilized": 8999808,
            "num_objects": 3
        },
        {
            "name": "STANDARD_EC" 
            "size": 0,
            "size_actual": 0,
            "size_utilized": 0,
            "size_kb": 0,
            "size_kb_actual": 0,
            "size_kb_utilized": 0,
            "num_objects": 0
        },
        {
            "name": "SSD" 
            "size": 0,
            "size_actual": 0,
            "size_utilized": 0,
            "size_kb": 0,
            "size_kb_actual": 0,
            "size_kb_utilized": 0,
            "num_objects": 0
        },
        {
            "name": "NVME" 
            "size": 0,
            "size_actual": 0,
            "size_utilized": 0,
            "size_kb": 0,
            "size_kb_actual": 0,
            "size_kb_utilized": 0,
            "num_objects": 0
        }]
    },
    "bucket_quota": {
        "enabled": false,
        "check_on_raw": false,
        "max_size": -1,
        "max_size_kb": 0,
        "max_objects": -1
    }
}

Then, one would be able to see how many objects, their size, and other information for each one of the storage classes that the objects stored in a bucket have.

Actions

Also available in: Atom PDF