Project

General

Profile

Actions

Bug #50839

open

radosgw returns error after upgrade from luminous to nautilus

Added by Marcin Gibula almost 3 years ago. Updated almost 3 years ago.

Status:
New
Priority:
Normal
Assignee:
Or Friedmann
Target version:
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

I've tried to upgrade a very old (dating back to firefly at least) RGW cluster from luminous to nautilus. Everything went fine until I've upgraded radosgw - it immediately started to return errors.
In logs there are these messages:

2021-05-17 13:25:54.101 7f304698b700 0 req 12 0.000s NOTICE: invalid dest placement:
2021-05-17 13:25:57.009 7f304698b700 0 req 13 0.000s NOTICE: invalid dest placement:
2021-05-17 13:25:57.101 7f304698b700 0 req 14 0.000s NOTICE: invalid dest placement:
2021-05-17 13:26:00.013 7f304698b700 0 req 15 0.000s NOTICE: invalid dest placement:
2021-05-17 13:26:00.105 7f304698b700 0 req 16 0.000s NOTICE: invalid dest placement:
2021-05-17 13:26:03.017 7f304698b700 0 req 17 0.004s NOTICE: invalid dest placement:

I managed to keep production running by using luminous radosgw binaries.

I've found this ticket https://tracker.ceph.com/issues/43172 - but nothing that would look like a solution. I think it's rather serious issue for anyone running old clusters.

Actions #1

Updated by Or Friedmann almost 3 years ago

  • Assignee set to Or Friedmann
Actions #2

Updated by Konstantin Shalygin almost 3 years ago

Marcin, please paste yours

radosgw-admin zone get --rgw-zone=default
output
and also your ceph.conf for rgws and
ceph config dump
output

Thanks

Actions #3

Updated by Marcin Gibula almost 3 years ago

$ radosgw-admin zone get --rgw-zone=default

{
    "id": "default",
    "name": "default",
    "domain_root": ".rgw",
    "control_pool": ".rgw.control",
    "gc_pool": ".rgw.gc",
    "lc_pool": ".log:lc",
    "log_pool": ".log",
    "intent_log_pool": ".intent-log",
    "usage_log_pool": ".usage",
    "reshard_pool": ".log:reshard",
    "user_keys_pool": ".users",
    "user_email_pool": ".users.email",
    "user_swift_pool": ".users.swift",
    "user_uid_pool": ".users.uid",
    "otp_pool": "default.rgw.otp",
    "system_key": {
        "access_key": "",
        "secret_key": "" 
    },
    "placement_pools": [
        {
            "key": "default-placement",
            "val": {
                "index_pool": ".rgw.buckets3",
                "storage_classes": {
                    "STANDARD": {
                        "data_pool": ".rgw.buckets3" 
                    }
                },
                "data_extra_pool": "",
                "index_type": 0
            }
        },
        {
            "key": "ec-placement",
            "val": {
                "index_pool": ".rgw.buckets.index",
                "storage_classes": {
                    "STANDARD": {
                        "data_pool": ".rgw.buckets-ec" 
                    }
                },
                "data_extra_pool": ".rgw.buckets-ec.extra",
                "index_type": 0
            }
        },
        {
            "key": "flash-placement",
            "val": {
                "index_pool": ".rgw.buckets.index",
                "storage_classes": {
                    "STANDARD": {
                        "data_pool": ".rgw.buckets3" 
                    }
                },
                "data_extra_pool": "",
                "index_type": 0
            }
        }
    ],
    "metadata_heap": "",
    "realm_id": "" 
}

$ ceph config dump
WHO MASK LEVEL OPTION VALUE RO
mgr advanced mgr/prometheus/server_port 5606 *
mgr advanced mgr/zabbix/identifier mon-rados-01 *
mgr advanced mgr/zabbix/zabbix_host 10.9.255.104 *
mgr advanced mgr/zabbix/zabbix_port 10051 *
mgr advanced mgr/zabbix/zabbix_sender /usr/bin/zabbix_sender *

Also, problematic buckets are all old ones, with empty placement_rule in bucket metadata. Just like in #43172

Actions #4

Updated by Konstantin Shalygin almost 3 years ago

I think this is problem with default placement

            "key": "default-placement",
            "val": {
                "index_pool": ".rgw.buckets3",
                "storage_classes": {
                    "STANDARD": {
                        "data_pool": ".rgw.buckets3" 
                    }
                },
                "data_extra_pool": "",
                "index_type": 0
            }

Your "ec-placement" and "flash-placement" have a ".rgw.buckets.index" pool for indexes, but "default-placement" use ".rgw.buckets3". You should fix this first, I think

Actions #5

Updated by Marcin Gibula almost 3 years ago

I don't want it in a different pool. And how does this change the fact that luminous rgw works perfectly and nautilus version does not? Backward compatibility is an issue here.

Actions #6

Updated by Konstantin Shalygin almost 3 years ago

Please, try to run nautilus rgw with debug_rgw=20

Actions #7

Updated by Marcin Gibula almost 3 years ago

Here are logs from failed request to pre-jewel bucket

2021-05-28 13:37:27.612 7f9fccadd700  2 RGWDataChangesLog::ChangesRenewThread: start
2021-05-28 13:37:32.200 7f9fb52ae700 20 HTTP_ACCEPT=*/*
2021-05-28 13:37:32.200 7f9fb52ae700 20 HTTP_HOST=127.0.0.1:8076
2021-05-28 13:37:32.200 7f9fb52ae700 20 HTTP_USER_AGENT=curl/7.47.0
2021-05-28 13:37:32.200 7f9fb52ae700 20 HTTP_VERSION=1.1
2021-05-28 13:37:32.200 7f9fb52ae700 20 REMOTE_ADDR=127.0.0.1
2021-05-28 13:37:32.200 7f9fb52ae700 20 REQUEST_METHOD=GET
2021-05-28 13:37:32.200 7f9fb52ae700 20 REQUEST_URI=/zabbix/photo.png
2021-05-28 13:37:32.200 7f9fb52ae700 20 SCRIPT_URI=/zabbix/photo.png
2021-05-28 13:37:32.200 7f9fb52ae700 20 SERVER_PORT=8076
2021-05-28 13:37:32.200 7f9fb52ae700  1 ====== starting new request req=0x7f9fb52a7750 =====
2021-05-28 13:37:32.200 7f9fb52ae700  2 req 2 0.000s initializing for trans_id = tx000000000000000000002-0060b0d5fc-25c01f7b-default
2021-05-28 13:37:32.200 7f9fb52ae700 10 rgw api priority: s3=3 s3website=-1
2021-05-28 13:37:32.200 7f9fb52ae700 10 host=127.0.0.1
2021-05-28 13:37:32.200 7f9fb52ae700 20 subdomain= domain= in_hosted_domain=0 in_hosted_domain_s3website=0
2021-05-28 13:37:32.212 7f9fb52ae700 -1 res_query() failed
2021-05-28 13:37:32.212 7f9fb52ae700 20 final domain/bucket subdomain= domain= in_hosted_domain=0 in_hosted_domain_s3website=0 s->info.domain= s->info.request_uri=/zabbix/photo.png
2021-05-28 13:37:32.212 7f9fb52ae700 20 get_handler handler=22RGWHandler_REST_Obj_S3
2021-05-28 13:37:32.212 7f9fb52ae700 10 handler=22RGWHandler_REST_Obj_S3
2021-05-28 13:37:32.212 7f9fb52ae700  2 req 2 0.012s getting op 0
2021-05-28 13:37:32.212 7f9fb52ae700 10 op=21RGWGetObj_ObjStore_S3
2021-05-28 13:37:32.212 7f9fb52ae700  2 req 2 0.012s s3:get_obj verifying requester
2021-05-28 13:37:32.212 7f9fb52ae700 20 req 2 0.012s s3:get_obj rgw::auth::StrategyRegistry::s3_main_strategy_t: trying rgw::auth::s3::AWSAuthStrategy
2021-05-28 13:37:32.212 7f9fb52ae700 20 req 2 0.012s s3:get_obj rgw::auth::s3::AWSAuthStrategy: trying rgw::auth::s3::S3AnonymousEngine
2021-05-28 13:37:32.212 7f9fb52ae700 20 req 2 0.012s s3:get_obj rgw::auth::s3::S3AnonymousEngine granted access
2021-05-28 13:37:32.212 7f9fb52ae700 20 req 2 0.012s s3:get_obj rgw::auth::s3::AWSAuthStrategy granted access
2021-05-28 13:37:32.212 7f9fb52ae700  2 req 2 0.012s s3:get_obj normalizing buckets and tenants
2021-05-28 13:37:32.212 7f9fb52ae700 10 s->object=photo.png s->bucket=zabbix
2021-05-28 13:37:32.212 7f9fb52ae700  2 req 2 0.012s s3:get_obj init permissions
2021-05-28 13:37:32.212 7f9fb52ae700 20 get_system_obj_state: rctx=0x7f9fb52a5760 obj=.rgw:zabbix state=0x252d9a0 s->prefetch_data=0
2021-05-28 13:37:32.212 7f9fb52ae700 10 cache get: name=.rgw++zabbix : miss
2021-05-28 13:37:32.212 7f9fb52ae700 10 cache put: name=.rgw++zabbix info.flags=0x16
2021-05-28 13:37:32.212 7f9fb52ae700 10 adding .rgw++zabbix to cache LRU end
2021-05-28 13:37:32.212 7f9fb52ae700 10 updating xattr: name=user.rgw.acl bl.length()=379
2021-05-28 13:37:32.212 7f9fb52ae700 20 get_system_obj_state: s->obj_tag was set empty
2021-05-28 13:37:32.212 7f9fb52ae700 20 Read xattr: user.rgw.acl
2021-05-28 13:37:32.212 7f9fb52ae700 20 Read xattr: user.rgw.idtag
2021-05-28 13:37:32.212 7f9fb52ae700 10 cache get: name=.rgw++zabbix : type miss (requested=0x13, cached=0x16)
2021-05-28 13:37:32.212 7f9fb52ae700 20 rados->read ofs=0 len=0
2021-05-28 13:37:32.212 7f9fb52ae700 20 rados_obj.operate() r=0 bl.length=104
2021-05-28 13:37:32.212 7f9fb52ae700 10 cache put: name=.rgw++zabbix info.flags=0x13
2021-05-28 13:37:32.212 7f9fb52ae700 10 moving .rgw++zabbix to cache LRU end
2021-05-28 13:37:32.212 7f9fb52ae700 10 updating xattr: name=user.rgw.acl bl.length()=379
2021-05-28 13:37:32.212 7f9fb52ae700 20 rgw_get_bucket_info: old bucket info, bucket=zabbix[5377.18] owner a0e651b0-4962-46fe-b3e5-a4cff907ce99
2021-05-28 13:37:32.212 7f9fb52ae700 15 decode_policy Read AccessControlPolicy<AccessControlPolicy xmlns="http://s3.amazonaws.com/doc/2006-03-01/"><Owner><ID>a0e651b0-4962-46fe-b3e5-a4cff907ce99</ID><DisplayName>xxx</DisplayName></Owner><AccessControlList><Grant><Grantee xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="Group"><URI>http://acs.amazonaws.com/groups/global/AllUsers</URI></Grantee><Permission>READ</Permission></Grant><Grant><Grantee xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="Group"><URI>http://acs.amazonaws.com/groups/global/AllUsers</URI></Grantee><Permission>READ_ACP</Permission></Grant><Grant><Grantee xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="CanonicalUser"><ID>a0e651b0-4962-46fe-b3e5-a4cff907ce99</ID><DisplayName>xxx</DisplayName></Grantee><Permission>FULL_CONTROL</Permission></Grant></AccessControlList></AccessControlPolicy>
2021-05-28 13:37:32.212 7f9fb52ae700  0 req 2 0.012s NOTICE: invalid dest placement:
2021-05-28 13:37:32.212 7f9fb52ae700 10 req 2 0.012s init_permissions on zabbix[5377.18] failed, ret=-22
2021-05-28 13:37:32.212 7f9fb52ae700 20 op->ERRORHANDLER: err_no=-22 new_err_no=-22
2021-05-28 13:37:32.212 7f9fb52ae700  2 req 2 0.012s s3:get_obj op status=0
2021-05-28 13:37:32.212 7f9fb52ae700  2 req 2 0.012s s3:get_obj http status=400
2021-05-28 13:37:32.212 7f9fb52ae700  1 ====== req done req=0x7f9fb52a7750 op status=0 http_status=400 latency=0.012s ======
2021-05-28 13:37:32.212 7f9fb52ae700 20 process_request() returned -22
Actions

Also available in: Atom PDF