Project

General

Profile

Actions

Bug #47866

closed

Object not found on healthy cluster

Added by Julien Demais over 3 years ago. Updated about 1 month ago.

Status:
Resolved
Priority:
Urgent
Target version:
% Done:

0%

Source:
Tags:
Backport:
octopus
Regression:
No
Severity:
1 - critical
Reviewed:
Affected Versions:
ceph-qa-suite:
rgw
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

While using the aws cli or trying to get a file using curl, I get an error from the radosgw:

aws --endpoint=http://172.17.5.11:7480 s3 cp s3://mpd/0002/0002-dashevc-main-2s-v1.mp4 .
download failed: s3://mpd/0002/0002-dashevc-main-2s-v1.mp4 to ./0002-dashevc-main-2s-v1.mp4 An error occurred (NoSuchKey) when calling the GetObject operation: Unknown

Overall, the cluster looks healthy:

root@node1-1:~# ceph -s
  cluster:
    id:     066f558c-6789-4a93-aaf1-5af1ba01a3ad
    health: HEALTH_WARN
            noout flag(s) set

  services:
    mon: 5 daemons, quorum node1-1,node2-1,node3-1,node4-1,node5-1 (age 3d)
    mgr: node2-1(active, since 3d), standbys: node3-1, node1-1, node4-1, node5-1
    osd: 6 osds: 6 up (since 3d), 6 in (since 13d)
         flags noout
    rgw: 6 daemons active (radosgw.node1-1, radosgw.node2-1, radosgw.node3-1, radosgw.node4-1, radosgw.node5-1, radosgw.node6-1)

  task status:

  data:
    pools:   9 pools, 481 pgs
    objects: 338.74k objects, 1.2 TiB
    usage:   3.7 TiB used, 17 TiB / 21 TiB avail
    pgs:     481 active+clean

It also seems like the file is available in the bucket index:

root@node1-1:~# radosgw-admin bi list --bucket=mpd | grep 0002-dashevc-main-2s-v1.mp4
        "idx": "0002/0002-dashevc-main-2s-v1.mp4",
            "name": "0002/0002-dashevc-main-2s-v1.mp4",

root@node1-1:~# radosgw-admin object stat --bucket=mpd --object=0002/0002-dashevc-main-2s-v1.mp4
{
    "name": "0002/0002-dashevc-main-2s-v1.mp4",
    "size": 31958035,
    "policy": {
        "acl": {
            "acl_user_map": [
                {
                    "user": "perfo",
                    "acl": 15
                }
            ],
            "acl_group_map": [
                {
                    "group": 1,
                    "acl": 1
                }
            ],
            "grant_map": [
                {
                    "id": "",
                    "grant": {
                        "type": {
                            "type": 2
                        },
                        "id": "",
                        "email": "",
                        "permission": {
                            "flags": 1
                        },
                        "name": "",
                        "group": 1,
                        "url_spec": "" 
                    }
                },
                {
                    "id": "perfo",
                    "grant": {
                        "type": {
                            "type": 0
                        },
                        "id": "perfo",
                        "email": "",
                        "permission": {
                            "flags": 15
                        },
                        "name": "perfo user",
                        "group": 0,
                        "url_spec": "" 
                    }
                }
            ]
        },
        "owner": {
            "id": "perfo",
            "display_name": "perfo user" 
        }
    },
    "etag": "3bec83e90cbc05823f71d3ddd7e40f33",
    "tag": "64992646-777f-4214-9a7e-6de270ba3d54.350436.82",
    "manifest": {
        "objs": [],
        "obj_size": 31958035,
        "explicit_objs": "false",
        "head_size": 4194304,
        "max_head_size": 4194304,
        "prefix": ".kPQbrpASUP-I4PXgozEZsXh5gQgo44u_",
        "rules": [
            {
                "key": 0,
                "val": {
                    "start_part_num": 0,
                    "start_ofs": 4194304,
                    "part_size": 0,
                    "stripe_max_size": 4194304,
                    "override_prefix": "" 
                }
            }
        ],
        "tail_instance": "",
        "tail_placement": {
            "bucket": {
                "name": "mpd",
                "marker": "64992646-777f-4214-9a7e-6de270ba3d54.350469.3",
                "bucket_id": "64992646-777f-4214-9a7e-6de270ba3d54.350469.3",
                "tenant": "",
                "explicit_placement": {
                    "data_pool": "",
                    "data_extra_pool": "",
                    "index_pool": "" 
                }
            },
            "placement_rule": "default-placement" 
        },
        "begin_iter": {
            "part_ofs": 0,
            "stripe_ofs": 0,
            "ofs": 0,
            "stripe_size": 4194304,
            "cur_part_id": 0,
            "cur_stripe": 0,
            "cur_override_prefix": "",
            "location": {
                "placement_rule": "default-placement",
                "obj": {
                    "bucket": {
                        "name": "mpd",
                        "marker": "64992646-777f-4214-9a7e-6de270ba3d54.350469.3",
                        "bucket_id": "64992646-777f-4214-9a7e-6de270ba3d54.350469.3",
                        "tenant": "",
                        "explicit_placement": {
                            "data_pool": "",
                            "data_extra_pool": "",
                            "index_pool": "" 
                        }
                    },
                    "key": {
                        "name": "0002/0002-dashevc-main-2s-v1.mp4",
                        "instance": "",
                        "ns": "" 
                    }
                },
                "raw_obj": {
                    "pool": "",
                    "oid": "",
                    "loc": "" 
                },
                "is_raw": false
            }
        },
        "end_iter": {
            "part_ofs": 4194304,
            "stripe_ofs": 29360128,
            "ofs": 31958035,
            "stripe_size": 2597907,
            "cur_part_id": 0,
            "cur_stripe": 7,
            "cur_override_prefix": "",
            "location": {
                "placement_rule": "default-placement",
                "obj": {
                    "bucket": {
                        "name": "mpd",
                        "marker": "64992646-777f-4214-9a7e-6de270ba3d54.350469.3",
                        "bucket_id": "64992646-777f-4214-9a7e-6de270ba3d54.350469.3",
                        "tenant": "",
                        "explicit_placement": {
                            "data_pool": "",
                            "data_extra_pool": "",
                            "index_pool": "" 
                        }
                    },
                    "key": {
                        "name": ".kPQbrpASUP-I4PXgozEZsXh5gQgo44u_7",
                        "instance": "",
                        "ns": "shadow" 
                    }
                },
                "raw_obj": {
                    "pool": "",
                    "oid": "",
                    "loc": "" 
                },
                "is_raw": false
            }
        }
    },
    "attrs": {
        "user.rgw.content_type": "video/mp4",
        "user.rgw.pg_ver": "\u0018",
        "user.rgw.source_zone": "",
        "user.rgw.tail_tag": "64992646-777f-4214-9a7e-6de270ba3d54.350436.82",
        "user.rgw.x-amz-acl": "public-read",
        "user.rgw.x-amz-content-sha256": "2fb5ab1a1fdb43c8f07ce17e2c14237f3827f7ff952cb7818eee7994f7f98072",
        "user.rgw.x-amz-date": "20200929T151731Z" 
    }
}

Here is the osd mapping of this object's chunks:

root@node1-1:~# for i in $(seq 0 7); do ceph osd map default.rgw.buckets.data 64992646-777f-4214-9a7e-6de270ba3d54.350469.3__shadow_.kPQbrpASUP-I4PXgozEZsXh5gQgo44u_${i}; done
osdmap e2269 pool 'default.rgw.buckets.data' (11) object '64992646-777f-4214-9a7e-6de270ba3d54.350469.3__shadow_.kPQbrpASUP-I4PXgozEZsXh5gQgo44u_0' -> pg 11.a6213653 (11.53) -> up ([5,2,1], p5) acting ([5,2,1], p5)
osdmap e2269 pool 'default.rgw.buckets.data' (11) object '64992646-777f-4214-9a7e-6de270ba3d54.350469.3__shadow_.kPQbrpASUP-I4PXgozEZsXh5gQgo44u_1' -> pg 11.7615522a (11.2a) -> up ([4,3,1], p4) acting ([4,3,1], p4)
osdmap e2269 pool 'default.rgw.buckets.data' (11) object '64992646-777f-4214-9a7e-6de270ba3d54.350469.3__shadow_.kPQbrpASUP-I4PXgozEZsXh5gQgo44u_2' -> pg 11.687dc40 (11.40) -> up ([0,5,1], p0) acting ([0,5,1], p0)
osdmap e2269 pool 'default.rgw.buckets.data' (11) object '64992646-777f-4214-9a7e-6de270ba3d54.350469.3__shadow_.kPQbrpASUP-I4PXgozEZsXh5gQgo44u_3' -> pg 11.ceadb28d (11.8d) -> up ([3,2,5], p3) acting ([3,2,5], p3)
osdmap e2269 pool 'default.rgw.buckets.data' (11) object '64992646-777f-4214-9a7e-6de270ba3d54.350469.3__shadow_.kPQbrpASUP-I4PXgozEZsXh5gQgo44u_4' -> pg 11.81caf2f4 (11.f4) -> up ([5,1,0], p5) acting ([5,1,0], p5)
osdmap e2269 pool 'default.rgw.buckets.data' (11) object '64992646-777f-4214-9a7e-6de270ba3d54.350469.3__shadow_.kPQbrpASUP-I4PXgozEZsXh5gQgo44u_5' -> pg 11.b5b8aebb (11.bb) -> up ([5,2,1], p5) acting ([5,2,1], p5)
osdmap e2269 pool 'default.rgw.buckets.data' (11) object '64992646-777f-4214-9a7e-6de270ba3d54.350469.3__shadow_.kPQbrpASUP-I4PXgozEZsXh5gQgo44u_6' -> pg 11.1a9d8fbc (11.bc) -> up ([3,0,2], p3) acting ([3,0,2], p3)
osdmap e2269 pool 'default.rgw.buckets.data' (11) object '64992646-777f-4214-9a7e-6de270ba3d54.350469.3__shadow_.kPQbrpASUP-I4PXgozEZsXh5gQgo44u_7' -> pg 11.aa597da7 (11.a7) -> up ([5,0,1], p5) acting ([5,0,1], p5)

root@node1-1:~# ceph osd map default.rgw.buckets.data 64992646-777f-4214-9a7e-6de270ba3d54.350469.3_0002-dashevc-main-2s-v1.mp4
osdmap e2269 pool 'default.rgw.buckets.data' (11) object '64992646-777f-4214-9a7e-6de270ba3d54.350469.3_0002-dashevc-main-2s-v1.mp4' -> pg 11.774f69c0 (11.c0) -> up ([2,4,5], p2) acting ([2,4,5], p2)

root@node1-1:~# ceph osd map default.rgw.buckets.data 64992646-777f-4214-9a7e-6de270ba3d54.350469.3_0002/0002-dashevc-main-2s-v1.mp4
osdmap e2269 pool 'default.rgw.buckets.data' (11) object '64992646-777f-4214-9a7e-6de270ba3d54.350469.3_0002/0002-dashevc-main-2s-v1.mp4' -> pg 11.4015493 (11.93) -> up ([0,1,3], p0) acting ([0,1,3], p0)

However, from the radosgw logs, it seems like some objects are seen as missing on the primary osds:

2020-10-12T10:47:01.585+0000 7f9bd7f57700 20 rados->get_obj_iterate_cb oid=64992646-777f-4214-9a7e-6de270ba3d54.350469.3__shadow_.kPQbrpASUP-I4PXgozEZsXh5gQgo44u_3 obj-ofs=12582912 read_ofs=0 len=4194304
2020-10-12T10:47:01.585+0000 7f9bd7f57700  1 -- 172.17.5.11:0/1875369112 --> [v2:172.17.5.31:6800/216163,v1:172.17.5.31:6801/216163] -- osd_op(unknown.0.0:449201 11.8d 11:b14db573:::64992646-777f-4214-9a7e-6de270ba3d54.350469.3__shadow_.kPQbrpASUP-I4PXgozEZsXh5gQgo44u_3:head [read 0~4194304] snapc 0=[] ondisk+read+known_if_redirected e2263) v8 -- 0x7f9c68024720 con 0x5603cb2e4290
2020-10-12T10:47:01.585+0000 7f9bd7f57700 20 RGWObjManifest::operator++(): rule->part_size=0 rules.size()=1
2020-10-12T10:47:01.585+0000 7f9bd7f57700 20 RGWObjManifest::operator++(): result: ofs=16777216 stripe_ofs=16777216 part_ofs=4194304 rule->part_size=0
2020-10-12T10:47:01.585+0000 7f9cc614b700  1 -- 172.17.5.11:0/1875369112 <== osd.3 v2:172.17.5.31:6800/216163 69444 ==== osd_op_reply(449194 64992646-777f-4214-9a7e-6de270ba3d54.350469.3__shadow_.kPQbrpASUP-I4PXgozEZsXh5gQgo44u_6 [read 0~4194304] v0'0 uv0 ondisk = -2 ((2) No such file or directory)) v8 ==== 232+0+0 (crc 0 0 0) 0x7f9cbc0e6070 con 0x5603cb2e4290
2020-10-12T10:47:01.586+0000 7f9cc614b700  1 -- 172.17.5.11:0/1875369112 <== osd.3 v2:172.17.5.31:6800/216163 69445 ==== osd_op_reply(449201 64992646-777f-4214-9a7e-6de270ba3d54.350469.3__shadow_.kPQbrpASUP-I4PXgozEZsXh5gQgo44u_3 [read 0~4194304] v0'0 uv0 ondisk = -2 ((2) No such file or directory)) v8 ==== 232+0+0 (crc 0 0 0) 0x7f9cbc0e6070 con 0x5603cb2e4290
2020-10-12T10:47:01.587+0000 7f9cc694c700  1 -- 172.17.5.11:0/1875369112 <== osd.4 v2:172.17.5.51:6800/219435 59157 ==== osd_op_reply(449199 64992646-777f-4214-9a7e-6de270ba3d54.350469.3__shadow_.kPQbrpASUP-I4PXgozEZsXh5gQgo44u_1 [read 0~4194304] v0'0 uv0 ondisk = -2 ((2) No such file or directory)) v8 ==== 232+0+0 (crc 0 0 0) 0x7f9cb80d4040 con 0x5603cb2d9710
2020-10-12T10:47:01.587+0000 7f9cc694c700  1 -- 172.17.5.11:0/1875369112 <== osd.5 v2:172.17.5.21:6800/236481 103456 ==== osd_op_reply(449195 64992646-777f-4214-9a7e-6de270ba3d54.350469.3__shadow_.kPQbrpASUP-I4PXgozEZsXh5gQgo44u_4 [read 0~4194304] v0'0 uv0 ondisk = -2 ((2) No such file or directory)) v8 ==== 232+0+0 (crc 0 0 0) 0x7f9cb80d4040 con 0x7f9ca4010b10
2020-10-12T10:47:01.587+0000 7f9cc694c700  1 -- 172.17.5.11:0/1875369112 <== osd.5 v2:172.17.5.21:6800/236481 103457 ==== osd_op_reply(449196 64992646-777f-4214-9a7e-6de270ba3d54.350469.3__shadow_.kPQbrpASUP-I4PXgozEZsXh5gQgo44u_5 [read 0~4194304] v0'0 uv0 ondisk = -2 ((2) No such file or directory)) v8 ==== 232+0+0 (crc 0 0 0) 0x7f9cb80d4040 con 0x7f9ca4010b10
2020-10-12T10:47:01.587+0000 7f9cc694c700  1 -- 172.17.5.11:0/1875369112 <== osd.5 v2:172.17.5.21:6800/236481 103458 ==== osd_op_reply(449198 64992646-777f-4214-9a7e-6de270ba3d54.350469.3__shadow_.kPQbrpASUP-I4PXgozEZsXh5gQgo44u_7 [read 0~2597907] v0'0 uv0 ondisk = -2 ((2) No such file or directory)) v8 ==== 232+0+0 (crc 0 0 0) 0x7f9cb80d4040 con 0x7f9ca4010b10
2020-10-12T10:47:01.588+0000 7f9bd2f4d700  2 req 48 0.010999928s s3:get_obj completing
2020-10-12T10:47:01.588+0000 7f9bd2f4d700  2 req 48 0.010999928s s3:get_obj op status=-2
2020-10-12T10:47:01.588+0000 7f9bd2f4d700  2 req 48 0.010999928s s3:get_obj http status=404
2020-10-12T10:47:01.588+0000 7f9bd2f4d700  1 ====== req done req=0x7f9cb46f57d0 op status=-2 http_status=404 latency=0.010999928s ======
2020-10-12T10:47:01.588+0000 7f9bd2f4d700  1 beast: 0x7f9cb46f57d0: 172.17.2.128 - - [2020-10-12T10:47:01.588700+0000] "GET /mpd/0002/0002-dashevc-main-2s-v1.mp4 HTTP/1.1" 404 214 - "aws-cli/1.18.156 Python/2.7.17 Linux/4.15.0-118-generic botocore/1.18.15" bytes=16777216-25165823
2020-10-12T10:47:01.588+0000 7f9bc8f39700  2 req 49 0.009999935s s3:get_obj completing
2020-10-12T10:47:01.588+0000 7f9bc8f39700  2 req 49 0.009999935s s3:get_obj op status=-2
2020-10-12T10:47:01.588+0000 7f9bc8f39700  2 req 49 0.009999935s s3:get_obj http status=404
2020-10-12T10:47:01.588+0000 7f9bc8f39700  1 ====== req done req=0x7f9cb46747d0 op status=-2 http_status=404 latency=0.009999935s ======
2020-10-12T10:47:01.588+0000 7f9bc8f39700  1 beast: 0x7f9cb46747d0: 172.17.2.128 - - [2020-10-12T10:47:01.588700+0000] "GET /mpd/0002/0002-dashevc-main-2s-v1.mp4 HTTP/1.1" 404 214 - "aws-cli/1.18.156 Python/2.7.17 Linux/4.15.0-118-generic botocore/1.18.15" bytes=25165824-
2020-10-12T10:47:01.597+0000 7f9cc694c700  1 -- 172.17.5.11:0/1875369112 <== osd.0 v2:172.17.5.41:6800/217855 3380 ==== osd_op_reply(449197 64992646-777f-4214-9a7e-6de270ba3d54.350469.3_0002/0002-dashevc-main-2s-v1.mp4 [cmpxattr (47) op 1 mode 1,read 0~4194304 out=4194304b] v0'0 uv25 ondisk = 0) v8 ==== 264+0+4194304 (crc 0 0 0) 0x7f9cb80d4040 con 0x7f9cb801d430
2020-10-12T10:47:01.597+0000 7f9cc694c700  1 -- 172.17.5.11:0/1875369112 <== osd.0 v2:172.17.5.41:6800/217855 3381 ==== osd_op_reply(449200 64992646-777f-4214-9a7e-6de270ba3d54.350469.3__shadow_.kPQbrpASUP-I4PXgozEZsXh5gQgo44u_2 [read 0~4194304] v0'0 uv0 ondisk = -2 ((2) No such file or directory)) v8 ==== 232+0+0 (crc 0 0 0) 0x7f9cb80d4040 con 0x7f9cb801d430
2020-10-12T10:47:01.598+0000 7f9bcd742700  2 req 47 0.020999862s s3:get_obj completing
2020-10-12T10:47:01.598+0000 7f9bcd742700  2 req 47 0.020999862s s3:get_obj op status=-2
2020-10-12T10:47:01.598+0000 7f9bcd742700  2 req 47 0.020999862s s3:get_obj http status=404
2020-10-12T10:47:01.598+0000 7f9bcd742700  1 ====== req done req=0x7f9cb47767d0 op status=-2 http_status=404 latency=0.020999862s ======
2020-10-12T10:47:01.598+0000 7f9bcd742700  1 beast: 0x7f9cb47767d0: 172.17.2.128 - - [2020-10-12T10:47:01.598700+0000] "GET /mpd/0002/0002-dashevc-main-2s-v1.mp4 HTTP/1.1" 404 214 - "aws-cli/1.18.156 Python/2.7.17 Linux/4.15.0-118-generic botocore/1.18.15" bytes=0-8388607
2020-10-12T10:47:01.598+0000 7f9bcef45700  2 req 50 0.019999869s s3:get_obj completing
2020-10-12T10:47:01.598+0000 7f9bcef45700  2 req 50 0.019999869s s3:get_obj op status=-2
2020-10-12T10:47:01.598+0000 7f9bcef45700  2 req 50 0.019999869s s3:get_obj http status=404
2020-10-12T10:47:01.598+0000 7f9bcef45700  1 ====== req done req=0x7f9cb45f37d0 op status=-2 http_status=404 latency=0.019999869s ======
2020-10-12T10:47:01.598+0000 7f9bcef45700  1 beast: 0x7f9cb45f37d0: 172.17.2.128 - - [2020-10-12T10:47:01.598700+0000] "GET /mpd/0002/0002-dashevc-main-2s-v1.mp4 HTTP/1.1" 404 214 - "aws-cli/1.18.156 Python/2.7.17 Linux/4.15.0-118-generic botocore/1.18.15" bytes=8388608-16777215

Here are logs of osd.5:

2020-10-13T06:50:53.767+0000 7f6bbe584700 20 osd.5 pg_epoch: 2269 pg[11.f4( v 2240'13384 (2132'12090,2240'13384] local-lis/les=2243/2244 n=1273 ec=223/171 lis/c=2243/2243 les/c/f=2244/2244/0 sis=2243) [5,1,0] r=0 lpr=2243 crt=2240'13384 lcod 0'0 mlcod 0'0 active+clean] do_op: op osd_op(client.3854431.0:580190 11.f4 11:2f4f5381:::64992646-777f-4214-9a7e-6de270ba3d54.350469.3__shadow_.kPQbrpASUP-I4PXgozEZsXh5gQgo44u_4:head [read 0~4194304] snapc 0=[] ondisk+read+known_if_redirected e2269) v8
2020-10-13T06:50:53.767+0000 7f6bbe584700 20 osd.5 pg_epoch: 2269 pg[11.f4( v 2240'13384 (2132'12090,2240'13384] local-lis/les=2243/2244 n=1273 ec=223/171 lis/c=2243/2243 les/c/f=2244/2244/0 sis=2243) [5,1,0] r=0 lpr=2243 crt=2240'13384 lcod 0'0 mlcod 0'0 active+clean] op_has_sufficient_caps session=0x557de4a3a780 pool=11 (default.rgw.buckets.data ) pool_app_metadata={rgw={}} need_read_cap=1 need_write_cap=0 classes=[] -> yes
2020-10-13T06:50:53.767+0000 7f6bbe584700 10 osd.5 pg_epoch: 2269 pg[11.f4( v 2240'13384 (2132'12090,2240'13384] local-lis/les=2243/2244 n=1273 ec=223/171 lis/c=2243/2243 les/c/f=2244/2244/0 sis=2243) [5,1,0] r=0 lpr=2243 crt=2240'13384 lcod 0'0 mlcod 0'0 active+clean] do_op osd_op(client.3854431.0:580190 11.f4 11:2f4f5381:::64992646-777f-4214-9a7e-6de270ba3d54.350469.3__shadow_.kPQbrpASUP-I4PXgozEZsXh5gQgo44u_4:head [read 0~4194304] snapc 0=[] ondisk+read+known_if_redirected e2269) v8 may_read -> read-ordered flags ondisk+read+known_if_redirected
2020-10-13T06:50:53.767+0000 7f6bbe584700 10 osd.5 pg_epoch: 2269 pg[11.f4( v 2240'13384 (2132'12090,2240'13384] local-lis/les=2243/2244 n=1273 ec=223/171 lis/c=2243/2243 les/c/f=2244/2244/0 sis=2243) [5,1,0] r=0 lpr=2243 crt=2240'13384 lcod 0'0 mlcod 0'0 active+clean] get_object_context: obc NOT found in cache: 11:2f4f5381:::64992646-777f-4214-9a7e-6de270ba3d54.350469.3__shadow_.kPQbrpASUP-I4PXgozEZsXh5gQgo44u_4:head
2020-10-13T06:50:53.768+0000 7f6bbe584700 10 osd.5 pg_epoch: 2269 pg[11.f4( v 2240'13384 (2132'12090,2240'13384] local-lis/les=2243/2244 n=1273 ec=223/171 lis/c=2243/2243 les/c/f=2244/2244/0 sis=2243) [5,1,0] r=0 lpr=2243 crt=2240'13384 lcod 0'0 mlcod 0'0 active+clean] get_object_context: no obc for soid 11:2f4f5381:::64992646-777f-4214-9a7e-6de270ba3d54.350469.3__shadow_.kPQbrpASUP-I4PXgozEZsXh5gQgo44u_4:head and !can_create
2020-10-13T06:50:53.768+0000 7f6bbe584700 20 osd.5 pg_epoch: 2269 pg[11.f4( v 2240'13384 (2132'12090,2240'13384] local-lis/les=2243/2244 n=1273 ec=223/171 lis/c=2243/2243 les/c/f=2244/2244/0 sis=2243) [5,1,0] r=0 lpr=2243 crt=2240'13384 lcod 0'0 mlcod 0'0 active+clean] do_op: find_object_context got error -2
2020-10-13T06:50:53.768+0000 7f6bbe584700 10 osd.5 2269 dequeue_op 0x557e5442ef20 finish
2020-10-13T06:50:53.768+0000 7f6bbe584700 20 osd.5 op_wq(4) _process empty q, waiting
2020-10-13T06:50:53.768+0000 7f6bda058700 15 osd.5 2269 enqueue_op 0x557e535794a0 prio 63 cost 0 latency 0.000014 epoch 2269 osd_op(client.3854431.0:580191 11.bb 11.b5b8aebb (undecoded) ondisk+read+known_if_redirected e2269) v8
2020-10-13T06:50:53.768+0000 7f6bda058700 20 osd.5 op_wq(3) _enqueue OpSchedulerItem(11.bb PGOpItem(op=osd_op(client.3854431.0:580191 11.bb 11.b5b8aebb (undecoded) ondisk+read+known_if_redirected e2269) v8) prio 63 cost 0 e2269)
2020-10-13T06:50:53.768+0000 7f6bbad7d700 20 osd.5 op_wq(3) _process 11.bb to_process <> waiting <> waiting_peering {}
2020-10-13T06:50:53.768+0000 7f6bbad7d700 20 osd.5 op_wq(3) _process OpSchedulerItem(11.bb PGOpItem(op=osd_op(client.3854431.0:580191 11.bb 11.b5b8aebb (undecoded) ondisk+read+known_if_redirected e2269) v8) prio 63 cost 0 e2269) queued
2020-10-13T06:50:53.768+0000 7f6bbad7d700 20 osd.5 op_wq(3) _process 11.bb to_process <OpSchedulerItem(11.bb PGOpItem(op=osd_op(client.3854431.0:580191 11.bb 11.b5b8aebb (undecoded) ondisk+read+known_if_redirected e2269) v8) prio 63 cost 0 e2269)> waiting <> waiting_peering {}
2020-10-13T06:50:53.768+0000 7f6bbad7d700 20 osd.5 op_wq(3) _process OpSchedulerItem(11.bb PGOpItem(op=osd_op(client.3854431.0:580191 11.bb 11.b5b8aebb (undecoded) ondisk+read+known_if_redirected e2269) v8) prio 63 cost 0 e2269) pg 0x557dde47c800
2020-10-13T06:50:53.768+0000 7f6bbed85700 20 osd.5 op_wq(3) _process empty q, waiting
2020-10-13T06:50:53.768+0000 7f6bbad7d700 10 osd.5 2269 dequeue_op 0x557e535794a0 prio 63 cost 0 latency 0.000194 osd_op(client.3854431.0:580191 11.bb 11.b5b8aebb (undecoded) ondisk+read+known_if_redirected e2269) v8 pg pg[11.bb( v 2153'13237 (2021'11944,2153'13237] local-lis/les=2243/2244 n=1304 ec=223/171 lis/c=2243/2243 les/c/f=2244/2244/0 sis=2243) [5,2,1] r=0 lpr=2243 crt=2153'13237 lcod 0'0 mlcod 0'0 active+clean]
2020-10-13T06:50:53.768+0000 7f6bbad7d700 10 osd.5 pg_epoch: 2269 pg[11.bb( v 2153'13237 (2021'11944,2153'13237] local-lis/les=2243/2244 n=1304 ec=223/171 lis/c=2243/2243 les/c/f=2244/2244/0 sis=2243) [5,2,1] r=0 lpr=2243 crt=2153'13237 lcod 0'0 mlcod 0'0 active+clean] _handle_message: 0x557e535794a0
2020-10-13T06:50:53.768+0000 7f6bbad7d700 20 osd.5 pg_epoch: 2269 pg[11.bb( v 2153'13237 (2021'11944,2153'13237] local-lis/les=2243/2244 n=1304 ec=223/171 lis/c=2243/2243 les/c/f=2244/2244/0 sis=2243) [5,2,1] r=0 lpr=2243 crt=2153'13237 lcod 0'0 mlcod 0'0 active+clean] do_op: op osd_op(client.3854431.0:580191 11.bb 11:dd751dad:::64992646-777f-4214-9a7e-6de270ba3d54.350469.3__shadow_.kPQbrpASUP-I4PXgozEZsXh5gQgo44u_5:head [read 0~4194304] snapc 0=[] ondisk+read+known_if_redirected e2269) v8
2020-10-13T06:50:53.768+0000 7f6bbad7d700 20 osd.5 pg_epoch: 2269 pg[11.bb( v 2153'13237 (2021'11944,2153'13237] local-lis/les=2243/2244 n=1304 ec=223/171 lis/c=2243/2243 les/c/f=2244/2244/0 sis=2243) [5,2,1] r=0 lpr=2243 crt=2153'13237 lcod 0'0 mlcod 0'0 active+clean] op_has_sufficient_caps session=0x557de4a3a780 pool=11 (default.rgw.buckets.data ) pool_app_metadata={rgw={}} need_read_cap=1 need_write_cap=0 classes=[] -> yes
2020-10-13T06:50:53.768+0000 7f6bbad7d700 10 osd.5 pg_epoch: 2269 pg[11.bb( v 2153'13237 (2021'11944,2153'13237] local-lis/les=2243/2244 n=1304 ec=223/171 lis/c=2243/2243 les/c/f=2244/2244/0 sis=2243) [5,2,1] r=0 lpr=2243 crt=2153'13237 lcod 0'0 mlcod 0'0 active+clean] do_op osd_op(client.3854431.0:580191 11.bb 11:dd751dad:::64992646-777f-4214-9a7e-6de270ba3d54.350469.3__shadow_.kPQbrpASUP-I4PXgozEZsXh5gQgo44u_5:head [read 0~4194304] snapc 0=[] ondisk+read+known_if_redirected e2269) v8 may_read -> read-ordered flags ondisk+read+known_if_redirected
2020-10-13T06:50:53.768+0000 7f6bbad7d700 10 osd.5 pg_epoch: 2269 pg[11.bb( v 2153'13237 (2021'11944,2153'13237] local-lis/les=2243/2244 n=1304 ec=223/171 lis/c=2243/2243 les/c/f=2244/2244/0 sis=2243) [5,2,1] r=0 lpr=2243 crt=2153'13237 lcod 0'0 mlcod 0'0 active+clean] get_object_context: obc NOT found in cache: 11:dd751dad:::64992646-777f-4214-9a7e-6de270ba3d54.350469.3__shadow_.kPQbrpASUP-I4PXgozEZsXh5gQgo44u_5:head
2020-10-13T06:50:53.769+0000 7f6bbad7d700 10 osd.5 pg_epoch: 2269 pg[11.bb( v 2153'13237 (2021'11944,2153'13237] local-lis/les=2243/2244 n=1304 ec=223/171 lis/c=2243/2243 les/c/f=2244/2244/0 sis=2243) [5,2,1] r=0 lpr=2243 crt=2153'13237 lcod 0'0 mlcod 0'0 active+clean] get_object_context: no obc for soid 11:dd751dad:::64992646-777f-4214-9a7e-6de270ba3d54.350469.3__shadow_.kPQbrpASUP-I4PXgozEZsXh5gQgo44u_5:head and !can_create
2020-10-13T06:50:53.769+0000 7f6bbad7d700 20 osd.5 pg_epoch: 2269 pg[11.bb( v 2153'13237 (2021'11944,2153'13237] local-lis/les=2243/2244 n=1304 ec=223/171 lis/c=2243/2243 les/c/f=2244/2244/0 sis=2243) [5,2,1] r=0 lpr=2243 crt=2153'13237 lcod 0'0 mlcod 0'0 active+clean] do_op: find_object_context got error -2

What surprises me though is that the file is indeed present on the cluster:

root@node1-1:~# rados -p default.rgw.buckets.data get 64992646-777f-4214-9a7e-6de270ba3d54.350469.3_0002/0002-dashevc-main-2s-v1.mp4 out

The out file is well retrieved and fine so it looks like the file is still there on the cluster.
The issue seems to be with the "shadow" files.


Files

radosgw-admin-stat.json.gz (95.2 KB) radosgw-admin-stat.json.gz Janek Bevendorff, 11/12/2020 05:04 PM
ceph.conf (14 KB) ceph.conf Janek Bevendorff, 11/12/2020 05:04 PM
radosgw-admin-stat.json.gz (18.6 KB) radosgw-admin-stat.json.gz Janek Bevendorff, 11/13/2020 08:17 AM

Related issues 3 (1 open2 closed)

Related to rgw - Feature #49063: rgw: tooling to locate rgw objects with missing rados componentsResolvedJ. Eric Ivancich

Actions
Precedes rgw - Bug #48330: rgw: fix GC defer process and clean up from immediate fixNew

Actions
Copied to rgw - Backport #48331: octopus: Object not found on healthy clusterResolvedJ. Eric IvancichActions
Actions

Also available in: Atom PDF