Project

General

Profile

Bug #18260

When uploading a large number of objects constantly, the objects number of bucket is not correct!

Added by wenjun jing almost 2 years ago. Updated 6 days ago.

Status:
In Progress
Priority:
High
Assignee:
Target version:
-
Start date:
12/15/2016
Due date:
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
rgw

Description

When finishing uploading a large number of objects constantly by cosbench, the objects number of bucket queried by radosgw-admin bucket stats --bucket=<bucket_name> is not consistent with its realistic objects number, searched by head_object API.

The cosbench, itself may not guarantee the ojects num, which was uploaded, consistent with the number we set. But it also exiting some objects, which cannot list by s3 API or swift API, could queried by head object API.

So, it can be preliminarily ascertain that there may exists a bug in the object index. When uploading a large number of objects constantly, the phenomenon discribed can be appeared with multi rgws to put.


Related issues

Related to rgw - Bug #22838: slave zone, `bucket stats`,the result of 'num_objects' is incorrect. Duplicate 01/31/2018

History

#1 Updated by Sage Weil almost 2 years ago

  • Project changed from Ceph to rgw
  • Category deleted (22)

#2 Updated by Yehuda Sadeh almost 2 years ago

  • Assignee set to Casey Bodley

#3 Updated by zhang sw almost 2 years ago

Add the test result:
each zone has one RGW with config
'rgw_num_rados_handles=2'. I use cosbench to upload 50,000 object ,
each object is 4M, the number of workers is 10.
After the data sync is finished(I use the command 'radosgw-admin
bucket sync status --bucket=<name>' and 'radosgw-admin sync status' to
check that)
Below is the bucket stats result:

Master zone:
[root@ceph36 ~]# radosgw-admin bucket stats --bucket=shard23 {
"bucket": "shard23",
"pool": "master.rgw.buckets.data",
"index_pool": "master.rgw.buckets.index",
"id": "cc3594b6-6282-421a-a3d5-3f7f3fa7efd0.702243.1",
"marker": "cc3594b6-6282-421a-a3d5-3f7f3fa7efd0.702243.1",
"owner": "zsw-test",
"ver": "0#50039,1#49964",
"master_ver": "0#0,1#0",
"mtime": "2016-12-16 10:58:56.174049",
"max_marker": "0#00000050038.56144.3,1#00000049963.56109.3",
"usage": {
"rgw.main": {
"size_kb": 195300782,
"size_kb_actual": 195388276,
"num_objects": 50000
}
},
"bucket_quota": {
"enabled": false,
"max_size_kb": -1,
"max_objects": -1
}
}

Slave zone:
[root@ceph05 ~]# radosgw-admin bucket stats --bucket=shard23 {
"bucket": "shard23",
"pool": "slave.rgw.buckets.data",
"index_pool": "slave.rgw.buckets.index",
"id": "cc3594b6-6282-421a-a3d5-3f7f3fa7efd0.702243.1",
"marker": "cc3594b6-6282-421a-a3d5-3f7f3fa7efd0.702243.1",
"owner": "zsw-test",
"ver": "0#51172,1#51070",
"master_ver": "0#0,1#0",
"mtime": "2016-12-16 10:58:56.174049",
"max_marker": "0#00000051171.112193.3,1#00000051069.79607.3",
"usage": {
"rgw.main": {
"size_kb": 194769532,
"size_kb_actual": 194856788,
"num_objects": 49861
}
},
"bucket_quota": {
"enabled": false,
"max_size_kb": -1,
"max_objects": -1
}
}

We can see that in slave zone, object number in bucket stats is less
than master. But if I use s3cmd to list the bucket in slave zone, the
result is right:
[root@ceph05 ~]# s3cmd ls s3://shard23 | wc -l
50000

And after I list the bucket with s3cmd, I use the bucket stats in
slave zone again:
[root@ceph05 ~]# radosgw-admin bucket stats --bucket=shard23 {
"bucket": "shard23",
"pool": "slave.rgw.buckets.data",
"index_pool": "slave.rgw.buckets.index",
"id": "cc3594b6-6282-421a-a3d5-3f7f3fa7efd0.702243.1",
"marker": "cc3594b6-6282-421a-a3d5-3f7f3fa7efd0.702243.1",
"owner": "zsw-test",
"ver": "0#51182,1#51079",
"master_ver": "0#0,1#0",
"mtime": "2016-12-16 10:58:56.174049",
"max_marker": "0#00000051181.112203.9,1#00000051078.79616.9",
"usage": {
"rgw.main": {
"size_kb": 194769532,
"size_kb_actual": 194856788,
"num_objects": 50000
}
},
"bucket_quota": {
"enabled": false,
"max_size_kb": -1,
"max_objects": -1
}
}

We can see that the num_objects is right now. (According to the code ,
list bucket will send the 'dir_suggest_changes' request to the osd. I
think this is why the number is right now.)
If each zone have two rgw with config 'rgw_num_rados_handles=1', the
difference between the bucket stats is smaller, from 10 to 40.
If each zone have one rgw with config 'rgw_num_rados_handles=1', the
bucket stats are same.

#4 Updated by Yehuda Sadeh over 1 year ago

  • Priority changed from Urgent to High

#5 Updated by Zhandong Guo over 1 year ago

Do we have plan to fix this?

#6 Updated by Casey Bodley over 1 year ago

Have not had a chance to reproduce. I will investigate and propose a fix.

#7 Updated by Orit Wasserman about 1 year ago

  • Status changed from New to In Progress

#8 Updated by Yehuda Sadeh about 1 year ago

could be related to the quota stats issue (#20661), need to retest it with the fixes for that in.

#9 Updated by Casey Bodley 11 months ago

  • Assignee changed from Casey Bodley to Eric Ivancich

#10 Updated by Mark Kogan 10 months ago

  • Assignee changed from Eric Ivancich to Mark Kogan

#11 Updated by Matt Benjamin 9 months ago

  • Assignee changed from Mark Kogan to Eric Ivancich

@eric, could you try to evaluate this? looks novemberish

#12 Updated by Eric Ivancich 9 months ago

I looked through the code and I do not believe this issue is related to https://bugzilla.redhat.com/show_bug.cgi?id=1526792 , which had bad per-pool object counts, because the sources of information are different. (And therefore the "mitigation" as described in http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-October/014045.html and verified in aforementioned bz would not be applicable.)

I think it would be interesting to pursue Yehuda's suggestion (#8 above -- https://tracker.ceph.com/issues/18260#note-8) and see if we can still reproduce.

#13 Updated by Eric Ivancich 8 months ago

Mark Kogan reported on Jan 24 that he's trying to reproduce the issue. He tried to reproduce the issue with rgw_num_rados_handles=1 and it did not reproduce. He's now trying with rgw_num_rados_handles=2.

#14 Updated by Mark Kogan 8 months ago

The issues does not reproduce with current upstream ceph luminous

On several itterations of running cosbench workload of 50000 * 4MB objects, 10 workers
on a dev test cluster created with mstart consisting of
2 zones, 1 rgw in each zone, rgw_num_rados_handles=2

cosbench is generating objects on c1 and c2 is syncronizing.
Both zones report the same number of objects 50000 in the cosbench bucket (s3testqwer011)
and sync status is caught up

./src/mrun c1 radosgw-admin bucket stats --bucket=s3testqwer011
...
  "usage": {
    "rgw.main": {
      "size": 200000000000,
      "size_actual": 200089600000,
      "size_utilized": 200000000000,
      "size_kb": 195312500,
      "size_kb_actual": 195400000,
      "size_kb_utilized": 195312500,
      "num_objects": 50000
    }
...

../src/mrun c2 radosgw-admin bucket stats --bucket=s3testqwer011
...
    "usage": {
        "rgw.main": {
            "size": 200000000000,
            "size_actual": 200089600000,
            "size_utilized": 200000000000,
            "size_kb": 195312500,
            "size_kb_actual": 195400000,
            "size_kb_utilized": 195312500,
            "num_objects": 50000
        }
...        

../src/mrun c2 radosgw-admin sync status 2>/dev/null
          realm f941dcdf-bacf-4086-89f3-a30855d5c517 (gold)
      zonegroup b4e907fb-31df-4899-b600-73aba18266b7 (us)
           zone c1d43af2-6440-402d-9d6f-9c75320c2d64 (us-west)
  metadata sync syncing
                full sync: 0/64 shards
                incremental sync: 64/64 shards
                metadata is caught up with master
      data sync source: 02fb444b-357b-4aaf-996d-8627977ea06b (us-east)
                        syncing
                        full sync: 0/128 shards
                        incremental sync: 128/128 shards
                        data is caught up with source

Detailed description of the reproduction flow:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

MON=3 OSD=1 MDS=0 MGR=1 RGW=0 ../src/mstart.sh c1 -n --bluestore -o bluestore_block_size=536870912000
../src/mrun c1 radosgw-admin realm create --rgw-realm=gold --default
../src/mrun c1 radosgw-admin zonegroup create --rgw-zonegroup=us --endpoints=http://localhost:8000 --master --default
../src/mrun c1 radosgw-admin zone create --rgw-zonegroup=us --rgw-zone=us-east --endpoints=http://localhost:8000 --access-key a2345678901234567890 --secret a234567890123456789012345678901234567890 --master --default
../src/mrun c1 radosgw-admin user create --uid=realm.admin --display-name=RealmAdmin --access-key a2345678901234567890 --secret a234567890123456789012345678901234567890 --system

../src/mrun c1 radosgw-admin period update --commit

### cosbench user
../src/mrun c1 radosgw-admin user create --display-name="Test Id" --uid=testid --access-key b2345678901234567890 --secret b234567890123456789012345678901234567890

vim ./run/c1/ceph.conf
    . . .
    rgw_num_rados_handles=2
    . . .

../src/mrgw.sh c1 8000 --debug-rgw=20 --debug-ms=1 --rgw-zone=us-east

cat ./run/c1/out/radosgw.8000.log|grep handles
    ... num_handles=2 ...

MON=3 OSD=1 MDS=0 MGR=1 RGW=0 ../src/mstart.sh c2 -n --bluestore -o bluestore_block_size=536870912000
../src/mrun c2 radosgw-admin realm pull --url=http://localhost:8000 --access-key a2345678901234567890 --secret a234567890123456789012345678901234567890 --default
../src/mrun c2 radosgw-admin period pull --url=http://localhost:8000 --access-key a2345678901234567890 --secret a234567890123456789012345678901234567890 --default
../src/mrun c2 radosgw-admin zone create --rgw-zonegroup=us --rgw-zone=us-west  --endpoints=http://localhost:8001 --access-key=a2345678901234567890 --secret=a234567890123456789012345678901234567890 --default

../src/mrun c2 radosgw-admin period update --commit

vim ./run/c2/ceph.conf
    . . .
    rgw_num_rados_handles=2
    . . .

../src/mrgw.sh c2 8001 --debug-rgw=20 --debug-ms=1 --rgw-zone=us-west

cat ./run/c2/out/radosgw.8001.log|grep handles
    ... num_handles=2 ...

### run cosbench

~/cosbench/0.4.2.c4 $ cat ./conf/s3-config-50Kobj_4M-T18260.xml
<?xml version="1.0" encoding="UTF-8" ?>
<workload name="s3-config-50Kobj_4M-T18260.xml" description="sample benchmark for s3">
  <storage type="s3" config="accesskey=b2345678901234567890;secretkey=b234567890123456789012345678901234567890;endpoint=http://192.168.39.252:8000;path_style_access=true" />
  <workflow>
    <workstage name="init">
      <work type="init" workers="1" config="cprefix=s3testqwer01;containers=r(1,1)" />
    </workstage>
    <workstage name="prepare">
      <work type="prepare" workers="10" config="cprefix=s3testqwer01;containers=r(1,1);objects=r(1,50000);sizes=c(4)MB" />
    </workstage>
<!--
    <workstage name="main">
      <work name="main" workers="8" runtime="600">
              <operation type="read" ratio="100" config="cprefix=s3testqwer02;containers=u(1,2);objects=u(1,50000)" />
      </work>
    </workstage>
    <workstage name="cleanup">
      <work type="cleanup" workers="1" config="cprefix=s3testqwer02;containers=r(1,2);objects=r(1,50000)" />
    </workstage>
    <workstage name="dispose">
      <work type="dispose" workers="1" config="cprefix=s3testqwer02;containers=r(1,2)" />
    </workstage>
-->
  </workflow>
</workload>

~/cosbench/0.4.2.c4 $ ./cli.sh submit ./conf/s3-config-50Kobj_4M-T18260.xml

### check sync

../src/mrun c2 radosgw-admin sync status

../src/mrun c1 radosgw-admin bucket stats --bucket=s3testqwer011 2>/dev/null
{
  "bucket": "s3testqwer011",
  "zonegroup": "b4e907fb-31df-4899-b600-73aba18266b7",
  "placement_rule": "default-placement",
  "explicit_placement": {
    "data_pool": "",
    "data_extra_pool": "",
    "index_pool": "" 
  },
  "id": "02fb444b-357b-4aaf-996d-8627977ea06b.4302.1",
  "marker": "02fb444b-357b-4aaf-996d-8627977ea06b.4302.1",
  "index_type": "Normal",
  "owner": "testid",
  "ver": "0#100005",
  "master_ver": "0#0",
  "mtime": "2018-01-25 03:45:48.121300",
  "max_marker": "0#00000100004.100005.1",
  "usage": {
    "rgw.main": {
      "size": 200000000000,
      "size_actual": 200089600000,
      "size_utilized": 200000000000,
      "size_kb": 195312500,
      "size_kb_actual": 195400000,
      "size_kb_utilized": 195312500,
      "num_objects": 50000
    }
  },
  "bucket_quota": {
    "enabled": false,
    "check_on_raw": false,
    "max_size": -1,
    "max_size_kb": 0,
    "max_objects": -1
  }

watch -d "../src/mrun c2 radosgw-admin bucket stats --bucket=s3testqwer011 2>/dev/null" 
{
    "bucket": "s3testqwer011",
    "zonegroup": "b4e907fb-31df-4899-b600-73aba18266b7",
    "placement_rule": "default-placement",
    "explicit_placement": {
        "data_pool": "",
        "data_extra_pool": "",
        "index_pool": "" 
    },
    "id": "02fb444b-357b-4aaf-996d-8627977ea06b.4302.1",
    "marker": "02fb444b-357b-4aaf-996d-8627977ea06b.4302.1",
    "index_type": "Normal",
    "owner": "testid",
    "ver": "0#100001",
    "master_ver": "0#0",
    "mtime": "2018-01-25 03:45:48.121300",
    "max_marker": "0#00000100000.100000.1",
    "usage": {
        "rgw.main": {
            "size": 200000000000,
            "size_actual": 200089600000,
            "size_utilized": 200000000000,
            "size_kb": 195312500,
            "size_kb_actual": 195400000,
            "size_kb_utilized": 195312500,
            "num_objects": 50000
        }
    },
    "bucket_quota": {
        "enabled": false,
        "check_on_raw": false,
        "max_size": -1,
        "max_size_kb": 0,
        "max_objects": -1
    }

../src/mrun c2 radosgw-admin sync status 2>/dev/null
          realm f941dcdf-bacf-4086-89f3-a30855d5c517 (gold)
      zonegroup b4e907fb-31df-4899-b600-73aba18266b7 (us)
           zone c1d43af2-6440-402d-9d6f-9c75320c2d64 (us-west)
  metadata sync syncing
                full sync: 0/64 shards
                incremental sync: 64/64 shards
                metadata is caught up with master
      data sync source: 02fb444b-357b-4aaf-996d-8627977ea06b (us-east)
                        syncing
                        full sync: 0/128 shards
                        incremental sync: 128/128 shards
                        data is caught up with source

#15 Updated by Mark Kogan 8 months ago

Checked on current jewel, the issue did not reproduce
num_objects is 50000 is the second zone.

$ git branch
* jewel
  master

$ ./src/mrun c2 radosgw-admin bucket stats --bucket=s3testqwer011 2>/dev/null
{
    "bucket": "s3testqwer011",
    "pool": "us-west.rgw.buckets.data",
    "index_pool": "us-west.rgw.buckets.index",
    "id": "c0caaea8-986c-426b-ae16-cabf57862f7e.4125.2",
    "marker": "c0caaea8-986c-426b-ae16-cabf57862f7e.4125.2",
    "owner": "testid",
    "ver": "0#988137",
    "master_ver": "0#0",
    "mtime": "2018-01-29 04:06:37.887860",
    "max_marker": "0#00000988136.1284181.3",
    "usage": {
        "rgw.main": {
            "size_kb": 195250000,
            "size_kb_actual": 195337472,
            "num_objects": 50000
        }
    },
    "bucket_quota": {
        "enabled": false,
        "max_size_kb": -1,
        "max_objects": -1
    }
}

#16 Updated by Casey Bodley 8 months ago

  • Related to Bug #22838: slave zone, `bucket stats`,the result of 'num_objects' is incorrect. added

#17 Updated by Yehuda Sadeh 8 months ago

I still suspect that it relates to incomplete bucket index changes. These eventually get fixed through the dir_suggest mechanism. I suggest we create some tooling in radosgw-admin to provide info about current pending bucket index operations (will need to go over the bucket index entries, find the ones that are incomplete... but will need to avoid the dir_suggest calls, so should look similar to the radosgw-admin bi list command). Another option would be to use radosgw-admin bi list for that (make a script that dump all entries, select the incomplete ones).

#18 Updated by Mark Kogan 8 months ago

I am able to reproduce this, on git checkout tag/v10.2.7

the second zone c2 which is syncing has sporadic coredumps
(coredump were fixed in tag/v10.2.9 and above)

Following the procedure in comment #14 with a modification,
c2 zone was started as below so that it will restart after a coredump

while sleep 1 ; do ./src/mrgw.sh c2 8001 --debug-rgw=20 --debug-ms=1 --rgw-zone=us-west -f ; done

The result is:

$ date
Sun Feb  4 10:21:05 EST 2018

$ ./src/mrun c2 radosgw-admin bucket sync status --bucket=s3testqwer011 --source-zone=us-east
  {
    "key": 0,
    "val": {
      "status": "incremental-sync",
      "full_marker": {
        "position": {
          "name": "myobjects9",
          "instance": "null" 
        },
        "count": 102
      },
      "inc_marker": {
        "position": "00000100001.100001.3" 
      }
    }
  }
]

$ ./src/mrun c2 radosgw-admin bucket stats --bucket=s3testqwer011
{
  "bucket": "s3testqwer011",
  "pool": "us-west.rgw.buckets.data",
  "index_pool": "us-west.rgw.buckets.index",
  "id": "fb6f98ac-13dd-4407-8245-ae48c63d56d8.4115.1",
  "marker": "fb6f98ac-13dd-4407-8245-ae48c63d56d8.4115.1",
  "owner": "testid",
  "ver": "0#140620",
  "master_ver": "0#0",
  "mtime": "2018-02-04 07:31:20.830098",
  "max_marker": "0#00000140619.154111.3",
  "usage": {
    "rgw.main": {
      "size_kb": 195148438,
      "size_kb_actual": 195235864,
      "num_objects": 49958
    }
  },
  "bucket_quota": {
    "enabled": false,
    "max_size_kb": -1,
    "max_objects": -1
  }
}

num_objects is 49958 although 50000 were synced

#19 Updated by Mark Kogan 8 months ago

Reproduces on the current git checkout jewel branch
by sending abort signal to the synchronizing zone c2 during the sync process.

while sleep 360; do /usr/bin/kill --verbose -6 $(ps -ef | grep lt-radosgw | grep '[8]001' | awk '{ print $2 }') ; done

#20 Updated by Eric Ivancich 8 months ago

This is interesting and helpful.

Getting to Yehuda's point -- are the object there but not in the bucket index or not yet transferred? If you don't have that answer yet, would you be able to use the rados command interface to see what the object differences are?

The related bug https://tracker.ceph.com/issues/22838 claims to be on v12.2.2, which presumably has the fix(es) that v10.2.9 has. So our only speculation at this point would be that one or more radosgw processes fail. Any additional thoughts on that, Mark?

#21 Updated by Mark Kogan 8 months ago

Reproduces on current master branch

Listed below is sync information from both zones, if there is additional info that
is may assist please let me know and I will add it and will investigate the "dir_suggest_changes"

"c1" is the zone that cosbench is writing to and "c2" is the syncronizing zone

Both zones have the same correct number of objects 50000 in the data pool
the number of objects in the log is different thou.

../src/mrun c1 ceph df
GLOBAL:
    SIZE     AVAIL     RAW USED     %RAW USED
    500G      309G         191G         38.16
POOLS:
    NAME                          ID     USED      %USED     MAX AVAIL     OBJECTS
    .rgw.root                     1      4.78K         0          304G          17
    us-east.rgw.control           2          0         0          304G           8
    us-east.rgw.meta              3      1.15K         0          304G           7
    us-east.rgw.log               4      5.14K         0          304G         630
    us-east.rgw.buckets.index     5          0         0          304G           1
    us-east.rgw.buckets.data      6       186G     37.97          304G       50000

../src/mrun c2 ceph df
GLOBAL:
    SIZE     AVAIL     RAW USED     %RAW USED
    500G      309G         191G         38.16
POOLS:
    NAME                          ID     USED      %USED     MAX AVAIL     OBJECTS
    .rgw.root                     1      4.93K         0          304G          17
    us-west.rgw.control           2          0         0          304G           8
    us-west.rgw.meta              3      1.15K         0          304G           7
    us-west.rgw.log               4      7.92K         0          304G         585
    us-west.rgw.buckets.index     5          0         0          304G           1
    us-west.rgw.buckets.data      6       186G     37.98          304G       50000

$ ./src/mrun c1 radosgw-admin bucket stats --bucket=s3testqwer011 | jq '.usage[].num_objects'
50000

$ ./src/mrun c2 radosgw-admin bucket stats --bucket=s3testqwer011 | jq '.usage[].num_objects'
49921

./src/mrun c2 radosgw-admin sync status
          realm f959e2d2-08d2-487f-bb3a-3d7eb3e56b58 (gold)
      zonegroup 4e65cc4f-f933-42d2-a0c4-32c162d0c68f (us)
           zone 5c720a0e-ce0d-4e52-8d7f-63c520898e7b (us-west)
  metadata sync syncing
                full sync: 0/64 shards
                incremental sync: 64/64 shards
                metadata is caught up with master
      data sync source: df9c8255-e7e6-46f5-a267-c58fde4b80eb (us-east)
                        syncing
                        full sync: 0/128 shards
                        incremental sync: 128/128 shards
                        data is caught up with source

./src/mrun c2 radosgw-admin bucket sync status --bucket=s3testqwer011 --source-zone=us-east
[
    {
        "key": 0,
        "val": {
            "status": "incremental-sync",
            "full_marker": {
                "position": {
                    "name": "myobjects9",
                    "instance": "null",
                    "ns": "" 
                },
                "count": 260
            },
            "inc_marker": {
                "position": "00000100001.100002.5" 
                             ^^^^^^^^^^^^^^^^^^^^
            }
        }
    }
]

Contrary to the original report on current master "s3cmd ls ..." does not currect the num_objects
possibly there were changes to dir_suggest_changes ....

$ s3cmd -c s3cfg-c1 ls s3://s3testqwer011 | wc -l
50000

$ s3cmd -c s3cfg-c2 ls s3://s3testqwer011 | wc -l
50000

./src/mrun c1 radosgw-admin bucket stats --bucket=s3testqwer011 | jq
{
  "bucket": "s3testqwer011",
  "zonegroup": "4e65cc4f-f933-42d2-a0c4-32c162d0c68f",
  "placement_rule": "default-placement",
  "explicit_placement": {
    "data_pool": "",
    "data_extra_pool": "",
    "index_pool": "" 
  },
  "id": "df9c8255-e7e6-46f5-a267-c58fde4b80eb.4138.1",
  "marker": "df9c8255-e7e6-46f5-a267-c58fde4b80eb.4138.1",
  "index_type": "Normal",
  "owner": "testid",
  "ver": "0#100002",
  "master_ver": "0#0",
  "mtime": "2018-02-05 10:22:11.049884",
  "max_marker": "0#00000100001.100002.5",
  "usage": {
    "rgw.main": {
      "size": 200000000000,
      "size_actual": 200089600000,
      "size_utilized": 200000000000,
      "size_kb": 195312500,
      "size_kb_actual": 195400000,
      "size_kb_utilized": 195312500,
      "num_objects": 50000
    }
  },
  "bucket_quota": {
    "enabled": false,
    "check_on_raw": false,
    "max_size": -1,
    "max_size_kb": 0,
    "max_objects": -1
  }
}

./src/mrun c2 radosgw-admin bucket stats --bucket=s3testqwer011 | jq
{
  "bucket": "s3testqwer011",
  "zonegroup": "4e65cc4f-f933-42d2-a0c4-32c162d0c68f",
  "placement_rule": "default-placement",
  "explicit_placement": {
    "data_pool": "",
    "data_extra_pool": "",
    "index_pool": "" 
  },
  "id": "df9c8255-e7e6-46f5-a267-c58fde4b80eb.4138.1",
  "marker": "df9c8255-e7e6-46f5-a267-c58fde4b80eb.4138.1",
  "index_type": "Normal",
  "owner": "testid",
  "ver": "0#100069",
  "master_ver": "0#0",
  "mtime": "2018-02-05 10:22:11.049884",
  "max_marker": "0#00000100068.100075.9",
  "usage": {
    "rgw.main": {
      "size": 199684000000,
      "size_actual": 199773458432,
      "size_utilized": 199684000000,
      "size_kb": 195003907,
      "size_kb_actual": 195091268,
      "size_kb_utilized": 195003907,
      "num_objects": 49921
    }
  },
  "bucket_quota": {
    "enabled": false,
    "check_on_raw": false,
    "max_size": -1,
    "max_size_kb": 0,
    "max_objects": -1
  }
}

Will continue to investigate RGWRados::cls_bucket_list and CEPH_RGW_TAG_TIMEOUT

#22 Updated by Orit Wasserman 7 months ago

  • Assignee changed from Eric Ivancich to Mark Kogan

#23 Updated by Yehuda Sadeh 4 months ago

Mark, is there any new information on this one?

#24 Updated by Mark Kogan 3 months ago

Not yet, was busy with other issues,
I Will be resuming work on this bug now.

#25 Updated by Matt Benjamin 24 days ago

Mark, is this one resolved?

Matt

#26 Updated by Mark Kogan 6 days ago

I am still debugging it case,

Reminder the stats discrepancy occurs when during multisite sync the replicating site is repeatedly killed with:

while sleep 360; do /usr/bin/kill --verbose -6 $(ps -ef | grep lt-radosgw | grep '[8]001' | awk '{ print $2 }') ; done

Last progress update is that the suspicion that the cause was related to dir_suggest_changes did not seem to be correct,
will continue debugging from a different angle.

Also available in: Atom PDF