Bug #52776: the bucket resharding time is too long, putting object is fail - rgw - Ceph

Actions

Copy link

Bug #52776

open

the bucket resharding time is too long, putting object is fail

Added by Huber ming over 2 years ago. Updated over 2 years ago.

Status:

Need More Info

Priority:

Normal

Assignee:

J. Eric Ivancich

Target version:

% Done:

Source:

Tags:

reshard

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

there is 50 million objects in bucket, and the bucket index need reshard to 1024, but the resharding time is too long, and putting object is fail.
ceph version: 14.2.8

Actions

Copy link

Updated by Casey Bodley over 2 years ago

Status changed from New to Need More Info
Tags set to reshard

this sounds like a relatively small bucket to have such performance issues. are the index pools on ssd/nvme? how long does the reshard take to complete?

Actions

Copy link

Updated by Mark Kogan over 2 years ago

posting performance results of resharding 50M obj buckets from 1 to 10224 shards on vstart environment

summery of performance in example environments:
elapsed time was ~6 minutes on SSD system & ~20 minute on HDD system (the HDD system also reported BlueFS spillover which was not reported on the SSD system)

(* performance in different environments may vary depending on cluster load, (deep)scrub, backfill, etc ...)

# objects were written with:
numactl -N 1 -m 1 -- ~/go/bin/hsbench -a b2345678901234567890 -s b234567890123456789012345678901234567890 -u http://127.0.0.1:8000 -z 4K -d -1 -t  $(numactl -N 1 -- nproc) -b 1 -n 50000000 -m cxip -bp b01b |& tee hsbench.log

# silvertip; - hdd & 14.2.8

git branch -vv
* (no branch) 2d095e947a0 14.2.8

sudo ./bin/ceph status
  cluster:
    id:     bac3a36d-eed8-4460-8619-5951395eb416
    health: HEALTH_WARN
            noscrub,nodeep-scrub flag(s) set
            BlueFS spillover detected on 1 OSD(s)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  services:
    mon: 1 daemons, quorum a (age 43h)
    mgr: x(active, since 43h)
    osd: 1 osds: 1 up (since 43h), 1 in (since 43h)
         flags noscrub,nodeep-scrub
    rgw: 1 daemon active (8000)

  data:
    pools:   6 pools, 768 pgs
    objects: 50.00M objects, 191 GiB
    usage:   296 GiB used, 235 GiB / 531 GiB avail
    pgs:     768 active+clean

watch -cd "timeout 4s sudo ./bin/ceph df 2>/dev/null" 

RAW STORAGE:
    CLASS     SIZE        AVAIL       USED        RAW USED     %RAW USED
    hdd       531 GiB     235 GiB     221 GiB      296 GiB         55.77
    TOTAL     531 GiB     235 GiB     221 GiB      296 GiB         55.77

POOLS:
    POOL                          ID     STORED      OBJECTS     USED        %USED     MAX AVAIL
    .rgw.root                      1     1.2 KiB           4      16 KiB         0       229 GiB
    default.rgw.control            2         0 B           8         0 B         0       229 GiB
    default.rgw.meta               3     4.0 KiB          22      84 KiB         0       229 GiB
    default.rgw.log                4         0 B         191         0 B         0       229 GiB
    default.rgw.buckets.index      5         0 B           2         0 B         0       229 GiB
    default.rgw.buckets.data       6     191 GiB      50.00M     191 GiB     45.40       229 GiB

sudo ./bin/radosgw-admin bucket stats --bucket=b01b000000000000 2>/dev/null | grep num
    "num_shards": 1,
            "num_objects": 50000000

fallocate -l 1M ./1M.dat

sudo time ./bin/radosgw-admin bucket reshard --bucket=b01b000000000000 --num-shards=1024 --yes-i-really-mean-it
...
50000000 50000000
2021-10-03 11:25:43.064 7fffbffff700 -1 RGWWatcher::handle_error cookie 93825003807488 err (107) Transport endpoint is not connected
2021-10-03 11:25:53.393 7fffedff5840  1 execute INFO: reshard of bucket "b01b000000000000" from "b01b000000000000:801862d0-a8fe-4809-9a76-68df48767f90.4173.2" to "b01b000000000000:801862d0-a8fe-4809-9a76-68df48767f90.183023.1" completed successfully
415.24user 52.77system 20:22.05elapsed 38%CPU (0avgtext+0avgdata 157136maxresident)k
                       ^^^^^
6032inputs+8outputs (4major+186345minor)pagefaults 0swaps

time s3cmd put 1M.dat s3://b01b000000000000
 upload: '1M.dat' -> 's3://b01b000000000000/1M.dat'  [1 of 1]
 1048576 of 1048576   100% in    0s     8.12 MB/s  failed
WARNING: Upload failed: /1M.dat (timed out)
WARNING: Waiting 3 sec...
upload: '1M.dat' -> 's3://b01b000000000000/1M.dat'  [1 of 1]
 1048576 of 1048576   100% in    0s    83.06 MB/s  failed
WARNING: Upload failed: /1M.dat (timed out)
WARNING: Waiting 6 sec...
upload: '1M.dat' -> 's3://b01b000000000000/1M.dat'  [1 of 1]
 1048576 of 1048576   100% in    0s    82.34 MB/s  failed
WARNING: Upload failed: /1M.dat (timed out)
WARNING: Waiting 9 sec...
upload: '1M.dat' -> 's3://b01b000000000000/1M.dat'  [1 of 1]
 1048576 of 1048576   100% in    0s    47.62 MB/s  failed
WARNING: Upload failed: /1M.dat (timed out)
WARNING: Waiting 12 sec...
upload: '1M.dat' -> 's3://b01b000000000000/1M.dat'  [1 of 1]
 1048576 of 1048576   100% in    0s    21.36 MB/s  done
s3cmd put 1M.dat s3://b01b000000000000  0.20s user 0.05s system 0% cpu 20:30.73 total
                                                                       ^^^^^

sudo ./bin/radosgw-admin bucket stats --bucket=b01b000000000000 2>/dev/null | grep num
    "num_shards": 1024,
            "num_objects": 50000001

# sepia o07 - ssd & master

git branch -vv
* master 6939ea034a2 [origin/master: behind 12] Merge PR #43323 into master

sudo ./bin/radosgw-admin bucket stats --bucket=b01b000000000000 2>/dev/null | grep num
    "num_shards": 1,
            "num_objects": 50000000

sudo time ./bin/radosgw-admin bucket reshard --bucket=b01b000000000000 --num-shards=1024 --yes-i-really-mean-it
...
 49979000 49980000 49981000 49982000 49983000 49984000 49985000 49986000 49987000 49988000 49989000 49990000 49991000 49992000 49993000 49994000 49995000 49996000 49997000 49998000 49999000
50000000 50000000
2021-10-04T07:50:43.476+0000 7ffff7e3ca80  1 execute INFO: reshard of bucket "b01b000000000000" from "b01b000000000000:e6369cbb-16f7-45e2-a904-dd2c5e69b8ed.4169.2" to "b01b000000000000:e6369cbb-16f7-45e2-a904-dd2c5e69b8ed.79935.1" completed successfully
245.17user 47.52system 6:26.06elapsed 75%CPU (0avgtext+0avgdata 173156maxresident)k
                       ^^^^
25848inputs+16outputs (34major+153351minor)pagefaults 0swaps

time s3cmd put 1M.dat s3://b01b000000000000
upload: '1M.dat' -> 's3://b01b000000000000/1M.dat'  [1 of 1]
 1048576 of 1048576   100% in    0s   165.84 MB/s  failed
WARNING: Upload failed: /1M.dat (timed out)
WARNING: Waiting 3 sec...
upload: '1M.dat' -> 's3://b01b000000000000/1M.dat'  [1 of 1]
 1048576 of 1048576   100% in   85s    12.03 KB/s  done
s3cmd put 1M.dat s3://b01b000000000000  0.12s user 0.02s system 0% cpu 6:28.30 total
                                                                       ^^^^

sudo ./bin/radosgw-admin bucket stats --bucket=b01b000000000000 2>/dev/null | grep num
    "num_shards": 1024,
            "num_objects": 50000001

sudo ./bin/ceph status
  cluster:
    id:     37cc56db-397f-498c-ad99-18521fde7c26
    health: HEALTH_WARN
            12 mgr modules have failed dependencies
            noscrub,nodeep-scrub flag(s) set
            6 pool(s) have no replicas configured

  services:
    mon: 1 daemons, quorum a (age -499183281)
    mgr: x(active, since 23h)
    osd: 1 osds: 1 up (since 23h), 1 in (since 23h)
         flags noscrub,nodeep-scrub
    rgw: 1 daemon active (1 hosts, 1 zones)

  data:
    pools:   6 pools, 768 pgs
    objects: 50.00M objects, 191 GiB
    usage:   537 GiB used, 1.5 TiB / 2.0 TiB avail
    pgs:     768 active+clean

Actions

Copy link

Updated by Casey Bodley over 2 years ago

Assignee set to J. Eric Ivancich

Actions

Copy link

Updated by J. Eric Ivancich over 2 years ago

@Huber Ming -- Are you able to provide the "need more info"?

Actions

Copy link

Updated by Huber ming over 2 years ago

as @Mark Kogan said, elapsed time was ~6 minutes on SSD system(resharding 50M obj buckets from 1 to 10224 shards), all putting object are falied during this time;
is there any method to put objects into some resharding bucekt?

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » rgw

Custom queries

Bug #52776

the bucket resharding time is too long, putting object is fail

Updated by Casey Bodley over 2 years ago

Updated by Mark Kogan over 2 years ago

Updated by Casey Bodley over 2 years ago

Updated by J. Eric Ivancich over 2 years ago

Updated by Huber ming over 2 years ago