Bug #59104: All access to rgw took over 100 seconds for a long time - rgw - Ceph

Actions

Copy link

Bug #59104

closed

All access to rgw took over 100 seconds for a long time

Added by Yuma Ogami about 1 year ago. Updated 8 months ago.

Status:

Can't reproduce

Priority:

Normal

Assignee:

Target version:

% Done:

Source:

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

All access to rgw took over 100 seconds for more than 10 minutes. I encountered this problem several times.
At that time, ceph status looks as following:

```
cybozu@stage0-boot-0:~$ kubectl exec ~~n ceph-object-store deploy/rook-ceph-tools -~~ ceph -s
cluster:
id: b52d5f3d-ba14-442e-a089-0bca47b83758
health: HEALTH_WARN
80 pgs not deep-scrubbed in time
79 pgs not scrubbed in time
17 slow ops, oldest one blocked for 856 sec, daemons [osd.0,osd.1,osd.17,osd.2] have slow ops.

services:
    mon: 3 daemons, quorum b,f,g (age 70m)
    mgr: a(active, since 11h), standbys: b
    osd: 24 osds: 24 up (since 12m), 24 in (since 14m); 15 remapped pgs
    rgw: 3 daemons active (3 hosts, 1 zones)

data:
    pools:   9 pools, 336 pgs
    objects: 104.09M objects, 7.9 TiB
    usage:   25 TiB used, 114 TiB / 139 TiB avail
    pgs:     10043255/312255237 objects misplaced (3.216%)
             291 active+clean
             26  active+recovering
             15  active+remapped+backfilling
             3   active+recovering+laggy
             1   active+clean+laggy

io:
    client:   70 KiB/s rd, 19 KiB/s wr, 66 op/s rd, 16 op/s wr
    recovery: 2.5 MiB/s, 29.44k keys/s, 27 objects/s
```

Here is the latency of healthcheck (i.e. GET "/swift/healthcheck" request).

```
...
2023-03-15 12:49:22
debug 2023-03-15T03:49:22.364+0000 7fed946e0700 1 beast: 0x7fee4416a710: 10.69.3.9 - - [15/Mar/2023:03:49:22.363 +0000] "GET /swift/healthcheck HTTP/1.1" 200 0 - "kube-probe/1.24" - latency=0.000000000sShow context
2023-03-15 12:49:23
debug 2023-03-15T03:49:23.401+0000 7ff73a664700 1 beast: 0x7ff83c2ed710: 10.69.1.134 - - [15/Mar/2023:03:49:23.400 +0000] "GET /swift/healthcheck HTTP/1.1" 200 0 - "kube-probe/1.24" - latency=0.000000000s
2023-03-15 12:49:31
debug 2023-03-15T03:49:31.934+0000 7f5cdfe4f700 1 beast: 0x7f5dd85f3710: 10.69.2.83 - - [15/Mar/2023:03:49:31.934 +0000] "GET /swift/healthcheck HTTP/1.1" 200 0 - "kube-probe/1.24" - latency=0.000000000s
2023-03-15 12:52:20
debug 2023-03-15T03:52:20.396+0000 7ff7b8f61700 1 beast: 0x7ff8305f3710: 10.69.1.134 - - [15/Mar/2023:03:49:33.402 +0000] "GET /swift/healthcheck HTTP/1.1" 200 0 - "kube-probe/1.24" - latency=166.993789673s
2023-03-15 12:52:20
debug 2023-03-15T03:52:20.396+0000 7ff757e9f700 1 beast: 0x7ff818597710: 10.69.1.134 - - [15/Mar/2023:03:49:54.402 +0000] "GET /swift/healthcheck HTTP/1.1" 200 0 - "kube-probe/1.24" - latency=145.994049072s
2023-03-15 12:52:20
debug 2023-03-15T03:52:20.396+0000 7ff751692700 1 beast: 0x7ff8240e9710: 10.69.1.134 - - [15/Mar/2023:03:49:53.401 +0000] "GET /swift/healthcheck HTTP/1.1" 200 0 - "kube-probe/1.24" - latency=146.994079590s
2023-03-15 12:52:20
debug 2023-03-15T03:52:20.396+0000 7ff7cc788700 1 beast: 0x7ff818414710: 10.69.1.134 - - [15/Mar/2023:03:50:13.400 +0000] "GET /swift/healthcheck HTTP/1.1" 200 0 - "kube-probe/1.24" - latency=126.995399475s
...
```

It looks fine before 03:52:20. But after that, the latency got too bad. Other requests like PUT took long time too.

When healthcheck took long time, the log looked different from the normal case.

normal case

```
2023-03-15 12:49:23
debug 2023-03-15T03:49:23.401+0000 7ff73a664700 1 ====== starting new request req=0x7ff83c2ed710 =====
2023-03-15 12:49:23
debug 2023-03-15T03:49:23.401+0000 7ff73a664700 2 req 10897545119182406344 0.000000000s initializing for trans_id = tx00000973bdb8c58a8c2c8-0064114043-663ba9a-ceph-object-store-0
2023-03-15 12:49:23
debug 2023-03-15T03:49:23.401+0000 7ff73a664700 2 req 10897545119182406344 0.000000000s getting op 0
2023-03-15 12:49:23
debug 2023-03-15T03:49:23.401+0000 7ff73a664700 2 req 10897545119182406344 0.000000000s swift:get_health_check verifying requester
2023-03-15 12:49:23
debug 2023-03-15T03:49:23.401+0000 7ff73a664700 2 req 10897545119182406344 0.000000000s swift:get_health_check normalizing buckets and tenants
2023-03-15 12:49:23
debug 2023-03-15T03:49:23.401+0000 7ff73a664700 2 req 10897545119182406344 0.000000000s swift:get_health_check init permissions
2023-03-15 12:49:23
debug 2023-03-15T03:49:23.401+0000 7ff73a664700 2 req 10897545119182406344 0.000000000s swift:get_health_check recalculating target
2023-03-15 12:49:23
debug 2023-03-15T03:49:23.401+0000 7ff73a664700 2 req 10897545119182406344 0.000000000s swift:get_health_check reading permissions
2023-03-15 12:49:23
debug 2023-03-15T03:49:23.401+0000 7ff73a664700 2 req 10897545119182406344 0.000000000s swift:get_health_check init op
2023-03-15 12:49:23
debug 2023-03-15T03:49:23.401+0000 7ff73a664700 2 req 10897545119182406344 0.000000000s swift:get_health_check verifying op mask
2023-03-15 12:49:23
debug 2023-03-15T03:49:23.401+0000 7ff73a664700 2 req 10897545119182406344 0.000000000s swift:get_health_check verifying op permissions
2023-03-15 12:49:23
debug 2023-03-15T03:49:23.401+0000 7ff73a664700 2 req 10897545119182406344 0.000000000s swift:get_health_check verifying op params
2023-03-15 12:49:23
debug 2023-03-15T03:49:23.401+0000 7ff73a664700 2 req 10897545119182406344 0.000000000s swift:get_health_check pre-executing
2023-03-15 12:49:23
debug 2023-03-15T03:49:23.401+0000 7ff73a664700 2 req 10897545119182406344 0.000000000s swift:get_health_check check rate limiting
2023-03-15 12:49:23
debug 2023-03-15T03:49:23.401+0000 7ff73a664700 2 req 10897545119182406344 0.000000000s swift:get_health_check executing
2023-03-15 12:49:23
debug 2023-03-15T03:49:23.401+0000 7ff73a664700 2 req 10897545119182406344 0.000000000s swift:get_health_check completing
2023-03-15 12:49:23
debug 2023-03-15T03:49:23.401+0000 7ff73a664700 2 req 10897545119182406344 0.000000000s swift:get_health_check op status=0
2023-03-15 12:49:23
debug 2023-03-15T03:49:23.401+0000 7ff73a664700 2 req 10897545119182406344 0.000000000s swift:get_health_check http status=200
2023-03-15 12:49:23
debug 2023-03-15T03:49:23.401+0000 7ff73a664700 1 ====== req done req=0x7ff83c2ed710 op status=0 http_status=200 latency=0.000000000s ======
2023-03-15 12:49:23
debug 2023-03-15T03:49:23.401+0000 7ff73a664700 1 beast: 0x7ff83c2ed710: 10.69.1.134 - - [15/Mar/2023:03:49:23.400 +0000] "GET /swift/healthcheck HTTP/1.1" 200 0 - "kube-probe/1.24" - latency=0.000000000s
```

problematic case

```
2023-03-15 12:50:33
debug 2023-03-15T03:50:33.402+0000 7ff70be07700 1 ====== starting new request req=0x7ff8d80d1710 =====
2023-03-15 12:50:33
debug 2023-03-15T03:50:33.402+0000 7ff70be07700 2 req 10795295110652223112 0.000000000s initializing for trans_id = tx0000095d097b50befde88-0064114089-663ba9a-ceph-object-store-0
2023-03-15 12:50:33
debug 2023-03-15T03:50:33.402+0000 7ff70be07700 2 req 10795295110652223112 0.000000000s getting op 0
2023-03-15 12:50:37
debug 2023-03-15T03:50:37.801+0000 7ff8dbfe7700 2 rgw data changes log: RGWDataChangesLog::ChangesRenewThread: start
2023-03-15 12:50:37
debug 2023-03-15T03:50:37.801+0000 7ff8dbfe7700 2 rgw data changes log: RGWDataChangesLog::ChangesRenewThread: pruning old generations
2023-03-15 12:50:38
debug 2023-03-15T03:50:38.282+0000 7ff737e5f700 2 req 14124238834200714628 64.880249023s swift:get_health_check verifying requester
2023-03-15 12:50:38
debug 2023-03-15T03:50:38.282+0000 7ff737e5f700 2 req 14124238834200714628 64.880249023s swift:get_health_check normalizing buckets and tenants
2023-03-15 12:50:38
debug 2023-03-15T03:50:38.282+0000 7ff737e5f700 2 req 14124238834200714628 64.880249023s swift:get_health_check init permissions
2023-03-15 12:50:38
debug 2023-03-15T03:50:38.282+0000 7ff7c0770700 2 req 1034626586565520860 63.837211609s s3:get_obj verifying requester
2023-03-15 12:50:38
debug 2023-03-15T03:50:38.282+0000 7ff737e5f700 2 req 14124238834200714628 64.880249023s swift:get_health_check recalculating target
2023-03-15 12:50:38
debug 2023-03-15T03:50:38.282+0000 7ff737e5f700 2 req 14124238834200714628 64.880249023s swift:get_health_check reading permissions
2023-03-15 12:50:38
debug 2023-03-15T03:50:38.282+0000 7ff737e5f700 2 req 14124238834200714628 64.880249023s swift:get_health_check init op
2023-03-15 12:50:38
debug 2023-03-15T03:50:38.282+0000 7ff737e5f700 2 req 14124238834200714628 64.880249023s swift:get_health_check verifying op mask
2023-03-15 12:50:38
debug 2023-03-15T03:50:38.282+0000 7ff737e5f700 2 req 14124238834200714628 64.880249023s swift:get_health_check verifying op permissions
2023-03-15 12:50:38
debug 2023-03-15T03:50:38.282+0000 7ff737e5f700 2 req 14124238834200714628 64.880249023s swift:get_health_check verifying op params
2023-03-15 12:50:38
debug 2023-03-15T03:50:38.282+0000 7ff737e5f700 2 req 14124238834200714628 64.880249023s swift:get_health_check pre-executing
2023-03-15 12:50:38
debug 2023-03-15T03:50:38.282+0000 7ff737e5f700 2 req 14124238834200714628 64.880249023s swift:get_health_check check rate limiting
2023-03-15 12:50:38
debug 2023-03-15T03:50:38.282+0000 7ff737e5f700 2 req 14124238834200714628 64.880249023s swift:get_health_check executing
2023-03-15 12:50:38
debug 2023-03-15T03:50:38.282+0000 7ff737e5f700 2 req 14124238834200714628 64.880249023s swift:get_health_check completing
2023-03-15 12:50:38
debug 2023-03-15T03:50:38.282+0000 7ff788700700 2 req 14745705746987820772 63.338191986s s3:get_obj verifying requester
2023-03-15 12:50:38
debug 2023-03-15T03:50:38.282+0000 7ff753e97700 2 req 6474228846771742968 62.464164734s s3:get_obj verifying requester
...
```

environment information:

- ceph: 17.2.5
- rook: 1.10.7

attached files

- rgw-all.log.xz: the combined log of all rgw daemons (12:40:00 - 11:53:00)
- rgw-{0,1,2}.log.xz: the log of each rgw daemon

Files

Download all files

rgw-all.log.xz (501 KB) rgw-all.log.xz		Yuma Ogami, 03/20/2023 06:47 AM
rgw-0.log.xz (137 KB) rgw-0.log.xz		Yuma Ogami, 03/20/2023 06:51 AM
rgw-1.log.xz (171 KB) rgw-1.log.xz		Yuma Ogami, 03/20/2023 06:51 AM
rgw-2.log.xz (137 KB) rgw-2.log.xz		Yuma Ogami, 03/20/2023 06:51 AM

Actions

Copy link

Updated by Yuma Ogami about 1 year ago

Additional information are following.

A request log (during latency degradation):

```
2023-03-15 12:49:33
debug 2023-03-15T03:49:33.402+0000 7ff75eead700 2 req 14124238834200714628 0.000000000s initializing for trans_id = tx00000c4036062e34f9d84-006411404d-663ba9a-ceph-object-store-0
2023-03-15 12:49:33
debug 2023-03-15T03:49:33.402+0000 7ff75eead700 2 req 14124238834200714628 0.000000000s getting op 0
2023-03-15 12:50:38
debug 2023-03-15T03:50:38.282+0000 7ff737e5f700 2 req 14124238834200714628 64.880249023s swift:get_health_check verifying requester
2023-03-15 12:50:38
debug 2023-03-15T03:50:38.282+0000 7ff737e5f700 2 req 14124238834200714628 64.880249023s swift:get_health_check normalizing buckets and tenants
2023-03-15 12:50:38
debug 2023-03-15T03:50:38.282+0000 7ff737e5f700 2 req 14124238834200714628 64.880249023s swift:get_health_check init permissions
2023-03-15 12:50:38
debug 2023-03-15T03:50:38.282+0000 7ff737e5f700 2 req 14124238834200714628 64.880249023s swift:get_health_check recalculating target
2023-03-15 12:50:38
debug 2023-03-15T03:50:38.282+0000 7ff737e5f700 2 req 14124238834200714628 64.880249023s swift:get_health_check reading permissions
2023-03-15 12:50:38
debug 2023-03-15T03:50:38.282+0000 7ff737e5f700 2 req 14124238834200714628 64.880249023s swift:get_health_check init op
2023-03-15 12:50:38
debug 2023-03-15T03:50:38.282+0000 7ff737e5f700 2 req 14124238834200714628 64.880249023s swift:get_health_check verifying op mask
2023-03-15 12:50:38
debug 2023-03-15T03:50:38.282+0000 7ff737e5f700 2 req 14124238834200714628 64.880249023s swift:get_health_check verifying op permissions
2023-03-15 12:50:38
debug 2023-03-15T03:50:38.282+0000 7ff737e5f700 2 req 14124238834200714628 64.880249023s swift:get_health_check verifying op params
2023-03-15 12:50:38
debug 2023-03-15T03:50:38.282+0000 7ff737e5f700 2 req 14124238834200714628 64.880249023s swift:get_health_check pre-executing
2023-03-15 12:50:38
debug 2023-03-15T03:50:38.282+0000 7ff737e5f700 2 req 14124238834200714628 64.880249023s swift:get_health_check check rate limiting
2023-03-15 12:50:38
debug 2023-03-15T03:50:38.282+0000 7ff737e5f700 2 req 14124238834200714628 64.880249023s swift:get_health_check executing
2023-03-15 12:50:38
debug 2023-03-15T03:50:38.282+0000 7ff737e5f700 2 req 14124238834200714628 64.880249023s swift:get_health_check completing
2023-03-15 12:52:20
debug 2023-03-15T03:52:20.396+0000 7ff7b8f61700 2 req 14124238834200714628 166.993789673s swift:get_health_check op status=0
2023-03-15 12:52:20
debug 2023-03-15T03:52:20.396+0000 7ff7b8f61700 2 req 14124238834200714628 166.993789673s swift:get_health_check http status=200
```

There is a delay of about 60 seconds between "getting op 0" and "swift:get_health_check verifying requester", and another delay about 100 seconds between "swift:get_health_check completing" and "swift:get_health_check op status=0".

I consider the first delay is related to this code.
https://github.com/ceph/ceph/blob/98318ae89f1a893a6ded3a640405cdbb33e08757/src/rgw/rgw_process.cc#L333-L358

During the latency degradation, not all requests were delayed. (There were some requests that were not delayed.)

Actions

Copy link

Updated by Satoru Takeuchi 12 months ago

Hi, I took over Yuma's work.

We succeeded to reproduce this problem. Here is the additional input.

problem¶

Accessing /swift/healthcheck took very long time after restarting an index OSD.

Rook regards each rgw daemon as unreachable if accessing /swift/healchcheck fails or takes > 1s for 3 times. In our case, all three rgw daemons were regarded as down.

Our considerations¶

We suspect that the part where RGW is trying to stat the lua script is taking a long time.
We found that RGW accesses the rgw pool during the loading lua script. In addition, we don't use our own lua scripts.
Since the trigger of this phenomenon is the restart of an index OSD, we guess that the response to the above stat request is delayed due to some other works after the OSD enters the UP state.
At this time, we do not know what task is taking time in the OSD.
Note that the CPU, memory, disk, and network have sufficient capacity.

how to reproduce¶

restart an index osd

cluster configuration¶

- ceph -s

  cluster:
    id:     b52d5f3d-ba14-442e-a089-0bca47b83758
    health: HEALTH_WARN
            797 large omap objects
            504 pgs not deep-scrubbed in time
            504 pgs not scrubbed in time

  services:
    mon: 3 daemons, quorum f,g,i (age 3d)
    mgr: a(active, since 3d), standbys: b
    osd: 24 osds: 24 up (since 3d), 24 in (since 4d); 71 remapped pgs
    rgw: 3 daemons active (3 hosts, 1 zones)

  data:
    pools:   9 pools, 712 pgs
    objects: 139.46M objects, 12 TiB
    usage:   40 TiB used, 100 TiB / 139 TiB avail
    pgs:     21257886/418382853 objects misplaced (5.081%)
             641 active+clean
             71  active+remapped+backfilling

  io:
    client:   39 MiB/s rd, 358 KiB/s wr, 5.03k op/s rd, 32 op/s wr
    recovery: 2.6 MiB/s, 32 objects/s

- ceph osd tree

ID    CLASS  WEIGHT     TYPE NAME                 STATUS  REWEIGHT  PRI-AFF
  -1         139.32303  root default                                       
  -8          44.16434      zone rack0                                     
 -19           0.25000          host 10-69-0-19                            
  10    ssd    0.25000              osd.10            up   1.00000  1.00000
 -73           7.27739          host 10-69-0-24                            
  23    hdd    7.27739              osd.23            up   1.00000  1.00000
 -34           7.27739          host 10-69-0-26                            
   7    hdd    7.27739              osd.7             up   1.00000  1.00000
 -22           7.27739          host 10-69-0-27                            
   3    hdd    7.27739              osd.3             up   1.00000  1.00000
 -58           7.27739          host 10-69-0-28                            
  12    hdd    7.27739              osd.12            up   1.00000  1.00000
 -61           7.27739          host 10-69-0-29                            
  15    hdd    7.27739              osd.15            up   1.00000  1.00000
 -49           7.27739          host 10-69-0-30                            
  13    hdd    7.27739              osd.13            up   1.00000  1.00000
  -7           0.25000          host 10-69-0-8                             
   1    ssd    0.25000              osd.1             up   1.00000  1.00000
  -4          44.16434      zone rack1                                     
  -3           0.25000          host 10-69-0-200                           
   0    ssd    0.25000              osd.0             up   1.00000  1.00000
 -11           0.25000          host 10-69-0-202                           
   2    ssd    0.25000              osd.2             up   1.00000  1.00000
 -46           7.27739          host 10-69-0-214                           
   9    hdd    7.27739              osd.9             up   1.00000  1.00000
 -67           7.27739          host 10-69-0-215                           
  19    hdd    7.27739              osd.19            up   1.00000  1.00000
 -40           7.27739          host 10-69-0-216                           
   4    hdd    7.27739              osd.4             up   1.00000  1.00000
 -55           7.27739          host 10-69-0-218                           
   8    hdd    7.27739              osd.8             up   1.00000  1.00000
 -79           7.27739          host 10-69-0-221                           
  22    hdd    7.27739              osd.22            up   1.00000  1.00000
 -37           7.27739          host 10-69-0-223                           
   6    hdd    7.27739              osd.6             up   1.00000  1.00000
 -26          29.60956      zone rack2                                     
 -25           0.25000          host 10-69-1-133                           
  17    ssd    0.25000              osd.17            up   1.00000  1.00000
 -85           0.25000          host 10-69-1-134                           
  18    ssd    0.25000              osd.18            up   1.00000  1.00000
 -31                 0          host 10-69-1-147                           
 -76                 0          host 10-69-1-151                           
 -82                 0          host 10-69-1-152                           
 -64           7.27739          host 10-69-1-154                           
  14    hdd    7.27739              osd.14            up   1.00000  1.00000
 -88           7.27739          host 10-69-1-155                           
  16    hdd    7.27739              osd.16            up   1.00000  1.00000
 -52           7.27739          host 10-69-1-156                           
  11    hdd    7.27739              osd.11            up   1.00000  1.00000
 -70                 0          host 10-69-1-157                           
 -43           7.27739          host 10-69-1-158                           
   5    hdd    7.27739              osd.5             up   1.00000  1.00000
 -98          10.69240      zone rack3                                     
 -97          10.69240          host 10-69-2-94                            
  21    hdd   10.69240              osd.21            up   1.00000  1.00000
 -92          10.69240      zone rack4                                     
 -91                 0          host 10-69-3-23                            
-103          10.69240          host 10-69-3-31                            
  20    hdd   10.69240              osd.20            up   1.00000  1.00000

osd 0, 1, 2, 10, 17, 18 (ssd) are index OSDs. The log pool also uses these OSDs. Other osds (hdd) are data osds.

There is a large omap warning because we haven't resharded bucket indexes to avoid the following bug.

https://tracker.ceph.com/issues/57770

software environment¶

- ceph: 17.2.5
- rook: 1.10.7

reproduce operation¶

1. Restart OSD.0 around 2023 May 11, 04:06 ~ 04:09
2. Restart OSD.1 around 2023 May 11, 04:18 ~ 04:20
3. Restart OSD.1 around 2023 May 11, 04:20 ~ 04:21
(We restarted OSD.1 twice by mistake.)

This problem didn't happen after restarting OSD.0.
This problem happended after restarting OSD.1.

log samples¶

- index osd's logs

debug 2023-05-11T04:22:02.936+0000 7f6543198700 10 osd.0 94781 dequeue_op 0x55cf90711080 finish
debug 2023-05-11T04:22:02.889+0000 7f6543198700  7 osd.0 pg_epoch: 94781 pg[5.63( v 94778'10303508 (94715'10300462,94778'10303508] local-lis/les=94780/94781 n=15 ec=24663/74 lis/c=94780/94770 les/c/f=94781/94771/0 sis=94780) [1,0,17] r=1 lpr=94780 pi=[94770,94780)/1 luod=0'0 crt=94778'10303508 lcod 94778'10303507 mlcod 0'0 active mbc={}] build_push_op 5:c6fb0402:::.dir.67ade7e7-b37b-486e-a969-5d39c507d06d.64431775.1.728:head v 94778'10303490 size 0 recovery_info: ObjectRecoveryInfo(5:c6fb0402:::.dir.67ade7e7-b37b-486e-a969-5d39c507d06d.64431775.1.728:head@94778'10303490, size: 0, copy_subset: [], clone_subset: {}, snapset: 0=[]:{}, object_exist: 1)
debug 2023-05-11T04:22:02.889+0000 7f6543198700 10 osd.0 pg_epoch: 94781 pg[5.63( v 94778'10303508 (94715'10300462,94778'10303508] local-lis/les=94780/94781 n=15 ec=24663/74 lis/c=94780/94770 les/c/f=94781/94771/0 sis=94780) [1,0,17] r=1 lpr=94780 pi=[94770,94780)/1 luod=0'0 crt=94778'10303508 lcod 94778'10303507 mlcod 0'0 active mbc={}] _handle_message: 0x55cf90711080
debug 2023-05-11T04:22:02.889+0000 7f6543198700 10 osd.0 94781 dequeue_op 0x55cf90711080 prio 196 cost 8389608 latency 0.000049 MOSDPGPull(5.63 e94781/94780 cost 8389608) v3 pg pg[5.63( v 94778'10303508 (94715'10300462,94778'10303508] local-lis/les=94780/94781 n=15 ec=24663/74 lis/c=94780/94770 les/c/f=94781/94771/0 sis=94780) [1,0,17] r=1 lpr=94780 pi=[94770,94780)/1 luod=0'0 crt=94778'10303508 lcod 94778'10303507 mlcod 0'0 active mbc={}]
debug 2023-05-11T04:22:02.889+0000 7f656445a700 15 osd.0 94781 enqueue_op 0x55cf90711080 prio 196 type 106 cost 8389608 latency 0.000015 epoch 94781 MOSDPGPull(5.63 e94781/94780 cost 8389608) v3
debug 2023-05-11T04:22:02.888+0000 7f6541995700 10 log is not dirty
debug 2023-05-11T04:22:02.888+0000 7f6541995700 10 osd.0 pg_epoch: 94781 pg[3.6( v 94766'4152 (49145'1437,94766'4152] local-lis/les=94780/94781 n=27 ec=62/62 lis/c=94780/94780 les/c/f=94781/94781/0 sis=94780) [1,17,0] r=2 lpr=94780 luod=0'0 crt=94766'4152 lcod 0'0 mlcod 0'0 active mbc={}] proc_lease pg_lease(ru 84.334770203s ub 93.439544678s int 16.000000000s)
debug 2023-05-11T04:22:02.888+0000 7f6541995700 10 osd.0 pg_epoch: 94781 pg[3.6( v 94766'4152 (49145'1437,94766'4152] local-lis/les=94780/94781 n=27 ec=62/62 lis/c=94780/94780 les/c/f=94781/94781/0 sis=94780) [1,17,0] r=2 lpr=94780 luod=0'0 crt=94766'4152 lcod 0'0 mlcod 0'0 active mbc={}] do_peering_event: epoch_sent: 94781 epoch_requested: 94781 MLease epoch 94781 from osd.1 pg_lease(ru 84.334770203s ub 93.439544678s int 16.000000000s)
debug 2023-05-11T04:22:02.888+0000 7f656445a700 15 osd.0 94781 enqueue_peering_evt 3.6 epoch_sent: 94781 epoch_requested: 94781 MLease epoch 94781 from osd.1 pg_lease(ru 84.334770203s ub 93.439544678s int 16.000000000s)
debug 2023-05-11T04:22:02.886+0000 7f65479a1700  7 osd.0 pg_epoch: 94781 pg[5.6a( v 94778'20615431 (94718'20612409,94778'20615431] local-lis/les=94780/94781 n=12 ec=24663/74 lis/c=94780/94770 les/c/f=94781/94771/0 sis=94780) [1,0,18] r=1 lpr=94780 pi=[94770,94780)/1 luod=0'0 crt=94778'20615431 lcod 94778'20615430 mlcod 0'0 active mbc={}] build_push_op 5:56d41e18:::.dir.67ade7e7-b37b-486e-a969-5d39c507d06d.64431775.1.169:head v 94773'20615413 size 0 recovery_info: ObjectRecoveryInfo(5:56d41e18:::.dir.67ade7e7-b37b-486e-a969-5d39c507d06d.64431775.1.169:head@94773'20615413, size: 0, copy_subset: [], clone_subset: {}, snapset: 0=[]:{}, object_exist: 1)
debug 2023-05-11T04:22:02.886+0000 7f6543999700  7 osd.0 pg_epoch: 94781 pg[5.2( v 94778'19603630 (94668'19600598,94778'19603630] local-lis/les=94780/94781 n=6 ec=74/74 lis/c=94780/94770 les/c/f=94781/94771/0 sis=94780) [0,18,1] r=0 lpr=94780 pi=[94770,94780)/1 crt=94778'19603630 lcod 94778'19603629 mlcod 0'0 active+recovering+degraded rops=2 mbc={255={(2+0)=3}}] build_push_op 5:40b5d906:::.dir.67ade7e7-b37b-486e-a969-5d39c507d06d.64431775.1.495:head v 94773'19603618 size 0 recovery_info: ObjectRecoveryInfo(5:40b5d906:::.dir.67ade7e7-b37b-486e-a969-5d39c507d06d.64431775.1.495:head@94773'19603618, size: 0, copy_subset: [], clone_subset: {}, snapset: 0=[]:{}, object_exist: 1)
debug 2023-05-11T04:22:02.886+0000 7f6543999700 10 osd.0 pg_epoch: 94781 pg[5.2( v 94778'19603630 (94668'19600598,94778'19603630] local-lis/les=94780/94781 n=6 ec=74/74 lis/c=94780/94770 les/c/f=94781/94771/0 sis=94780) [0,18,1] r=0 lpr=94780 pi=[94770,94780)/1 crt=94778'19603630 lcod 94778'19603629 mlcod 0'0 active+recovering+degraded rops=2 mbc={255={(2+0)=3}}]  pushing more from, 0 of []
debug 2023-05-11T04:22:02.886+0000 7f65479a1700 10 osd.0 pg_epoch: 94781 pg[5.6a( v 94778'20615431 (94718'20612409,94778'20615431] local-lis/les=94780/94781 n=12 ec=24663/74 lis/c=94780/94770 les/c/f=94781/94771/0 sis=94780) [1,0,18] r=1 lpr=94780 pi=[94770,94780)/1 luod=0'0 crt=94778'20615431 lcod 94778'20615430 mlcod 0'0 active mbc={}] _handle_message: 0x55cf90ddfa20
debug 2023-05-11T04:22:02.886+0000 7f6543999700 10 osd.0 pg_epoch: 94781 pg[5.2( v 94778'19603630 (94668'19600598,94778'19603630] local-lis/les=94780/94781 n=6 ec=74/74 lis/c=94780/94770 les/c/f=94781/94771/0 sis=94780) [0,18,1] r=0 lpr=94780 pi=[94770,94780)/1 crt=94778'19603630 lcod 94778'19603629 mlcod 0'0 active+recovering+degraded rops=2 mbc={255={(2+0)=3}}] _handle_message: 0x55cf892ce000
debug 2023-05-11T04:22:02.886+0000 7f65479a1700 10 osd.0 94781 dequeue_op 0x55cf90ddfa20 prio 3 cost 8389608 latency 0.000028 MOSDPGPull(5.6a e94781/94780 cost 8389608) v3 pg pg[5.6a( v 94778'20615431 (94718'20612409,94778'20615431] local-lis/les=94780/94781 n=12 ec=24663/74 lis/c=94780/94770 les/c/f=94781/94771/0 sis=94780) [1,0,18] r=1 lpr=94780 pi=[94770,94780)/1 luod=0'0 crt=94778'20615431 lcod 94778'20615430 mlcod 0'0 active mbc={}]
debug 2023-05-11T04:22:02.886+0000 7f6543999700 10 osd.0 94781 dequeue_op 0x55cf892ce000 prio 196 cost 8389608 latency 0.000078 MOSDPGPushReply(5.2 94781/94780 [PushReplyOp(5:40b5d906:::.dir.67ade7e7-b37b-486e-a969-5d39c507d06d.64431775.1.495:head)]) v3 pg pg[5.2( v 94778'19603630 (94668'19600598,94778'19603630] local-lis/les=94780/94781 n=6 ec=74/74 lis/c=94780/94770 les/c/f=94781/94771/0 sis=94780) [0,18,1] r=0 lpr=94780 pi=[94770,94780)/1 crt=94778'19603630 lcod 94778'19603629 mlcod 0'0 active+recovering+degraded rops=2 mbc={255={(2+0)=3}}]
debug 2023-05-11T04:22:02.886+0000 7f656445a700 15 osd.0 94781 enqueue_op 0x55cf90ddfa20 prio 3 type 106 cost 8389608 latency 0.000014 epoch 94781 MOSDPGPull(5.6a e94781/94780 cost 8389608) v3
debug 2023-05-11T04:22:02.886+0000 7f656445a700 15 osd.0 94781 enqueue_op 0x55cf892ce000 prio 196 type 107 cost 8389608 latency 0.000024 epoch 94781 MOSDPGPushReply(5.2 94781/94780 [PushReplyOp(5:40b5d906:::.dir.67ade7e7-b37b-486e-a969-5d39c507d06d.64431775.1.495:head)]) v3
debug 2023-05-11T04:22:02.846+0000 7f655c1d9700  0 log_channel(cluster) log [WRN] : 1 slow requests (by type [ 'delayed' : 1 ] most affected pool [ 'ceph-object-store-0.rgw.buckets.index' : 1 ])
debug 2023-05-11T04:22:02.846+0000 7f655c1d9700 -1 osd.0 94781 get_health_metrics reporting 1 slow ops, oldest is osd_op(client.116406765.0:115378 5.15 5:a91b7908:::.dir.67ade7e7-b37b-486e-a969-5d39c507d06d.64431775.1.194:head [call rgw.guard_bucket_resharding in=36b,call rgw.bucket_prepare_op in=349b] snapc 0=[] ondisk+write+known_if_redirected+supports_pool_eio e94781)
debug 2023-05-11T04:22:02.846+0000 7f655c1d9700 10 osd.0 94781 promote_throttle_recalibrate  actual 0, actual/prob ratio 1, adjusted new_prob 1000, prob 1000 -> 1000
debug 2023-05-11T04:22:02.846+0000 7f655c1d9700 10 osd.0 94781 promote_throttle_recalibrate 0 attempts, promoted 0 objects and 0 B; target 25 obj/sec or 5 MiB/sec
debug 2023-05-11T04:22:02.846+0000 7f655c1d9700 10 osd.0 94781 tick_without_osd_lock
debug 2023-05-11T04:22:02.726+0000 7f65579d0700  4 rocksdb: EVENT_LOG_v1 {"time_micros": 1683778922726846, "cf_name": "p-0", "job": 419, "event": "table_file_creation", "file_number": 167071, "file_size": 69614045, "file_checksum": "", "file_checksum_func_name": "Unknown", "table_properties": {"data_size": 67112593, "index_size": 1016223, "index_partitions": 0, "top_level_index_size": 0, "index_key_is_user_key": 1, "index_value_is_delta_encoded": 1, "filter_size": 1484229, "raw_key_size": 58857540, "raw_average_key_size": 99, "raw_value_size": 31256491, "raw_average_value_size": 52, "num_data_blocks": 17041, "num_entries": 593654, "num_deletions": 469256, "num_merge_operands": 0, "num_range_deletions": 0, "format_version": 0, "fixed_key_len": 0, "filter_policy": "rocksdb.BuiltinBloomFilter", "column_family_name": "p-0", "column_family_id": 4, "comparator": "leveldb.BytewiseComparator", "merge_operator": "nullptr", "prefix_extractor_name": "nullptr", "property_collectors": "[]", "compression": "NoCompression", "compression_options": "window_bits=-14; level=32767; strategy=0; max_dict_bytes=0; zstd_max_train_bytes=0; enabled=0; ", "creation_time": 1681352923, "oldest_key_time": 0, "file_creation_time": 1683778919, "db_id": "feb9ef26-123a-44a9-bef7-17fc4d0f6538", "db_session_id": "I4V7FERQ3A7SUFU4JCOW"}}
debug 2023-05-11T04:22:02.726+0000 7f65579d0700  4 rocksdb: [db/compaction/compaction_job.cc:1516] [p-0] [JOB 419] Generated table #167071: 593654 keys, 69614045 bytes
debug 2023-05-11T04:22:02.685+0000 7f65609e2700 10 osd.0 94781 do_waiters -- finish
debug 2023-05-11T04:22:02.685+0000 7f65609e2700 10 osd.0 94781 do_waiters -- start
debug 2023-05-11T04:22:02.685+0000 7f65609e2700 10 osd.0 94781 tick
debug 2023-05-11T04:22:02.159+0000 7f6542196700 10 log is not dirty
debug 2023-05-11T04:22:02.159+0000 7f6542196700 10 osd.0 pg_epoch: 94781 pg[5.55( v 94778'11153516 (94710'11150497,94778'11153516] local-lis/les=94770/94771 n=18 ec=24663/74 lis/c=94770/94770 les/c/f=94771/94771/0 sis=94770) [18,0,10] r=1 lpr=94770 luod=0'0 crt=94778'11153514 mlcod 94778'11153503 active mbc={}] proc_lease pg_lease(ru 1296569.875000000s ub 1296577.875000000s int 16.000000000s)
debug 2023-05-11T04:22:02.159+0000 7f6542196700 10 osd.0 pg_epoch: 94781 pg[5.55( v 94778'11153516 (94710'11150497,94778'11153516] local-lis/les=94770/94771 n=18 ec=24663/74 lis/c=94770/94770 les/c/f=94771/94771/0 sis=94770) [18,0,10] r=1 lpr=94770 luod=0'0 crt=94778'11153514 mlcod 94778'11153503 active mbc={}] do_peering_event: epoch_sent: 94781 epoch_requested: 94781 MLease epoch 94781 from osd.18 pg_lease(ru 1296569.875000000s ub 1296577.875000000s int 16.000000000s)
debug 2023-05-11T04:22:02.159+0000 7f656445a700 15 osd.0 94781 enqueue_peering_evt 5.55 epoch_sent: 94781 epoch_requested: 94781 MLease epoch 94781 from osd.18 pg_lease(ru 1296569.875000000s ub 1296577.875000000s int 16.000000000s)
debug 2023-05-11T04:22:02.116+0000 7f654619e700 10 log is not dirty
debug 2023-05-11T04:22:02.116+0000 7f654619e700 10 osd.0 pg_epoch: 94781 pg[5.4d( v 94778'12874484 (94746'12871482,94778'12874484] local-lis/les=94770/94771 n=16 ec=24663/74 lis/c=94770/94770 les/c/f=94771/94771/0 sis=94770) [17,10,0] r=2 lpr=94770 luod=0'0 lua=94778'12874480 crt=94778'12874484 lcod 94778'12874483 mlcod 94778'12874482 active mbc={}] proc_lease pg_lease(ru 1297597.875000000s ub 1297606.000000000s int 16.000000000s)
debug 2023-05-11T04:22:02.116+0000 7f654619e700 10 osd.0 pg_epoch: 94781 pg[5.4d( v 94778'12874484 (94746'12871482,94778'12874484] local-lis/les=94770/94771 n=16 ec=24663/74 lis/c=94770/94770 les/c/f=94771/94771/0 sis=94770) [17,10,0] r=2 lpr=94770 luod=0'0 lua=94778'12874480 crt=94778'12874484 lcod 94778'12874483 mlcod 94778'12874482 active mbc={}] do_peering_event: epoch_sent: 94781 epoch_requested: 94781 MLease epoch 94781 from osd.17 pg_lease(ru 1297597.875000000s ub 1297606.000000000s int 16.000000000s)
debug 2023-05-11T04:22:02.116+0000 7f6564c5b700 15 osd.0 94781 enqueue_peering_evt 5.4d epoch_sent: 94781 epoch_requested: 94781 MLease epoch 94781 from osd.17 pg_lease(ru 1297597.875000000s ub 1297606.000000000s int 16.000000000s)
debug 2023-05-11T04:22:02.111+0000 7f654619e700 10 log is not dirty

- rgw's log

debug 2023-05-11T04:22:20.831+0000 7f4b76ec1700  2 req 17177950262429030953 0.000000000s initializing for trans_id = tx00000ee6456c041691a29-00645c6d7c-6efb411-ceph-object-store-0
debug 2023-05-11T04:22:20.831+0000 7f4b76ec1700 10 req 17177950262429030953 0.000000000s rgw api priority: s3=8 s3website=7
debug 2023-05-11T04:22:20.831+0000 7f4b76ec1700 10 req 17177950262429030953 0.000000000s host=10.64.19.41
debug 2023-05-11T04:22:20.831+0000 7f4b76ec1700 20 req 17177950262429030953 0.000000000s subdomain= domain= in_hosted_domain=0 in_hosted_domain_s3website=0
debug 2023-05-11T04:22:20.831+0000 7f4b76ec1700 20 req 17177950262429030953 0.000000000s final domain/bucket subdomain= domain= in_hosted_domain=0 in_hosted_domain_s3website=0 s->info.domain= s->info.request_uri=/swift/healthcheck
debug 2023-05-11T04:22:20.831+0000 7f4b76ec1700 10 req 17177950262429030953 0.000000000s handler=28RGWHandler_SWIFT_HealthCheck
debug 2023-05-11T04:22:20.831+0000 7f4b76ec1700  2 req 17177950262429030953 0.000000000s getting op 0
debug 2023-05-11T04:22:20.831+0000 7f4b76ec1700 20 req 17177950262429030953 0.000000000s get_system_obj_state: rctx=0x7f4cf04f0770 obj=ceph-object-store-0.rgw.log:script.prerequest. state=0x7f48d4002670 s->prefetch_data=0
#### Took long time here ####
debug 2023-05-11T04:25:18.303+0000 7f4b035da700 10 req 17177950262429030953 177.471832275s swift:get_health_check scheduling with throttler client=3 cost=1
debug 2023-05-11T04:25:18.303+0000 7f4b035da700 10 req 17177950262429030953 177.471832275s swift:get_health_check op=32RGWGetHealthCheck_ObjStore_SWIFT
debug 2023-05-11T04:25:18.303+0000 7f4b035da700  2 req 17177950262429030953 177.471832275s swift:get_health_check verifying requester
debug 2023-05-11T04:25:18.303+0000 7f4b035da700  2 req 17177950262429030953 177.471832275s swift:get_health_check normalizing buckets and tenants
debug 2023-05-11T04:25:18.303+0000 7f4b035da700  2 req 17177950262429030953 177.471832275s swift:get_health_check init permissions
debug 2023-05-11T04:25:18.303+0000 7f4b035da700  2 req 17177950262429030953 177.471832275s swift:get_health_check recalculating target
debug 2023-05-11T04:25:18.303+0000 7f4b035da700  2 req 17177950262429030953 177.471832275s swift:get_health_check reading permissions
debug 2023-05-11T04:25:18.303+0000 7f4b035da700  2 req 17177950262429030953 177.471832275s swift:get_health_check init op
debug 2023-05-11T04:25:18.303+0000 7f4b035da700  2 req 17177950262429030953 177.471832275s swift:get_health_check verifying op mask
debug 2023-05-11T04:25:18.303+0000 7f4b035da700 20 req 17177950262429030953 177.471832275s swift:get_health_check required_mask= 1 user.op_mask=7
debug 2023-05-11T04:25:18.303+0000 7f4b035da700  2 req 17177950262429030953 177.471832275s swift:get_health_check verifying op permissions
debug 2023-05-11T04:25:18.303+0000 7f4b035da700  2 req 17177950262429030953 177.471832275s swift:get_health_check verifying op params
debug 2023-05-11T04:25:18.303+0000 7f4b035da700  2 req 17177950262429030953 177.471832275s swift:get_health_check pre-executing
debug 2023-05-11T04:25:18.303+0000 7f4b035da700  2 req 17177950262429030953 177.471832275s swift:get_health_check check rate limiting
debug 2023-05-11T04:25:18.303+0000 7f4b035da700  2 req 17177950262429030953 177.471832275s swift:get_health_check executing
debug 2023-05-11T04:25:18.303+0000 7f4b035da700  2 req 17177950262429030953 177.471832275s swift:get_health_check completing
debug 2023-05-11T04:25:18.303+0000 7f4b035da700 20 req 17177950262429030953 177.471832275s get_system_obj_state: rctx=0x7f4cf04f0770 obj=ceph-object-store-0.rgw.log:script.postrequest. state=0x7f48d4002670 s->prefetch_data=0
#### Took long time here ####
debug 2023-05-11T04:25:44.271+0000 7f4b59686700  2 req 17177950262429030953 203.440002441s swift:get_health_check op status=0
debug 2023-05-11T04:25:44.271+0000 7f4b59686700  2 req 17177950262429030953 203.440002441s swift:get_health_check http status=200

- ops logs: we took `ceph daemon osd.X 'dump_historic_ops dump_ops_in_flight ops'` once per 30 secs during slow healchcheck.

        {
            "description": "osd_op(client.116406765.0:93517 5.16 5:69274e6f:::.dir.67ade7e7-b37b-486e-a969-5d39c507d06d.64431775.1.186:head [call rgw.guard_bucket_resharding in=36b,call rgw.bucket_prepare_op in=349b] snapc 0=[] ondisk+write+known_if_redirected+supports_pool_eio e94776)",
            "initiated_at": "2023-05-11T04:19:23.626475+0000",
            "age": 200.544154045,
            "duration": 16.333650203000001,
            "type_data": {
                "flag_point": "commit sent; apply or cleanup",
                "client_info": {
                    "client": "client.116406765",
                    "client_addr": "10.64.14.90:0/1590675739",
                    "tid": 93517
                },
                "events": [
                    {
                        "event": "initiated",
                        "time": "2023-05-11T04:19:23.626475+0000",
                        "duration": 0
                    },
                    {
                        "event": "throttled",
                        "time": "2023-05-11T04:19:23.626475+0000",
                        "duration": 0
                    },
                    {
                        "event": "header_read",
                        "time": "2023-05-11T04:19:23.626475+0000",
                        "duration": 4294967295.9999995
                    },
                    {
                        "event": "all_read",
                        "time": "2023-05-11T04:19:23.626479+0000",
                        "duration": 3.5779999999999999e-06
                    },
                    {
                        "event": "dispatched",
                        "time": "2023-05-11T04:19:23.626482+0000",
                        "duration": 3.4319999999999999e-06
                    },
                    {
                        "event": "queued_for_pg",
                        "time": "2023-05-11T04:19:23.626487+0000",
                        "duration": 5.361e-06
                    },
                    {
                        "event": "reached_pg",
                        "time": "2023-05-11T04:19:23.626547+0000",
                        "duration": 5.9861000000000002e-05
                    },
                    {
                        "event": "waiting for degraded object",
                        "time": "2023-05-11T04:19:23.626603+0000",
                        "duration": 5.5455999999999999e-05
                    },
                    {
                        "event": "reached_pg",
                        "time": "2023-05-11T04:19:38.942878+0000",
                        "duration": 15.316275408999999
                    },
                    {
                        "event": "waiting for peered",
                        "time": "2023-05-11T04:19:38.942885+0000",
                        "duration": 6.404e-06
                    },
                    {
                        "event": "reached_pg",
                        "time": "2023-05-11T04:19:39.949108+0000",
                        "duration": 1.00622308
                    },
                    {
                        "event": "started",
                        "time": "2023-05-11T04:19:39.949196+0000",
                        "duration": 8.8764000000000006e-05
                    },
                    {
                        "event": "waiting for subops from 18",
                        "time": "2023-05-11T04:19:39.951455+0000",
                        "duration": 0.0022586799999999999
                    },
                    {
                        "event": "op_commit",
                        "time": "2023-05-11T04:19:39.952883+0000",
                        "duration": 0.0014283939999999999
                    },
                    {
                        "event": "sub_op_commit_rec",
                        "time": "2023-05-11T04:19:39.960050+0000",
                        "duration": 0.0071662039999999998
                    },
                    {
                        "event": "commit_sent",
                        "time": "2023-05-11T04:19:39.960093+0000",
                        "duration": 4.3247000000000001e-05
                    },
                    {
                        "event": "done",
                        "time": "2023-05-11T04:19:39.960126+0000",
                        "duration": 3.2756999999999998e-05
                    }
                ]
            }
        },

In the above log, log levels are 15 for index osd daemons and 20 for rgw daemons. We couldn't set osd daemons' log levels to 20 since these daemons crashes after restarting an index osds due to the limitation of our system.

We have captured many other logs as follows.

- All index OSD's full log.
- All RGW's full log.
- Complete result of `ceph daemon osd.X 'dump_historic_ops dump_ops_in_flight ops'.

However, we don't attach them in this message since they are big. Please let me know if necessary.

Actions

Copy link

Updated by Satoru Takeuchi 11 months ago

I succeeded to create a reproducer. I found that increasing the number of index OSD reduces the latency. So I'll increase the number of index OSDs in the environment where I hit the original problem.

In addition, I got perf record during reproducing this problem. However, I couldn't find the root cause of this problem.

Could you tell me if you know the root cause from my logs and perf records(see below). In addition, I'm glad if there is a recommendation about the number of index OSDs.

Here is the detail.

Additional information¶

bucket index resharding (before adding index OSDs) didn't resolve this problem.
- the number of shards is increased from 307 to 509
Increasing the number of OSDs from 6 to 12 reduces the latency of all requests.
- The number of PGs are also increased from 128 to 512 by pg-autoscaler. So whether the latency improvement came from the number of OSDs, number of PGs or both.
Increasing the number of OSDs from 12 to 18, 18 to 24 didn't take effects.
I captured the log of the restarted index OSD1 and the perf record2 during reproducing this problem(attached in this message). However, I didn't find the root cause.

reproducer¶

1. Create a Ceph cluster for RGW. The number of index OSD (SSD) is 6 and the number of data OSD (HDD) is 33
2. Create a test bucket
3. PUT about 47M objects in the test bucket as follows.

create-objects.sh:

warp get --host=${BUCKET_HOST}:${BUCKET_PORT} --bucket=${BUCKET_NAME} \
                --benchdata /tmp/warp-benchdata --analyze.v \
                --access-key=${AWS_ACCESS_KEY_ID} --secret-key=${AWS_SECRET_ACCESS_KEY} \
                --concurrent=50 --objects=46708840 --obj.size=76800 --noprefix \
                --noclear --duration 10s

4. Increase the number of shards of test bucket to 509
5. Run the following three scripts background.

list.sh:

ENDPOINT=<rgw's endpoint>
BUCKET=<bucket name>
while :
  do
    date
    aws s3 ls --endpoint-url ${ENDPOINT} s3://${BUCKET}/111111 > /dev/null
    sleep 15
  done

get.sh:

ENDPOINT=<rgw's endpoint>
BUCKET=<bucket name>
OBJS=$(aws s3 ls --endpoint-url ${ENDPOINT} s3://${BUCKET}/111111 | awk '{ print $4 }')

mkdir -p ./work
download() {
aws s3 cp --endpoint-url ${ENDPOINT} s3://${BUCKET}/$1 ./work/$1
rm ./work/$1
}
while :
 do
  for OBJ in ${OBJS}; do
    download ${OBJ} &amp;
    sleep 0.05
  done
 done

put.sh:

ENDPOINT=<rgw's endpoint>
BUCKET=<bucket name>
mkdir -p ./work
OBJ=test

upload() {
    aws s3 cp --endpoint-url ${ENDPOINT} ./work/$1 s3://${BUCKET}/$1-$((RANDOM % 10000000 + 10000000))
}

dd if=/dev/urandom of=work/${OBJ} count=1 bs=256k

while :
do
  upload ${OBJ} &amp;
  sleep 0.33
done

6. Restart an index OSD pod.

The reproduction probability seems to be different for each OSD. In our environment, it's almost 100% by restarting an index OSD. However, it's not in other OSDs.

Actions

Copy link

Updated by Satoru Takeuchi 11 months ago

I found that increasing the number of index OSD reduces the latency. So I'll increase the number of index OSDs in the environment where I hit the original problem.

I succeeded to reduce the latency by increasing the number of OSDs in my production environment.

Please close this ticket.

Actions

Copy link

Updated by Konstantin Shalygin 8 months ago

Tracker changed from Support to Bug
Status changed from New to Can't reproduce
Regression set to No
Severity set to 3 - minor

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » rgw

Custom queries

Bug #59104

All access to rgw took over 100 seconds for a long time

Updated by Yuma Ogami about 1 year ago

Updated by Satoru Takeuchi 12 months ago

problem¶

Our considerations¶

how to reproduce¶

cluster configuration¶

software environment¶

reproduce operation¶

log samples¶

Updated by Satoru Takeuchi 11 months ago

Additional information¶

reproducer¶

Updated by Satoru Takeuchi 11 months ago

Updated by Konstantin Shalygin 8 months ago