Project

General

Profile

Bug #57033

rgw/mulitisie: Secondary data sync speed is very slow compare to master write speed

Added by yite gu over 1 year ago. Updated over 1 year ago.

Status:
New
Priority:
Normal
Assignee:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

I'm using rgw multisite, user put object to master site, the master site speed as below:

  data:
    pools:   7 pools, 6592 pgs
    objects: 1.02M objects, 2.5 TiB
    usage:   3.4 TiB used, 3.4 PiB / 3.4 PiB avail
    pgs:     6592 active+clean

  io:
    client:   157 MiB/s rd, 472 MiB/s wr, 1.02k op/s rd, 1.54k op/s wr

write speed is 472 MiB/s,
read speed is 157 MiB/s, read speed is secondary site syncing
The secondary site speed as below:
  data:
    pools:   6 pools, 6560 pgs
    objects: 235.10k objects, 824 GiB
    usage:   1.3 TiB used, 3.4 PiB / 3.4 PiB avail
    pgs:     6560 active+clean

  io:
    client:   53 KiB/s rd, 153 MiB/s wr, 57 op/s rd, 109 op/s wr

What good ways can we improve the data sync speed?

History

#1 Updated by yite gu over 1 year ago

ceph version: 14.2.22

#2 Updated by yite gu over 1 year ago

    "poll_latency": {
      "avgcount": 11,
      "sum": 12630.703249944,
      "avgtime": 1148.245749994
    },

#3 Updated by yite gu over 1 year ago

http_manager thread of radosgw CPU utilize is very high:

$ top -H -p 3886649

top - 21:30:37 up 7 days,  3:23,  1 user,  load average: 1.17, 1.37, 1.51
Threads: 602 total,   1 running, 601 sleeping,   0 stopped,   0 zombie
%Cpu(s):  5.9 us,  0.8 sy,  0.0 ni, 92.9 id,  0.0 wa,  0.0 hi,  0.3 si,  0.0 st
KiB Mem : 13153520+total,  1203900 free, 10176180+used, 28569496 buff/cache
KiB Swap:        0 total,        0 free,        0 used. 28044388 avail Mem 

    PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND                                                                                                 
3886856 work      20   0 6192788 910512  20832 R 99.9  0.7   5722:15 http_manager

http_manager thread stack:
Thread 594 (Thread 0x7f1cdc0d6700 (LWP 3650760)):
#0  0x00007f1cf1e244ea in multi_addtimeout () from /lib64/libcurl.so.4
#1  0x00007f1cf1e25800 in Curl_expire () from /lib64/libcurl.so.4
#2  0x00007f1cf1e172a3 in Curl_speedcheck () from /lib64/libcurl.so.4
#3  0x00007f1cf1e1c619 in Curl_readwrite () from /lib64/libcurl.so.4
#4  0x00007f1cf1e266c7 in multi_runsingle () from /lib64/libcurl.so.4
#5  0x00007f1cf1e27031 in curl_multi_perform () from /lib64/libcurl.so.4
#6  0x000055ef6f6c7441 in RGWHTTPManager::reqs_thread_entry() ()
#7  0x000055ef6f6c7edd in RGWHTTPManager::ReqsThread::entry() ()
#8  0x00007f1ce586fdc5 in start_thread () from /lib64/libpthread.so.0
#9  0x00007f1ce4d7e73d in clone () from /lib64/libc.so.6

Have a patch of libcurl to fix this problem: https://github.com/curl/curl/commit/cacdc27f52ba7b0bf08aa57886bfbd18bc82ebfb
So, I upgrade libcurl version 7.29.0 to 7.84.0. CPU utilize of http_manager decrease to 6.2:
top - 12:06:06 up 8 days, 17:58,  1 user,  load average: 0.41, 0.70, 1.01
Threads: 602 total,   2 running, 600 sleeping,   0 stopped,   0 zombie
%Cpu(s):  4.4 us,  0.6 sy,  0.0 ni, 94.3 id,  0.4 wa,  0.0 hi,  0.3 si,  0.0 st
KiB Mem : 13153520+total,  6629456 free, 10238659+used, 22519160 buff/cache
KiB Swap:        0 total,        0 free,        0 used. 27425424 avail Mem 

    PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND                                                                                                                                                             
 790383 work      20   0 6302944 824564  27588 R  6.2  0.6  12:25.29 http_manager 


The poll_latency decrease very much:
    "poll_latency": {
      "avgcount": 872,
      "sum": 8.924769164,
      "avgtime": 0.010234827
    },

Also available in: Atom PDF