Bug #57033
rgw/mulitisie: Secondary data sync speed is very slow compare to master write speed
Status:
New
Priority:
Normal
Assignee:
-
Target version:
-
% Done:
0%
Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
I'm using rgw multisite, user put object to master site, the master site speed as below:
data: pools: 7 pools, 6592 pgs objects: 1.02M objects, 2.5 TiB usage: 3.4 TiB used, 3.4 PiB / 3.4 PiB avail pgs: 6592 active+clean io: client: 157 MiB/s rd, 472 MiB/s wr, 1.02k op/s rd, 1.54k op/s wr
write speed is 472 MiB/s,
read speed is 157 MiB/s, read speed is secondary site syncing
The secondary site speed as below:
data: pools: 6 pools, 6560 pgs objects: 235.10k objects, 824 GiB usage: 1.3 TiB used, 3.4 PiB / 3.4 PiB avail pgs: 6560 active+clean io: client: 53 KiB/s rd, 153 MiB/s wr, 57 op/s rd, 109 op/s wr
What good ways can we improve the data sync speed?
History
#1 Updated by yite gu over 1 year ago
ceph version: 14.2.22
#2 Updated by yite gu over 1 year ago
"poll_latency": { "avgcount": 11, "sum": 12630.703249944, "avgtime": 1148.245749994 },
#3 Updated by yite gu over 1 year ago
http_manager thread of radosgw CPU utilize is very high:
$ top -H -p 3886649 top - 21:30:37 up 7 days, 3:23, 1 user, load average: 1.17, 1.37, 1.51 Threads: 602 total, 1 running, 601 sleeping, 0 stopped, 0 zombie %Cpu(s): 5.9 us, 0.8 sy, 0.0 ni, 92.9 id, 0.0 wa, 0.0 hi, 0.3 si, 0.0 st KiB Mem : 13153520+total, 1203900 free, 10176180+used, 28569496 buff/cache KiB Swap: 0 total, 0 free, 0 used. 28044388 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 3886856 work 20 0 6192788 910512 20832 R 99.9 0.7 5722:15 http_manager
http_manager thread stack:
Thread 594 (Thread 0x7f1cdc0d6700 (LWP 3650760)): #0 0x00007f1cf1e244ea in multi_addtimeout () from /lib64/libcurl.so.4 #1 0x00007f1cf1e25800 in Curl_expire () from /lib64/libcurl.so.4 #2 0x00007f1cf1e172a3 in Curl_speedcheck () from /lib64/libcurl.so.4 #3 0x00007f1cf1e1c619 in Curl_readwrite () from /lib64/libcurl.so.4 #4 0x00007f1cf1e266c7 in multi_runsingle () from /lib64/libcurl.so.4 #5 0x00007f1cf1e27031 in curl_multi_perform () from /lib64/libcurl.so.4 #6 0x000055ef6f6c7441 in RGWHTTPManager::reqs_thread_entry() () #7 0x000055ef6f6c7edd in RGWHTTPManager::ReqsThread::entry() () #8 0x00007f1ce586fdc5 in start_thread () from /lib64/libpthread.so.0 #9 0x00007f1ce4d7e73d in clone () from /lib64/libc.so.6
Have a patch of libcurl to fix this problem: https://github.com/curl/curl/commit/cacdc27f52ba7b0bf08aa57886bfbd18bc82ebfb
So, I upgrade libcurl version 7.29.0 to 7.84.0. CPU utilize of http_manager decrease to 6.2:
top - 12:06:06 up 8 days, 17:58, 1 user, load average: 0.41, 0.70, 1.01 Threads: 602 total, 2 running, 600 sleeping, 0 stopped, 0 zombie %Cpu(s): 4.4 us, 0.6 sy, 0.0 ni, 94.3 id, 0.4 wa, 0.0 hi, 0.3 si, 0.0 st KiB Mem : 13153520+total, 6629456 free, 10238659+used, 22519160 buff/cache KiB Swap: 0 total, 0 free, 0 used. 27425424 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 790383 work 20 0 6302944 824564 27588 R 6.2 0.6 12:25.29 http_manager
The poll_latency decrease very much:
"poll_latency": { "avgcount": 872, "sum": 8.924769164, "avgtime": 0.010234827 },