Project

General

Profile

Bug #23853

Inefficent implementation - very long query time for "rbd ls -l" queries

Added by Marc Schöchlin almost 6 years ago. Updated over 5 years ago.

Status:
Resolved
Priority:
Low
Assignee:
-
Target version:
-
% Done:

0%

Source:
Community (dev)
Tags:
snapshots
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

We are trying to integrate a storage repository in xenserver.

Summary:

The slowness is a real pain for us, because this prevents the xen storage repository to work efficently.
Gathering information for XEN Pools with hundreds of virtual machines (using "--format json") would be a real pain...
The high user time consumption and the really huge amount of threads suggests that there is something really inefficient in the "rbd" utility.

So what can i do to make "rbd ls -l" faster or to get comparable information regarding snapshot hierarchy information?

We have a strange problem on listing the images of a SSD pool.

# time rbd ls -l --pool RBD_XenStorage-07449252-bf96-4daa-b0a6-687b7f1c369c
NAME                                            SIZE PARENT                                                                                            FMT PROT LOCK
RBD-0192938e-cb4b-4ee1-9988-b8145704ac81      20480M RBD_XenStorage-07449252-bf96-4daa-b0a6-687b7f1c369c/RBD-8b2cfe76-44b7-4393-b376-f675366831c3@BASE   2          
RBD-0192938e-cb4b-4ee1-9988-b8145704ac81@BASE 20480M RBD_XenStorage-07449252-bf96-4daa-b0a6-687b7f1c369c/RBD-8b2cfe76-44b7-4393-b376-f675366831c3@BASE   2 yes      
..
__srlock__                                         0                                                                                                     2          

real    0m8.726s
user    0m8.506s
sys    0m0.591s

===> This incredibly slow for outputting 105 lines.

Without "-l" ist pretty fast (unfortunately i need this information):

# time rbd ls  --pool RBD_XenStorage-07449252-bf96-4daa-b0a6-687b7f1c369c
RBD-0192938e-cb4b-4ee1-9988-b8145704ac81
...
__srlock__

real    0m0.024s
user    0m0.015s
sys    0m0.008s

===> This incredibly fast for outputting 71 lines.

The "@BASE" snapshots are created by the following procedure:

rbd snap create
rbd snap protect
rbd clone

It seems that lookups to rbd-pools are performed by using a object named "rbd_directory" which resides on the pool...

Querying this object with 142 entries need 0.024seconds.

# time rados -p RBD_XenStorage-07449252-bf96-4daa-b0a6-687b7f1c369c listomapvals rbd_directory

id_12da262ae8944a
value (14 bytes) :
00000000  0a 00 00 00 5f 5f 73 72  6c 6f 63 6b 5f 5f        |....__srlock__|
0000000e

id_12f2943d1b58ba
value (44 bytes) :
00000000  28 00 00 00 52 42 44 2d  31 62 34 32 31 38 39 31  |(...RBD-1b421891|
00000010  2d 34 34 31 63 2d 34 35  33 30 2d 62 64 66 33 2d  |-441c-4530-bdf3-|
00000020  61 64 32 62 31 31 34 61  36 33 66 63              |ad2b114a63fc|
0000002c
...

real    0m0.024s
user    0m0.023s
sys    0m0.000s

I also analyzed the state of the OSD holding this object:

  1. ceph osd map RBD_XenStorage-07449252-bf96-4daa-b0a6-687b7f1c369c rbd_directory
    osdmap e7400 pool 'RBD_XenStorage-07449252-bf96-4daa-b0a6-687b7f1c369c' (14) object 'rbd_directory' -> pg 14.30a98c1c (14.1c) -> up ([36,0,38], p36) acting ([36,0,38], p36)

Repeating queries resulted in 3.6% cpu usage - the logs do not provide any useful information.

Analyzing this command by strace, suggests me that there is something wrong with the "rbd" command implementation.

# strace -f -c rbd ls -l --pool RBD_XenStorage-07449252-bf96-4daa-b0a6-687b7f1c369c > /dev/null
strace: Process 50286 attached
strace: Process 50287 attached
strace: Process 50288 attached
strace: Process 50289 attached
strace: Process 50290 attached
strace: Process 50291 attached
strace: Process 50292 attached
....
!!! 2086 threads !!!
....
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 98.42  219.207328        2020    108534     20241 futex
  1.47    3.265517         162     20099           epoll_wait
  0.06    0.131473           3     46377     22894 read
  0.02    0.053740           4     14627           sendmsg
  0.01    0.017225           2     10020           write
  0.00    0.008975           4      2001           munmap
  0.00    0.007260           3      2170           epoll_ctl
  0.00    0.007171           3      2088           madvise
  0.00    0.003139           1      4166           rt_sigprocmask
  0.00    0.002670           1      2086           prctl
  0.00    0.002494           1      3381           mprotect
  0.00    0.002315           1      2087           mmap
  0.00    0.002120           1      2087           set_robust_list
  0.00    0.002098           1      2084           gettid
  0.00    0.001833           1      2086           clone
  0.00    0.001152           8       136        87 connect
  0.00    0.000739           7       102           close
  0.00    0.000623          13        49           shutdown
  0.00    0.000622           6       110           fcntl
  0.00    0.000469          10        49           socket
  0.00    0.000466           6        73        29 open
  0.00    0.000456           9        49           setsockopt
  0.00    0.000166           6        27        27 access
  0.00    0.000164           4        37           fstat
  0.00    0.000070           4        17           brk
  0.00    0.000047           5        10         5 stat
  0.00    0.000013           4         3           pipe
  0.00    0.000012           4         3           epoll_create
  0.00    0.000010           1        11           rt_sigaction
  0.00    0.000007           7         1           lseek
  0.00    0.000007           7         1           execve
  0.00    0.000006           2         3           uname
  0.00    0.000005           3         2           sysinfo
  0.00    0.000005           5         1           arch_prctl
  0.00    0.000005           5         1           membarrier
  0.00    0.000003           3         1         1 ioctl
  0.00    0.000003           3         1           getrlimit
  0.00    0.000003           3         1           set_tid_address
  0.00    0.000000           0         1           getuid
------ ----------- ----------- --------- --------- ----------------
100.00  222.720411                224582     43284 total

Our System:

  • Luminous/12.2.5
  • Ubuntu 16.04
  • 5 OSD Nodes (24*8 TB HDD OSDs, 48*1TB SSD OSDS, Bluestore, 6Gb Cache Size per OSD, 192GB RAM, 56 HT CPUs)
  • 3 Mons (64 GB RAM, 200GB SSD, 4 visible CPUs)
  • 2 * 10 GBIT, SFP+, bonded xmit_hash_policy layer3+4
  • the cluster is currently in preproduction state, there is no significant load

out.txt View - JSON Output (31.7 KB) Marc Schöchlin, 04/25/2018 02:01 PM

perf.data.txt.gz - perf report -n --stdio > perf.data.txt (39.8 KB) Marc Schöchlin, 04/26/2018 10:35 AM

perf.data.gz - perf record -F 99 -g rbd ls -l -p RBD_XenStorage-07449252-bf96-4daa-b0a6-687b7f1c369c --id xen_test (294 KB) Marc Schöchlin, 04/26/2018 10:35 AM

out.png View (121 KB) Jason Dillaman, 04/26/2018 04:21 PM

perf.data (84.3 KB) Marc Schöchlin, 06/01/2018 03:21 PM

perf.data.txt View (108 KB) Marc Schöchlin, 06/01/2018 03:21 PM

out.script.gz (16.3 KB) Marc Schöchlin, 06/20/2018 07:37 AM

perf.data.gz (174 KB) Marc Schöchlin, 06/20/2018 07:37 AM

out.svg View (271 KB) Jason Dillaman, 06/20/2018 03:25 PM

History

#1 Updated by Marc Schöchlin almost 6 years ago

If i invoke this without parallel thread execution i get the following result:

# time rbd ls -l --pool RBD_XenStorage-07449252-bf96-4daa-b0a6-687b7f1c369c --rbd_concurrent_management_ops=1
NAME                                            SIZE PARENT    

RBD-feb32ab0-a5ee-44e6-9089-486e91ee8af3      20480M RBD_XenStorage-07449252-bf96-4daa-b0a6-687b7f1c369c/RBD-bbbc2ce0-4ad3-44ae-a52f-e57df0441e27@BASE   2           
__srlock__                                         0                                                                                                     2           

real    0m18.562s
user    0m12.513s
sys    0m0.793s

I attached the complete json output.

#2 Updated by Marc Schöchlin almost 6 years ago

According to this, latency between client and osd should not be the problem:
(according to the high amount of user time in the measure above, network communication should not be the problem)

Finding the involved osd:


# ceph osd map RBD_XenStorage-07449252-bf96-4daa-b0a6-687b7f1c369c rbd_directory
osdmap e7570 pool 'RBD_XenStorage-07449252-bf96-4daa-b0a6-687b7f1c369c' (14) object 'rbd_directory' -> pg 14.30a98c1c (14.1c) -> up ([36,0,38], p36) acting ([36,0,38], p36)

# ceph osd find osd.36
{
    "osd": 36,
    "ip": "10.23.27.149:6826/7195",
    "crush_location": {
        "host": "ceph-ssd-s39",
        "root": "default" 
    }
}

Checking latencies:
<pre><code class="text">
ssh ceph-ssd-s39

# nuttcp -w1m ceph-mon-s43
11186.3391 MB /  10.00 sec = 9381.8890 Mbps 12 %TX 32 %RX 0 retrans 0.15 msRTT

# time rbd ls -l -p RBD_XenStorage-07449252-bf96-4daa-b0a6-687b7f1c369c --rbd_concurrent_management_ops=1 --id xen_test
NAME                                            SIZE PARENT                                                                                            FMT PROT LOCK
RBD-0192938e-cb4b-4ee1-9988-b8145704ac81      20480M RBD_XenStorage-07449252-bf96-4daa-b0a6-687b7f1c369c/RBD-8b2cfe76-44b7-4393-b376-f675366831c3@BASE   2          
RBD-0192938e-cb4b-4ee1-9988-b8145704ac81@BASE 20480M RBD_XenStorage-07449252-bf96-4daa-b0a6-687b7f1c369c/RBD-8b2cfe76-44b7-4393-b376-f675366831c3@BASE   2 yes      
...
RBD-feb32ab0-a5ee-44e6-9089-486e91ee8af3      20480M RBD_XenStorage-07449252-bf96-4daa-b0a6-687b7f1c369c/RBD-bbbc2ce0-4ad3-44ae-a52f-e57df0441e27@BASE   2          
__srlock__                                         0                                                                                                     2          

real    0m23.667s
user    0m15.949s
sys    0m1.276s

# time rbd ls -l -p RBD_XenStorage-07449252-bf96-4daa-b0a6-687b7f1c369c --rbd_concurrent_management_ops=1 --id xen_test
NAME                                            SIZE PARENT                                                                                            FMT PROT LOCK
RBD-0192938e-cb4b-4ee1-9988-b8145704ac81      20480M RBD_XenStorage-07449252-bf96-4daa-b0a6-687b7f1c369c/RBD-8b2cfe76-44b7-4393-b376-f675366831c3@BASE   2         
RBD-0192938e-cb4b-4ee1-9988-b8145704ac81@BASE 20480M RBD_XenStorage-07449252-bf96-4daa-b0a6-687b7f1c369c/RBD-8b2cfe76-44b7-4393-b376-f675366831c3@BASE   2 yes     
...
RBD-feb32ab0-a5ee-44e6-9089-486e91ee8af3      20480M RBD_XenStorage-07449252-bf96-4daa-b0a6-687b7f1c369c/RBD-bbbc2ce0-4ad3-44ae-a52f-e57df0441e27@BASE   2

....
__srlock__                                         0                                                                                                     2          

real    0m13.937s
user    0m14.404s
sys    0m1.089s
</code></pre>

#3 Updated by Jason Dillaman almost 6 years ago

  • Priority changed from Normal to Low

#4 Updated by Marc Schöchlin almost 6 years ago

#5 Updated by Marc Schöchlin almost 6 years ago

Added perf data, created by :

apt-get install linux-tools-4.13.0-39-generic linux-cloud-tools-4.13.0-39-generic linux-tools-generic linux-cloud-tools-generic
perf record -F 99 -g rbd ls -l -p RBD_XenStorage-07449252-bf96-4daa-b0a6-687b7f1c369c --id xen_test
perf report -n --stdio > perf.data.txt

#6 Updated by Jason Dillaman almost 6 years ago

  • Status changed from New to Need More Info

@Marc: can you re-run this using the following the steps (making sure you have the debuginfo packages installed as well).

perf record -F 99 -g --call-graph dwarf rbd ls -l -p RBD_XenStorage-07449252-bf96-4daa-b0a6-687b7f1c369c --id xen_test
perf script > out.script

Please attach the perf.data and out.script results.

#7 Updated by Marc Schöchlin almost 6 years ago

Sorry, the results exceed the maximum file upload limit:

I uploaded the file to my personal server:

https://www.256bit.org/nextcloud/index.php/s/npi7ppNpojbAj9k (will be removed after 25-07-2018)

Password: oChoo4Ey%

#8 Updated by Jason Dillaman almost 6 years ago

@Marc: can you double-check that you have the debuginfo packages installed? The perf graph (attached) shows the vast majority of the time is invoking the scheduler. Perhaps repeat the test but also append "--rbd-cache=false" to the command-line.

#9 Updated by Jason Dillaman almost 6 years ago

@Marc: any update?

#10 Updated by Marc Schöchlin almost 6 years ago

Sorry, i missed that mail.
I am currently on vacation until 9.6.2018.

Nevertheless i created a additional trace for you:

apt-get install linux-tools-4.13.0-39-generic linux-cloud-tools-4.13.0-39-generic linux-tools-generic linux-cloud-tools-generic ceph-common-dbg librbd1-dbg rbd-nbd-dbg librados2-dbg
perf record -F 99 -g rbd ls -l --pool RBD_XenStorage-TEST-SSD-07449252-bf96-4daa-b0a6-687b7f1c369c --rbd_concurrent_management_ops=1 --rbd-cache=false
perf report -n --stdio > perf.data.txt

#11 Updated by Jason Dillaman almost 6 years ago

@Marc: was it much faster w/ the RBD cache disabled? If so, this issue has been addressed in Mimic since we automatically disable the cache now on "rbd ls" (so you shouldn't see thousands of thread creations).

Also, make sure you run this to collect the perf:

perf record -F 99 -g --call-graph dwarf rbd ls -l -p RBD_XenStorage-07449252-bf96-4daa-b0a6-687b7f1c369c --id xen_test
perf script > out.script

#12 Updated by Marc Schöchlin almost 6 years ago

No, enabling/disabling makes no difference. The execution times are identical.
I added the perf results.


# for i in `seq 1 10`; do time rbd ls -l -p RBD_XenStorage-TEST-SSD-07449252-bf96-4daa-b0a6-687b7f1c369c --rbd-cache=false >/dev/null; done 

real    0m3.827s
user    0m3.697s
sys    0m0.143s

real    0m3.844s
user    0m3.649s
sys    0m0.238s

real    0m3.824s
user    0m3.677s
sys    0m0.220s

real    0m3.910s
user    0m3.764s
sys    0m0.190s

real    0m3.818s
user    0m3.604s
sys    0m0.236s

real    0m3.800s
user    0m3.639s
sys    0m0.196s

real    0m3.802s
user    0m3.672s
sys    0m0.147s

real    0m3.822s
user    0m3.660s
sys    0m0.185s

real    0m3.875s
user    0m3.660s
sys    0m0.234s

real    0m3.856s
user    0m3.704s
sys    0m0.214s

# for i in `seq 1 10`; do time rbd ls -l -p RBD_XenStorage-TEST-SSD-07449252-bf96-4daa-b0a6-687b7f1c369c --rbd-cache=true >/dev/null; done 

real    0m3.878s
user    0m3.722s
sys    0m0.239s

real    0m3.908s
user    0m3.766s
sys    0m0.256s

real    0m3.908s
user    0m3.716s
sys    0m0.263s

real    0m3.871s
user    0m3.719s
sys    0m0.246s

real    0m3.886s
user    0m3.658s
sys    0m0.294s

real    0m3.877s
user    0m3.697s
sys    0m0.266s

real    0m3.926s
user    0m3.720s
sys    0m0.303s

real    0m3.972s
user    0m3.691s
sys    0m0.272s

real    0m3.834s
user    0m3.703s
sys    0m0.239s

real    0m3.851s
user    0m3.727s
sys    0m0.229s

#13 Updated by Jason Dillaman almost 6 years ago

I don't see any long CPU touch-time hotspots in your perf result. I can only imagine you are seeing latency artifacts since each image open requires numerous back-and-forth ops to the OSDs.

#14 Updated by Jason Dillaman over 5 years ago

  • Status changed from Need More Info to Resolved

Also available in: Atom PDF