Bug #644
closedrsync can be sloooow
0%
Description
This is probably due to metadata ops being fairly expensive, but we should inspect an rsync run over Ceph to make sure, and look for optimizations.
I rsynced a copy of my ceph-client folder and running inside UML it took 3:40 to run locally and 31:40 to run locally onto Ceph.
Updated by Greg Farnum over 13 years ago
Copy of my notes file:
FS->FS
gregf@kai:~/ceph/src$ time rsync -r /btrfs/gregf/ceph-client/ mnt/ceph-client
skipping non-regular file "arch/microblaze/boot/dts/system.dts"
real 1m22.007s
user 0m30.946s
sys 0m19.701s
FS->uclient --- gave up
gregf@kai:~/ceph/src$ sudo time rsync -r /btrfs/gregf/ceph-client/ mnt/ceph-client
skipping non-regular file "arch/microblaze/boot/dts/system.dts"
^Crsync error: received SIGINT, SIGTERM, or SIGHUP (code 20) at rsync.c(543) [sender=3.0.7]
rsync: writefd_unbuffered failed to write 97 bytes to socket [generator]: Broken pipe (32)
Command exited with non-zero status 20
13.69user 7.27system 47:04.56elapsed 0%CPU (0avgtext+0avgdata 29696maxresident)k
7526528inputs+0outputs (3major+2224minor)pagefaults 0swaps
FS (via UML networking)->UML local
uml:~# time rsync -r gregf@kai:/btrfs/gregf/ceph-client /host/home/gregf/ceph-client/
gregf@kai's password:
skipping non-regular file "ceph-client/arch/microblaze/boot/dts/system.dts"
real 3m39.621s
user 1m17.250s
sys 0m33.940s
FS (via UML networking)->UML ceph
uml:~# time rsync -r gregf@kai:/btrfs/gregf/ceph-client mnt/ceph-client/
gregf@kai's password:
[ 1306.190000] libceph: tid 9586 timed out on osd0, will reset osd
skipping non-regular file "ceph-client/arch/microblaze/boot/dts/system.dts"
[ 1473.000000] libceph: tid 12934 timed out on osd0, will reset osd
[ 1619.590000] libceph: tid 15582 timed out on osd0, will reset osd
real 31m39.795s
user 2m1.700s
sys 1m13.860s
Updated by Wido den Hollander over 13 years ago
- Dual-Core Intel E5310 1.6Ghz
- 6GB DDR2
- 4x WD Greenpower 2TB
In my first test, I simply made a btrfs filesystem on one of the Greenpower disks and ran rsync, this gave me:
Number of files: 179899 Number of files transferred: 149100 Total file size: 77759728862 bytes Total transferred file size: 77759727466 bytes Literal data: 77759972268 bytes Matched data: 0 bytes File list size: 4486135 File list generation time: 19.884 seconds File list transfer time: 0.000 seconds Total bytes sent: 3465014 Total bytes received: 77780895925 sent 3465014 bytes received 77780895925 bytes 3746747.96 bytes/sec total size is 77759728862 speedup is 1.00 real 346m0.698s user 31m55.460s sys 12m13.790s
I then used my Ceph filesystem (RC branch, commit: 89d5c91e7d207d646651f8959ee37a15ea199d1b), with the 2.6.37-rc5 kernel. My journal was on a tmpfs and my replication level was set to 3 for both the 'data' and 'metadata' pool.
Number of files: 180014 Number of files transferred: 149213 Total file size: 78254366970 bytes Total transferred file size: 78254365574 bytes Literal data: 78255506613 bytes Matched data: 0 bytes File list size: 4490275 File list generation time: 23.995 seconds File list transfer time: 0.000 seconds Total bytes sent: 3467512 Total bytes received: 78276499964 sent 3467512 bytes received 78276499964 bytes 3313928.73 bytes/sec total size is 78254366970 speedup is 1.00 real 394m31.248s user 31m54.510s sys 12m31.080s
As you can see, the rsync to Ceph wasn't that slow at all, but it might be due to the large dataset and my download speed at the office, which is 100Mbit max.
I then tried removing the files, on the btrfs filesystem this took 23 seconds, on the Ceph filesystem the rsync is already running for 15 minutes now and it seems it will need a lot more time to finish.
Updated by Greg Farnum over 13 years ago
- Assignee set to Greg Farnum
Started taking a look at this again.
Updated by Greg Farnum about 13 years ago
Discovered that the kernel client is clearing its I_COMPLETE flag on every newly-created directory due to how the caps code handles the addition of the shared bit. I have a patch but want to get it reviewed.
Will have to re-run tests to see how much improvement this provides since I think my test data got changed since I originally ran them.
Updated by Greg Farnum about 13 years ago
All right, ran into a wall unfortunately. :(
Because rsync does a transfer-to-temp-file-and-rename, and because renames necessarily clear the I_COMPLETE flag in the kclient (due to obscure ordering issues caused by d_move's behavior), this is just going to be stuck at pretty slow since the directory can't retain the I_COMPLETE flag. sigh
I may put some effort into testing rsyncs after the initial one to see if there are other behaviors that slow it down which we can modify, but it's not going to be a priority.
Updated by Sage Weil about 13 years ago
Try this?
From 28257f973e309b4e0989cbefb0245a70a1aa4d4b Mon Sep 17 00:00:00 2001 From: Sage Weil <sage@newdream.net> Date: Fri, 4 Feb 2011 21:38:47 -0800 Subject: [PATCH] ceph: preserve I_COMPLETE across rename d_move puts the renamed dentry at the end of d_subdirs, screwing with our cached dentry directory offsets. We were just clearing I_COMPLETE to avoid any possibility of trouble. However, assigning the renamed dentry an offset at the end of the directory (to match its new d_subdirs position) is sufficient to maintain correct behavior and hold onto I_COMPLETE. This is especially important for workloads like rsync, which renames files into place. Before, we would lose I_COMPLETE and do MDS lookups for each file. With this patch we only talk to the MDS on create and rename. Signed-off-by: Sage Weil <sage@newdream.net> --- fs/ceph/inode.c | 16 ++++++++-------- 1 files changed, 8 insertions(+), 8 deletions(-) diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c index 50001de..83f65f9 100644 --- a/fs/ceph/inode.c +++ b/fs/ceph/inode.c @@ -1021,9 +1021,6 @@ int ceph_fill_trace(struct super_block *sb, struct ceph_mds_request *req, dout("fill_trace doing d_move %p -> %p\n", req->r_old_dentry, dn); - /* d_move screws up d_subdirs order */ - ceph_i_clear(dir, CEPH_I_COMPLETE); - d_move(req->r_old_dentry, dn); dout(" src %p '%.*s' dst %p '%.*s'\n", req->r_old_dentry, @@ -1035,12 +1032,15 @@ int ceph_fill_trace(struct super_block *sb, struct ceph_mds_request *req, rehashing bug in vfs_rename_dir */ ceph_invalidate_dentry_lease(dn); - /* take overwritten dentry's readdir offset */ - dout("dn %p gets %p offset %lld (old offset %lld)\n", - req->r_old_dentry, dn, ceph_dentry(dn)->offset, + /* + * d_move() puts the renamed dentry at the end of + * d_subdirs. We need to assign it an appropriate + * directory offset so we can behave when holding + * I_COMPLETE. + */ + ceph_set_dentry_offset(req->r_old_dentry); + dout("dn %p gets new offset %lld\n", req->r_old_dentry, ceph_dentry(req->r_old_dentry)->offset); - ceph_dentry(req->r_old_dentry)->offset = - ceph_dentry(dn)->offset; dn = req->r_old_dentry; /* use old_dentry */ in = dn->d_inode; -- 1.7.0
Updated by Greg Farnum about 13 years ago
- Target version changed from v0.25 to 12
Well, Sage's above patch makes it some better, but there are still issues with losing the I_COMPLETE flag due to preemptively dropping caps we aren't using. Since that's something we want to continue doing in general, there's not a lot we can do about it.
I've considered the idea of holding on to caps when we created the node and would like to test it out, but a change like that should wait until we have a suite of performance tests since it could pretty easily hurt other kinds of workloads.
So, putting this on hold for now.
Updated by Sage Weil about 13 years ago
- Translation missing: en.field_story_points set to 3
- Translation missing: en.field_position set to 1
- Translation missing: en.field_position changed from 1 to 561