Bug #1114
closedNFS export extreme slowdown
0%
Description
Attached is debug mds 20 output.
Below is ceph -w output for a corresponding period.
Time synchronization is < 0.1s.
Master branch as of today 5/26.
Use case is copying a 1GB+ source build tree to a ceph filesystem (kernel-client mounted) via NFS export.
Consistently occurs after some roughly deterministic but unknown amount of data/metadata operations in the copy.
- ceph -w
2011-05-26 22:02:02.672197 pg v63: 2376 pgs: 2376 active+clean; 1218 KB data, 31224 KB used, 18240 GB / 18265 GB avail
2011-05-26 22:02:02.684161 mds e6: 1/1/1 up {0=scale-192-168-98-110=up:active}, 2 up:standby
2011-05-26 22:02:02.684233 osd e14: 12 osds: 12 up, 12 in
2011-05-26 22:02:02.684444 log 2011-05-26 21:56:32.882616 mon2 192.168.98.111:6789/0 25 : [WRN] message from mon0 was stamped 0.013284s in the future, clocks not synchronized
2011-05-26 22:02:02.684995 mon e1: 3 mons at {0=192.168.98.109:6789/0,1=192.168.98.110:6789/0,2=192.168.98.111:6789/0}
2011-05-26 22:02:04.921748 pg v64: 2376 pgs: 2376 active+clean; 64692 KB data, 107 MB used, 18240 GB / 18265 GB avail
2011-05-26 22:02:05.984357 pg v65: 2376 pgs: 2376 active+clean; 155 MB data, 194 MB used, 18240 GB / 18265 GB avail
2011-05-26 22:02:09.535567 pg v66: 2376 pgs: 2376 active+clean; 159 MB data, 219 MB used, 18240 GB / 18265 GB avail
2011-05-26 22:02:11.052284 pg v67: 2376 pgs: 2376 active+clean; 165 MB data, 251 MB used, 18240 GB / 18265 GB avail
2011-05-26 22:02:14.489935 pg v68: 2376 pgs: 2376 active+clean; 165 MB data, 251 MB used, 18240 GB / 18265 GB avail
2011-05-26 22:02:16.112182 pg v69: 2376 pgs: 2376 active+clean; 165 MB data, 287 MB used, 18240 GB / 18265 GB avail
2011-05-26 22:02:19.476662 pg v70: 2376 pgs: 2376 active+clean; 165 MB data, 287 MB used, 18240 GB / 18265 GB avail
2011-05-26 22:02:24.578577 pg v71: 2376 pgs: 2376 active+clean; 165 MB data, 287 MB used, 18240 GB / 18265 GB avail
2011-05-26 22:02:26.277742 pg v72: 2376 pgs: 2376 active+clean; 165 MB data, 287 MB used, 18240 GB / 18265 GB avail
2011-05-26 22:02:29.656958 pg v73: 2376 pgs: 2376 active+clean; 165 MB data, 287 MB used, 18240 GB / 18265 GB avail
2011-05-26 22:02:34.767601 pg v74: 2376 pgs: 2376 active+clean; 166 MB data, 287 MB used, 18240 GB / 18265 GB avail
2011-05-26 22:02:36.344764 pg v75: 2376 pgs: 2376 active+clean; 166 MB data, 297 MB used, 18240 GB / 18265 GB avail
2011-05-26 22:02:39.503245 pg v76: 2376 pgs: 2376 active+clean; 166 MB data, 297 MB used, 18240 GB / 18265 GB avail
2011-05-26 22:02:44.899181 pg v77: 2376 pgs: 2376 active+clean; 166 MB data, 297 MB used, 18240 GB / 18265 GB avail
2011-05-26 22:02:46.123144 pg v78: 2376 pgs: 2376 active+clean; 166 MB data, 313 MB used, 18240 GB / 18265 GB avail
2011-05-26 22:02:49.457926 pg v79: 2376 pgs: 2376 active+clean; 166 MB data, 313 MB used, 18240 GB / 18265 GB avail
2011-05-26 22:02:54.571233 pg v80: 2376 pgs: 2376 active+clean; 166 MB data, 313 MB used, 18240 GB / 18265 GB avail
2011-05-26 22:02:56.245927 pg v81: 2376 pgs: 2376 active+clean; 166 MB data, 313 MB used, 18240 GB / 18265 GB avail
2011-05-26 22:02:59.885295 pg v82: 2376 pgs: 2376 active+clean; 166 MB data, 313 MB used, 18240 GB / 18265 GB avail
2011-05-26 22:03:00.871756 pg v83: 2376 pgs: 2376 active+clean; 166 MB data, 313 MB used, 18240 GB / 18265 GB avail
2011-05-26 22:03:04.952345 pg v84: 2376 pgs: 2376 active+clean; 166 MB data, 313 MB used, 18240 GB / 18265 GB avail
2011-05-26 22:03:05.801877 pg v85: 2376 pgs: 2376 active+clean; 166 MB data, 325 MB used, 18240 GB / 18265 GB avail
2011-05-26 22:03:09.827020 pg v86: 2376 pgs: 2376 active+clean; 166 MB data, 326 MB used, 18240 GB / 18265 GB avail
2011-05-26 22:03:14.510075 pg v87: 2376 pgs: 2376 active+clean; 167 MB data, 333 MB used, 18240 GB / 18265 GB avail
2011-05-26 22:03:15.576846 pg v88: 2376 pgs: 2376 active+clean; 167 MB data, 349 MB used, 18240 GB / 18265 GB avail
2011-05-26 22:03:19.883004 pg v89: 2376 pgs: 2376 active+clean; 167 MB data, 349 MB used, 18240 GB / 18265 GB avail
2011-05-26 22:03:25.129790 pg v90: 2376 pgs: 2376 active+clean; 167 MB data, 349 MB used, 18240 GB / 18265 GB avail
2011-05-26 22:03:25.966880 pg v91: 2376 pgs: 2376 active+clean; 167 MB data, 369 MB used, 18240 GB / 18265 GB avail
2011-05-26 22:03:29.806945 pg v92: 2376 pgs: 2376 active+clean; 167 MB data, 369 MB used, 18240 GB / 18265 GB avail
2011-05-26 22:03:35.147184 pg v93: 2376 pgs: 2376 active+clean; 167 MB data, 369 MB used, 18240 GB / 18265 GB avail
2011-05-26 22:03:36.022737 pg v94: 2376 pgs: 2376 active+clean; 167 MB data, 369 MB used, 18240 GB / 18265 GB avail
2011-05-26 22:03:39.726807 pg v95: 2376 pgs: 2376 active+clean; 167 MB data, 369 MB used, 18240 GB / 18265 GB avail
2011-05-26 22:03:44.744657 pg v96: 2376 pgs: 2376 active+clean; 167 MB data, 369 MB used, 18240 GB / 18265 GB avail
2011-05-26 22:03:45.879049 pg v97: 2376 pgs: 2376 active+clean; 167 MB data, 369 MB used, 18240 GB / 18265 GB avail
2011-05-26 22:03:49.756106 pg v98: 2376 pgs: 2376 active+clean; 167 MB data, 369 MB used, 18240 GB / 18265 GB avail
2011-05-26 22:03:54.523294 pg v99: 2376 pgs: 2376 active+clean; 167 MB data, 369 MB used, 18240 GB / 18265 GB avail
2011-05-26 22:03:55.648484 pg v100: 2376 pgs: 2376 active+clean; 168 MB data, 372 MB used, 18240 GB / 18265 GB avail
2011-05-26 22:03:59.963840 pg v101: 2376 pgs: 2376 active+clean; 168 MB data, 372 MB used, 18240 GB / 18265 GB avail
2011-05-26 22:04:00.838651 pg v102: 2376 pgs: 2376 active+clean; 169 MB data, 374 MB used, 18240 GB / 18265 GB avail
2011-05-26 22:04:05.071747 pg v103: 2376 pgs: 2376 active+clean; 169 MB data, 374 MB used, 18240 GB / 18265 GB avail
2011-05-26 22:04:05.825393 pg v104: 2376 pgs: 2376 active+clean; 169 MB data, 374 MB used, 18240 GB / 18265 GB avail
2011-05-26 22:04:10.235816 pg v105: 2376 pgs: 2376 active+clean; 169 MB data, 374 MB used, 18240 GB / 18265 GB avail
2011-05-26 22:04:15.224835 pg v106: 2376 pgs: 2376 active+clean; 169 MB data, 374 MB used, 18240 GB / 18265 GB avail
2011-05-26 22:04:16.160427 pg v107: 2376 pgs: 2376 active+clean; 169 MB data, 374 MB used, 18240 GB / 18265 GB avail
2011-05-26 22:04:20.171603 pg v108: 2376 pgs: 2376 active+clean; 169 MB data, 374 MB used, 18240 GB / 18265 GB avail
2011-05-26 22:04:20.943961 pg v109: 2376 pgs: 2376 active+clean; 169 MB data, 374 MB used, 18240 GB / 18265 GB avail
2011-05-26 22:04:25.211361 pg v110: 2376 pgs: 2376 active+clean; 169 MB data, 374 MB used, 18240 GB / 18265 GB avail
2011-05-26 22:04:25.932742 pg v111: 2376 pgs: 2376 active+clean; 169 MB data, 374 MB used, 18240 GB / 18265 GB avail
2011-05-26 22:04:29.591203 pg v112: 2376 pgs: 2376 active+clean; 169 MB data, 374 MB used, 18240 GB / 18265 GB avail
2011-05-26 22:04:35.053130 pg v113: 2376 pgs: 2376 active+clean; 170 MB data, 374 MB used, 18240 GB / 18265 GB avail
2011-05-26 22:04:36.558676 pg v114: 2376 pgs: 2376 active+clean; 170 MB data, 374 MB used, 18240 GB / 18265 GB avail
2011-05-26 22:04:39.625175 pg v115: 2376 pgs: 2376 active+clean; 170 MB data, 374 MB used, 18240 GB / 18265 GB avail
2011-05-26 22:04:45.228658 pg v116: 2376 pgs: 2376 active+clean; 170 MB data, 374 MB used, 18240 GB / 18265 GB avail
2011-05-26 22:04:46.104386 pg v117: 2376 pgs: 2376 active+clean; 170 MB data, 374 MB used, 18240 GB / 18265 GB avail
2011-05-26 22:04:50.041891 pg v118: 2376 pgs: 2376 active+clean; 170 MB data, 374 MB used, 18240 GB / 18265 GB avail
2011-05-26 22:04:54.680541 pg v119: 2376 pgs: 2376 active+clean; 170 MB data, 374 MB used, 18240 GB / 18265 GB avail
2011-05-26 22:04:55.706833 pg v120: 2376 pgs: 2376 active+clean; 170 MB data, 374 MB used, 18240 GB / 18265 GB avail
2011-05-26 22:04:59.685786 pg v121: 2376 pgs: 2376 active+clean; 170 MB data, 374 MB used, 18240 GB / 18265 GB avail
2011-05-26 22:05:01.258347 pg v122: 2376 pgs: 2376 active+clean; 170 MB data, 374 MB used, 18240 GB / 18265 GB avail
2011-05-26 22:05:04.961594 pg v123: 2376 pgs: 2376 active+clean; 170 MB data, 374 MB used, 18240 GB / 18265 GB avail
2011-05-26 22:05:05.924606 pg v124: 2376 pgs: 2376 active+clean; 170 MB data, 374 MB used, 18240 GB / 18265 GB avail
2011-05-26 22:05:10.130142 pg v125: 2376 pgs: 2376 active+clean; 170 MB data, 374 MB used, 18240 GB / 18265 GB avail
2011-05-26 22:05:14.674699 pg v126: 2376 pgs: 2376 active+clean; 170 MB data, 374 MB used, 18240 GB / 18265 GB avail
2011-05-26 22:05:15.663701 pg v127: 2376 pgs: 2376 active+clean; 170 MB data, 375 MB used, 18240 GB / 18265 GB avail
2011-05-26 22:05:20.208742 pg v128: 2376 pgs: 2376 active+clean; 170 MB data, 375 MB used, 18240 GB / 18265 GB avail
2011-05-26 22:05:25.133026 pg v129: 2376 pgs: 2376 active+clean; 170 MB data, 375 MB used, 18240 GB / 18265 GB avail
2011-05-26 22:05:35.009655 pg v130: 2376 pgs: 2376 active+clean; 170 MB data, 375 MB used, 18240 GB / 18265 GB avail
2011-05-26 22:05:35.956974 pg v131: 2376 pgs: 2376 active+clean; 170 MB data, 375 MB used, 18240 GB / 18265 GB avail
2011-05-26 22:05:55.059282 pg v132: 2376 pgs: 2376 active+clean; 170 MB data, 375 MB used, 18240 GB / 18265 GB avail
2011-05-26 22:05:55.850950 pg v133: 2376 pgs: 2376 active+clean; 170 MB data, 377 MB used, 18240 GB / 18265 GB avail
2011-05-26 22:06:04.488704 pg v134: 2376 pgs: 2376 active+clean; 170 MB data, 376 MB used, 18240 GB / 18265 GB avail
Files
Updated by Brian Chrisman almost 13 years ago
I've verified this problem continues with 2.6.39 kernel with today's (5/26/11) build from ceph-client git repo (master) and today's ceph repo master branch.
What am I looking for in the logs here?
[root@buildvm-e2bf11 ~]# du -sh /mnt/buildshare ; sleep 600 ; du -sh /mnt/buildshare
166M /mnt/buildshare
167M /mnt/buildshare
[root@buildvm-e2bf11 ~]#
Updated by Brian Chrisman almost 13 years ago
- File CephBDIstats.pdf CephBDIstats.pdf added
Here's a 20-second interval time evolution of /sys/kernel/debug/ceph/*/bdi/stats
I can go ahead and graph these..
Updated by Brian Chrisman almost 13 years ago
- File Screen_shot_2011-06-15_at_4.29.37_PM.png Screen_shot_2011-06-15_at_4.29.37_PM.png added
- File Screen_shot_2011-06-15_at_4.29.01_PM.png Screen_shot_2011-06-15_at_4.29.01_PM.png added
- File Screen_shot_2011-06-15_at_4.27.38_PM.png Screen_shot_2011-06-15_at_4.27.38_PM.png added
Here's the time sequence with each x-axis increment being 20 seconds.
Plotted separately to avoid scaling axes and all that crap.
Google charts doesn't plot much better.
Updated by Brian Chrisman almost 13 years ago
I think I've figured out the more general symptom.
File creation via NFS-ontop-of-ceph takes 1-3 seconds per file.
My copy appeared to be working because it initially copies some very large files.
I interrupted the copy and manually performed a few more copies, each taking 5-10s even though they were 100 byte or 500 byte files.
Performing a 'touch' of a new file on an NFS mounted ceph filesystem takes 1-3+ seconds.
Here's the end of a strace for a touch... what's not clear here is that the 2-3 second pause shows up in the middle of the utimensat() call after the open and close.
I exported an ext3 filesystem from the same server and the same touch takes < 0.1s.
open("/usr/lib/locale/locale-archive", O_RDONLY) = 3 fstat(3, {st_mode=S_IFREG|0644, st_size=99158752, ...}) = 0 mmap(NULL, 99158752, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7fbd89d58000 close(3) = 0 open("/mnt/buildshare/file7", O_WRONLY|O_CREAT|O_NOCTTY|O_NONBLOCK, 0666) = 3 dup2(3, 0) = 0 close(3) = 0 utimensat(0, NULL, NULL, 0) = 0 close(0) = 0 close(1) = 0 close(2) = 0 exit_group(0) = ?
Updated by Sage Weil over 12 years ago
- Target version set to v0.34
- Translation missing: en.field_position set to 790
Updated by Sage Weil over 12 years ago
- Translation missing: en.field_position deleted (
798) - Translation missing: en.field_position set to 1
- Translation missing: en.field_position changed from 1 to 805
Updated by Sage Weil over 12 years ago
- Status changed from New to 4
- Translation missing: en.field_position deleted (
807) - Translation missing: en.field_position set to 808
Updated by Sage Weil over 12 years ago
- Target version changed from v0.34 to v0.35
Updated by Sage Weil over 12 years ago
- Target version changed from v0.35 to v0.36
Updated by Sage Weil over 12 years ago
- Translation missing: en.field_position deleted (
847) - Translation missing: en.field_position set to 1
- Translation missing: en.field_position changed from 1 to 884
Updated by Sage Weil over 12 years ago
- Target version changed from v0.36 to v0.37
Updated by Sage Weil over 12 years ago
- Target version changed from v0.37 to v0.38
Updated by Sage Weil over 12 years ago
- Status changed from 4 to Need More Info
- Target version deleted (
v0.38)
Need to reproduce this on the current trunk and fully characterize what is going on.
- the the nfs server in sync or async mode
- which ceph operation(s) are slow
I suspect the ceph flushes aren't being prioritized or something.
Updated by Sage Weil about 12 years ago
- Project changed from Ceph to Linux kernel client
Updated by Ian Colle about 11 years ago
- Project changed from Linux kernel client to CephFS
Updated by Patrick Donnelly about 5 years ago
- Status changed from Need More Info to Rejected
NFS over the kernel client is not a recommended configuration.