Project

General

Profile

Bug #1114

NFS export extreme slowdown

Added by Brian Chrisman almost 8 years ago. Updated 3 months ago.

Status:
Rejected
Priority:
Normal
Assignee:
-
Category:
NFS (Linux Kernel)
Target version:
-
Start date:
05/26/2011
Due date:
% Done:

0%

Source:
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:

Description

Attached is debug mds 20 output.
Below is ceph -w output for a corresponding period.
Time synchronization is < 0.1s.
Master branch as of today 5/26.
Use case is copying a 1GB+ source build tree to a ceph filesystem (kernel-client mounted) via NFS export.
Consistently occurs after some roughly deterministic but unknown amount of data/metadata operations in the copy.

  1. ceph -w
    2011-05-26 22:02:02.672197 pg v63: 2376 pgs: 2376 active+clean; 1218 KB data, 31224 KB used, 18240 GB / 18265 GB avail
    2011-05-26 22:02:02.684161 mds e6: 1/1/1 up {0=scale-192-168-98-110=up:active}, 2 up:standby
    2011-05-26 22:02:02.684233 osd e14: 12 osds: 12 up, 12 in
    2011-05-26 22:02:02.684444 log 2011-05-26 21:56:32.882616 mon2 192.168.98.111:6789/0 25 : [WRN] message from mon0 was stamped 0.013284s in the future, clocks not synchronized
    2011-05-26 22:02:02.684995 mon e1: 3 mons at {0=192.168.98.109:6789/0,1=192.168.98.110:6789/0,2=192.168.98.111:6789/0}
    2011-05-26 22:02:04.921748 pg v64: 2376 pgs: 2376 active+clean; 64692 KB data, 107 MB used, 18240 GB / 18265 GB avail
    2011-05-26 22:02:05.984357 pg v65: 2376 pgs: 2376 active+clean; 155 MB data, 194 MB used, 18240 GB / 18265 GB avail
    2011-05-26 22:02:09.535567 pg v66: 2376 pgs: 2376 active+clean; 159 MB data, 219 MB used, 18240 GB / 18265 GB avail
    2011-05-26 22:02:11.052284 pg v67: 2376 pgs: 2376 active+clean; 165 MB data, 251 MB used, 18240 GB / 18265 GB avail
    2011-05-26 22:02:14.489935 pg v68: 2376 pgs: 2376 active+clean; 165 MB data, 251 MB used, 18240 GB / 18265 GB avail
    2011-05-26 22:02:16.112182 pg v69: 2376 pgs: 2376 active+clean; 165 MB data, 287 MB used, 18240 GB / 18265 GB avail
    2011-05-26 22:02:19.476662 pg v70: 2376 pgs: 2376 active+clean; 165 MB data, 287 MB used, 18240 GB / 18265 GB avail
    2011-05-26 22:02:24.578577 pg v71: 2376 pgs: 2376 active+clean; 165 MB data, 287 MB used, 18240 GB / 18265 GB avail
    2011-05-26 22:02:26.277742 pg v72: 2376 pgs: 2376 active+clean; 165 MB data, 287 MB used, 18240 GB / 18265 GB avail
    2011-05-26 22:02:29.656958 pg v73: 2376 pgs: 2376 active+clean; 165 MB data, 287 MB used, 18240 GB / 18265 GB avail
    2011-05-26 22:02:34.767601 pg v74: 2376 pgs: 2376 active+clean; 166 MB data, 287 MB used, 18240 GB / 18265 GB avail
    2011-05-26 22:02:36.344764 pg v75: 2376 pgs: 2376 active+clean; 166 MB data, 297 MB used, 18240 GB / 18265 GB avail
    2011-05-26 22:02:39.503245 pg v76: 2376 pgs: 2376 active+clean; 166 MB data, 297 MB used, 18240 GB / 18265 GB avail
    2011-05-26 22:02:44.899181 pg v77: 2376 pgs: 2376 active+clean; 166 MB data, 297 MB used, 18240 GB / 18265 GB avail
    2011-05-26 22:02:46.123144 pg v78: 2376 pgs: 2376 active+clean; 166 MB data, 313 MB used, 18240 GB / 18265 GB avail
    2011-05-26 22:02:49.457926 pg v79: 2376 pgs: 2376 active+clean; 166 MB data, 313 MB used, 18240 GB / 18265 GB avail
    2011-05-26 22:02:54.571233 pg v80: 2376 pgs: 2376 active+clean; 166 MB data, 313 MB used, 18240 GB / 18265 GB avail
    2011-05-26 22:02:56.245927 pg v81: 2376 pgs: 2376 active+clean; 166 MB data, 313 MB used, 18240 GB / 18265 GB avail
    2011-05-26 22:02:59.885295 pg v82: 2376 pgs: 2376 active+clean; 166 MB data, 313 MB used, 18240 GB / 18265 GB avail
    2011-05-26 22:03:00.871756 pg v83: 2376 pgs: 2376 active+clean; 166 MB data, 313 MB used, 18240 GB / 18265 GB avail
    2011-05-26 22:03:04.952345 pg v84: 2376 pgs: 2376 active+clean; 166 MB data, 313 MB used, 18240 GB / 18265 GB avail
    2011-05-26 22:03:05.801877 pg v85: 2376 pgs: 2376 active+clean; 166 MB data, 325 MB used, 18240 GB / 18265 GB avail
    2011-05-26 22:03:09.827020 pg v86: 2376 pgs: 2376 active+clean; 166 MB data, 326 MB used, 18240 GB / 18265 GB avail
    2011-05-26 22:03:14.510075 pg v87: 2376 pgs: 2376 active+clean; 167 MB data, 333 MB used, 18240 GB / 18265 GB avail
    2011-05-26 22:03:15.576846 pg v88: 2376 pgs: 2376 active+clean; 167 MB data, 349 MB used, 18240 GB / 18265 GB avail
    2011-05-26 22:03:19.883004 pg v89: 2376 pgs: 2376 active+clean; 167 MB data, 349 MB used, 18240 GB / 18265 GB avail
    2011-05-26 22:03:25.129790 pg v90: 2376 pgs: 2376 active+clean; 167 MB data, 349 MB used, 18240 GB / 18265 GB avail
    2011-05-26 22:03:25.966880 pg v91: 2376 pgs: 2376 active+clean; 167 MB data, 369 MB used, 18240 GB / 18265 GB avail
    2011-05-26 22:03:29.806945 pg v92: 2376 pgs: 2376 active+clean; 167 MB data, 369 MB used, 18240 GB / 18265 GB avail
    2011-05-26 22:03:35.147184 pg v93: 2376 pgs: 2376 active+clean; 167 MB data, 369 MB used, 18240 GB / 18265 GB avail
    2011-05-26 22:03:36.022737 pg v94: 2376 pgs: 2376 active+clean; 167 MB data, 369 MB used, 18240 GB / 18265 GB avail
    2011-05-26 22:03:39.726807 pg v95: 2376 pgs: 2376 active+clean; 167 MB data, 369 MB used, 18240 GB / 18265 GB avail
    2011-05-26 22:03:44.744657 pg v96: 2376 pgs: 2376 active+clean; 167 MB data, 369 MB used, 18240 GB / 18265 GB avail
    2011-05-26 22:03:45.879049 pg v97: 2376 pgs: 2376 active+clean; 167 MB data, 369 MB used, 18240 GB / 18265 GB avail
    2011-05-26 22:03:49.756106 pg v98: 2376 pgs: 2376 active+clean; 167 MB data, 369 MB used, 18240 GB / 18265 GB avail
    2011-05-26 22:03:54.523294 pg v99: 2376 pgs: 2376 active+clean; 167 MB data, 369 MB used, 18240 GB / 18265 GB avail
    2011-05-26 22:03:55.648484 pg v100: 2376 pgs: 2376 active+clean; 168 MB data, 372 MB used, 18240 GB / 18265 GB avail
    2011-05-26 22:03:59.963840 pg v101: 2376 pgs: 2376 active+clean; 168 MB data, 372 MB used, 18240 GB / 18265 GB avail
    2011-05-26 22:04:00.838651 pg v102: 2376 pgs: 2376 active+clean; 169 MB data, 374 MB used, 18240 GB / 18265 GB avail
    2011-05-26 22:04:05.071747 pg v103: 2376 pgs: 2376 active+clean; 169 MB data, 374 MB used, 18240 GB / 18265 GB avail
    2011-05-26 22:04:05.825393 pg v104: 2376 pgs: 2376 active+clean; 169 MB data, 374 MB used, 18240 GB / 18265 GB avail
    2011-05-26 22:04:10.235816 pg v105: 2376 pgs: 2376 active+clean; 169 MB data, 374 MB used, 18240 GB / 18265 GB avail
    2011-05-26 22:04:15.224835 pg v106: 2376 pgs: 2376 active+clean; 169 MB data, 374 MB used, 18240 GB / 18265 GB avail
    2011-05-26 22:04:16.160427 pg v107: 2376 pgs: 2376 active+clean; 169 MB data, 374 MB used, 18240 GB / 18265 GB avail
    2011-05-26 22:04:20.171603 pg v108: 2376 pgs: 2376 active+clean; 169 MB data, 374 MB used, 18240 GB / 18265 GB avail
    2011-05-26 22:04:20.943961 pg v109: 2376 pgs: 2376 active+clean; 169 MB data, 374 MB used, 18240 GB / 18265 GB avail
    2011-05-26 22:04:25.211361 pg v110: 2376 pgs: 2376 active+clean; 169 MB data, 374 MB used, 18240 GB / 18265 GB avail
    2011-05-26 22:04:25.932742 pg v111: 2376 pgs: 2376 active+clean; 169 MB data, 374 MB used, 18240 GB / 18265 GB avail
    2011-05-26 22:04:29.591203 pg v112: 2376 pgs: 2376 active+clean; 169 MB data, 374 MB used, 18240 GB / 18265 GB avail
    2011-05-26 22:04:35.053130 pg v113: 2376 pgs: 2376 active+clean; 170 MB data, 374 MB used, 18240 GB / 18265 GB avail
    2011-05-26 22:04:36.558676 pg v114: 2376 pgs: 2376 active+clean; 170 MB data, 374 MB used, 18240 GB / 18265 GB avail
    2011-05-26 22:04:39.625175 pg v115: 2376 pgs: 2376 active+clean; 170 MB data, 374 MB used, 18240 GB / 18265 GB avail
    2011-05-26 22:04:45.228658 pg v116: 2376 pgs: 2376 active+clean; 170 MB data, 374 MB used, 18240 GB / 18265 GB avail
    2011-05-26 22:04:46.104386 pg v117: 2376 pgs: 2376 active+clean; 170 MB data, 374 MB used, 18240 GB / 18265 GB avail
    2011-05-26 22:04:50.041891 pg v118: 2376 pgs: 2376 active+clean; 170 MB data, 374 MB used, 18240 GB / 18265 GB avail
    2011-05-26 22:04:54.680541 pg v119: 2376 pgs: 2376 active+clean; 170 MB data, 374 MB used, 18240 GB / 18265 GB avail
    2011-05-26 22:04:55.706833 pg v120: 2376 pgs: 2376 active+clean; 170 MB data, 374 MB used, 18240 GB / 18265 GB avail
    2011-05-26 22:04:59.685786 pg v121: 2376 pgs: 2376 active+clean; 170 MB data, 374 MB used, 18240 GB / 18265 GB avail
    2011-05-26 22:05:01.258347 pg v122: 2376 pgs: 2376 active+clean; 170 MB data, 374 MB used, 18240 GB / 18265 GB avail
    2011-05-26 22:05:04.961594 pg v123: 2376 pgs: 2376 active+clean; 170 MB data, 374 MB used, 18240 GB / 18265 GB avail
    2011-05-26 22:05:05.924606 pg v124: 2376 pgs: 2376 active+clean; 170 MB data, 374 MB used, 18240 GB / 18265 GB avail
    2011-05-26 22:05:10.130142 pg v125: 2376 pgs: 2376 active+clean; 170 MB data, 374 MB used, 18240 GB / 18265 GB avail
    2011-05-26 22:05:14.674699 pg v126: 2376 pgs: 2376 active+clean; 170 MB data, 374 MB used, 18240 GB / 18265 GB avail
    2011-05-26 22:05:15.663701 pg v127: 2376 pgs: 2376 active+clean; 170 MB data, 375 MB used, 18240 GB / 18265 GB avail
    2011-05-26 22:05:20.208742 pg v128: 2376 pgs: 2376 active+clean; 170 MB data, 375 MB used, 18240 GB / 18265 GB avail
    2011-05-26 22:05:25.133026 pg v129: 2376 pgs: 2376 active+clean; 170 MB data, 375 MB used, 18240 GB / 18265 GB avail
    2011-05-26 22:05:35.009655 pg v130: 2376 pgs: 2376 active+clean; 170 MB data, 375 MB used, 18240 GB / 18265 GB avail
    2011-05-26 22:05:35.956974 pg v131: 2376 pgs: 2376 active+clean; 170 MB data, 375 MB used, 18240 GB / 18265 GB avail
    2011-05-26 22:05:55.059282 pg v132: 2376 pgs: 2376 active+clean; 170 MB data, 375 MB used, 18240 GB / 18265 GB avail
    2011-05-26 22:05:55.850950 pg v133: 2376 pgs: 2376 active+clean; 170 MB data, 377 MB used, 18240 GB / 18265 GB avail
    2011-05-26 22:06:04.488704 pg v134: 2376 pgs: 2376 active+clean; 170 MB data, 376 MB used, 18240 GB / 18265 GB avail

mds.slownfs.log.bz2 (1.5 MB) Brian Chrisman, 05/26/2011 03:29 PM

CephBDIstats.pdf (22.2 KB) Brian Chrisman, 06/15/2011 04:16 PM

Screen_shot_2011-06-15_at_4.29.37_PM.png View (19.6 KB) Brian Chrisman, 06/15/2011 04:32 PM

Screen_shot_2011-06-15_at_4.29.01_PM.png View (34.6 KB) Brian Chrisman, 06/15/2011 04:32 PM

Screen_shot_2011-06-15_at_4.27.38_PM.png View (24 KB) Brian Chrisman, 06/15/2011 04:32 PM

History

#1 Updated by Brian Chrisman almost 8 years ago

I've verified this problem continues with 2.6.39 kernel with today's (5/26/11) build from ceph-client git repo (master) and today's ceph repo master branch.
What am I looking for in the logs here?

[root@buildvm-e2bf11 ~]# du -sh /mnt/buildshare ; sleep 600 ; du -sh /mnt/buildshare
166M /mnt/buildshare
167M /mnt/buildshare
[root@buildvm-e2bf11 ~]#

#2 Updated by Brian Chrisman almost 8 years ago

Here's a 20-second interval time evolution of /sys/kernel/debug/ceph/*/bdi/stats
I can go ahead and graph these..

#3 Updated by Brian Chrisman almost 8 years ago

Here's the time sequence with each x-axis increment being 20 seconds.
Plotted separately to avoid scaling axes and all that crap.
Google charts doesn't plot much better.

#4 Updated by Brian Chrisman almost 8 years ago

I think I've figured out the more general symptom.
File creation via NFS-ontop-of-ceph takes 1-3 seconds per file.
My copy appeared to be working because it initially copies some very large files.
I interrupted the copy and manually performed a few more copies, each taking 5-10s even though they were 100 byte or 500 byte files.

Performing a 'touch' of a new file on an NFS mounted ceph filesystem takes 1-3+ seconds.

Here's the end of a strace for a touch... what's not clear here is that the 2-3 second pause shows up in the middle of the utimensat() call after the open and close.
I exported an ext3 filesystem from the same server and the same touch takes < 0.1s.

open("/usr/lib/locale/locale-archive", O_RDONLY) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=99158752, ...}) = 0
mmap(NULL, 99158752, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7fbd89d58000
close(3)                                = 0
open("/mnt/buildshare/file7", O_WRONLY|O_CREAT|O_NOCTTY|O_NONBLOCK, 0666) = 3
dup2(3, 0)                              = 0
close(3)                                = 0
utimensat(0, NULL, NULL, 0)             = 0
close(0)                                = 0
close(1)                                = 0
close(2)                                = 0
exit_group(0)                           = ?

#5 Updated by Sage Weil almost 8 years ago

  • Target version set to v0.34
  • translation missing: en.field_position set to 790

#6 Updated by Sage Weil almost 8 years ago

  • translation missing: en.field_position deleted (798)
  • translation missing: en.field_position set to 1
  • translation missing: en.field_position changed from 1 to 805

#7 Updated by Sage Weil almost 8 years ago

  • Status changed from New to Feedback
  • translation missing: en.field_position deleted (807)
  • translation missing: en.field_position set to 808

#8 Updated by Sage Weil almost 8 years ago

  • Target version changed from v0.34 to v0.35

#9 Updated by Sage Weil over 7 years ago

  • Assignee set to Brian Chrisman

#10 Updated by Sage Weil over 7 years ago

  • Target version changed from v0.35 to v0.36

#11 Updated by Sage Weil over 7 years ago

  • translation missing: en.field_position deleted (847)
  • translation missing: en.field_position set to 1
  • translation missing: en.field_position changed from 1 to 884

#12 Updated by Sage Weil over 7 years ago

  • Target version changed from v0.36 to v0.37

#13 Updated by Sage Weil over 7 years ago

  • Target version changed from v0.37 to v0.38

#14 Updated by Sage Weil over 7 years ago

  • Status changed from Feedback to Need More Info
  • Target version deleted (v0.38)

Need to reproduce this on the current trunk and fully characterize what is going on.

- the the nfs server in sync or async mode
- which ceph operation(s) are slow

I suspect the ceph flushes aren't being prioritized or something.

#15 Updated by Sage Weil about 7 years ago

  • Project changed from Ceph to Linux kernel client

#16 Updated by Sage Weil over 6 years ago

  • Assignee deleted (Brian Chrisman)

#17 Updated by Ian Colle about 6 years ago

  • Project changed from Linux kernel client to fs

#18 Updated by Greg Farnum almost 3 years ago

  • Category set to NFS (Linux Kernel)

#19 Updated by Patrick Donnelly 3 months ago

  • Status changed from Need More Info to Rejected

NFS over the kernel client is not a recommended configuration.

Also available in: Atom PDF