Project

General

Profile

Actions

Bug #17297

open

high cpu usage for ceph-fuse (>150%)

Added by Donatas Abraitis over 7 years ago. Updated over 7 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
Performance/Resource Usage
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
ceph-fuse
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Hi,

we noticed, that our CephFS deployment is very very slow. If we try to extract for instance kernel source under mounted cephfs with ceph-fuse, ceph-fuse process eats almost two cores and extraction takes ~5mins.

This is the output from sysdig, which shows that almost every fifth futex() is timed out. Any suggestion what should we look first to debug this slowness?

# sysdig 'proc.name=ceph-fuse and evt.latency > 10000000' -p  '%evt.type --> %evt.latency.human)'
futex --> 28ms)
futex --> 5.00s)
futex --> 14ms)
futex --> 15ms)
futex --> 11ms)
futex --> 11ms)
futex --> 5.00s)
futex --> 11ms)
futex --> 15ms)
futex --> 14ms)
futex --> 14ms)
futex --> 16ms)
futex --> 19ms)
futex --> 5.00s)

Thank you.

Actions #1

Updated by Greg Farnum over 7 years ago

What version of ceph-fuse are you currently running? What config options have you set?

Do you have any evidence it's the futexes in particular which are taking up CPU time? Being "fast" I wouldn't default to it being time wasted there unless you have some evidence of it. :)

I think by default we are using kernel-enforced permissions; changing that may improve things (fuse_default_permissions = false) and is probably okay in Jewel (but isn't default, so your mileage may vary).

Actions #2

Updated by Donatas Abraitis over 7 years ago

ceph-fuse version:

# rpm -qa | grep ceph-fuse
ceph-fuse-10.2.2-0.el7.x86_64

ceph-fuse process:

root      6793  2.0  0.1 2017400 377908 ?      Sl   14:58   5:18 ceph-fuse --name=client.cephfuse-client.xxx.io /home -o nonempty,rw

/etc/ceph/ceph.conf (client section):

[client]
fuse default permissions = 0
client acl type = posix_acl

0 == false?

Actions #3

Updated by Donatas Abraitis over 7 years ago

Greg Farnum, nothing is warned/noticed regarding "slow" in OSD logs, cluster status is HEALTH_OK, but slowness somehow is disappointing me. What would you recommend to take a look first?

Actions #4

Updated by Donatas Abraitis over 7 years ago

Just tried to disable quotas, but https://github.com/ceph/ceph/blob/a033dc6f5b4cef357db6f5951062d680e880ba0e/src/client/Client.cc#L12470 is hitting on every read/write still.. Or maybe ceph-fuse ignores [client] section from /etc/ceph/ceph.conf and needs run-time parameters?

Actions #5

Updated by Greg Farnum over 7 years ago

Well, that function aborts if quota is disabled; it still gets called into.

Anyway I tried it locally with linux-4.0.5.tar.xz and it took me 8 minutes on a vstart instance. I think that's just how long that many metadata queries take right now.

Actions #6

Updated by Donatas Abraitis over 7 years ago

# dd if=/dev/zero of=/home/testas/1G bs=1G count=1 oflag=direct
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB) copied, 26.3153 s, 40.8 MB/s
Actions #7

Updated by Loïc Dachary over 7 years ago

  • Target version deleted (v10.2.3)
Actions

Also available in: Atom PDF