Project

General

Profile

Bug #57898

ceph client extremely slow kernel version between 5.15 and 6.0

Added by Minjong Kim 3 months ago. Updated about 1 month ago.

Status:
In Progress
Priority:
Normal
Assignee:
Category:
fs/ceph
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
performance
Crash signature (v1):
Crash signature (v2):

Description

hello? I am very new to ceph. Thank you for taking that into consideration and reading.

I recently changed the kernel to enjoy ceph's client code. For the first time I uploaded a newly built kernel to the VM for isolation. (After that, the same problem occurred in the official kernel of ubuntu, so I will skip my rudimentary kernel build process). However, compared to running on host (docker), there was a huge performance drop and I suspected this was due to my rudimentary VM network setup.

So I gave up the VM and replaced the kernel of the host of another machine with my custom kernel. But I got the same performance degradation. I was equally suspicious of myself, so I checked whether the same phenomenon occurs even if I build the 5.15 kernel (the same version as the Ubuntu 22.04 kernel) in the same way. But it worked fine with kernel version 5.15. However, my suspicion is still not resolved, I downloaded Ubuntu's mainline kernel (https://kernel.ubuntu.com/~kernel-ppa/mainline/v6.0/) and installed the kernel, but it was equally slow.

I'm trying to limit the performance drop to one use case to present quantitatively. There is already saved imagenet dataset (1000 directories with a total of 1.2 million files, each file on average 100KB). I'll present the average of the logged ceph_mds_request measurands when deleting it (rm /mnt/ceph/imagenet -r).

Kernel 6.0
mean of ceph_mds_requests: 6053
Kernel 5.15
mean of ceph_mds_requests: 131 (not a typo)

I ran the same test by mounting via ceph-fuse to comply with the standard troubleshooting methods presented in various issues. But this process caused another confusion for me. Performance is restored by performing the deletion in the following order.

1. kernel mount ceph (mount -t ceph ... /mnt/ceph)
2. fuse mount ceph (ceph-fuse ... /mnt/ceph-fuse)
3. delete via fuse mount (rm /mnt/ceph-fuse/imagenet -r)
4. Interrupt after a while
5. delete via kernel mount (rm /mnt/ceph/imagenet -r)

This was reproduced through several iterative tests with the docker container.

Let's summarize. The point of my issue is this.

- Performance degradation occurred in kernel version 5.15<x<=6.0. (roughly about 40 times)
- This was tested several times on two hosts,
- Also, the method related to ceph-fuse was repeated several times through container,

I'm new to both ceph and kernel, so it may have been my invisible mistake. However, I tried my best to control the variables.

Here is the experimental environment.

- 3 nodes and 4 OSDs per node
- 1 MDS globally
- OSD allocates both blocks and metadata as ramdisks
- No special options except bluefs_buffered_io = False
- Replication is not set. (for testing)
- ceph version v17.2.0

History

#1 Updated by Minjong Kim 3 months ago

Even with the ceph-fuse method in the body it gets slow again over time.

#2 Updated by Minjong Kim 3 months ago

Hello again
I don't know if anyone is interested, but when tested with an already built kernel (https://kernel.ubuntu.com/~kernel-ppa/mainline/), this slowdown does not occur in v5.19.17 and in v6.0.0-rc3 occurs. If I have time later, I'll build and test it by commit. No promises can be made.
Thanks

#3 Updated by Xiubo Li 3 months ago

Could you upload your test script ?
Do you mean you can also reproduce this by using the ceph-fuse mount, right ?

#4 Updated by Minjong Kim 3 months ago

ceph I used the ceph kernel mount. In fuse-mount it works fine.

The test script is nothing special. I just did the cp and rm commands on the imagenet dataset. It's very fast for kernel 5.19.17 and very slow for 6.0.0-rc3.

The imagenet dataset has 1,200,000 images in 1000 directories, with a total size of about 127GB, but this doesn't seem to matter. (because it's slow from the start)

It seems that it can be reproduced by copying or deleting several files of less than 100KB in one directory. Need to write a simple script?

#5 Updated by Xiubo Li 3 months ago

Minjong Kim wrote:

ceph I used the ceph kernel mount. In fuse-mount it works fine.

The test script is nothing special. I just did the cp and rm commands on the imagenet dataset. It's very fast for kernel 5.19.17 and very slow for 6.0.0-rc3.

The imagenet dataset has 1,200,000 images in 1000 directories, with a total size of about 127GB, but this doesn't seem to matter. (because it's slow from the start)

It seems that it can be reproduced by copying or deleting several files of less than 100KB in one directory. Need to write a simple script?

Yeah, please.

BTW, have ever test the latest ceph code in ceph-client [1] repo in testing ?

[1] https://github.com/ceph/ceph-client

#6 Updated by Minjong Kim 3 months ago

Xiubo Li wrote:

Minjong Kim wrote:

ceph I used the ceph kernel mount. In fuse-mount it works fine.

The test script is nothing special. I just did the cp and rm commands on the imagenet dataset. It's very fast for kernel 5.19.17 and very slow for 6.0.0-rc3.

The imagenet dataset has 1,200,000 images in 1000 directories, with a total size of about 127GB, but this doesn't seem to matter. (because it's slow from the start)

It seems that it can be reproduced by copying or deleting several files of less than 100KB in one directory. Need to write a simple script?

Yeah, please.

BTW, have ever test the latest ceph code in ceph-client [1] repo in testing ?

[1] https://github.com/ceph/ceph-client

I've never built and tested the most recent ceph-client repo myself. However, I have tried running it with 6.1.0-rc1 in the ubuntu mainline kernel, but the performance was the same. As a result of checking, the 6.1.0-rc1 tag of torvalds/linux and the for-linus branch of ceph/ceph-clinet seem to be the same version.

#7 Updated by Minjong Kim 3 months ago

But I haven't checked the testing branch. (I'll check)

#8 Updated by Minjong Kim 3 months ago

https://gist.github.com/caffeinism/dbfd974374d620911a6c0c3dd1daadfb

I am not good at writing files in a shell script (testing it doesn't seem to write blocks properly...), so I wrote it in a Python script. sorry. This script is designed to create a folder structure similar to imagenet by default. I've tried running it on 6.0.3 and 5.19.17 and it reproduces similarly. (although faster than cp).

python3 test.py /mnt/cephfs/some-directory

#9 Updated by Minjong Kim 3 months ago

Xiubo Li wrote:

Minjong Kim wrote:

ceph I used the ceph kernel mount. In fuse-mount it works fine.

The test script is nothing special. I just did the cp and rm commands on the imagenet dataset. It's very fast for kernel 5.19.17 and very slow for 6.0.0-rc3.

The imagenet dataset has 1,200,000 images in 1000 directories, with a total size of about 127GB, but this doesn't seem to matter. (because it's slow from the start)

It seems that it can be reproduced by copying or deleting several files of less than 100KB in one directory. Need to write a simple script?

Yeah, please.

BTW, have ever test the latest ceph code in ceph-client [1] repo in testing ?

[1] https://github.com/ceph/ceph-client

I've tried building and running the testing branch and it seems to be equally slow.

#10 Updated by Minjong Kim 3 months ago

Also, it seems that requests to mds are much slower than writing blocks. When I run the rm command, it sends an average of 5-6000 commands in the 5.15 kernel, an average of 10000 commands in the 5.19 kernel, and an average of 120 commands in the 6.0 kernel. But when it comes to copying, it's not that much of a degradation.

And a question unrelated to this issue, what improvements did ceph-client get from 5.15 to 5.19? I can feel the performance improvement of 1.5 to 2 times perceptually. It seems that the writeback mechanism has changed along with the speed improvement of the MDS client. This no longer saturates the maximum bandwidth of the network... I spent a lot of time trying to solve this.

#11 Updated by Xiubo Li 3 months ago

  • Assignee set to Xiubo Li

#12 Updated by Xiubo Li 3 months ago

Minjong Kim wrote:

https://gist.github.com/caffeinism/dbfd974374d620911a6c0c3dd1daadfb

I am not good at writing files in a shell script (testing it doesn't seem to write blocks properly...), so I wrote it in a Python script. sorry. This script is designed to create a folder structure similar to imagenet by default. I've tried running it on 6.0.3 and 5.19.17 and it reproduces similarly. (although faster than cp).

python3 test.py /mnt/cephfs/some-directory

Thanks MinJong, I will take a look.

#13 Updated by Xiubo Li 3 months ago

I saw the same issue when testing the `6.1.0-rc1` upstream code:

[xfstests-dev]# sudo ./check -g quick -E ./ceph.exclude
FSTYP         -- ceph
PLATFORM      -- Linux/x86_64 lxbceph1 6.1.0-rc1+ #174 SMP PREEMPT_DYNAMIC Mon Nov 14 09:56:19 CST 2022
MKFS_OPTIONS  -- 10.72.47.117:40864:/testB
MOUNT_OPTIONS -- -o name=admin,nowsync,copyfrom,rasize=4096 -o context=system_u:object_r:root_t:s0 10.72.47.117:40864:/testB /mnt/kcephfs.B

ceph/001       [expunged]
ceph/002 62s ...  169s
ceph/003 75s ...  184s
ceph/004 82s ...  171s
ceph/005 63s ...  166s
generic/001 98s ...  327s
generic/002 60s ...  174s
generic/003       [expunged]
generic/004       [not run] O_TMPFILE is not supported
generic/005 64s ...  167s
generic/006 79s ...  211s
generic/007 114s ...  280s
generic/008       [not run] xfs_io fzero  failed (old kernel/wrong fs?)
generic/009       [not run] xfs_io fzero  failed (old kernel/wrong fs?)
generic/011 83s ...  212s
generic/012       [not run] xfs_io falloc  failed (old kernel/wrong fs?)
generic/013 242s ...  421s
generic/014 442s ...  764s
generic/015       [not run] Filesystem ceph not supported in _scratch_mkfs_sized
generic/016       [not run] xfs_io falloc  failed (old kernel/wrong fs?)
generic/018       [not run] defragmentation not supported for fstype "ceph" 
generic/020 97s ...  271s
generic/021       [not run] xfs_io falloc  failed (old kernel/wrong fs?)
generic/022       [not run] xfs_io falloc  failed (old kernel/wrong fs?)
generic/023 66s ...  225s
generic/024       [not run] kernel doesn't support renameat2 syscall
generic/025       [not run] kernel doesn't support renameat2 syscall
generic/026       [not run] ceph does not define maximum ACL count
....

Then I switched back to `5.19.0`, still the same but a little better:

[xfstests-dev]# sudo ./check -g quick -E ./ceph.exclude
FSTYP         -- ceph
PLATFORM      -- Linux/x86_64 lxbceph1 5.19.0+ #159 SMP PREEMPT_DYNAMIC Wed Aug 3 17:09:18 CST 2022
MKFS_OPTIONS  -- 10.72.47.117:40643:/testB
MOUNT_OPTIONS -- -o name=admin,nowsync,copyfrom,rasize=4096 -o context=system_u:object_r:root_t:s0 10.72.47.117:40643:/testB /mnt/kcephfs.B

ceph/001       [expunged]
ceph/002 169s ...  152s
ceph/003 184s ...  167s
ceph/004 171s ...  148s
ceph/005 166s ...  148s
generic/001 327s ...  263s
generic/002 174s ...  152s
generic/003       [expunged]
generic/004       [not run] O_TMPFILE is not supported
generic/005 167s ...  150s

#14 Updated by Xiubo Li 3 months ago

  • Status changed from New to In Progress

#15 Updated by Xiubo Li about 2 months ago

[root@lxbceph1 xfstests-dev]# ./check g quick -E ./ceph.exclude
FSTYP -
ceph
PLATFORM -- Linux/x86_64 lxbceph1 6.1.0+ #186 SMP PREEMPT_DYNAMIC Tue Dec 13 10:45:30 CST 2022
MKFS_OPTIONS -- 10.72.47.117:40986:/testB
MOUNT_OPTIONS -- -o name=admin,nowsync,copyfrom,rasize=4096 -o context=system_u:object_r:root_t:s0 10.72.47.117:40986:/testB /mnt/kcephfs.B

ceph/001 [expunged]
ceph/002 180s ... 9s
ceph/003 198s ... 23s
ceph/004 169s ... 7s
ceph/005 170s ... 7s
generic/001 332s ... 157s
generic/002 178s ... 12s
generic/003 [expunged]
generic/004 [not run] O_TMPFILE is not supported
generic/005 178s ... 9s
generic/006 218s ... 49s
generic/007 318s ... 130s
generic/008 [not run] xfs_io fzero failed (old kernel/wrong fs?)
generic/009 [not run] xfs_io fzero failed (old kernel/wrong fs?)
generic/011 231s ... 148s
generic/012 [not run] xfs_io falloc failed (old kernel/wrong fs?)
generic/013 449s ...

After rebased to the Linux 6.1 I see it became faster by testing the testing branch, without any change for ceph comparing with the last test with Linux 6.1.0-rc1.

So this is mostly not the kceph's issue.

#16 Updated by Minjong Kim about 1 month ago

Xiubo Li wrote:

[root@lxbceph1 xfstests-dev]# ./check g quick -E ./ceph.exclude
FSTYP -
ceph
PLATFORM -- Linux/x86_64 lxbceph1 6.1.0+ #186 SMP PREEMPT_DYNAMIC Tue Dec 13 10:45:30 CST 2022
MKFS_OPTIONS -- 10.72.47.117:40986:/testB
MOUNT_OPTIONS -- -o name=admin,nowsync,copyfrom,rasize=4096 -o context=system_u:object_r:root_t:s0 10.72.47.117:40986:/testB /mnt/kcephfs.B

ceph/001 [expunged]
ceph/002 180s ... 9s
ceph/003 198s ... 23s
ceph/004 169s ... 7s
ceph/005 170s ... 7s
generic/001 332s ... 157s
generic/002 178s ... 12s
generic/003 [expunged]
generic/004 [not run] O_TMPFILE is not supported
generic/005 178s ... 9s
generic/006 218s ... 49s
generic/007 318s ... 130s
generic/008 [not run] xfs_io fzero failed (old kernel/wrong fs?)
generic/009 [not run] xfs_io fzero failed (old kernel/wrong fs?)
generic/011 231s ... 148s
generic/012 [not run] xfs_io falloc failed (old kernel/wrong fs?)
generic/013 449s ...

After rebased to the Linux 6.1 I see it became faster by testing the testing branch, without any change for ceph comparing with the last test with Linux 6.1.0-rc1.

So this is mostly not the kceph's issue.

It seems so to me too

Also available in: Atom PDF