Bug #20329
closedCeph file system hang on Jewel
0%
Description
We are running Ceph 10.2.7 and after adding a new multi-threaded writer application we are seeing hangs accessing metadata from ceph file system kernel mounted clients. I have a "du -ah /cephfs" process that been stuck for over 36 hours on one cephfs client system. We started seeing hung "du -ah" processes three days ago, so two days ago we upgraded the whole cluster from v10.2.5 to v10.2.7, but the problem occurred again. Rebooting the client fixes the problem. The ceph -s command is showing HEALTH_OK
We have four ceph file system clients, each kernel mounting our 1 ceph file system to /cephfs. The "du -ah /cephfs" runs hourly within a test script that is cron controlled. If the du -ah /cephfs does not complete within an hour, emails are sent to the admin group as part of our monitoring process. This command normally takes less then a minute to run and we have just over 3.6M files in this file system. The du -ah is hanging while accessing sub-directories where the new multi-threaded writer application is writing.
About the application: On one ceph client we are downloading external data via the network and writing data as files with a python program into the ceph file system. The python script can write up to 100 files in parallel. The metadata hangs we are seeing can occur on one or more client systems, but right now it is only hung on one system, which is not the system writing the data.
System info:
ceph -s cluster ba0c94fc-1168-11e6-aaea-000c290cc2d4 health HEALTH_OK monmap e1: 3 mons at {mon01=10.16.51.21:6789/0,mon02=10.16.51.22:6789/0,mon03=10.16.51.23:6789/0} election epoch 138, quorum 0,1,2 mon01,mon02,mon03 fsmap e3274: 1/1/1 up {0=mds03=up:active}, 2 up:standby osdmap e33064: 85 osds: 85 up, 85 in flags sortbitwise,require_jewel_osds pgmap v27766384: 16192 pgs, 12 pools, 7744 GB data, 6686 kobjects 24651 GB used, 216 TB / 241 TB avail 16192 active+clean client io 8901 kB/s rd, 883 kB/s wr, 26 op/s rd, 5 op/s wr
On the hung client node, we are seeing an entry in mdsc
cat /sys/kernel/debug/ceph/*/mdsc 163925513 mds0 readdir #100003be2b1 kplr009658474_dr25_window.fits
This entry has remained for 36 hours.
I am not seeing this in the misc file on the other 3 client nodes.
On the active metdata server, I ran:
ceph daemon mds.mds02 dump_ops_in_flight
every 2 seconds, as it kept changing. Output is at:
ftp://ftp.keepertech.com/outgoing/eric/cephfs_hang/dump_ops_in_flight.txt.gz
The output of ceph mds tell 0 dumpcache /tmp/dump.txt is at
ftp://ftp.keepertech.com/outgoing/eric/cephfs_hang/dumpcache.txt.gz
The contents of all the files under /sys/kernel/debug/ceph on the hung system, created with:
find /sys/kernel/debug/ceph -type f -print -exec cat {} \; > debug.ceph.all.files.txt
is at:
ftp://ftp.keepertech.com/outgoing/eric/cephfs_hang/debug.ceph.all.files.txt.gz
Info about the system
OS: Ubuntu Trusty
Cephfs snapshots are turned on and being created hourly
Ceph Version
ceph -v
ceph version 10.2.7 (50e863e0f4bc8f4b9e31156de690d765af245185)
Kernel: Ceph Servers:
uname -a
Linux mon01 4.2.0-27-generic #32~14.04.1-Ubuntu SMP Fri Jan 22
15:32:26 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
Kernel Cephfs clients:
uname -a
Linux dfgw02 4.9.21-040921-generic #201704080434 SMP Sat Apr 8
08:35:57 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
This was originally reported on the user email list:
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-June/018613.html
Updated by Zheng Yan almost 7 years ago
- Status changed from New to 12
cat /sys/kernel/debug/ceph/*/mdsc 163925513 mds0 readdir #100003be2b1 kplr009658474_dr25_window.fits
does the first entry of the line keeps changing. If it does, the bug should be fixed by https://github.com/ceph/ceph-client/commit/b50c2de51e611da90cf3cf04c058f7e9bbe79e93
Updated by Eric Eastman almost 7 years ago
The number in the first column changes. Here is the output running the command in a while loop, once a second. Every once in a while the cat command comes back blank.
1587904205 mds0 readdir #100003be2b1 kplr009658474_dr25_window.fits 1587909058 mds0 readdir #100003be2b1 kplr009658474_dr25_window.fits 1587917575 mds0 readdir #100003be2b1 kplr009658474_dr25_window.fits 1587921343 mds0 readdir #100003be2b1 kplr009658474_dr25_window.fits 1587925937 mds0 readdir #100003be2b1 kplr009658474_dr25_window.fits 1587930758 mds0 readdir #100003be2b1 kplr009658474_dr25_window.fits 1587935502 mds0 readdir #100003be2b1 kplr009658474_dr25_window.fits 1587939518 mds0 readdir #100003be2b1 kplr009658474_dr25_window.fits 1587943537 mds0 readdir #100003be2b1 kplr009658474_dr25_window.fits 1587947959 mds0 readdir #100003be2b1 kplr009658474_dr25_window.fits
Is this fix going to be pushed back to any earlier kernels?
Updated by John Spray almost 7 years ago
Eric: that's a conversation to have with whoever is providing your kernel -- the kernel bits of Ceph are not part of what we release + version on ceph.com.
Updated by Zheng Yan almost 7 years ago
Eric Eastman wrote:
The number in the first column changes. Here is the output running the command in a while loop, once a second. Every once in a while the cat command comes back blank.
[...]Is this fix going to be pushed back to any earlier kernels?
I will backport this for longterm kernel.
Updated by John Spray almost 7 years ago
- Status changed from 12 to Resolved
Resolving, patch will show up in stable release as and when.