Bug #10208: libceph: intermittent hangs under memory pressure - Linux kernel client - Ceph

Actions

Copy link

Bug #10208

closed

libceph: intermittent hangs under memory pressure

Added by Ilya Dryomov over 9 years ago. Updated about 5 years ago.

Status:

Resolved

Priority:

Low

Assignee:

Ilya Dryomov

Category:

libceph

Target version:

% Done:

Source:

other

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Crash signature (v1):

Crash signature (v2):

Files

kern.log (223 KB) kern.log

Andrei Mikhailovsky, 11/30/2014 01:35 PM

Actions

Copy link

Updated by Andrei Mikhailovsky over 9 years ago

File kern.log kern.log added

The kern.log attached, with the data got shortly after running the following command:

time dd if=/dev/zero of=4G00 bs=4M count=5K oflag=direct & time dd if=/dev/zero of=4G11 bs=4M count=5K oflag=direct &time dd if=/dev/zero of=4G22 bs=4M count=5K oflag=direct &time dd if=/dev/zero of=4G33 bs=4M count=5K oflag=direct & time dd if=/dev/zero of=4G44 bs=4M count=5K oflag=direct & time dd if=/dev/zero of=4G55 bs=4M count=5K oflag=direct &time dd if=/dev/zero of=4G66 bs=4M count=5K oflag=direct &time dd if=/dev/zero of=4G77 bs=4M count=5K oflag=direct &

The output is an nfs mount point running over cephfs, which is mounted with mount -t ceph ... ...

Andrei

Actions

Copy link

Updated by Zheng Yan over 9 years ago

do nfs mount and cephfs mount on the same machine?

Actions

Copy link

Updated by Ilya Dryomov over 9 years ago

Status changed from 12 to Need More Info

Andrei, does the below mean you had OSDs and cephfs mounted on the same box? I missed this completely because the problem looked very similar to an rbd problem I was debugging at the time and I just assumed it was a libceph problem.

I had nfsd process hang tasks on the server side and not the actual hang tasks on the client side.

Here is my setup:

(osd server + cephfs kernel mountpoint + nfs server) ---- IPoIB link ----- (hypervisor host + nfs client)

So, when I was running dd tests on the mountpoint on the nfs client it has produced hang tasks of the nfsd process on the nfs server side. I have not seen any hang tasks on the client itself.

Actions

Copy link

Updated by Ilya Dryomov over 9 years ago

Priority changed from Urgent to High

A similar problem with krbd I was debugging with a user offline went away with 3.18 as far as they can tell.
I assume it was fixed by memory reclaim flags on waitqueues and sockets patches that went into 3.18.

Actions

Copy link