Bug #7206: Ceph MDS Hang on hadoop workloads - CephFS - Ceph

Actions

Copy link

Bug #7206

closed

Ceph MDS Hang on hadoop workloads

Added by Greg Bowyer over 10 years ago. Updated about 5 years ago.

Status:

Can't reproduce

Priority:

Normal

Assignee:

Category:

Target version:

% Done:

Source:

other

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Component(FS):

Client, Common/Protocol, Hadoop/Java, MDS

Labels (FS):

Java/Hadoop

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

With blatant disregard for posted advice about the stability of ceph-fs, I attempted to use it for testing out some hadoop workloads.

When the hadoop job gets to roughly 30% complete the MDS appears to go out for lunch locking up.

This is with ubuntu 13.10, kernel 3.11.0, ceph emperor, hadoop cdh 4.x.

I had to update the ceph filesystem hook for hadoop 2.x, that might be the cause, but I think that even if the updates are invalid a buggy client should not really be able to freak out the MDS.

The machines are on AWS so Xen might be part of the issue

Ceph FS is mounted on the machines as well.

I could not coax a perf report out of the machine (that also locks up)

I am more than willing to help track this down.

Files

Download all files

ceph-mds.parlimentarian-10-2-38-5.log (8.71 MB) ceph-mds.parlimentarian-10-2-38-5.log		Greg Bowyer, 01/22/2014 12:30 PM
client.dmesg (2.42 KB) client.dmesg		Greg Bowyer, 01/22/2014 12:30 PM

Actions

Copy link

Updated by Greg Farnum over 10 years ago

This could be an issue with time sync on the nodes; check your clock drift. (That's the only issue I know of that we've run into with Hadoop.) If you're using the Hadoop/CephFS Filesystem you don't need to mount CephFS on the client nodes, btw.

If that's not the issue, you'll need to reproduce with mds logging turned on (probably debug mds = 20, debug ms = 1) and client logging (debug client = 20, debug ms = 1) and the admin socket enabled on the Hadoop nodes.
Once it hangs, see if you can get the mds to dump its cache ("ceph tell mds.0 dumpcache", I think) and gather the "mds_requests" and "dump_cache" output from the client admin sockets.

Actions

Copy link

Updated by Greg Bowyer over 10 years ago

Greg Farnum wrote:

This could be an issue with time sync on the nodes; check your clock drift. (That's the only issue I know of that we've run into with Hadoop.) If you're using the Hadoop/CephFS Filesystem you don't need to mount CephFS on the client nodes, btw.

Double checked, all the clocks are in sync (NTP and all)

If that's not the issue, you'll need to reproduce with mds logging turned on (probably debug mds = 20, debug ms = 1) and client logging (debug client = 20, debug ms = 1) and the admin socket enabled on the Hadoop nodes.
Once it hangs, see if you can get the mds to dump its cache ("ceph tell mds.0 dumpcache", I think) and gather the "mds_requests" and "dump_cache" output from the client admin sockets.

I will do this tonight when the cluster is quiet, is there anything else I can grab at the same time ?

Actions

Copy link

Updated by Greg Farnum over 10 years ago

That should be enough to either diagnose it or realize we need to reproduce it locally.

Actions

Copy link

Updated by Greg Bowyer over 10 years ago

Greg Farnum already knows this, but for reference

I spent a large part of today with debug logging on to try to catch this occurring without much success I think the debug logging slows everything down to the point at which it masks timing junk.

I am going to see if I can grab a mds dump when / if it happens again

Greg Bowyer wrote:

Greg Farnum wrote:

This could be an issue with time sync on the nodes; check your clock drift. (That's the only issue I know of that we've run into with Hadoop.) If you're using the Hadoop/CephFS Filesystem you don't need to mount CephFS on the client nodes, btw.

Double checked, all the clocks are in sync (NTP and all)

If that's not the issue, you'll need to reproduce with mds logging turned on (probably debug mds = 20, debug ms = 1) and client logging (debug client = 20, debug ms = 1) and the admin socket enabled on the Hadoop nodes.
Once it hangs, see if you can get the mds to dump its cache ("ceph tell mds.0 dumpcache", I think) and gather the "mds_requests" and "dump_cache" output from the client admin sockets.

I will do this tonight when the cluster is quiet, is there anything else I can grab at the same time ?

Actions

Copy link

Updated by Zheng Yan over 10 years ago

just enable admin socket (without enabling client debug and mds debug). When the hang occurs, dump client request and mds cache (ceph --admin-daemon /var/run/ceph/ceph-fuse.asok mds_requests, ceph mds tell 0 dumpcache /tmp/cachedump.mds0)

Actions

Copy link