Project

General

Profile

Actions

Bug #3267

closed

Multiple active MDSes stall when listing freshly created files

Added by Stan Schwertly over 11 years ago. Updated about 5 years ago.

Status:
Closed
Priority:
Low
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Common/Protocol, MDS
Labels (FS):
multimds
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

The output from ceph-debugpack can be found at the following location: [[http://cumulonim.biz/mds.tar.gz]] We were running the following version (but tested with HEAD/ad97bbb0a1e985b91ab0ffe9ae5b15cfce465211 as well):

ceph version 0.52 (commit:e48859474c4944d4ff201ddc9f5fd400e8898173)

We created a Ceph cluster on a single box with 3 monitors, 9 MDSes (6 active, 3 standby), and 5 OSDs. Note: we've seen the same issue when separated across several machines, but kept it down to one box for this report and for simplicity. After creating the cluster, we mounted CephFS from another server and ran the following line:

for D in {0..99}; do  mkdir -p /mnt/ceph/$D; for F in {0..99}; do echo "Hello $F in $D" > /mnt/ceph/$D/$F; done; done

In the specific instance (of the logs provided,) we were able to complete this command, but immediately found the MDSes to "stall" after trying to `ls /mnt/ceph/*` (we saw 2 directories returned prior to the command hanging.) Note: in other runs of this test, the command to create the files and directories would also stall. We ran these tests repeatedly with similar outcomes.

Actions #1

Updated by Greg Farnum over 11 years ago

I'll try and take a look at this, but multi-MDS setups are known to be pretty unstable at this point. Have you tried just using one active with some standbys?

Actions #2

Updated by Stan Schwertly over 11 years ago

Greg Farnum wrote:

I'll try and take a look at this, but multi-MDS setups are known to be pretty unstable at this point. Have you tried just using one active with some standbys?

We are aware of the issue with multi-MDS setups, hence the report :)! For good measure, we reissued the command against a cluster with a single MDS and two standbys and it worked like a charm.

Let us know if there's any more detail we can provide.

Actions #3

Updated by Greg Farnum over 11 years ago

  • Priority changed from Normal to Low

Currently de-prioritizing multi-MDS bugs.

Actions #4

Updated by Greg Farnum almost 8 years ago

  • Category changed from 47 to 90
  • Component(FS) Common/Protocol, MDS added
Actions #5

Updated by John Spray over 7 years ago

  • Status changed from New to Closed

This ticket is old and the use case seems like something we will pick up on from the multimds suite if it's still broken.

Actions #6

Updated by Patrick Donnelly about 5 years ago

  • Category deleted (90)
  • Labels (FS) multimds added
Actions

Also available in: Atom PDF