Project

General

Profile

Bug #4537

mds: hang on rmdir, unlink

Added by Sage Weil about 11 years ago. Updated over 7 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

ubuntu@teuthology:/a/teuthology-2013-03-24_08:46:04-nfs-master-testing-basic/2541$ cat orig.config.yaml 
kernel:
  kdb: true
  sha1: 06fb6a9f87bb1377a6549602fff230d4b352afe9
nuke-on-error: true
overrides:
  ceph:
    log-whitelist:
    - slow request
    sha1: 8befbca77aa50a1188969892aabedaf11d8f8ce7
  s3tests:
    branch: master
  workunit:
    sha1: 8befbca77aa50a1188969892aabedaf11d8f8ce7
roles:
- - mon.a
  - mon.c
  - osd.0
  - osd.1
  - osd.2
- - mon.b
  - mds.a
  - osd.3
  - osd.4
  - osd.5
- - client.0
- - client.1
tasks:
- chef: null
- clock: null
- install: null
- ceph: null
- kclient:
  - client.0
- knfsd:
  - client.0
- nfs:
    client.1:
      options:
      - rw
      - hard
      - intr
      - nfsvers=3
      server: client.0
- workunit:
    clients:
      client.1:
      - suites/dbench-short.sh

on the kclient:
root@plana52:~# cat /sys/kernel/debug/ceph/01e69bf5-da78-43b2-8a10-481af36d8bf9.client4111/mdsc
3736    mds0    rmdir    #10000000004/ACCESS (client.1/tmp/clients/client0/~dmtmp/ACCESS)

cache dump has request holding a few locked items but blocked.
ubuntu@plana24:~$ grep 0x295c000 /tmp/foo
  [dentry #100/stray5/10000000040 [2,head] auth NULL (dn xlock x=1 by 0x295c000) (dversion lock w=1 last_client=4111) pv=0 v=1052 ap=2+0 inode=0 state=new | request=1 lock=2 authpin=1 0x2c1eb00]
  [dentry #1/client.1/tmp/clients/client0/~dmtmp/ACCESS [2,head] auth (dn xlock x=1 by 0x295c000) (dversion lock w=1 last_client=4111) v=4208 ap=2+1 inode=0x2875140 | request=1 lock=2 inodepin=1 dirty=1 authpin=1 clientlease=0 0x285a4d0]

Associated revisions

Revision 14cef276 (diff)
Added by Sam Lang about 11 years ago

mds: CInode::build_backtrace() always incr iter

Always increment the iterator when adding old pools
to the backtrace. This fixes a bug on files where
the layout had been set to a different pool and then
back to the same pool, causing continuous looping in
the build_backtrace() function.

Fixes #4537.
Signed-off-by: Sam Lang <>

History

#1 Updated by Sage Weil about 11 years ago

  • Subject changed from mds: hang on rmdir on dbench-short + nfs to mds: hang on rmdir, unlink
  • Priority changed from Normal to Urgent

similar hang:

root@plana20:~# ceph --admin-daemon /var/run/ceph/ceph-client.0.25094.asok mds_requests
{ "tid": 137,
  "op": "unlink",
  "path": "#10000000012\/foo.25287",
  "path2": "",
  "ino": "10000000012",
  "other_ino": "10000000013",
  "dentry": "foo.25287",
  "hint_ino": "0",
  "sent_stamp": "2013-03-24 08:32:18.548847",
  "mds": 0,
  "resend_mds": -1,
  "send_to_auth": 0,
  "sent_on_mseq": 0,
  "retry_attempt": 0,
  "got_safe": 0,
  "got_unsafe": 1,
  "uid": 1000,
  "gid": 1000,
  "oldest_client_tid": 135,
  "mdsmap_epoch": 0,
  "flags": 0,
  "num_retry": 0,
  "num_fwd": 0,
  "num_releases": 0}{ "tid": 138,
  "op": "readdir",
  "path": "#10000000012",
  "path2": "",
  "ino": "10000000012",
  "hint_ino": "0",
  "sent_stamp": "2013-03-24 08:32:18.559149",
  "mds": 0,
  "resend_mds": -1,
  "send_to_auth": 0,
  "sent_on_mseq": 0,
  "retry_attempt": 0,
  "got_safe": 0,
  "got_unsafe": 0,
  "readdir_frag": "*",
  "readdir_start": "",
  "readdir_offset": 2,
  "uid": 0,
  "gid": 0,
  "oldest_client_tid": 135,
  "mdsmap_epoch": 0,
  "flags": 0,
  "num_retry": 0,
  "num_fwd": 0,
  "num_releases": 0}


with job
ubuntu@teuthology:/a/sage-2013-03-24_08:29:36-fs-master-testing-basic/2394$ cat orig.config.yaml 
kernel:
  kdb: true
  sha1: 06fb6a9f87bb1377a6549602fff230d4b352afe9
machine_type: plana
nuke-on-error: true
overrides:
  ceph:
    conf:
      mon:
        debug mon: 20
        debug ms: 20
        debug paxos: 20
      osd:
        osd op thread timeout: 60
    fs: btrfs
    log-whitelist:
    - slow request
    sha1: 8befbca77aa50a1188969892aabedaf11d8f8ce7
  s3tests:
    branch: master
  workunit:
    sha1: 8befbca77aa50a1188969892aabedaf11d8f8ce7
roles:
- - mon.a
  - mon.c
  - osd.0
  - osd.1
  - osd.2
- - mon.b
  - mds.a
  - osd.3
  - osd.4
  - osd.5
- - client.0
tasks:
- chef: null
- install: null
- ceph: null
- ceph-fuse: null
- workunit:
    clients:
      all:
      - misc

#2 Updated by Sage Weil about 11 years ago

ubuntu@teuthology:/var/lib/teuthworker/archive/teuthology-2013-03-24_08:45:56-kernel-master-testing-basic/2503

root@plana46:~# cat /sys/kernel/debug/ceph/3a30f280-b4a5-43c4-8eb4-1d193bb58349.client4137/mdsc
81      mds0    unlink  (unsafe) #10000000012/foo.30400 (client.0/tmp/foo.30400)
...

#3 Updated by Sage Weil about 11 years ago

ubuntu@teuthology:/var/lib/teuthworker/archive/teuthology-2013-03-24_08:45:56-kernel-master-testing-basic/2501

croot@plana54:~# cat /sys/kernel/debug/ceph/3a4ddc20-c264-453d-a2ae-002533ce9446client4119/mdsc
31 mds0 create (unsafe) #10000000001/temp (client.0/tmp/temp)

#4 Updated by Sam Lang about 11 years ago

  • Status changed from New to In Progress
  • Assignee set to Sam Lang

#5 Updated by Sam Lang about 11 years ago

  • Status changed from In Progress to Fix Under Review

Fix pushed to wip-4537.

#6 Updated by Sam Lang about 11 years ago

  • Status changed from Fix Under Review to Resolved

#7 Updated by Greg Farnum over 7 years ago

  • Component(FS) MDS added

Also available in: Atom PDF