Project

General

Profile

Bug #4850

ceph-fuse: disconnected inode on shutdown with fsstress + mds thrashing

Added by Sage Weil almost 11 years ago. Updated over 10 years ago.

Status:
Resolved
Priority:
High
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

2013-04-28 08:48:35.479385 7f03e249d780  1 client.4144 dump_cache
2013-04-28 08:48:35.479404 7f03e249d780  1 client.4144 dump_inode: DISCONNECTED inode 10000000004 #10000000004 ref 010000000004.head(ref=0 cap_refs={} open={2=0} mode=100666 size=0 mtime=2013-04-27 23:11:30.544590 caps=- objectset[10000000004 ts 0/0 objects 0 dirty_or_tx 0] 0x2ae8900)
2013-04-28 08:48:35.479425 7f03e249d780  1 client.4144 dump_inode: DISCONNECTED inode 10000000037 #10000000037 ref 010000000037.head(ref=0 cap_refs={1024=0,4096=0,8192=0} open={2=0} mode=100666 size=478995 mtime=2013-04-27 23:11:34.561768 caps=- objectset[10000000037 ts 2/478995 objects 0 dirty_or_tx 0] 0x34d2000)
2013-04-28 08:48:35.479439 7f03e249d780  1 client.4144 dump_inode: DISCONNECTED inode 10000000014 #10000000014 ref 010000000014.head(ref=0 cap_refs={} open={} mode=40777 size=0 mtime=2013-04-27 23:11:30.622570 caps=- COMPLETE 0x2b2b000)

job was
ubuntu@teuthology:/a/teuthology-2013-04-27_20:55:21-fs-next-testing-basic/2193$ cat orig.config.yaml 
kernel:
  kdb: true
  sha1: 42c6070519ad45965762de9b20f5bd280a6eef68
machine_type: plana
nuke-on-error: true
overrides:
  ceph:
    conf:
      client:
        debug client: 10
      mds:
        debug mds: 20
        debug ms: 1
      mon:
        debug mon: 20
        debug ms: 20
        debug paxos: 20
    log-whitelist:
    - slow request
    - wrongly marked me down
    sha1: 5327d06275972dc49150505b113051743a87f8b5
  s3tests:
    branch: next
  workunit:
    sha1: 5327d06275972dc49150505b113051743a87f8b5
roles:
- - mon.a
  - mon.c
  - osd.0
  - osd.1
  - osd.2
- - mon.b
  - mds.a
  - osd.3
  - osd.4
  - osd.5
- - client.0
  - mds.b-s-a
tasks:
- chef: null
- clock.check: null
- install: null
- ceph: null
- mds_thrash: null
- ceph-fuse: null
- workunit:
    clients:
      all:
      - suites/fsstress.sh

ceph-client.0.28605.log.gz (18.5 MB) Sam Lang, 04/30/2013 01:26 PM

ceph-mds.b-s-a.log.04.gz (13.8 KB) Sam Lang, 04/30/2013 01:26 PM

ceph-mds.a.log.04.gz (17.5 KB) Sam Lang, 04/30/2013 01:26 PM


Related issues

Related to CephFS - Bug #5381: ceph-fuse: stuck with disconnected inodes on shutdown Resolved 06/17/2013

History

#1 Updated by Sage Weil almost 11 years ago

  • Project changed from Ceph to CephFS
  • Subject changed from ceph-fuse: to ceph-fuse: disconnected inode on shutdown with fsstress + mds thrashing
  • Category deleted (1)

#2 Updated by Sage Weil almost 11 years ago

have full log.. put a copy in the run dir

#3 Updated by Greg Farnum almost 11 years ago

/a/teuthology-2013-04-28_21:32:40-fs-next-testing-basic/2662

That's an fsstress run that got hung, I copied the client log into that dir.

#4 Updated by Sage Weil almost 11 years ago

  • Assignee set to Sam Lang

#5 Updated by Sam Lang almost 11 years ago

  • Status changed from 12 to In Progress

#6 Updated by Sam Lang almost 11 years ago

This looks like the client creates a file, then unlinks it, but it never removes it from its cache, because it still holds the caps for the inode (caps=pAsXsFs). Then the mds goes through a series of failures, with the standby mds taking over, and every time the inode in question (10000000004) gets replayed (both the openc and unlink). Eventually, the log gets trimmed, and the replay doesn't include those ops. This is where the mds sends back cap export messages for the inode, because the inode isn't in the mds cache from journal replay or from the client replaying unsafe requests.

The mds should be sending back a caps revoke for the remaining caps on unlink, I think. Its not clear yet why that's not happening...

#7 Updated by Greg Farnum almost 11 years ago

We can't revoke on unlink because the file might still be held open with something accessing it. :)

#8 Updated by Sage Weil almost 11 years ago

  • Priority changed from Urgent to High

#9 Updated by Sam Lang almost 11 years ago

The attached files include the complete client log, along with the mds logs that include 10000000004 (one of the indoes that is disconnected at the client). The ceph-mds.a.log is the mds that sends the cap exports. The full mds logs are available on teuthology at:

/a/teuthology-2013-04-27_20:55:/log/ceph-mds.b-s-a.log.gz
/a/teuthology-2013-04-27_20:55:/log/ceph-mds.a.log.gz

(6GB each)

#10 Updated by Zheng Yan almost 11 years ago

I think this is a general issue. When handling MClientReconnect, if an inode is not in the cache, the MDS tries fetching the missing inode by the path client provides. But the path client provides isn't always accurate. (because the MDS does not send notifications to clients when handling rename/unlink)

#11 Updated by Sage Weil almost 11 years ago

ubuntu@teuthology:/var/lib/teuthworker/archive/teuthology-2013-05-01_01:00:37-fs-next-testing-basic/4534

#12 Updated by Greg Farnum almost 11 years ago

  • Status changed from In Progress to 12
  • Assignee deleted (Sam Lang)

Hmm, I thought we handled renames properly since they involve changing the caps state. But maybe we don't propagate the new path out to clients. :(
We certainly don't do so on unlinks, and it is indeed a problem. The clean solution would be to fix it in the protocol, although Sage suggests we might be able to do something with pinning deleted-but-referenced nodes. (I'm not at all sure we want to commit to that as a long-term solution, though.)

Sam, you're not actively working on this any more, right?

#13 Updated by Zheng Yan almost 11 years ago

  • Assignee set to Zheng Yan

FYI: I have code that finds the missing inode by using backtrace. The code is under test, will send out soon.

#14 Updated by Zheng Yan almost 11 years ago

  • Assignee deleted (Zheng Yan)

#15 Updated by Sage Weil over 10 years ago

  • Status changed from 12 to Resolved

i think this is resolved now...

Also available in: Atom PDF