Project

General

Profile

Actions

Bug #529

closed

Cfuse: Software caused connection abort

Added by Ed Burnette over 13 years ago. Updated over 7 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

After using ceph for a few minutes it gets into a state where I can no longer access the cfuse mount point. It also seems to corrupt the file system so I have to recreate it.

I don't have a specific reproduceable sequence but it's happened several times. This morning, I was testing Ceph, copying a few hundred files into a ceph directory mounted with cfuse. Several copies worked fine. I did some mv's and chmods, no problem. Then I cd'd to the directory I just chmod'd (chmod 777) and tried to run an ls command:

> ls
ls: reading directory .: Software caused connection abort

> ls /mnt/ceph
ls: /mnt/ceph: Transport endpoint is not connected

On another machine, I can ls /mnt/ceph on any directory except the one I was using above:

> ls /mnt/ceph/rtolap
20090906
> ls /mnt/ceph/rtolap/2009*/nosuchfile
ls: /mnt/ceph/rtolap/2009*/nosuchfile: No such file or directory
> ls /mnt/ceph/rtolap/2009*
ls: reading directory /mnt/ceph/rtolap/20090906: Software caused connection abort
> ls /mnt/ceph/rtolap
ls: /mnt/ceph/rtolap: Transport endpoint is not connected

At this point even if I restart all servers, any time somebody access that directory they will crash cfuse. I can't even remove the bad directory without crashing cfuse. The only way to recover is to recreate the file system and clobber all data.

I'm using RHEL5, Linux 2.6.18, ext3 file system (not using xattr), and ceph-0.22.1 .

Actions #1

Updated by Sage Weil over 13 years ago

  • Category set to 11
  • Assignee set to Greg Farnum

Hey Greg, this looks like client truncation stuff again. This was biting me today, almost immediately. These two patches at commit:2444f2982cc3d2e0adbfe177de549ede667def17 (uclient_trunc branch) seemed to fix it for me, but I haven't been paying attention to the whole saga.. can you take a look?

Thanks!

Actions #2

Updated by Ed Burnette over 13 years ago

I was going to apply the patch to my version but I noted that my src/client/Client.h line 516 already says "truncate_size(-1)". (I'm using ceph-0.22.1 - tried to download 0.22.2 but dl link was broken).

Can you tell me, is my setup supposed to work, or is it contributing to this instability? RHEL5.5, uname -a returns "2.6.18-194.11.3.el5 #1 SMP Mon Aug 23 15:51:38 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux", ext3 file system (mount returns "/dev/sda5 on /primary type ext3 (rw)"). For testing purposes I can't change the file system or the mount options.

Actions #3

Updated by Greg Farnum over 13 years ago

  • Status changed from New to Resolved

There were a sequence of commits in this, some of which were one step forward and two steps back. The testing branch fixes this problem and shouldn't add any new issues. :)

Sage's commits are fine and I pulled them in along with a few others -- I generated similar ones a few days ago that somehow didn't get pushed out!

Actions #4

Updated by John Spray over 7 years ago

  • Project changed from Ceph to CephFS
  • Category deleted (11)

Bulk updating project=ceph category=ceph-fuse issues to move to fs project so that we can remove the ceph-fuse category from the ceph project

Actions

Also available in: Atom PDF