Project

General

Profile

Actions

Bug #13714

closed

Segmentation fault accessing file using fuse mount

Added by Eric Eastman over 8 years ago. Updated about 8 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
infernalis
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

On a test Ceph cluster running Ceph v9.1.0 with the 4.3.0 Kernel on Trusty, running the Ceph File System with snapshots enabled, I was attempting to read a file within a snapshot when the I received a read error, followed by a Transport endpoint is not connected error as shown:

root: /cephfs/.snap/snapshot.2015-11-06_14_17_01-1446837421/top/dfgw02/2015-11-06_14_08_38/2/2009/month_4/day_3/hour_2# md5sum -c data_file.md5 
md5sum: data_file.md5: read error
root:/cephfs/.snap/snapshot.2015-11-06_14_17_01-1446837421/top/dfgw02/2015-11-06_14_08_38/2/2009/month_4/day_3/hour_2# ls -lrta
ls: cannot open directory .: Transport endpoint is not connected
root:/cephfs/.snap/snapshot.2015-11-06_14_17_01-1446837421/top/dfgw02/2015-11-06_14_08_38/2/2009/month_4/day_3/hour_2# df
df: ‘/cephfs’: Transport endpoint is not connected

In ceph-client.cephfs.log.1:

2015-11-06 17:03:36.713548 7f96857fa700 -1 *** Caught signal (Segmentation fault) **
 in thread 7f96857fa700

 ceph version 9.1.0 (3be81ae6cf17fcf689cd6f187c4615249fea4f61)
 1: (()+0x25aa1a) [0x55fcb870ca1a]
 2: (()+0x10340) [0x7f96ae901340]
 3: (Client::check_pool_perm(Inode*, int)+0x335) [0x55fcb864bac5]
 4: (Client::get_caps(Inode*, int, int, int*, long)+0x2f) [0x55fcb864cc1f]
 5: (Client::_read(Fh*, long, unsigned long, ceph::buffer::list*)+0x205) [0x55fcb8663f25]
 6: (Client::ll_read(Fh*, long, long, ceph::buffer::list*)+0x8f) [0x55fcb86648ff]
 7: (()+0x17455b) [0x55fcb862655b]
 8: (()+0x1481e) [0x7f96aef8381e]
 9: (()+0x1522b) [0x7f96aef8422b]
 10: (()+0x11e49) [0x7f96aef80e49]
 11: (()+0x8182) [0x7f96ae8f9182]
 12: (clone()+0x6d) [0x7f96ad48347d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- begin dump of recent events ---
-10000> 2015-11-06 17:03:11.936372 7f96867fc700  3 client.5076 ll_forget 1000003028f 1
 -9999> 2015-11-06 17:03:11.936376 7f96867fc700  3 client.5076 ll_getattr 10000030291.14
 -9998> 2015-11-06 17:03:11.936378 7f96867fc700  3 client.5076 ll_getattr 10000030291.14 = 0
 -9997> 2015-11-06 17:03:11.936382 7f96867fc700  3 client.5076 ll_forget 10000030291 1
 -9996> 2015-11-06 17:03:11.936385 7f96867fc700  3 client.5076 ll_lookup 0x7f96940f2d20 2009
 -9995> 2015-11-06 17:03:11.936387 7f96867fc700  3 client.5076 ll_lookup 0x7f96940f2d20 2009 -> 0 (10000030294)
...
  -13> 2015-11-06 17:03:36.712522 7f96867fc700  3 client.5076 ll_getattr 10000032da8.14 = 0
   -12> 2015-11-06 17:03:36.712525 7f96867fc700  3 client.5076 ll_forget 10000032da8 1
   -11> 2015-11-06 17:03:36.712531 7f96867fc700  3 client.5076 ll_open 10000032da8.14 32768
   -10> 2015-11-06 17:03:36.712540 7f96867fc700  5 client.5076 open success, fh is 0x7f9678053aa0 combined IMMUTABLE SNAP caps pAsLsXsFscr
    -9> 2015-11-06 17:03:36.712580 7f96867fc700  3 client.5076 ll_open 10000032da8.14 32768 = 0 (0x7f9678053aa0)
    -8> 2015-11-06 17:03:36.712587 7f96867fc700  3 client.5076 ll_forget 10000032da8 1
    -7> 2015-11-06 17:03:36.712599 7f9685ffb700  3 client.5076 ll_getattr 10000032da8.14
    -6> 2015-11-06 17:03:36.712605 7f9685ffb700  3 client.5076 ll_getattr 10000032da8.14 = 0
    -5> 2015-11-06 17:03:36.712609 7f9685ffb700  3 client.5076 ll_forget 10000032da8 1
    -4> 2015-11-06 17:03:36.712624 7f9684ff9700  3 client.5076 ll_getattr 10000032da8.14
    -3> 2015-11-06 17:03:36.712633 7f9684ff9700  3 client.5076 ll_getattr 10000032da8.14 = 0
    -2> 2015-11-06 17:03:36.712638 7f9684ff9700  3 client.5076 ll_forget 10000032da8 1
    -1> 2015-11-06 17:03:36.712650 7f96857fa700  3 client.5076 ll_read 0x7f9678053aa0 10000032da8  0~4096
     0> 2015-11-06 17:03:36.713548 7f96857fa700 -1 *** Caught signal (Segmentation fault) **
 in thread 7f96857fa700

 ceph version 9.1.0 (3be81ae6cf17fcf689cd6f187c4615249fea4f61)
 1: (()+0x25aa1a) [0x55fcb870ca1a]
 2: (()+0x10340) [0x7f96ae901340]
 3: (Client::check_pool_perm(Inode*, int)+0x335) [0x55fcb864bac5]
 4: (Client::get_caps(Inode*, int, int, int*, long)+0x2f) [0x55fcb864cc1f]
 5: (Client::_read(Fh*, long, unsigned long, ceph::buffer::list*)+0x205) [0x55fcb8663f25]
 6: (Client::ll_read(Fh*, long, long, ceph::buffer::list*)+0x8f) [0x55fcb86648ff]
 7: (()+0x17455b) [0x55fcb862655b]
 8: (()+0x1481e) [0x7f96aef8381e]
 9: (()+0x1522b) [0x7f96aef8422b]
 10: (()+0x11e49) [0x7f96aef80e49]
 11: (()+0x8182) [0x7f96ae8f9182]
 12: (clone()+0x6d) [0x7f96ad48347d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

System info

ceph -v
ceph version 9.1.0 (3be81ae6cf17fcf689cd6f187c4615249fea4f61)

dpkg -l | grep fuse
ii  ceph-fuse                             9.1.0-1trusty                    amd64        FUSE-based client for the Ceph distributed file system
ii  fuse                                  2.9.2-4ubuntu4.14.04.1           amd64        Filesystem in Userspace
ii  libfuse2:amd64                        2.9.2-4ubuntu4.14.04.1           amd64        Filesystem in Userspace (library)

uname -a
Linux dfadm01 4.3.0-040300-generic #201511020949 SMP Mon Nov 2 14:50:44 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

grep ceph /etc/fstab
id=cephfs,keyring=/etc/ceph/client.cephfs.keyring /cephfs fuse.ceph noatime,_netdev,noauto 0 0

I have attached the whole log file.


Files

ceph-client.cephfs.log.1.gz (88.4 KB) ceph-client.cephfs.log.1.gz Eric Eastman, 11/06/2015 11:23 PM
client-snapc.patch (1.05 KB) client-snapc.patch Zheng Yan, 11/09/2015 02:40 AM

Related issues 1 (0 open1 closed)

Copied to CephFS - Backport #13889: infernalis: Segmentation fault accessing file using fuse mountResolvedAbhishek VarshneyActions
Actions #1

Updated by John Spray over 8 years ago

Was the snapshotted file's layout pointing to a different data pool than other files accessed by that client? Not clear to me which part of the fn is potentially crashing, but would be useful to know the context.

Actions #2

Updated by Eric Eastman over 8 years ago

The file system is very simple with two pools. One data, one metatdata. Created with the command:

ceph fs new cephfs cephfs_metadata cephfs_data

The pools look like:

 ceph df detail
GLOBAL:
    SIZE     AVAIL     RAW USED     %RAW USED     OBJECTS 
    241T      188T       54076G         21.85       4797k 
POOLS:
    NAME                ID     CATEGORY     USED       %USED     MAX AVAIL     OBJECTS     DIRTY     READ       WRITE  
    rbd                 0      -                 0         0        55819G           0         0          0          0 
    cephfs_data         1      -            17962G      7.26        55819G     4897561     4782k     17800k     12542k 
    cephfs_metadata     2      -              104M         0        55819G       15511     15511      24554      3452k 
    kSAFEbackup         3      -             1463M         0        55819G           8         8          0        376 

Are there any commands that I could run to get you additional information?

Actions #3

Updated by Zheng Yan over 8 years ago

        objecter->mutate(oid, OSDMap::file_to_object_locator(in->layout), wr_op,
                 in->snaprealm->get_snap_context(), ceph_clock_now(cct), 0,
                 &wr_cond, NULL);

maybe in->snaprealm is NULL. I think it's OK to use null snapc in this case.

Actions #4

Updated by Zheng Yan over 8 years ago

please try the attached patch

Actions #5

Updated by Zheng Yan over 8 years ago

  • Status changed from New to Fix Under Review
Actions #6

Updated by Zheng Yan over 8 years ago

  • Status changed from Fix Under Review to Resolved
Actions #7

Updated by Eric Eastman over 8 years ago

Sorry for the delay. It took me a bit of time to create a build environment.

The patch fixed my problem. Thanks!

Can this fix be included in v9.2.1?

will do

Actions #8

Updated by Greg Farnum over 8 years ago

  • Status changed from Resolved to Pending Backport
  • Backport set to infernalis
Actions #9

Updated by Abhishek Varshney over 8 years ago

  • Copied to Backport #13889: infernalis: Segmentation fault accessing file using fuse mount added
Actions #10

Updated by Loïc Dachary about 8 years ago

  • Status changed from Pending Backport to Resolved
Actions

Also available in: Atom PDF