Project

General

Profile

Bug #13714

Segmentation fault accessing file using fuse mount

Added by Eric Eastman over 2 years ago. Updated over 2 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
Start date:
11/06/2015
Due date:
% Done:

0%

Source:
Community (user)
Tags:
Backport:
infernalis
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):

Description

On a test Ceph cluster running Ceph v9.1.0 with the 4.3.0 Kernel on Trusty, running the Ceph File System with snapshots enabled, I was attempting to read a file within a snapshot when the I received a read error, followed by a Transport endpoint is not connected error as shown:

root: /cephfs/.snap/snapshot.2015-11-06_14_17_01-1446837421/top/dfgw02/2015-11-06_14_08_38/2/2009/month_4/day_3/hour_2# md5sum -c data_file.md5 
md5sum: data_file.md5: read error
root:/cephfs/.snap/snapshot.2015-11-06_14_17_01-1446837421/top/dfgw02/2015-11-06_14_08_38/2/2009/month_4/day_3/hour_2# ls -lrta
ls: cannot open directory .: Transport endpoint is not connected
root:/cephfs/.snap/snapshot.2015-11-06_14_17_01-1446837421/top/dfgw02/2015-11-06_14_08_38/2/2009/month_4/day_3/hour_2# df
df: ‘/cephfs’: Transport endpoint is not connected

In ceph-client.cephfs.log.1:

2015-11-06 17:03:36.713548 7f96857fa700 -1 *** Caught signal (Segmentation fault) **
 in thread 7f96857fa700

 ceph version 9.1.0 (3be81ae6cf17fcf689cd6f187c4615249fea4f61)
 1: (()+0x25aa1a) [0x55fcb870ca1a]
 2: (()+0x10340) [0x7f96ae901340]
 3: (Client::check_pool_perm(Inode*, int)+0x335) [0x55fcb864bac5]
 4: (Client::get_caps(Inode*, int, int, int*, long)+0x2f) [0x55fcb864cc1f]
 5: (Client::_read(Fh*, long, unsigned long, ceph::buffer::list*)+0x205) [0x55fcb8663f25]
 6: (Client::ll_read(Fh*, long, long, ceph::buffer::list*)+0x8f) [0x55fcb86648ff]
 7: (()+0x17455b) [0x55fcb862655b]
 8: (()+0x1481e) [0x7f96aef8381e]
 9: (()+0x1522b) [0x7f96aef8422b]
 10: (()+0x11e49) [0x7f96aef80e49]
 11: (()+0x8182) [0x7f96ae8f9182]
 12: (clone()+0x6d) [0x7f96ad48347d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- begin dump of recent events ---
-10000> 2015-11-06 17:03:11.936372 7f96867fc700  3 client.5076 ll_forget 1000003028f 1
 -9999> 2015-11-06 17:03:11.936376 7f96867fc700  3 client.5076 ll_getattr 10000030291.14
 -9998> 2015-11-06 17:03:11.936378 7f96867fc700  3 client.5076 ll_getattr 10000030291.14 = 0
 -9997> 2015-11-06 17:03:11.936382 7f96867fc700  3 client.5076 ll_forget 10000030291 1
 -9996> 2015-11-06 17:03:11.936385 7f96867fc700  3 client.5076 ll_lookup 0x7f96940f2d20 2009
 -9995> 2015-11-06 17:03:11.936387 7f96867fc700  3 client.5076 ll_lookup 0x7f96940f2d20 2009 -> 0 (10000030294)
...
  -13> 2015-11-06 17:03:36.712522 7f96867fc700  3 client.5076 ll_getattr 10000032da8.14 = 0
   -12> 2015-11-06 17:03:36.712525 7f96867fc700  3 client.5076 ll_forget 10000032da8 1
   -11> 2015-11-06 17:03:36.712531 7f96867fc700  3 client.5076 ll_open 10000032da8.14 32768
   -10> 2015-11-06 17:03:36.712540 7f96867fc700  5 client.5076 open success, fh is 0x7f9678053aa0 combined IMMUTABLE SNAP caps pAsLsXsFscr
    -9> 2015-11-06 17:03:36.712580 7f96867fc700  3 client.5076 ll_open 10000032da8.14 32768 = 0 (0x7f9678053aa0)
    -8> 2015-11-06 17:03:36.712587 7f96867fc700  3 client.5076 ll_forget 10000032da8 1
    -7> 2015-11-06 17:03:36.712599 7f9685ffb700  3 client.5076 ll_getattr 10000032da8.14
    -6> 2015-11-06 17:03:36.712605 7f9685ffb700  3 client.5076 ll_getattr 10000032da8.14 = 0
    -5> 2015-11-06 17:03:36.712609 7f9685ffb700  3 client.5076 ll_forget 10000032da8 1
    -4> 2015-11-06 17:03:36.712624 7f9684ff9700  3 client.5076 ll_getattr 10000032da8.14
    -3> 2015-11-06 17:03:36.712633 7f9684ff9700  3 client.5076 ll_getattr 10000032da8.14 = 0
    -2> 2015-11-06 17:03:36.712638 7f9684ff9700  3 client.5076 ll_forget 10000032da8 1
    -1> 2015-11-06 17:03:36.712650 7f96857fa700  3 client.5076 ll_read 0x7f9678053aa0 10000032da8  0~4096
     0> 2015-11-06 17:03:36.713548 7f96857fa700 -1 *** Caught signal (Segmentation fault) **
 in thread 7f96857fa700

 ceph version 9.1.0 (3be81ae6cf17fcf689cd6f187c4615249fea4f61)
 1: (()+0x25aa1a) [0x55fcb870ca1a]
 2: (()+0x10340) [0x7f96ae901340]
 3: (Client::check_pool_perm(Inode*, int)+0x335) [0x55fcb864bac5]
 4: (Client::get_caps(Inode*, int, int, int*, long)+0x2f) [0x55fcb864cc1f]
 5: (Client::_read(Fh*, long, unsigned long, ceph::buffer::list*)+0x205) [0x55fcb8663f25]
 6: (Client::ll_read(Fh*, long, long, ceph::buffer::list*)+0x8f) [0x55fcb86648ff]
 7: (()+0x17455b) [0x55fcb862655b]
 8: (()+0x1481e) [0x7f96aef8381e]
 9: (()+0x1522b) [0x7f96aef8422b]
 10: (()+0x11e49) [0x7f96aef80e49]
 11: (()+0x8182) [0x7f96ae8f9182]
 12: (clone()+0x6d) [0x7f96ad48347d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

System info

ceph -v
ceph version 9.1.0 (3be81ae6cf17fcf689cd6f187c4615249fea4f61)

dpkg -l | grep fuse
ii  ceph-fuse                             9.1.0-1trusty                    amd64        FUSE-based client for the Ceph distributed file system
ii  fuse                                  2.9.2-4ubuntu4.14.04.1           amd64        Filesystem in Userspace
ii  libfuse2:amd64                        2.9.2-4ubuntu4.14.04.1           amd64        Filesystem in Userspace (library)

uname -a
Linux dfadm01 4.3.0-040300-generic #201511020949 SMP Mon Nov 2 14:50:44 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

grep ceph /etc/fstab
id=cephfs,keyring=/etc/ceph/client.cephfs.keyring /cephfs fuse.ceph noatime,_netdev,noauto 0 0

I have attached the whole log file.

ceph-client.cephfs.log.1.gz (88.4 KB) Eric Eastman, 11/06/2015 11:23 PM

client-snapc.patch View (1.05 KB) Zheng Yan, 11/09/2015 02:40 AM


Related issues

Copied to fs - Backport #13889: infernalis: Segmentation fault accessing file using fuse mount Resolved

Associated revisions

Revision fad3772f (diff)
Added by Yan, Zheng over 2 years ago

client: use null snapc to check pool permission

snap inodes' ->snaprealm can be NULL, so dereferencing it in
check_pool_perm() can cause segment fault. The pool permission
check does not write any data, so it's safe to use null snapc.

Fixes: #13714
Signed-off-by: Yan, Zheng <>

Revision a2644ed5 (diff)
Added by Yan, Zheng over 2 years ago

client: use null snapc to check pool permission

snap inodes' ->snaprealm can be NULL, so dereferencing it in
check_pool_perm() can cause segment fault. The pool permission
check does not write any data, so it's safe to use null snapc.

Fixes: #13714
Signed-off-by: Yan, Zheng <>
(cherry picked from commit fad3772fb7731272d47cbfd9e81f22f5df3701a2)

History

#1 Updated by John Spray over 2 years ago

Was the snapshotted file's layout pointing to a different data pool than other files accessed by that client? Not clear to me which part of the fn is potentially crashing, but would be useful to know the context.

#2 Updated by Eric Eastman over 2 years ago

The file system is very simple with two pools. One data, one metatdata. Created with the command:

ceph fs new cephfs cephfs_metadata cephfs_data

The pools look like:

 ceph df detail
GLOBAL:
    SIZE     AVAIL     RAW USED     %RAW USED     OBJECTS 
    241T      188T       54076G         21.85       4797k 
POOLS:
    NAME                ID     CATEGORY     USED       %USED     MAX AVAIL     OBJECTS     DIRTY     READ       WRITE  
    rbd                 0      -                 0         0        55819G           0         0          0          0 
    cephfs_data         1      -            17962G      7.26        55819G     4897561     4782k     17800k     12542k 
    cephfs_metadata     2      -              104M         0        55819G       15511     15511      24554      3452k 
    kSAFEbackup         3      -             1463M         0        55819G           8         8          0        376 

Are there any commands that I could run to get you additional information?

#3 Updated by Zheng Yan over 2 years ago

        objecter->mutate(oid, OSDMap::file_to_object_locator(in->layout), wr_op,
                 in->snaprealm->get_snap_context(), ceph_clock_now(cct), 0,
                 &wr_cond, NULL);

maybe in->snaprealm is NULL. I think it's OK to use null snapc in this case.

#4 Updated by Zheng Yan over 2 years ago

please try the attached patch

#5 Updated by Zheng Yan over 2 years ago

  • Status changed from New to Need Review

#6 Updated by Zheng Yan over 2 years ago

  • Status changed from Need Review to Resolved

#7 Updated by Eric Eastman over 2 years ago

Sorry for the delay. It took me a bit of time to create a build environment.

The patch fixed my problem. Thanks!

Can this fix be included in v9.2.1?

will do

#8 Updated by Greg Farnum over 2 years ago

  • Status changed from Resolved to Pending Backport
  • Backport set to infernalis

#9 Updated by Abhishek Varshney over 2 years ago

  • Copied to Backport #13889: infernalis: Segmentation fault accessing file using fuse mount added

#10 Updated by Loic Dachary over 2 years ago

  • Status changed from Pending Backport to Resolved

Also available in: Atom PDF