Bug #13714
closedSegmentation fault accessing file using fuse mount
0%
Description
On a test Ceph cluster running Ceph v9.1.0 with the 4.3.0 Kernel on Trusty, running the Ceph File System with snapshots enabled, I was attempting to read a file within a snapshot when the I received a read error, followed by a Transport endpoint is not connected error as shown:
root: /cephfs/.snap/snapshot.2015-11-06_14_17_01-1446837421/top/dfgw02/2015-11-06_14_08_38/2/2009/month_4/day_3/hour_2# md5sum -c data_file.md5 md5sum: data_file.md5: read error root:/cephfs/.snap/snapshot.2015-11-06_14_17_01-1446837421/top/dfgw02/2015-11-06_14_08_38/2/2009/month_4/day_3/hour_2# ls -lrta ls: cannot open directory .: Transport endpoint is not connected root:/cephfs/.snap/snapshot.2015-11-06_14_17_01-1446837421/top/dfgw02/2015-11-06_14_08_38/2/2009/month_4/day_3/hour_2# df df: ‘/cephfs’: Transport endpoint is not connected In ceph-client.cephfs.log.1: 2015-11-06 17:03:36.713548 7f96857fa700 -1 *** Caught signal (Segmentation fault) ** in thread 7f96857fa700 ceph version 9.1.0 (3be81ae6cf17fcf689cd6f187c4615249fea4f61) 1: (()+0x25aa1a) [0x55fcb870ca1a] 2: (()+0x10340) [0x7f96ae901340] 3: (Client::check_pool_perm(Inode*, int)+0x335) [0x55fcb864bac5] 4: (Client::get_caps(Inode*, int, int, int*, long)+0x2f) [0x55fcb864cc1f] 5: (Client::_read(Fh*, long, unsigned long, ceph::buffer::list*)+0x205) [0x55fcb8663f25] 6: (Client::ll_read(Fh*, long, long, ceph::buffer::list*)+0x8f) [0x55fcb86648ff] 7: (()+0x17455b) [0x55fcb862655b] 8: (()+0x1481e) [0x7f96aef8381e] 9: (()+0x1522b) [0x7f96aef8422b] 10: (()+0x11e49) [0x7f96aef80e49] 11: (()+0x8182) [0x7f96ae8f9182] 12: (clone()+0x6d) [0x7f96ad48347d] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. --- begin dump of recent events --- -10000> 2015-11-06 17:03:11.936372 7f96867fc700 3 client.5076 ll_forget 1000003028f 1 -9999> 2015-11-06 17:03:11.936376 7f96867fc700 3 client.5076 ll_getattr 10000030291.14 -9998> 2015-11-06 17:03:11.936378 7f96867fc700 3 client.5076 ll_getattr 10000030291.14 = 0 -9997> 2015-11-06 17:03:11.936382 7f96867fc700 3 client.5076 ll_forget 10000030291 1 -9996> 2015-11-06 17:03:11.936385 7f96867fc700 3 client.5076 ll_lookup 0x7f96940f2d20 2009 -9995> 2015-11-06 17:03:11.936387 7f96867fc700 3 client.5076 ll_lookup 0x7f96940f2d20 2009 -> 0 (10000030294) ... -13> 2015-11-06 17:03:36.712522 7f96867fc700 3 client.5076 ll_getattr 10000032da8.14 = 0 -12> 2015-11-06 17:03:36.712525 7f96867fc700 3 client.5076 ll_forget 10000032da8 1 -11> 2015-11-06 17:03:36.712531 7f96867fc700 3 client.5076 ll_open 10000032da8.14 32768 -10> 2015-11-06 17:03:36.712540 7f96867fc700 5 client.5076 open success, fh is 0x7f9678053aa0 combined IMMUTABLE SNAP caps pAsLsXsFscr -9> 2015-11-06 17:03:36.712580 7f96867fc700 3 client.5076 ll_open 10000032da8.14 32768 = 0 (0x7f9678053aa0) -8> 2015-11-06 17:03:36.712587 7f96867fc700 3 client.5076 ll_forget 10000032da8 1 -7> 2015-11-06 17:03:36.712599 7f9685ffb700 3 client.5076 ll_getattr 10000032da8.14 -6> 2015-11-06 17:03:36.712605 7f9685ffb700 3 client.5076 ll_getattr 10000032da8.14 = 0 -5> 2015-11-06 17:03:36.712609 7f9685ffb700 3 client.5076 ll_forget 10000032da8 1 -4> 2015-11-06 17:03:36.712624 7f9684ff9700 3 client.5076 ll_getattr 10000032da8.14 -3> 2015-11-06 17:03:36.712633 7f9684ff9700 3 client.5076 ll_getattr 10000032da8.14 = 0 -2> 2015-11-06 17:03:36.712638 7f9684ff9700 3 client.5076 ll_forget 10000032da8 1 -1> 2015-11-06 17:03:36.712650 7f96857fa700 3 client.5076 ll_read 0x7f9678053aa0 10000032da8 0~4096 0> 2015-11-06 17:03:36.713548 7f96857fa700 -1 *** Caught signal (Segmentation fault) ** in thread 7f96857fa700 ceph version 9.1.0 (3be81ae6cf17fcf689cd6f187c4615249fea4f61) 1: (()+0x25aa1a) [0x55fcb870ca1a] 2: (()+0x10340) [0x7f96ae901340] 3: (Client::check_pool_perm(Inode*, int)+0x335) [0x55fcb864bac5] 4: (Client::get_caps(Inode*, int, int, int*, long)+0x2f) [0x55fcb864cc1f] 5: (Client::_read(Fh*, long, unsigned long, ceph::buffer::list*)+0x205) [0x55fcb8663f25] 6: (Client::ll_read(Fh*, long, long, ceph::buffer::list*)+0x8f) [0x55fcb86648ff] 7: (()+0x17455b) [0x55fcb862655b] 8: (()+0x1481e) [0x7f96aef8381e] 9: (()+0x1522b) [0x7f96aef8422b] 10: (()+0x11e49) [0x7f96aef80e49] 11: (()+0x8182) [0x7f96ae8f9182] 12: (clone()+0x6d) [0x7f96ad48347d] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
System info
ceph -v ceph version 9.1.0 (3be81ae6cf17fcf689cd6f187c4615249fea4f61) dpkg -l | grep fuse ii ceph-fuse 9.1.0-1trusty amd64 FUSE-based client for the Ceph distributed file system ii fuse 2.9.2-4ubuntu4.14.04.1 amd64 Filesystem in Userspace ii libfuse2:amd64 2.9.2-4ubuntu4.14.04.1 amd64 Filesystem in Userspace (library) uname -a Linux dfadm01 4.3.0-040300-generic #201511020949 SMP Mon Nov 2 14:50:44 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux grep ceph /etc/fstab id=cephfs,keyring=/etc/ceph/client.cephfs.keyring /cephfs fuse.ceph noatime,_netdev,noauto 0 0
I have attached the whole log file.
Files
Updated by John Spray over 8 years ago
Was the snapshotted file's layout pointing to a different data pool than other files accessed by that client? Not clear to me which part of the fn is potentially crashing, but would be useful to know the context.
Updated by Eric Eastman over 8 years ago
The file system is very simple with two pools. One data, one metatdata. Created with the command:
ceph fs new cephfs cephfs_metadata cephfs_data
The pools look like:
ceph df detail GLOBAL: SIZE AVAIL RAW USED %RAW USED OBJECTS 241T 188T 54076G 21.85 4797k POOLS: NAME ID CATEGORY USED %USED MAX AVAIL OBJECTS DIRTY READ WRITE rbd 0 - 0 0 55819G 0 0 0 0 cephfs_data 1 - 17962G 7.26 55819G 4897561 4782k 17800k 12542k cephfs_metadata 2 - 104M 0 55819G 15511 15511 24554 3452k kSAFEbackup 3 - 1463M 0 55819G 8 8 0 376
Are there any commands that I could run to get you additional information?
Updated by Zheng Yan over 8 years ago
objecter->mutate(oid, OSDMap::file_to_object_locator(in->layout), wr_op, in->snaprealm->get_snap_context(), ceph_clock_now(cct), 0, &wr_cond, NULL);
maybe in->snaprealm is NULL. I think it's OK to use null snapc in this case.
Updated by Zheng Yan over 8 years ago
- File client-snapc.patch client-snapc.patch added
please try the attached patch
Updated by Zheng Yan over 8 years ago
- Status changed from New to Fix Under Review
Updated by Zheng Yan over 8 years ago
- Status changed from Fix Under Review to Resolved
Updated by Eric Eastman over 8 years ago
Sorry for the delay. It took me a bit of time to create a build environment.
The patch fixed my problem. Thanks!
Can this fix be included in v9.2.1?
will do
Updated by Greg Farnum over 8 years ago
- Status changed from Resolved to Pending Backport
- Backport set to infernalis
Updated by Abhishek Varshney over 8 years ago
- Copied to Backport #13889: infernalis: Segmentation fault accessing file using fuse mount added
Updated by Loïc Dachary about 8 years ago
- Status changed from Pending Backport to Resolved