Project

General

Profile

Bug #12354

ceph-fuse crash in ll_fsync, during TestClientRecovery.test_fsync

Added by John Spray over 8 years ago. Updated over 8 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
hammer
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

teuthology-2015-07-13_23:04:03-fs-master---basic-multi/972688/

ceph-client.0.4375.log

   -20> 2015-07-15 15:03:26.815210 7f22f1ffb700 20 client.4210 _lookup have dn subdir mds.0 ttl 2015-07-15 15:03:56.764008 seq 2
   -19> 2015-07-15 15:03:26.815217 7f22f1ffb700 10 client.4210 _lookup 1.head(ref=3 ll_ref=12 cap_refs={} open={} mode=41777 size=0/0 mtime=2015-07-15 15:03:26.763942 caps=pAsLsXsFs(0=pAsLsXsFs) has_dir_layout 0x7f22fc00be50) subdir = 10000000000.head(ref=4 ll_ref=2 cap_refs={} open={} mode=40755 size=0/0 mtime=2015-07-15 15:03:26.808625 caps=pAsLsXsFsx(0=pAsLsXsFsx) COMPLETE parents=0x7f22d4001350 0x7f22fc010fe0)
   -18> 2015-07-15 15:03:26.815244 7f22f1ffb700 10 client.4210 fill_stat on 10000000000 snap/devhead mode 040755 mtime 2015-07-15 15:03:26.808625 ctime 2015-07-15 15:03:26.808625
   -17> 2015-07-15 15:03:26.815251 7f22f1ffb700 20 client.4210 _ll_get 0x7f22fc010fe0 10000000000 -> 3
   -16> 2015-07-15 15:03:26.815252 7f22f1ffb700  3 client.4210 ll_lookup 0x7f22fc00be50 subdir -> 0 (10000000000)
   -15> 2015-07-15 15:03:26.815265 7f22f1ffb700  3 client.4210 ll_forget 1 1
   -14> 2015-07-15 15:03:26.815278 7f22f1ffb700 20 client.4210 _ll_get 0x7f22fc010fe0 10000000000 -> 4
   -13> 2015-07-15 15:03:26.815283 7f22f1ffb700  3 client.4210 ll_getattr 10000000000.head
   -12> 2015-07-15 15:03:26.815292 7f22f1ffb700 10 client.4210 _getattr mask pAsLsXsFs issued=1
   -11> 2015-07-15 15:03:26.815297 7f22f1ffb700 10 client.4210 fill_stat on 10000000000 snap/devhead mode 040755 mtime 2015-07-15 15:03:26.808625 ctime 2015-07-15 15:03:26.808625
   -10> 2015-07-15 15:03:26.815304 7f22f1ffb700  3 client.4210 ll_getattr 10000000000.head = 0
    -9> 2015-07-15 15:03:26.815313 7f22f1ffb700  3 client.4210 ll_forget 10000000000 1
    -8> 2015-07-15 15:03:26.815317 7f22f1ffb700 20 client.4210 _ll_put 0x7f22fc010fe0 10000000000 1 -> 3
    -7> 2015-07-15 15:03:26.815326 7f22f1ffb700 20 client.4210 _ll_get 0x7f22fc010fe0 10000000000 -> 4
    -6> 2015-07-15 15:03:26.815330 7f22f1ffb700  3 client.4210 ll_opendir 10000000000.head
    -5> 2015-07-15 15:03:26.815332 7f22f1ffb700 10 client.4210 _opendir 10000000000, our cache says the first dirfrag is *
    -4> 2015-07-15 15:03:26.815334 7f22f1ffb700  3 client.4210 _opendir(10000000000) = 0 (0x7f22e80086c0)
    -3> 2015-07-15 15:03:26.815336 7f22f1ffb700  3 client.4210 ll_opendir 10000000000.head = 0 (0x7f22e80086c0)
    -2> 2015-07-15 15:03:26.815346 7f22f1ffb700  3 client.4210 ll_forget 10000000000 1
    -1> 2015-07-15 15:03:26.815350 7f22f1ffb700 20 client.4210 _ll_put 0x7f22fc010fe0 10000000000 1 -> 3
     0> 2015-07-15 15:03:26.829080 7f22f27fc700 -1 *** Caught signal (Segmentation fault) **
 in thread 7f22f27fc700

 ceph version 9.0.1-1475-g970195e (970195e86a01503921f248d469a73f3611747197)
 1: ceph-fuse() [0x62d635]
 2: (()+0xfcb0) [0x7f231a192cb0]
 3: (Client::ll_fsync(Fh*, bool)+0x15d) [0x57705d]
 4: ceph-fuse() [0x54e515]
 5: (()+0x124d5) [0x7f231a5cb4d5]
 6: (()+0x110e6) [0x7f231a5ca0e6]
 7: (()+0x7e9a) [0x7f231a18ae9a]
 8: (clone()+0x6d) [0x7f2318d523fd]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

Associated revisions

Revision 613f5481 (diff)
Added by Yan, Zheng over 8 years ago

client: fix directory fsync

When opening a regular file, fuse assigns a 'struct Fh' pointer to
fuse_file_info::fh. but when openning a directory, fuse assigns a
'struct dir_result_t' to fuse_file_info::fh. So we need a seperate
function for fsyncdir (cast fuse_file_info::fh to a struct
dir_result_t pointer)

Fixes: #12354
Signed-off-by: Yan, Zheng <>

History

#1 Updated by Greg Farnum over 8 years ago

  • Priority changed from Normal to Urgent

#2 Updated by Greg Farnum over 8 years ago

  • Status changed from New to In Progress
  • Assignee set to Greg Farnum

#3 Updated by Greg Farnum over 8 years ago

Looking at the core dump, it's crashing on dereferencing a NULL Inode* contained in the passed-in Fh structure. But the Fh in use is 0x7f22e80086c0, which if you look at the log snippet above is used in a different thread as a dir_result_t!
It's definitely a dir_result_t (at least now!), if you look at that region of memory in the core dump. Since both of these functions are using values that are passed in by the calling code, it kind of looks like an error in the stack above us...?

#4 Updated by Greg Farnum over 8 years ago

Or more likely we're freeing the Fh inappropriately and reusing the memory for a dir_result_t, since we control the lifetimes of both?

#5 Updated by Greg Farnum over 8 years ago

  • Assignee changed from Greg Farnum to Zheng Yan

I'm not finding where it's gone wrong, but I think this must be an issue with the new refcounting. Please take a look, Zheng.

#6 Updated by Greg Farnum over 8 years ago

I've got the core file and appropriate packages on vpm119 if you want an environment to look at it with.

#7 Updated by Zheng Yan over 8 years ago

  • Status changed from In Progress to Fix Under Review

#8 Updated by Greg Farnum over 8 years ago

  • Status changed from Fix Under Review to Pending Backport
  • Backport set to hammer

I guess this is a long-standing issue, but given CephFS' support state just backporting to hammer should be fine.

#9 Updated by Zheng Yan over 8 years ago

  • Status changed from Pending Backport to Resolved

fuse fsyncdir callback was added recently

Also available in: Atom PDF