Project

General

Profile

Actions

Bug #11136

closed

cfuse: 0.80.7 client segfault

Added by David Matson about 9 years ago. Updated about 9 years ago.

Status:
Won't Fix
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

I don't know what other information I can provide other than to say that occasionally some of our clients segfault with this trace:

Mar 17 13:37:38 api-slc-01-01 ceph-fuse:     -4> 2015-03-17 13:37:37.822311 7f1f970947c0  1 -- 10.1.1.210:0/821 --> 10.1.1.30:6800/43743 -- client_request(client.9222:2684 readdir #10000000b9e) v1 -- ?+0 0x98578f0 con 0x247ec50
Mar 17 13:37:38 api-slc-01-01 ceph-fuse:     -3> 2015-03-17 13:37:37.823452 7f1f8c87f700  1 -- 10.1.1.210:0/821 <== mds.0 10.1.1.30:6800/43743 10762 ==== client_reply(???:2684 = 0 (0) Success) v1 ==== 3946+0+0 (1120753239 0 0) 0x7f1f64004280 con 0x247ec50
Mar 17 13:37:38 api-slc-01-01 ceph-fuse:     -2> 2015-03-17 13:37:37.828084 7f1f8a174700  2 -- 10.1.1.210:0/821 >> 10.1.1.54:6872/17125 pipe(0x7f1f6c0b75c0 sd=2 :34782 s=2 pgs=615 cs=1 l=1 c=0x7f1f6c162570).reader couldn't read tag, (0) Success
Mar 17 13:37:38 api-slc-01-01 ceph-fuse:     -1> 2015-03-17 13:37:37.828136 7f1f8a174700  2 -- 10.1.1.210:0/821 >> 10.1.1.54:6872/17125 pipe(0x7f1f6c0b75c0 sd=2 :34782 s=2 pgs=615 cs=1 l=1 c=0x7f1f6c162570).fault (0) Success
Mar 17 13:37:38 api-slc-01-01 ceph-fuse:      0> 2015-03-17 13:37:37.837862 7f1f8c87f700 -1 *** Caught signal (Segmentation fault) **
 in thread 7f1f8c87f700

 ceph version 0.80.7 (6c0127fcb58008793d3c8b62d925bc91963672a3)
 1: ceph-fuse() [0x5e239f]
 2: (()+0x10340) [0x7f1f96a3f340]
 3: (std::tr1::_Hashtable<std::string, std::pair<std::string const, Dentry*>, std::allocator<std::pair<std::string const, Dentry*> >, std::_Select1st<std::pair<std::string const, Dentry*> >, std::equal_to<std::string>, std::tr1::hash<std::string>, std::tr1::__detail::_Mod_range_hashing, std::tr1::__detail::_Default_ranged_hash, std::tr1::__detail::_Prime_rehash_policy, false, false, true>::erase(std::string const&)+0x48) [0x5814f8]
 4: (Client::unlink(Dentry*, bool)+0x219) [0x539199]
 5: (Client::insert_readdir_results(MetaRequest*, MetaSession*, Inode*)+0xe62) [0x555872]
 6: (Client::insert_trace(MetaRequest*, MetaSession*)+0xac5) [0x556a15]
 7: (Client::handle_client_reply(MClientReply*)+0x239) [0x557539]
 8: (Client::ms_dispatch(Message*)+0x5cb) [0x560d1b]
 9: (DispatchQueue::entry()+0x57a) [0x7998da]
 10: (DispatchQueue::DispatchThread::entry()+0xd) [0x6a96cd]
 11: (()+0x8182) [0x7f1f96a37182]
 12: (clone()+0x6d) [0x7f1f953ae47d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

Files

ceph-fuse.log-20150318.gz (730 KB) ceph-fuse.log-20150318.gz David Matson, 03/19/2015 04:52 PM
Actions #1

Updated by Greg Farnum about 9 years ago

If you have log files on these nodes, they should contain a bit more information following the crash. Can you upload that log file?

Actions #2

Updated by Greg Farnum about 9 years ago

Note: I assume it's the dn->dir member that's uninitialized here or something.

Actions #4

Updated by Greg Farnum about 9 years ago

  • Status changed from New to Won't Fix

Hmm, the relevant code here is very different in the Hammer release and I don't think it will be susceptible to this failure mode.

Unfortunately with what we've got I can't come up with anything else, and given CephFS' support state I don't think this is likely to get fixed for firefly. :(

Actions #5

Updated by David Matson about 9 years ago

Not a deal breaker as we have a cron job to fix the mount if it dies. Just thought I'd throw this up here in case it was worthwhile.

Actions

Also available in: Atom PDF