Bug #11136
closedcfuse: 0.80.7 client segfault
0%
Description
I don't know what other information I can provide other than to say that occasionally some of our clients segfault with this trace:
Mar 17 13:37:38 api-slc-01-01 ceph-fuse: -4> 2015-03-17 13:37:37.822311 7f1f970947c0 1 -- 10.1.1.210:0/821 --> 10.1.1.30:6800/43743 -- client_request(client.9222:2684 readdir #10000000b9e) v1 -- ?+0 0x98578f0 con 0x247ec50 Mar 17 13:37:38 api-slc-01-01 ceph-fuse: -3> 2015-03-17 13:37:37.823452 7f1f8c87f700 1 -- 10.1.1.210:0/821 <== mds.0 10.1.1.30:6800/43743 10762 ==== client_reply(???:2684 = 0 (0) Success) v1 ==== 3946+0+0 (1120753239 0 0) 0x7f1f64004280 con 0x247ec50 Mar 17 13:37:38 api-slc-01-01 ceph-fuse: -2> 2015-03-17 13:37:37.828084 7f1f8a174700 2 -- 10.1.1.210:0/821 >> 10.1.1.54:6872/17125 pipe(0x7f1f6c0b75c0 sd=2 :34782 s=2 pgs=615 cs=1 l=1 c=0x7f1f6c162570).reader couldn't read tag, (0) Success Mar 17 13:37:38 api-slc-01-01 ceph-fuse: -1> 2015-03-17 13:37:37.828136 7f1f8a174700 2 -- 10.1.1.210:0/821 >> 10.1.1.54:6872/17125 pipe(0x7f1f6c0b75c0 sd=2 :34782 s=2 pgs=615 cs=1 l=1 c=0x7f1f6c162570).fault (0) Success Mar 17 13:37:38 api-slc-01-01 ceph-fuse: 0> 2015-03-17 13:37:37.837862 7f1f8c87f700 -1 *** Caught signal (Segmentation fault) ** in thread 7f1f8c87f700 ceph version 0.80.7 (6c0127fcb58008793d3c8b62d925bc91963672a3) 1: ceph-fuse() [0x5e239f] 2: (()+0x10340) [0x7f1f96a3f340] 3: (std::tr1::_Hashtable<std::string, std::pair<std::string const, Dentry*>, std::allocator<std::pair<std::string const, Dentry*> >, std::_Select1st<std::pair<std::string const, Dentry*> >, std::equal_to<std::string>, std::tr1::hash<std::string>, std::tr1::__detail::_Mod_range_hashing, std::tr1::__detail::_Default_ranged_hash, std::tr1::__detail::_Prime_rehash_policy, false, false, true>::erase(std::string const&)+0x48) [0x5814f8] 4: (Client::unlink(Dentry*, bool)+0x219) [0x539199] 5: (Client::insert_readdir_results(MetaRequest*, MetaSession*, Inode*)+0xe62) [0x555872] 6: (Client::insert_trace(MetaRequest*, MetaSession*)+0xac5) [0x556a15] 7: (Client::handle_client_reply(MClientReply*)+0x239) [0x557539] 8: (Client::ms_dispatch(Message*)+0x5cb) [0x560d1b] 9: (DispatchQueue::entry()+0x57a) [0x7998da] 10: (DispatchQueue::DispatchThread::entry()+0xd) [0x6a96cd] 11: (()+0x8182) [0x7f1f96a37182] 12: (clone()+0x6d) [0x7f1f953ae47d] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Files
Updated by Greg Farnum about 9 years ago
If you have log files on these nodes, they should contain a bit more information following the crash. Can you upload that log file?
Updated by Greg Farnum about 9 years ago
Note: I assume it's the dn->dir member that's uninitialized here or something.
Updated by David Matson about 9 years ago
Attached.
Updated by Greg Farnum about 9 years ago
- Status changed from New to Won't Fix
Hmm, the relevant code here is very different in the Hammer release and I don't think it will be susceptible to this failure mode.
Unfortunately with what we've got I can't come up with anything else, and given CephFS' support state I don't think this is likely to get fixed for firefly. :(
Updated by David Matson about 9 years ago
Not a deal breaker as we have a cron job to fix the mount if it dies. Just thought I'd throw this up here in case it was worthwhile.