Project

General

Profile

Actions

Bug #1391

closed

client: crash on std::string in insert_trace()

Added by Sam Lang over 12 years ago. Updated almost 7 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Random cfuse client crash. Sorry I don't have a core file for this. It only happened on

  • Caught signal (Segmentation fault) *
    in thread 0x7fd7041ed700
    ceph version (commit:)
    1: (ceph::BackTrace::BackTrace(int)+0x2d) [0x71e9a5]
    2: /usr/ceph/bin/cfuse() [0x82c26b]
    3: (()+0xfc60) [0x7fd7084b9c60]
    4: (std::string::length() const+0x3) [0x7fd7076db093]
    5: (Client::insert_trace(MetaRequest
    , utime_t, int)+0x1460) [0x66225c]
    6: (Client::handle_client_reply(MClientReply*)+0xba1) [0x667d25]
    7: (Client::ms_dispatch(Message*)+0x167) [0x668381]
    8: (Messenger::ms_deliver_dispatch(Message*)+0x70) [0x70f880]
    9: (SimpleMessenger::dispatch_entry()+0x810) [0x6f9024]
    10: (SimpleMessenger::DispatchThread::entry()+0x2c) [0x64e468]
    11: (Thread::_entry_func(void*)+0x23) [0x6ecbd9]
    12: (()+0x6d8c) [0x7fd7084b0d8c]
    13: (clone()+0x6d) [0x7fd706ef604d]
    2011-08-11 14:38:55.275921 7fb621bbf760 ceph version .commit: . process: cfuse. pid: 9461
Actions #1

Updated by Sage Weil over 12 years ago

  • Status changed from New to Can't reproduce

It's not clear from code inspection where this might be coming from, unless there is general heap corruption. If you see this again, a core will help!

Actions #2

Updated by Sam Lang over 12 years ago

I've been seeing a segfault in a similar spot regularly, but its been hard to reproduce. The segfault is always in the sample place (see below), and is clearly coming from a spot in the conditional within the while expression:

while (pd != dir->dentry_map.end() && pd->first <= dname) {

I think that the pd iterator is somehow getting munged so that the first field is not a valid string, causing the segfault. I've been unable to generate a core file or reproduce it reliably, it just seems to happen after running the clients for a while. Anything I can try to trigger this? I've tried creating a bunch of directory entries on one client, and then doing ls on another. That seems like one of the few ways to get the response to include any extra buffers...

  • Caught signal (Segmentation fault) *
    in thread 0x7f2a7add8700
    ceph version (commit:)
    1: (ceph::BackTrace::BackTrace(int)+0x2d) [0x71eca1]
    2: /usr/ceph/bin/cfuse() [0x82cffb]
    3: (()+0xfc60) [0x7f2a7f0a4c60]
    4: (std::string::compare(std::string const&) const+0x9) [0x7f2a7e2c6a79]
    5: (bool std::operator<=<char, std::char_traits<char>, std::allocator<char> >(std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::basic_string<char, std::c
    har_traits<char>, std::allocator<char> > const&)+0x23) [0x6a9b83]
    6: (Client::insert_trace(MetaRequest
    , utime_t, int)+0x130f) [0x6622eb]
    7: (Client::handle_client_reply(MClientReply*)+0xba1) [0x668013]
    8: (Client::ms_dispatch(Message*)+0x167) [0x66866f]
    9: (Messenger::ms_deliver_dispatch(Message*)+0x70) [0x70fb7e]
    10: (SimpleMessenger::dispatch_entry()+0x810) [0x6f9314]
    11: (SimpleMessenger::DispatchThread::entry()+0x2c) [0x64e648]
    12: (Thread::_entry_func(void*)+0x23) [0x6ecec9]
    13: (()+0x6d8c) [0x7f2a7f09bd8c]
    14: (clone()+0x6d) [0x7f2a7dae104d]
Actions #3

Updated by Sage Weil over 12 years ago

  • Category set to 24
  • Status changed from Can't reproduce to New
  • Target version set to v0.35

Reopening this...

Actions #4

Updated by Sage Weil over 12 years ago

  • Subject changed from client crash on ceph stable branch to client: crash on std::string in insert_trace()
Actions #5

Updated by Sage Weil over 12 years ago

Hmm, any other hints on what workloads might trigger this? I'm not getting anything from valgrind or my workloads.

If you can tolerate the slowness, maybe you can run cfuse through valgrind in your environment?

Actions #6

Updated by Sage Weil over 12 years ago

  • Status changed from New to 7
  • Assignee set to Sage Weil
Actions #7

Updated by Sage Weil over 12 years ago

  • Status changed from 7 to Resolved
Actions #8

Updated by Greg Farnum almost 7 years ago

  • Project changed from Ceph to CephFS
  • Category deleted (24)
  • Target version deleted (v0.35)
Actions

Also available in: Atom PDF