Bug #1391
closed
client: crash on std::string in insert_trace()
Added by Sam Lang almost 13 years ago.
Updated almost 7 years ago.
Description
Random cfuse client crash. Sorry I don't have a core file for this. It only happened on
- Caught signal (Segmentation fault) *
in thread 0x7fd7041ed700
ceph version (commit:)
1: (ceph::BackTrace::BackTrace(int)+0x2d) [0x71e9a5]
2: /usr/ceph/bin/cfuse() [0x82c26b]
3: (()+0xfc60) [0x7fd7084b9c60]
4: (std::string::length() const+0x3) [0x7fd7076db093]
5: (Client::insert_trace(MetaRequest, utime_t, int)+0x1460) [0x66225c]
6: (Client::handle_client_reply(MClientReply*)+0xba1) [0x667d25]
7: (Client::ms_dispatch(Message*)+0x167) [0x668381]
8: (Messenger::ms_deliver_dispatch(Message*)+0x70) [0x70f880]
9: (SimpleMessenger::dispatch_entry()+0x810) [0x6f9024]
10: (SimpleMessenger::DispatchThread::entry()+0x2c) [0x64e468]
11: (Thread::_entry_func(void*)+0x23) [0x6ecbd9]
12: (()+0x6d8c) [0x7fd7084b0d8c]
13: (clone()+0x6d) [0x7fd706ef604d]
2011-08-11 14:38:55.275921 7fb621bbf760 ceph version .commit: . process: cfuse. pid: 9461
- Status changed from New to Can't reproduce
It's not clear from code inspection where this might be coming from, unless there is general heap corruption. If you see this again, a core will help!
I've been seeing a segfault in a similar spot regularly, but its been hard to reproduce. The segfault is always in the sample place (see below), and is clearly coming from a spot in the conditional within the while expression:
while (pd != dir->dentry_map.end() && pd->first <= dname) {
I think that the pd iterator is somehow getting munged so that the first field is not a valid string, causing the segfault. I've been unable to generate a core file or reproduce it reliably, it just seems to happen after running the clients for a while. Anything I can try to trigger this? I've tried creating a bunch of directory entries on one client, and then doing ls on another. That seems like one of the few ways to get the response to include any extra buffers...
- Caught signal (Segmentation fault) *
in thread 0x7f2a7add8700
ceph version (commit:)
1: (ceph::BackTrace::BackTrace(int)+0x2d) [0x71eca1]
2: /usr/ceph/bin/cfuse() [0x82cffb]
3: (()+0xfc60) [0x7f2a7f0a4c60]
4: (std::string::compare(std::string const&) const+0x9) [0x7f2a7e2c6a79]
5: (bool std::operator<=<char, std::char_traits<char>, std::allocator<char> >(std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::basic_string<char, std::c
har_traits<char>, std::allocator<char> > const&)+0x23) [0x6a9b83]
6: (Client::insert_trace(MetaRequest, utime_t, int)+0x130f) [0x6622eb]
7: (Client::handle_client_reply(MClientReply*)+0xba1) [0x668013]
8: (Client::ms_dispatch(Message*)+0x167) [0x66866f]
9: (Messenger::ms_deliver_dispatch(Message*)+0x70) [0x70fb7e]
10: (SimpleMessenger::dispatch_entry()+0x810) [0x6f9314]
11: (SimpleMessenger::DispatchThread::entry()+0x2c) [0x64e648]
12: (Thread::_entry_func(void*)+0x23) [0x6ecec9]
13: (()+0x6d8c) [0x7f2a7f09bd8c]
14: (clone()+0x6d) [0x7f2a7dae104d]
- Category set to 24
- Status changed from Can't reproduce to New
- Target version set to v0.35
- Subject changed from client crash on ceph stable branch to client: crash on std::string in insert_trace()
Hmm, any other hints on what workloads might trigger this? I'm not getting anything from valgrind or my workloads.
If you can tolerate the slowness, maybe you can run cfuse through valgrind in your environment?
- Status changed from New to 7
- Assignee set to Sage Weil
- Status changed from 7 to Resolved
- Project changed from Ceph to CephFS
- Category deleted (
24)
- Target version deleted (
v0.35)
Also available in: Atom
PDF