Project

General

Profile

Actions

Bug #10124

closed

monitor recieves bus error signal

Added by Noah Watkins over 9 years ago. Updated over 9 years ago.

Status:
Rejected
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

This happend in the latest giant verison. Bus error seems like something wrong with hardware, but the issue suspiciously occured when we were running a huge map-reduce job. I have the full log if its helpful.

   -10> 2014-11-17 14:01:58.776614 7f20f47ef700  2 -- 10.16.153.102:6789/0 >> 10.16.153.96:0/2636934960 pipe(0x6c239c0 sd=45 :6789 s=2 pgs=1 cs=1 l=1 c=0x53c3440).fault (0) Success
    -9> 2014-11-17 14:01:58.777470 7f20f47ef700  1 -- 10.16.153.102:6789/0 >> :/0 pipe(0xa670dc0 sd=40 :6789 s=0 pgs=0 cs=0 l=0 c=0x53c1600).accept sd=40 10.16.153.96:55301/0
    -8> 2014-11-17 14:01:59.444334 7f20f2e19700  1 -- 10.16.153.102:6789/0 >> :/0 pipe(0xa673180 sd=41 :6789 s=0 pgs=0 cs=0 l=0 c=0x53c2aa0).accept sd=41 10.16.153.96:55302/0
    -7> 2014-11-17 14:02:00.162767 7f20f398e700  1 -- 10.16.153.102:6789/0 >> :/0 pipe(0xa672940 sd=45 :6789 s=0 pgs=0 cs=0 l=0 c=0x53c4620).accept sd=45 10.16.153.96:55305/0
    -6> 2014-11-17 14:02:00.852355 7f20f2f1a700  2 -- 10.16.153.102:6789/0 >> 10.16.153.99:0/2291418206 pipe(0x6c27380 sd=39 :6789 s=2 pgs=1 cs=1 l=1 c=0x53c2520).reader couldn't read tag, (0) Success
    -5> 2014-11-17 14:02:00.852435 7f20f2f1a700  2 -- 10.16.153.102:6789/0 >> 10.16.153.99:0/2291418206 pipe(0x6c27380 sd=39 :6789 s=2 pgs=1 cs=1 l=1 c=0x53c2520).fault (0) Success
    -4> 2014-11-17 14:02:00.853129 7f20f2f1a700  1 -- 10.16.153.102:6789/0 >> :/0 pipe(0xa676040 sd=39 :6789 s=0 pgs=0 cs=0 l=0 c=0x53c32e0).accept sd=39 10.16.153.99:51056/0
    -3> 2014-11-17 14:02:01.066194 7f20f4348700  1 -- 10.16.153.102:6789/0 >> :/0 pipe(0xa671340 sd=46 :6789 s=0 pgs=0 cs=0 l=0 c=0x53c39c0).accept sd=46 10.16.153.99:51057/0
    -2> 2014-11-17 14:02:01.457031 7f20f5f93700  2 -- 10.16.153.102:6789/0 >> 10.16.153.99:0/4163978185 pipe(0x6c27bc0 sd=42 :6789 s=2 pgs=1 cs=1 l=1 c=0x53c5960).reader couldn't read tag, (0) Success
    -1> 2014-11-17 14:02:01.457115 7f20f5f93700  2 -- 10.16.153.102:6789/0 >> 10.16.153.99:0/4163978185 pipe(0x6c27bc0 sd=42 :6789 s=2 pgs=1 cs=1 l=1 c=0x53c5960).fault (0) Success
     0> 2014-11-17 14:02:01.759029 7f20fda08700 -1 *** Caught signal (Bus error) **
 in thread 7f20fda08700

 ceph version 0.87-27-gccfd241 (ccfd2414c68afda55bf4cefa2441ea6d53d87cc6)
 1: /usr/bin/ceph-mon() [0x975d72]
 2: (()+0xf130) [0x7f2102b07130]
 3: (()+0x1489ab) [0x7f21016349ab]
 4: (()+0x3d5dd) [0x7f21034db5dd]
 5: (leveldb::log::Writer::EmitPhysicalRecord(leveldb::log::RecordType, char const*, unsigned long)+0xe4) [0x7f21034c1dd4]
 6: (leveldb::log::Writer::AddRecord(leveldb::Slice const&)+0x86) [0x7f21034c1f66]
 7: (leveldb::DBImpl::Write(leveldb::WriteOptions const&, leveldb::WriteBatch*)+0x319) [0x7f21034b98c9]
 8: (LevelDBStore::submit_transaction_sync(std::tr1::shared_ptr<KeyValueDB::TransactionImpl>)+0x3d) [0x89914d]
 9: (MonitorDBStore::apply_transaction(std::tr1::shared_ptr<MonitorDBStore::Transaction>)+0x225) [0x588ad5]
 10: (Paxos::begin(ceph::buffer::list&)+0x54e) [0x5ef3de]
 11: (Paxos::propose_queued()+0xf8) [0x5efdf8]
 12: (Paxos::finish_round()+0x242) [0x5f0342]
 13: (Paxos::commit_finish()+0x5c8) [0x5f0b78]
 14: (C_Committed::finish(int)+0x2b) [0x5f539b]
 15: (Context::complete(int)+0x9) [0x5c6f39]
 16: (MonitorDBStore::C_DoTransaction::finish(int)+0x6a) [0x5f470a]
 17: (Context::complete(int)+0x9) [0x5c6f39]
 18: (Finisher::finisher_thread_entry()+0x168) [0x70d208]
 19: (()+0x7df3) [0x7f2102affdf3]
 20: (clone()+0x6d) [0x7f21015e23dd]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Actions #1

Updated by Noah Watkins over 9 years ago

Oh, it looks like this has been reported by Joao before to leveldb list: https://groups.google.com/forum/#!topic/leveldb/9VaANZvbYlk

Actions #2

Updated by Noah Watkins over 9 years ago

if reproducible

<joao> 'mon_debug_dump_transactions = true' and 'mon_debug_dump_location = /path'

Actions #3

Updated by Sage Weil over 9 years ago

  • Status changed from New to Rejected

leveldb bug

Actions

Also available in: Atom PDF