Project

General

Profile

Actions

Bug #1503

closed

monitor failure

Added by Sam Lang over 12 years ago. Updated over 12 years ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Category:
Monitor
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

I had a monitor crash with the following assertion. I didn't notice the crash right away, and I have logrotate setup, for some reason the log ended up truncated, and the core file got overwritten. If it happens again I'll try to get more info.

2011-09-01 22:17:55.831531 7ff366244700 mon.alpha@0(leader).osd e242 e242: 38 osds: 38 up, 38 in
2011-09-01 22:18:20.823183 7ff366244700 mon.alpha@0(leader).osd e243 e243: 38 osds: 38 up, 38 in
../../src/mon/MonitorStore.cc: In function 'int MonitorStore::write_bl_ss(ceph::bufferlist&, const char*, const char*, bool)', in thread '0x7ff366244700'
../../src/mon/MonitorStore.cc: 339: FAILED assert(!err)
ceph version (commit:)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x89) [0x7c6b11]
2: (MonitorStore::write_bl_ss(ceph::buffer::list&, char const*, char const*, bool)+0x65) [0x6f713f]
3: (MonitorStore::put_bl_ss(ceph::buffer::list&, char const*, char const*)+0x36) [0x624c4a]
4: (Paxos::stash_latest(unsigned long, ceph::buffer::list&)+0x2e9) [0x65fc45]
5: (PGMonitor::update_from_paxos()+0x5b0) [0x6b8a86]
6: (PaxosService::_commit()+0xe2) [0x665652]
7: (PaxosService::C_Commit::finish(int)+0x25) [0x66605b]
8: (Context::complete(int)+0x2b) [0x63cc63]
9: (finish_contexts(CephContext*, std::list<Context*, std::allocator<Context*> >&, int)+0x1c3) [0x66069d]
10: (Paxos::handle_accept(MMonPaxos*)+0x865) [0x65b8d1]
11: (Paxos::dispatch(PaxosServiceMessage*)+0x271) [0x65ea1f]
12: (Monitor::_ms_dispatch(Message*)+0xd23) [0x638bd5]
13: (Monitor::ms_dispatch(Message*)+0x3a) [0x63ffc6]
14: (Messenger::ms_deliver_dispatch(Message*)+0x70) [0x788c24]
15: (SimpleMessenger::dispatch_entry()+0x810) [0x77294c]
16: (SimpleMessenger::DispatchThread::entry()+0x2c) [0x625afa]
17: (Thread::_entry_func(void*)+0x23) [0x6f9e61]
18: (()+0x6d8c) [0x7ff369b02d8c]
19: (clone()+0x6d) [0x7ff36854804d]
ceph version (commit:)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x89) [0x7c6b11]
2: (MonitorStore::write_bl_ss(ceph::buffer::list&, char const*, char const*, bool)+0x65) [0x6f713f]
3: (MonitorStore::put_bl_ss(ceph::buffer::list&, char const*, char const*)+0x36) [0x624c4a]
4: (Paxos::stash_latest(unsigned long, ceph::buffer::list&)+0x2e9) [0x65fc45]
5: (PGMonitor::update_from_paxos()+0x5b0) [0x6b8a86]
6: (PaxosService::_commit()+0xe2) [0x665652]
7: (PaxosService::C_Commit::finish(int)+0x25) [0x66605b]
8: (Context::complete(int)+0x2b) [0x63cc63]
9: (finish_contexts(CephContext*, std::list<Context*, std::allocator<Context*> >&, int)+0x1c3) [0x66069d]
10: (Paxos::handle_accept(MMonPaxos*)+0x865) [0x65b8d1]
11: (Paxos::dispatch(PaxosServiceMessage*)+0x271) [0x65ea1f]
12: (Monitor::_ms_dispatch(Message*)+0xd23) [0x638bd5]
13: (Monitor::ms_dispatch(Message*)+0x3a) [0x63ffc6]
14: (Messenger::ms_deliver_dispatch(Message*)+0x70) [0x788c24]
15: (SimpleMessenger::dispatch_entry()+0x810) [0x77294c]
16: (SimpleMessenger::DispatchThread::entry()+0x2c) [0x625afa]
17: (Thread::_entry_func(void*)+0x23) [0x6f9e61]
18: (()+0x6d8c) [0x7ff369b02d8c]
19: (clone()+0x6d) [0x7ff36854804d]
  • Caught signal (Aborted) *
    in thread 0x7ff366244700
    ceph version (commit:)
    1: (ceph::BackTrace::BackTrace(int)+0x2d) [0x7acd2d]
    2: /usr/ceph/bin/cmon() [0x7c7183]
    3: (()+0xfc60) [0x7ff369b0bc60]
    4: (gsignal()+0x35) [0x7ff368495d05]
    5: (abort()+0x186) [0x7ff368499ab6]
    6: (_gnu_cxx::_verbose_terminate_handler()+0x11d) [0x7ff368d4c6dd]
    7: (()+0xb9926) [0x7ff368d4a926]
    8: (()+0xb9953) [0x7ff368d4a953]
    9: (()+0xb9a5e) [0x7ff368d4aa5e]
    10: (ceph::__ceph_assert_fail(char const
    , char const*, int, char const*)+0x1f3) [0x7c6c7b]
    11: (MonitorStore::write_bl_ss(ceph::buffer::list&, char const*, char const*, bool)+0x65) [0x6f713f]
    12: (MonitorStore::put_bl_ss(ceph::buffer::list&, char const*, char const*)+0x36) [0x624c4a]
    13: (Paxos::stash_latest(unsigned long, ceph::buffer::list&)+0x2e9) [0x65fc45]
    14: (PGMonitor::update_from_paxos()+0x5b0) [0x6b8a86]
    15: (PaxosService::_commit()+0xe2) [0x665652]
    16: (PaxosService::C_Commit::finish(int)+0x25) [0x66605b]
    17: (Context::complete(int)+0x2b) [0x63cc63]
    18: (finish_contexts(CephContext*, std::list<Context*, std::allocator<Context*> >&, int)+0x1c3) [0x66069d]
    19: (Paxos::handle_accept(MMonPaxos*)+0x865) [0x65b8d1]
    20: (Paxos::dispatch(PaxosSe
Actions #1

Updated by Sam Lang over 12 years ago

(gdb) thread apply all bt

Not sure this is useful, but when I try to restart the monitor that crashed, it starts ok and calls for new elections, but uses up more and more memory, until all the memory on the node is used by the cmon process. This is the log from the beginning of startup, followed by the stack trace of all the threads while the cmon process is using up all the memory.

2011-09-06 09:18:30.853600 7fc541601700 mon.alpha@0(leader).pg v134565 ignoring stats from non-active osd
2011-09-06 09:18:30.853686 7fc541601700 mon.alpha@0(leader).pg v134565 ignoring stats from non-active osd
2011-09-06 09:18:30.853790 7fc541601700 mon.alpha@0(leader).pg v134565 ignoring stats from non-active osd
2011-09-06 09:18:36.355401 7fc541601700 mon.alpha@0(leader).osd e256 e256: 38 osds: 30 up, 38 in
2011-09-06 09:18:47.484618 7fc540e00700 log [INF] : mon.alpha@0 won leader election with quorum 0,1,2
2011-09-06 09:18:57.491035 7fc540e00700 log [INF] : mon.alpha calling new monitor election
2011-09-06 09:19:51.046710 7fc53f7f1700 -- 192.168.101.112:6789/0 >> 192.168.101.115:6789/0 pipe(0x2c1076500 sd=18 pgs=0 cs=0 l=0).accept connect_seq 0 vs existing 0 state 1
2011-09-06 09:19:57.509165 7fc540e00700 log [INF] : mon.alpha@0 won leader election with quorum 0,1,3
2011-09-06 09:20:07.516255 7fc540e00700 log [INF] : mon.alpha calling new monitor election
2011-09-06 09:20:32.538447 7fc540e00700 log [INF] : mon.alpha@0 won leader election with quorum 0,2,3
2011-09-06 09:20:42.551029 7fc540e00700 log [INF] : mon.alpha calling new monitor election
2011-09-06 09:21:02.572272 7fc540e00700 log [INF] : mon.alpha@0 won leader election with quorum 0,1,3
2011-09-06 09:21:12.580606 7fc540e00700 log [INF] : mon.alpha calling new monitor election
2011-09-06 09:22:02.606753 7fc540e00700 log [INF] : mon.alpha@0 won leader election with quorum 0,1,3
2011-09-06 09:22:12.616051 7fc540e00700 log [INF] : mon.alpha calling new monitor election
2011-09-06 09:22:17.633192 7fc540e00700 log [INF] : mon.alpha@0 won leader election with quorum 0,2,3
2011-09-06 09:22:27.639589 7fc540e00700 log [INF] : mon.alpha calling new monitor election
2011-09-06 09:23:02.657726 7fc540e00700 log [INF] : mon.alpha@0 won leader election with quorum 0,1,3
2011-09-06 09:23:12.665062 7fc540e00700 log [INF] : mon.alpha calling new monitor election
2011-09-06 09:24:01.791790 7fc541601700 log [INF] : mon.alpha@0 won leader election with quorum 0,1,2,3
2011-09-06 09:24:11.799365 7fc540e00700 log [INF] : mon.alpha calling new monitor election
2011-09-06 09:25:07.175392 7fc540e00700 log [INF] : mon.alpha@0 won leader election with quorum 0,1,3
2011-09-06 09:25:17.477793 7fc540e00700 log [INF] : mon.alpha calling new monitor election
2011-09-06 09:25:54.903581 7fc540e00700 log [INF] : mon.alpha@0 won leader election with quorum 0,2,3
2011-09-06 09:26:05.388912 7fc540e00700 log [INF] : mon.alpha calling new monitor election
2011-09-06 09:26:13.667456 7fc540e00700 log [INF] : mon.alpha@0 won leader election with quorum 0,1,3
2011-09-06 09:26:23.973332 7fc540e00700 log [INF] : mon.alpha calling new monitor election
2011-09-06 09:27:32.402916 7fc540e00700 log [INF] : mon.alpha@0 won leader election with quorum 0,1,3
2011-09-06 09:27:48.511307 7fc53eff8700 -- 192.168.101.112:6789/0 >> :/0 pipe(0x16e52e7a00 sd=20 pgs=0 cs=0 l=0).accept failed to getpeername 107 Transport endpoint is not connected
2011-09-06 09:27:53.403454 7fc540e00700 log [INF] : mon.alpha@0 won leader election with quorum 0,2,3
2011-09-06 09:28:03.922795 7fc540e00700 log [INF] : mon.alpha calling new monitor election
2011-09-06 09:28:42.789767 7fc53f4ee700 -- 192.168.101.112:6789/0 >> :/0 pipe(0x14cbc3e280 sd=20 pgs=0 cs=0 l=0).accept failed to getpeername 107 Transport endpoint is not connected
2011-09-06 09:28:42.790766 7fc53e507700 -- 192.168.101.112:6789/0 >> :/0 pipe(0x14cbc3e000 sd=19 pgs=0 cs=0 l=0).accept failed to getpeername 107 Transport endpoint is not connected
2011-09-06 09:28:46.201246 7fc53dafd700 -- 192.168.101.112:6789/0 >> :/0 pipe(0x173e328c80 sd=22 pgs=0 cs=0 l=0).accept failed to getpeername 107 Transport endpoint is not connected
2011-09-06 09:28:50.358983 7fc540e00700 log [INF] : mon.alpha@0 won leader election with quorum 0,1,3
2011-09-06 09:29:02.898485 7fc540e00700 log [INF] : mon.alpha calling new monitor election
2011-09-06 09:29:46.419182 7fc53ebd9700 -- 192.168.101.112:6789/0 >> :/0 pipe(0x1775eaac80 sd=26 pgs=0 cs=0 l=0).accept failed to getpeername 107 Transport endpoint is not connected
2011-09-06 09:29:52.029127 7fc53eef7700 -- 192.168.101.112:6789/0 >> :/0 pipe(0x16e5b4aa00 sd=19 pgs=0 cs=0 l=0).accept failed to getpeername 107 Transport endpoint is not connected
2011-09-06 09:29:52.163756 7fc53dc01700 -- 192.168.101.112:6789/0 >> :/0 pipe(0x16e5b4a780 sd=17 pgs=0 cs=0 l=0).accept failed to getpeername 107 Transport endpoint is not connected
2011-09-06 09:29:52.163875 7fc53e507700 -- 192.168.101.112:6789/0 >> :/0 pipe(0x122cb0a780 sd=22 pgs=0 cs=0 l=0).accept failed to getpeername 107 Transport endpoint is not connected
2011-09-06 09:29:52.163966 7fc53dafd700 -- 192.168.101.112:6789/0 >> :/0 pipe(0x122cb0a500 sd=21 pgs=0 cs=0 l=0).accept failed to getpeername 107 Transport endpoint is not connected
2011-09-06 09:29:52.191086 7fc540e00700 log [INF] : mon.alpha@0 won leader election with quorum 0,2,3
2011-09-06 09:30:04.286561 7fc53f4ee700 -- 192.168.101.112:6789/0 >> :/0 pipe(0x16e5b4a500 sd=17 pgs=0 cs=0 l=0).accept failed to getpeername 107 Transport endpoint is not connected
2011-09-06 09:30:09.345795 7fc53dc01700 -- 192.168.101.112:6789/0 >> :/0 pipe(0x178932da00 sd=21 pgs=0 cs=0 l=0).accept failed to getpeername 107 Transport endpoint is not connected
2011-09-06 09:30:19.783298 7fc540e00700 log [INF] : mon.alpha calling new monitor election
2011-09-06 09:30:32.336843 7fc53eff8700 -- 192.168.101.112:6789/0 >> :/0 pipe(0x17a6268a00 sd=22 pgs=0 cs=0 l=0).accept failed to getpeername 107 Transport endpoint is not connected
2011-09-06 09:30:43.793388 7fc53dd02700 -- 192.168.101.112:6789/0 >> :/0 pipe(0x17a6268500 sd=18 pgs=0 cs=0 l=0).accept failed to getpeername 107 Transport endpoint is not connected
2011-09-06 09:30:45.166304 7fc53eff8700 -- 192.168.101.112:6789/0 >> :/0 pipe(0x17a6268a00 sd=19 pgs=0 cs=0 l=0).accept failed to getpeername 107 Transport endpoint is not connected
2011-09-06 09:30:45.166348 7fc53f194700 -- 192.168.101.112:6789/0 >> :/0 pipe(0x17a6268280 sd=20 pgs=0 cs=0 l=0).accept failed to getpeername 107 Transport endpoint is not connected
2011-09-06 09:30:45.166445 7fc53ead8700 -- 192.168.101.112:6789/0 >> :/0 pipe(0x17a6268c80 sd=17 pgs=0 cs=0 l=0).accept failed to getpeername 107 Transport endpoint is not connected
2011-09-06 09:30:45.166477 7fc540e00700 log [INF] : mon.alpha@0 won leader election with quorum 0,1,3
2011-09-06 09:31:04.159085 7fc540e00700 log [INF] : mon.alpha calling new monitor election
2011-09-06 09:31:23.201418 7fc53e996700 -- 192.168.101.112:6789/0 >> :/0 pipe(0x17acd90280 sd=20 pgs=0 cs=0 l=0).accept failed to getpeername 107 Transport endpoint is not connected
2011-09-06 09:31:24.082659 7fc53eff8700 -- 192.168.101.112:6789/0 >> :/0 pipe(0x17a6268c80 sd=22 pgs=0 cs=0 l=0).accept failed to getpeername 107 Transport endpoint is not connected
2011-09-06 09:31:24.082776 7fc53dc01700 -- 192.168.101.112:6789/0 >> :/0 pipe(0x17acd90780 sd=19 pgs=0 cs=0 l=0).accept failed to getpeername 107 Transport endpoint is not connected
2011-09-06 09:31:24.082895 7fc53eef7700 -- 192.168.101.112:6789/0 >> :/0 pipe(0x17a6268a00 sd=21 pgs=0 cs=0 l=0).accept failed to getpeername 107 Transport endpoint is not connected
2011-09-06 09:31:49.506668 7fc540e00700 log [INF] : mon.alpha@0 won leader election with quorum 0,2,3
2011-09-06 09:32:06.313469 7fc53f5ef700 -- 192.168.101.112:6789/0 >> :/0 pipe(0x17cf131500 sd=17 pgs=0 cs=0 l=0).accept failed to getpeername 107 Transport endpoint is not connected
2011-09-06 09:32:07.869351 7fc53f194700 -- 192.168.101.112:6789/0 >> :/0 pipe(0x17e5026000 sd=18 pgs=0 cs=0 l=0).accept failed to getpeername 107 Transport endpoint is not connected
2011-09-06 09:32:29.110587 7fc540e00700 log [INF] : mon.alpha@0 won leader election with quorum 0,1,3
2011-09-06 09:32:39.664224 7fc540e00700 log [INF] : mon.alpha calling new monitor election
2011-09-06 09:32:49.203076 7fc53f194700 -- 192.168.101.112:6789/0 >> :/0 pipe(0x18044e5a00 sd=22 pgs=0 cs=0 l=0).accept failed to getpeername 107 Transport endpoint is not connected
2011-09-06 09:32:49.203310 7fc53e996700 -- 192.168.101.112:6789/0 >> :/0 pipe(0x18044e5500 sd=23 pgs=0 cs=0 l=0).accept failed to getpeername 107 Transport endpoint is not connected
2011-09-06 09:32:49.203340 7fc53dc01700 -- 192.168.101.112:6789/0 >> :/0 pipe(0x18044e5780 sd=21 pgs=0 cs=0 l=0).accept failed to getpeername 107 Transport endpoint is not connected
2011-09-06 09:32:49.204017 7fc53eef7700 -- 192.168.101.112:6789/0 >> :/0 pipe(0x17e5026000 sd=24 pgs=0 cs=0 l=0).accept failed to getpeername 107 Transport endpoint is not connected
2011-09-06 09:32:49.204065 7fc53ebd9700 -- 192.168.101.112:6789/0 >> :/0 pipe(0x18044e5000 sd=20 pgs=0 cs=0 l=0).accept failed to getpeername 107 Transport endpoint is not connected
2011-09-06 09:32:49.204106 7fc53e507700 -- 192.168.101.112:6789/0 >> :/0 pipe(0x17e5026280 sd=25 pgs=0 cs=0 l=0).accept failed to getpeername 107 Transport endpoint is not connected
2011-09-06 09:33:21.776392 7fc53f4ee700 -- 192.168.101.112:6789/0 >> :/0 pipe(0x181de4e000 sd=17 pgs=0 cs=0 l=0).accept failed to getpeername 107 Transport endpoint is not connected
2011-09-06 09:34:18.183818 7fc53ead8700 -- 192.168.101.112:6789/0 >> :/0 pipe(0x13fe6b2a00 sd=30 pgs=0 cs=0 l=0).accept failed to getpeername 107 Transport endpoint is not connected
2011-09-06 09:34:18.254057 7fc53ece5700 -- 192.168.101.112:6789/0 >> :/0 pipe(0x184dbeb000 sd=35 pgs=0 cs=0 l=0).accept failed to getpeername 107 Transport endpoint is not connected
2011-09-06 09:34:18.254149 7fc53eef7700 -- 192.168.101.112:6789/0 >> :/0 pipe(0x115c3cd780 sd=19 pgs=0 cs=0 l=0).accept failed to getpeername 107 Transport endpoint is not connected
2011-09-06 09:34:18.387025 7fc541601700 log [INF] : mon.alpha@0 won leader election with quorum 0,1,2,3
2011-09-06 09:34:18.466764 7fc53cf84700 -- 192.168.101.112:6789/0 >> :/0 pipe(0x184dbeb280 sd=25 pgs=0 cs=0 l=0).accept failed to getpeername 107 Transport endpoint is not connected
2011-09-06 09:34:28.714274 7fc540e00700 log [INF] : mon.alpha calling new monitor election
2011-09-06 09:34:35.397677 7fc53ede6700 -- 192.168.101.112:6789/0 >> :/0 pipe(0x1858955c80 sd=21 pgs=0 cs=0 l=0).accept failed to getpeername 107 Transport endpoint is not connected
2011-09-06 09:50:32.084766 7fc53f7f1700 -- 192.168.101.112:6789/0 >> 192.168.101.115:6789/0 pipe(0x1f7ec80 sd=11 pgs=56 cs=1 l=0).fault with nothing to send, going to standby
2011-09-06 09:50:32.091452 7fc53c870700 -- 192.168.101.112:6789/0 >> :/0 pipe(0x185caf2780 sd=39 pgs=0 cs=0 l=0).accept failed to getpeername 107 Transport endpoint is not connected
2011-09-06 09:50:32.091489 7fc53ebd9700 -- 192.168.101.112:6789/0 >> :/0 pipe(0x185caef500 sd=34 pgs=0 cs=0 l=0).accept failed to getpeername 107 Transport endpoint is not connected
2011-09-06 09:50:32.091563 7fc53f5ef700 -- 192.168.101.112:6789/0 >> :/0 pipe(0x185caefc80 sd=31 pgs=0 cs=0 l=0).accept failed to getpeername 107 Transport endpoint is not connected
2011-09-06 09:50:32.091668 7fc53cb73700 -- 192.168.101.112:6789/0 >> :/0 pipe(0x185caef000 sd=36 pgs=0 cs=0 l=0).accept failed to getpeername 107 Transport endpoint is not connected
2011-09-06 09:50:32.091822 7fc5403fd700 -- 192.168.101.112:6789/0 >> 192.168.101.114:6789/0 pipe(0x1f7e280 sd=10 pgs=228164 cs=1 l=0).fault with nothing to send, going to standby
2011-09-06 09:50:32.091850 7fc53c36b700 -- 192.168.101.112:6789/0 >> :/0 pipe(0x185cb0aa00 sd=44 pgs=0 cs=0 l=0).accept failed to getpeername 107 Transport endpoint is not connected
2011-09-06 09:50:32.091900 7fc53bd65700 -- 192.168.101.112:6789/0 >> :/0 pipe(0x185cb12a00 sd=50 pgs=0 cs=0 l=0).accept failed to getpeername 107 Transport endpoint is not connected
2011-09-06 09:50:32.092015 7fc53b55d700 -- 192.168.101.112:6789/0 >> :/0 pipe(0x185cb1f500 sd=58 pgs=0 cs=0 l=0).accept failed to getpeername 107 Transport endpoint is not connected
2011-09-06 09:50:32.092097 7fc53b058700 -- 192.168.101.112:6789/0 >> :/0 pipe(0x185cb3d780 sd=63 pgs=0 cs=0 l=0).accept failed to getpeername 107 Transport endpoint is not connected
2011-09-06 09:50:32.092142 7fc53ad55700 -- 192.168.101.112:6789/0 >> :/0 pipe(0x185cb3d000 sd=66 pgs=0 cs=0 l=0).accept failed to getpeername 107 Transport endpoint is not connected
2011-09-06 09:50:32.092216 7fc53ac54700 -- 192.168.101.112:6789/0 >> :/0 pipe(0x185cb78c80 sd=67 pgs=0 cs=0 l=0).accept failed to getpeername 107 Transport endpoint is not connected
2011-09-06 09:50:32.092282 7fc53a850700 -- 192.168.101.112:6789/0 >> :/0 pipe(0x185cb78280 sd=71 pgs=0 cs=0 l=0).accept failed to getpeername 107 Transport endpoint is not connected
2011-09-06 09:50:32.092345 7fc5402fc700 -- 192.168.101.112:6789/0 >> 192.168.101.113:6789/0 pipe(0x1f7e500 sd=9 pgs=317909 cs=1 l=0).fault initiating reconnect
2011-09-06 09:50:32.099024 7fc53ab53700 -- 192.168.101.112:6789/0 >> :/0 pipe(0x185cb78000 sd=72 pgs=0 cs=0 l=0).accept failed to getpeername 107 Transport endpoint is not connected
2011-09-06 09:50:32.099659 7fc53aa52700 -- 192.168.101.112:6789/0 >> :/0 pipe(0x185cb78a00 sd=47 pgs=0 cs=0 l=0).accept failed to getpeername 107 Transport endpoint is not connected
2011-09-06 09:50:32.099679 7fc53ba62700 -- 192.168.101.112:6789/0 >> :/0 pipe(0x1858955000 sd=22 pgs=0 cs=0 l=0).accept failed to getpeername 107 Transport endpoint is not connected
2011-09-06 09:50:32.099738 7fc53f4ee700 -- 192.168.101.112:6789/0 >> :/0 pipe(0x1858955c80 sd=21 pgs=0 cs=0 l=0).accept failed to getpeername 107 Transport endpoint is not connected
2011-09-06 09:50:32.099839 7fc53e996700 -- 192.168.101.112:6789/0 >> :/0 pipe(0x185cb0a500 sd=28 pgs=0 cs=0 l=0).accept failed to getpeername 107 Transport endpoint is not connected
2011-09-06 09:50:32.099916 7fc53dafd700 -- 192.168.101.112:6789/0 >> :/0 pipe(0x185cb12780 sd=74 pgs=0 cs=0 l=0).accept failed to getpeername 107 Transport endpoint is not connected
2011-09-06 09:50:32.100247 7fc53a44c700 -- 192.168.101.112:6789/0 >> :/0 pipe(0x185caffa00 sd=78 pgs=0 cs=0 l=0).accept failed to getpeername 107 Transport endpoint is not connected
2011-09-06 09:50:32.100312 7fc53c46c700 -- 192.168.101.112:6789/0 >> :/0 pipe(0x185cb78780 sd=46 pgs=0 cs=0 l=0).accept failed to getpeername 107 Transport endpoint is not connected
2011-09-06 09:50:32.100975 7fc53c169700 -- 192.168.101.112:6789/0 >> :/0 pipe(0x185cac4c80 sd=68 pgs=0 cs=0 l=0).accept failed to getpeername 107 Transport endpoint is not connected
2011-09-06 09:50:32.101206 7fc53a34b700 -- 192.168.101.112:6789/0 >> :/0 pipe(0x185caff780 sd=79 pgs=0 cs=0 l=0).accept failed to getpeername 107 Transport endpoint is not connected
2011-09-06 09:50:32.149103 7fc53ede6700 -- 192.168.101.112:6789/0 >> 192.168.101.113:6789/0 pipe(0x1f7e500 sd=9 pgs=319971 cs=3 l=0).reader got old message 1551226 <= 1551226 0x172ca50580 forward(pg_stats(53 pgs v 255) v1) to leader v1, discarding
2011-09-06 09:50:32.155867 7fc53bd65700 -- 192.168.101.112:6789/0 >> :/0 pipe(0x184065e000 sd=19 pgs=0 cs=0 l=0).accept failed to getpeername 107 Transport endpoint is not connected
2011-09-06 09:50:32.155946 7fc53f9f3700 -- 192.168.101.112:6789/0 >> :/0 pipe(0xbddeadc80 sd=16 pgs=0 cs=0 l=0).accept failed to getpeername 107 Transport endpoint is not connected
2011-09-06 09:50:32.156032 7fc53cf84700 -- 192.168.101.112:6789/0 >> :/0 pipe(0x185caffa00 sd=22 pgs=0 cs=0 l=0).accept failed to getpeername 107 Transport endpoint is not connected
2011-09-06 09:50:32.156105 7fc53f6f0700 -- 192.168.101.112:6789/0 >> :/0 pipe(0x1f9b500 sd=20 pgs=0 cs=0 l=0).accept failed to getpeername 107 Transport endpoint is not connected
2011-09-06 09:50:32.169003 7fc53a54d700 -- 192.168.101.112:6789/0 >> 192.168.101.114:6789/0 pipe(0x1f7e280 sd=25 pgs=229773 cs=3 l=0).reader got old message 1403896 <= 1403899 0x172d0d32c0 forward(pg_stats(38 pgs v 255) v1) to leader v1, discarding
2011-09-06 09:50:32.169659 7fc53a54d700 -- 192.168.101.112:6789/0 >> 192.168.101.114:6789/0 pipe(0x1f7e280 sd=25 pgs=229773 cs=3 l=0).reader got old message 1403897 <= 1403899 0x172d0d32c0 forward(pg_stats(49 pgs v 255) v1) to leader v1, discarding
2011-09-06 09:50:32.170194 7fc53a54d700 -- 192.168.101.112:6789/0 >> 192.168.101.114:6789/0 pipe(0x1f7e280 sd=25 pgs=229773 cs=3 l=0).reader got old message 1403898 <= 1403899 0x172d0d32c0 forward(pg_stats(45 pgs v 255) v1) to leader v1, discarding
2011-09-06 09:50:32.170740 7fc53a54d700 -- 192.168.101.112:6789/0 >> 192.168.101.114:6789/0 pipe(0x1f7e280 sd=25 pgs=229773 cs=3 l=0).reader got old message 1403899 <= 1403899 0x172d0d32c0 forward(pg_stats(48 pgs v 255) v1) to leader v1, discarding
2011-09-06 09:50:32.174706 7fc53f9f3700 -- 192.168.101.112:6789/0 >> :/0 pipe(0x184dbfea00 sd=15 pgs=0 cs=0 l=0).accept failed to getpeername 107 Transport endpoint is not connected
2011-09-06 09:50:32.175509 7fc53a34b700 -- 192.168.101.112:6789/0 >> :/0 pipe(0x185cb89a00 sd=20 pgs=0 cs=0 l=0).accept failed to getpeername 107 Transport endpoint is not connected
2011-09-06 09:50:32.175715 7fc53b961700 -- 192.168.101.112:6789/0 >> :/0 pipe(0xbddeadc80 sd=24 pgs=0 cs=0 l=0).accept failed to getpeername 107 Transport endpoint is not connected
2011-09-06 09:50:32.175838 7fc53faf4700 -- 192.168.101.112:6789/0 >> :/0 pipe(0x184065e000 sd=30 pgs=0 cs=0 l=0).accept failed to getpeername 107 Transport endpoint is not connected
2011-09-06 09:50:32.177926 7fc53cf84700 -- 192.168.101.112:6789/0 >> :/0 pipe(0x16e52e8000 sd=12 pgs=0 cs=0 l=0).accept failed to getpeername 107 Transport endpoint is not connected
2011-09-06 09:50:32.334029 7fc53cf84700 -- 192.168.101.112:6789/0 >> :/0 pipe(0xbddead500 sd=10 pgs=0 cs=0 l=0).accept failed to getpeername 107 Transport endpoint is not connected
2011-09-06 09:50:32.945381 7fc53cf84700 -- 192.168.101.112:6789/0 >> :/0 pipe(0xbddeada00 sd=10 pgs=0 cs=0 l=0).accept failed to getpeername 107 Transport endpoint is not connected
2011-09-06 09:50:33.622606 7fc541601700 log [INF] : mon.alpha@0 won leader election with quorum 0,1,2,3
2011-09-06 09:50:41.853233 7fc53c068700 -- 192.168.101.112:6789/0 >> :/0 pipe(0x183ee55280 sd=30 pgs=0 cs=0 l=0).accept failed to getpeername 107 Transport endpoint is not connected
2011-09-06 09:50:41.884086 7fc53f194700 -- 192.168.101.112:6789/0 >> :/0 pipe(0x183ee55000 sd=29 pgs=0 cs=0 l=0).accept failed to getpeername 107 Transport endpoint is not connected
2011-09-06 09:50:41.884137 7fc53dc01700 -- 192.168.101.112:6789/0 >> :/0 pipe(0x185cb4e780 sd=32 pgs=0 cs=0 l=0).accept failed to getpeername 107 Transport endpoint is not connected
2011-09-06 09:50:41.884240 7fc53cd82700 -- 192.168.101.112:6789/0 >> :/0 pipe(0x185cb4e500 sd=31 pgs=0 cs=0 l=0).accept failed to getpeername 107 Transport endpoint is not connected
2011-09-06 09:50:41.884317 7fc53c26a700 -- 192.168.101.112:6789/0 >> :/0 pipe(0x184dbe6a00 sd=23 pgs=0 cs=0 l=0).accept failed to getpeername 107 Transport endpoint is not connected
2011-09-06 09:50:41.884355 7fc53fff9700 -- 192.168.101.112:6789/0 >> :/0 pipe(0x185cb4ea00 sd=33 pgs=0 cs=0 l=0).accept failed to getpeername 107 Transport endpoint is not connected
2011-09-06 09:50:41.884393 7fc53fdf7700 -- 192.168.101.112:6789/0 >> :/0 pipe(0x18428d5280 sd=27 pgs=0 cs=0 l=0).accept failed to getpeername 107 Transport endpoint is not connected
2011-09-06 09:50:41.884490 7fc53ac54700 -- 192.168.101.112:6789/0 >> :/0 pipe(0x184dbe6000 sd=19 pgs=0 cs=0 l=0).accept failed to getpeername 107 Transport endpoint is not connected
2011-09-06 09:50:41.884596 7fc53bd65700 -- 192.168.101.112:6789/0 >> :/0 pipe(0x50294f500 sd=13 pgs=0 cs=0 l=0).accept failed to getpeername 107 Transport endpoint is not connected
2011-09-06 09:50:41.884646 7fc5403fd700 -- 192.168.101.112:6789/0 >> :/0 pipe(0x184dbe6280 sd=20 pgs=0 cs=0 l=0).accept failed to getpeername 107 Transport endpoint is not connected
2011-09-06 09:50:41.884740 7fc53faf4700 -- 192.168.101.112:6789/0 >> :/0 pipe(0x184dbfea00 sd=15 pgs=0 cs=0 l=0).accept failed to getpeername 107 Transport endpoint is not connected
2011-09-06 09:50:41.884793 7fc53a34b700 -- 192.168.101.112:6789/0 >> :/0 pipe(0x18428d5500 sd=28 pgs=0 cs=0 l=0).accept failed to getpeername 107 Transport endpoint is not connected
2011-09-06 09:50:41.884839 7fc53f9f3700 -- 192.168.101.112:6789/0 >> :/0 pipe(0x16e52e8280 sd=12 pgs=0 cs=0 l=0).accept failed to getpeername 107 Transport endpoint is not connected
2011-09-06 09:50:41.884902 7fc53cf84700 -- 192.168.101.112:6789/0 >> :/0 pipe(0x16e52e8000 sd=18 pgs=0 cs=0 l=0).accept failed to getpeername 107 Transport endpoint is not connected
2011-09-06 09:50:41.884938 7fc53b058700 -- 192.168.101.112:6789/0 >> :/0 pipe(0x184dbe6780 sd=22 pgs=0 cs=0 l=0).accept failed to getpeername 107 Transport endpoint is not connected
2011-09-06 09:50:41.885037 7fc53af57700 -- 192.168.101.112:6789/0 >> :/0 pipe(0x16e52e8500 sd=10 pgs=0 cs=0 l=0).accept failed to getpeername 107 Transport endpoint is not connected
2011-09-06 09:50:41.885130 7fc53f6f0700 -- 192.168.101.112:6789/0 >> :/0 pipe(0x184dbe6500 sd=21 pgs=0 cs=0 l=0).accept failed to getpeername 107 Transport endpoint is not connected
2011-09-06 09:50:41.885173 7fc53f4ee700 -- 192.168.101.112:6789/0 >> :/0 pipe(0x18428d5000 sd=26 pgs=0 cs=0 l=0).accept failed to getpeername 107 Transport endpoint is not connected
2011-09-06 09:50:41.885275 7fc53b961700 -- 192.168.101.112:6789/0 >> :/0 pipe(0x115c3cd780 sd=24 pgs=0 cs=0 l=0).accept failed to getpeername 107 Transport endpoint is not connected
2011-09-06 09:50:41.885357 7fc53fbf5700 -- 192.168.101.112:6789/0 >> :/0 pipe(0x184dbfe000 sd=17 pgs=0 cs=0 l=0).accept failed to getpeername 107 Transport endpoint is not connected
2011-09-06 09:50:41.885478 7fc53a74f700 -- 192.168.101.112:6789/0 >> :/0 pipe(0x184dbfe280 sd=16 pgs=0 cs=0 l=0).accept failed to getpeername 107 Transport endpoint is not connected

(gdb) thread apply all bt

Thread 31 (Thread 0x7fc543605700 (LWP 15247)):
#0 sem_timedwait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/sem_timedwait.S:106
#1 0x00000000007c868a in CephContextServiceThread::entry (this=0x1f600c0) at ../../src/common/ceph_context.cc:53
#2 0x00000000006f9e61 in Thread::_entry_func (arg=0x1f600c0) at ../../src/common/Thread.cc:45
#3 0x00007fc544ebfd8c in start_thread (arg=0x7fc543605700) at pthread_create.c:304
#4 0x00007fc54390504d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#5 0x0000000000000000 in ?? ()

Thread 30 (Thread 0x7fc542e04700 (LWP 15248)):
#0 0x00007fc5438f7f03 in __poll (fds=<value optimized out>, nfds=<value optimized out>, timeout=<value optimized out>)
at ../sysdeps/unix/sysv/linux/poll.c:87
#1 0x00000000007d2e2a in AdminSocket::entry (this=0x1f4e1e0) at ../../src/common/admin_socket.cc:212
#2 0x00000000006f9e61 in Thread::_entry_func (arg=0x1f4e1e0) at ../../src/common/Thread.cc:45
#3 0x00007fc544ebfd8c in start_thread (arg=0x7fc542e04700) at pthread_create.c:304
#4 0x00007fc54390504d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#5 0x0000000000000000 in ?? ()

Thread 29 (Thread 0x7fc542603700 (LWP 15249)):
#0 0x00007fc5438f7f03 in __poll (fds=<value optimized out>, nfds=<value optimized out>, timeout=<value optimized out>)
at ../sysdeps/unix/sysv/linux/poll.c:87
#1 0x00000000007715a0 in SimpleMessenger::Accepter::entry (this=0x1f7f040) at ../../src/msg/SimpleMessenger.cc:202
#2 0x00000000006f9e61 in Thread::_entry_func (arg=0x1f7f040) at ../../src/common/Thread.cc:45
#3 0x00007fc544ebfd8c in start_thread (arg=0x7fc542603700) at pthread_create.c:304
#4 0x00007fc54390504d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#5 0x0000000000000000 in ?? ()

Thread 28 (Thread 0x7fc541e02700 (LWP 15250)):
#0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
#1 0x0000000000788557 in Cond::Wait (this=0x1f7f450, mutex=...) at ../../src/common/Cond.h:46
#2 0x00000000007839d2 in SimpleMessenger::reaper_entry (this=0x1f7f000) at ../../src/msg/SimpleMessenger.cc:2287
#3 0x0000000000625a80 in SimpleMessenger::ReaperThread::entry (this=0x1f7f430) at ../../src/msg/SimpleMessenger.h:497
#4 0x00000000006f9e61 in Thread::_entry_func (arg=0x1f7f430) at ../../src/common/Thread.cc:45
#5 0x00007fc544ebfd8c in start_thread (arg=0x7fc541e02700) at pthread_create.c:304
#6 0x00007fc54390504d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#7 0x0000000000000000 in ?? ()

Thread 27 (Thread 0x7fc541601700 (LWP 15251)):
#0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
#1 0x0000000000788557 in Cond::Wait (this=0x1f7f0a0, mutex=...) at ../../src/common/Cond.h:46
#2 0x0000000000772ad6 in SimpleMessenger::dispatch_entry (this=0x1f7f000) at ../../src/msg/SimpleMessenger.cc:366
#3 0x0000000000625afa in SimpleMessenger::DispatchThread::entry (this=0x1f7f488) at ../../src/msg/SimpleMessenger.h:546
#4 0x00000000006f9e61 in Thread::_entry_func (arg=0x1f7f488) at ../../src/common/Thread.cc:45
#5 0x00007fc544ebfd8c in start_thread (arg=0x7fc541601700) at pthread_create.c:304
#6 0x00007fc54390504d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#7 0x0000000000000000 in ?? ()

Thread 26 (Thread 0x7fc540e00700 (LWP 15252)):
#0 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:216
#1 0x00000000007885a6 in Cond::WaitUntil (this=0x1f7fb00, mutex=..., when=...) at ../../src/common/Cond.h:59
#2 0x000000000079d28a in SafeTimer::timer_thread (this=0x1f7faf0) at ../../src/common/Timer.cc:110
#3 0x000000000079e0d0 in SafeTimerThread::entry (this=0x1f93660) at ../../src/common/Timer.cc:38
#4 0x00000000006f9e61 in Thread::_entry_func (arg=0x1f93660) at ../../src/common/Thread.cc:45
#5 0x00007fc544ebfd8c in start_thread (arg=0x7fc540e00700) at pthread_create.c:304
#6 0x00007fc54390504d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#7 0x0000000000000000 in ?? ()

Thread 25 (Thread 0x7fc5452e2700 (LWP 15253)):
#0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
#1 0x0000000000788557 in Cond::Wait (this=0x1f7e6d0, mutex=...) at ../../src/common/Cond.h:46
#2 0x0000000000780000 in SimpleMessenger::Pipe::writer (this=0x1f7e500) at ../../src/msg/SimpleMessenger.cc:1782
#3 0x0000000000624ed0 in SimpleMessenger::Pipe::Writer::entry (this=0x1f7e748) at ../../src/msg/SimpleMessenger.h:213
#4 0x00000000006f9e61 in Thread::_entry_func (arg=0x1f7e748) at ../../src/common/Thread.cc:45
#5 0x00007fc544ebfd8c in start_thread (arg=0x7fc5452e2700) at pthread_create.c:304
#6 0x00007fc54390504d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#7 0x0000000000000000 in ?? ()

Thread 24 (Thread 0x7fc5405ff700 (LWP 15254)):
#0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
#1 0x0000000000788557 in Cond::Wait (this=0x1f7e450, mutex=...) at ../../src/common/Cond.h:46
#2 0x0000000000780000 in SimpleMessenger::Pipe::writer (this=0x1f7e280) at ../../src/msg/SimpleMessenger.cc:1782
#3 0x0000000000624ed0 in SimpleMessenger::Pipe::Writer::entry (this=0x1f7e4c8) at ../../src/msg/SimpleMessenger.h:213
#4 0x00000000006f9e61 in Thread::_entry_func (arg=0x1f7e4c8) at ../../src/common/Thread.cc:45
#5 0x00007fc544ebfd8c in start_thread (arg=0x7fc5405ff700) at pthread_create.c:304
#6 0x00007fc54390504d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#7 0x0000000000000000 in ?? ()

Thread 23 (Thread 0x7fc5404fe700 (LWP 15255)):
---Type <return> to continue, or q <return> to quit---
#0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
#1 0x0000000000788557 in Cond::Wait (this=0x1f7ee50, mutex=...) at ../../src/common/Cond.h:46
#2 0x0000000000780000 in SimpleMessenger::Pipe::writer (this=0x1f7ec80) at ../../src/msg/SimpleMessenger.cc:1782
#3 0x0000000000624ed0 in SimpleMessenger::Pipe::Writer::entry (this=0x1f7eec8) at ../../src/msg/SimpleMessenger.h:213
#4 0x00000000006f9e61 in Thread::_entry_func (arg=0x1f7eec8) at ../../src/common/Thread.cc:45
#5 0x00007fc544ebfd8c in start_thread (arg=0x7fc5404fe700) at pthread_create.c:304
#6 0x00007fc54390504d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#7 0x0000000000000000 in ?? ()

Thread 22 (Thread 0x7fc5403fd700 (LWP 15256)):
#0 0x00007fc5438f7f03 in __poll (fds=<value optimized out>, nfds=<value optimized out>, timeout=<value optimized out>)
at ../sysdeps/unix/sysv/linux/poll.c:87
#1 0x000000000076fc02 in tcp_read_wait (sd=10, timeout=900000) at ../../src/msg/tcp.cc:48
#2 0x000000000076fb51 in tcp_read (cct=0x1f59000, sd=10, buf=0x185cab221e "", len=2980, timeout=900000)
at ../../src/msg/tcp.cc:25
#3 0x0000000000780df6 in SimpleMessenger::Pipe::read_message (this=0x1f7e280, pm=0x7fc5403fcd28)
at ../../src/msg/SimpleMessenger.cc:1896
#4 0x000000000077e82b in SimpleMessenger::Pipe::reader (this=0x1f7e280) at ../../src/msg/SimpleMessenger.cc:1599
#5 0x0000000000624e76 in SimpleMessenger::Pipe::Reader::entry (this=0x1f7e4b0) at ../../src/msg/SimpleMessenger.h:205
#6 0x00000000006f9e61 in Thread::_entry_func (arg=0x1f7e4b0) at ../../src/common/Thread.cc:45
#7 0x00007fc544ebfd8c in start_thread (arg=0x7fc5403fd700) at pthread_create.c:304
#8 0x00007fc54390504d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#9 0x0000000000000000 in ?? ()

Thread 21 (Thread 0x7fc5402fc700 (LWP 15258)):
#0 0x00007fc5438f7f03 in __poll (fds=<value optimized out>, nfds=<value optimized out>, timeout=<value optimized out>)
at ../sysdeps/unix/sysv/linux/poll.c:87
#1 0x000000000076fc02 in tcp_read_wait (sd=9, timeout=900000) at ../../src/msg/tcp.cc:48
#2 0x000000000076fb51 in tcp_read (cct=0x1f59000, sd=9, buf=0x185ca3460a "", len=3521, timeout=900000)
at ../../src/msg/tcp.cc:25
#3 0x0000000000780df6 in SimpleMessenger::Pipe::read_message (this=0x1f7e500, pm=0x7fc5402fbd28)
at ../../src/msg/SimpleMessenger.cc:1896
#4 0x000000000077e82b in SimpleMessenger::Pipe::reader (this=0x1f7e500) at ../../src/msg/SimpleMessenger.cc:1599
#5 0x0000000000624e76 in SimpleMessenger::Pipe::Reader::entry (this=0x1f7e730) at ../../src/msg/SimpleMessenger.h:205
#6 0x00000000006f9e61 in Thread::_entry_func (arg=0x1f7e730) at ../../src/common/Thread.cc:45
#7 0x00007fc544ebfd8c in start_thread (arg=0x7fc5402fc700) at pthread_create.c:304
#8 0x00007fc54390504d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#9 0x0000000000000000 in ?? ()

Thread 20 (Thread 0x7fc53fdf7700 (LWP 15276)):
#0 0x00007fc5438f7f03 in __poll (fds=<value optimized out>, nfds=<value optimized out>, timeout=<value optimized out>)
at ../sysdeps/unix/sysv/linux/poll.c:87
#1 0x000000000076fc02 in tcp_read_wait (sd=13, timeout=900000) at ../../src/msg/tcp.cc:48
#2 0x000000000076fb51 in tcp_read (cct=0x1f59000, sd=13, buf=0x7fc53fdf6daf "\377\310\351\362\001", len=1, timeout=900000)
at ../../src/msg/tcp.cc:25
#3 0x000000000077e245 in SimpleMessenger::Pipe::reader (this=0x1f9b500) at ../../src/msg/SimpleMessenger.cc:1567
#4 0x0000000000624e76 in SimpleMessenger::Pipe::Reader::entry (this=0x1f9b730) at ../../src/msg/SimpleMessenger.h:205
#5 0x00000000006f9e61 in Thread::_entry_func (arg=0x1f9b730) at ../../src/common/Thread.cc:45
#6 0x00007fc544ebfd8c in start_thread (arg=0x7fc53fdf7700) at pthread_create.c:304
#7 0x00007fc54390504d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#8 0x0000000000000000 in ?? ()

Thread 19 (Thread 0x7fc53fcf6700 (LWP 15277)):
#0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
#1 0x0000000000788557 in Cond::Wait (this=0x1f9b6d0, mutex=...) at ../../src/common/Cond.h:46
#2 0x0000000000780000 in SimpleMessenger::Pipe::writer (this=0x1f9b500) at ../../src/msg/SimpleMessenger.cc:1782
#3 0x0000000000624ed0 in SimpleMessenger::Pipe::Writer::entry (this=0x1f9b748) at ../../src/msg/SimpleMessenger.h:213
#4 0x00000000006f9e61 in Thread::_entry_func (arg=0x1f9b748) at ../../src/common/Thread.cc:45
#5 0x00007fc544ebfd8c in start_thread (arg=0x7fc53fcf6700) at pthread_create.c:304
#6 0x00007fc54390504d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#7 0x0000000000000000 in ?? ()

Thread 18 (Thread 0x7fc53fbf5700 (LWP 15284)):
#0 0x00007fc5438f7f03 in __poll (fds=<value optimized out>, nfds=<value optimized out>, timeout=<value optimized out>)
at ../sysdeps/unix/sysv/linux/poll.c:87
#1 0x000000000076fc02 in tcp_read_wait (sd=12, timeout=900000) at ../../src/msg/tcp.cc:48
#2 0x000000000076fb51 in tcp_read (cct=0x1f59000, sd=12, buf=0x7fc53fbf4daf "\377\260\373\362\001", len=1, timeout=900000)
at ../../src/msg/tcp.cc:25
#3 0x000000000077e245 in SimpleMessenger::Pipe::reader (this=0x1f9b280) at ../../src/msg/SimpleMessenger.cc:1567
#4 0x0000000000624e76 in SimpleMessenger::Pipe::Reader::entry (this=0x1f9b4b0) at ../../src/msg/SimpleMessenger.h:205
#5 0x00000000006f9e61 in Thread::_entry_func (arg=0x1f9b4b0) at ../../src/common/Thread.cc:45
#6 0x00007fc544ebfd8c in start_thread (arg=0x7fc53fbf5700) at pthread_create.c:304
#7 0x00007fc54390504d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#8 0x0000000000000000 in ?? ()

Thread 17 (Thread 0x7fc53faf4700 (LWP 15285)):
#0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
#1 0x0000000000788557 in Cond::Wait (this=0x1f9b450, mutex=...) at ../../src/common/Cond.h:46
---Type <return> to continue, or q <return> to quit---
#2 0x0000000000780000 in SimpleMessenger::Pipe::writer (this=0x1f9b280) at ../../src/msg/SimpleMessenger.cc:1782
#3 0x0000000000624ed0 in SimpleMessenger::Pipe::Writer::entry (this=0x1f9b4c8) at ../../src/msg/SimpleMessenger.h:213
#4 0x00000000006f9e61 in Thread::_entry_func (arg=0x1f9b4c8) at ../../src/common/Thread.cc:45
#5 0x00007fc544ebfd8c in start_thread (arg=0x7fc53faf4700) at pthread_create.c:304
#6 0x00007fc54390504d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#7 0x0000000000000000 in ?? ()

Thread 16 (Thread 0x7fc5401fb700 (LWP 15328)):
#0 0x00007fc5438f7f03 in __poll (fds=<value optimized out>, nfds=<value optimized out>, timeout=<value optimized out>)
at ../sysdeps/unix/sysv/linux/poll.c:87
#1 0x000000000076fc02 in tcp_read_wait (sd=14, timeout=900000) at ../../src/msg/tcp.cc:48
#2 0x000000000076fb51 in tcp_read (cct=0x1f59000, sd=14, buf=0x7fc5401fadaf "\377\270\365\362\001", len=1, timeout=900000)
at ../../src/msg/tcp.cc:25
#3 0x000000000077e245 in SimpleMessenger::Pipe::reader (this=0x467da00) at ../../src/msg/SimpleMessenger.cc:1567
#4 0x0000000000624e76 in SimpleMessenger::Pipe::Reader::entry (this=0x467dc30) at ../../src/msg/SimpleMessenger.h:205
#5 0x00000000006f9e61 in Thread::_entry_func (arg=0x467dc30) at ../../src/common/Thread.cc:45
#6 0x00007fc544ebfd8c in start_thread (arg=0x7fc5401fb700) at pthread_create.c:304
#7 0x00007fc54390504d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#8 0x0000000000000000 in ?? ()

Thread 15 (Thread 0x7fc5400fa700 (LWP 15336)):
#0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
#1 0x0000000000788557 in Cond::Wait (this=0x467dbd0, mutex=...) at ../../src/common/Cond.h:46
#2 0x0000000000780000 in SimpleMessenger::Pipe::writer (this=0x467da00) at ../../src/msg/SimpleMessenger.cc:1782
#3 0x0000000000624ed0 in SimpleMessenger::Pipe::Writer::entry (this=0x467dc48) at ../../src/msg/SimpleMessenger.h:213
#4 0x00000000006f9e61 in Thread::_entry_func (arg=0x467dc48) at ../../src/common/Thread.cc:45
#5 0x00007fc544ebfd8c in start_thread (arg=0x7fc5400fa700) at pthread_create.c:304
#6 0x00007fc54390504d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#7 0x0000000000000000 in ?? ()

Thread 14 (Thread 0x7fc53fff9700 (LWP 15349)):
#0 0x00007fc5438f7f03 in __poll (fds=<value optimized out>, nfds=<value optimized out>, timeout=<value optimized out>)
at ../sysdeps/unix/sysv/linux/poll.c:87
#1 0x000000000076fc02 in tcp_read_wait (sd=15, timeout=900000) at ../../src/msg/tcp.cc:48
#2 0x000000000076fb51 in tcp_read (cct=0x1f59000, sd=15, buf=0x7fc53fff8daf "\377\320\343\362\001", len=1, timeout=900000)
at ../../src/msg/tcp.cc:25
#3 0x000000000077e245 in SimpleMessenger::Pipe::reader (this=0x467d780) at ../../src/msg/SimpleMessenger.cc:1567
#4 0x0000000000624e76 in SimpleMessenger::Pipe::Reader::entry (this=0x467d9b0) at ../../src/msg/SimpleMessenger.h:205
#5 0x00000000006f9e61 in Thread::_entry_func (arg=0x467d9b0) at ../../src/common/Thread.cc:45
#6 0x00007fc544ebfd8c in start_thread (arg=0x7fc53fff9700) at pthread_create.c:304
#7 0x00007fc54390504d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#8 0x0000000000000000 in ?? ()

Thread 13 (Thread 0x7fc53fef8700 (LWP 15350)):
#0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
#1 0x0000000000788557 in Cond::Wait (this=0x467d950, mutex=...) at ../../src/common/Cond.h:46
#2 0x0000000000780000 in SimpleMessenger::Pipe::writer (this=0x467d780) at ../../src/msg/SimpleMessenger.cc:1782
#3 0x0000000000624ed0 in SimpleMessenger::Pipe::Writer::entry (this=0x467d9c8) at ../../src/msg/SimpleMessenger.h:213
#4 0x00000000006f9e61 in Thread::_entry_func (arg=0x467d9c8) at ../../src/common/Thread.cc:45
#5 0x00007fc544ebfd8c in start_thread (arg=0x7fc53fef8700) at pthread_create.c:304
#6 0x00007fc54390504d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#7 0x0000000000000000 in ?? ()

Thread 12 (Thread 0x7fc53f9f3700 (LWP 15356)):
#0 0x00007fc5438f7f03 in __poll (fds=<value optimized out>, nfds=<value optimized out>, timeout=<value optimized out>)
at ../sysdeps/unix/sysv/linux/poll.c:87
#1 0x000000000076fc02 in tcp_read_wait (sd=16, timeout=900000) at ../../src/msg/tcp.cc:48
#2 0x000000000076fb51 in tcp_read (cct=0x1f59000, sd=16, buf=0x7fc53f9f2daf "\377", len=1, timeout=900000)
at ../../src/msg/tcp.cc:25
#3 0x000000000077e245 in SimpleMessenger::Pipe::reader (this=0x467d500) at ../../src/msg/SimpleMessenger.cc:1567
#4 0x0000000000624e76 in SimpleMessenger::Pipe::Reader::entry (this=0x467d730) at ../../src/msg/SimpleMessenger.h:205
#5 0x00000000006f9e61 in Thread::_entry_func (arg=0x467d730) at ../../src/common/Thread.cc:45
#6 0x00007fc544ebfd8c in start_thread (arg=0x7fc53f9f3700) at pthread_create.c:304
#7 0x00007fc54390504d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#8 0x0000000000000000 in ?? ()

Thread 11 (Thread 0x7fc53f8f2700 (LWP 15358)):
#0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
#1 0x0000000000788557 in Cond::Wait (this=0x467d6d0, mutex=...) at ../../src/common/Cond.h:46
#2 0x0000000000780000 in SimpleMessenger::Pipe::writer (this=0x467d500) at ../../src/msg/SimpleMessenger.cc:1782
#3 0x0000000000624ed0 in SimpleMessenger::Pipe::Writer::entry (this=0x467d748) at ../../src/msg/SimpleMessenger.h:213
#4 0x00000000006f9e61 in Thread::_entry_func (arg=0x467d748) at ../../src/common/Thread.cc:45
#5 0x00007fc544ebfd8c in start_thread (arg=0x7fc53f8f2700) at pthread_create.c:304
#6 0x00007fc54390504d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#7 0x0000000000000000 in ?? ()

Thread 10 (Thread 0x7fc53f7f1700 (LWP 15856)):
#0 0x00007fc5438f7f03 in __poll (fds=<value optimized out>, nfds=<value optimized out>, timeout=<value optimized out>)
---Type <return> to continue, or q <return> to quit---
at ../sysdeps/unix/sysv/linux/poll.c:87
#1 0x000000000076fc02 in tcp_read_wait (sd=11, timeout=900000) at ../../src/msg/tcp.cc:48
#2 0x000000000076fb51 in tcp_read (cct=0x1f59000, sd=11, buf=0x7fc53f7f0daf "\377p+\363\001", len=1, timeout=900000)
at ../../src/msg/tcp.cc:25
#3 0x000000000077e245 in SimpleMessenger::Pipe::reader (this=0x1f7ec80) at ../../src/msg/SimpleMessenger.cc:1567
#4 0x0000000000624e76 in SimpleMessenger::Pipe::Reader::entry (this=0x1f7eeb0) at ../../src/msg/SimpleMessenger.h:205
#5 0x00000000006f9e61 in Thread::_entry_func (arg=0x1f7eeb0) at ../../src/common/Thread.cc:45
#6 0x00007fc544ebfd8c in start_thread (arg=0x7fc53f7f1700) at pthread_create.c:304
#7 0x00007fc54390504d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#8 0x0000000000000000 in ?? ()

Thread 9 (Thread 0x7fc53f6f0700 (LWP 18127)):
#0 0x00007fc5438f7f03 in __poll (fds=<value optimized out>, nfds=<value optimized out>, timeout=<value optimized out>)
at ../sysdeps/unix/sysv/linux/poll.c:87
#1 0x000000000076fc02 in tcp_read_wait (sd=19, timeout=900000) at ../../src/msg/tcp.cc:48
#2 0x000000000076fb51 in tcp_read (cct=0x1f59000, sd=19, buf=0x7fc53f6efdaf "\377@O\363\001", len=1, timeout=900000)
at ../../src/msg/tcp.cc:25
#3 0x000000000077e245 in SimpleMessenger::Pipe::reader (this=0x1858955c80) at ../../src/msg/SimpleMessenger.cc:1567
#4 0x0000000000624e76 in SimpleMessenger::Pipe::Reader::entry (this=0x1858955eb0) at ../../src/msg/SimpleMessenger.h:205
#5 0x00000000006f9e61 in Thread::_entry_func (arg=0x1858955eb0) at ../../src/common/Thread.cc:45
#6 0x00007fc544ebfd8c in start_thread (arg=0x7fc53f6f0700) at pthread_create.c:304
#7 0x00007fc54390504d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#8 0x0000000000000000 in ?? ()

Thread 8 (Thread 0x7fc53f194700 (LWP 18129)):
#0 0x00007fc5438f7f03 in __poll (fds=<value optimized out>, nfds=<value optimized out>, timeout=<value optimized out>)
at ../sysdeps/unix/sysv/linux/poll.c:87
#1 0x000000000076fc02 in tcp_read_wait (sd=21, timeout=900000) at ../../src/msg/tcp.cc:48
#2 0x000000000076fb51 in tcp_read (cct=0x1f59000, sd=21, buf=0x7fc53f193daf "\377 g\363\001", len=1, timeout=900000)
at ../../src/msg/tcp.cc:25
#3 0x000000000077e245 in SimpleMessenger::Pipe::reader (this=0x1858955000) at ../../src/msg/SimpleMessenger.cc:1567
#4 0x0000000000624e76 in SimpleMessenger::Pipe::Reader::entry (this=0x1858955230) at ../../src/msg/SimpleMessenger.h:205
#5 0x00000000006f9e61 in Thread::_entry_func (arg=0x1858955230) at ../../src/common/Thread.cc:45
#6 0x00007fc544ebfd8c in start_thread (arg=0x7fc53f194700) at pthread_create.c:304
#7 0x00007fc54390504d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#8 0x0000000000000000 in ?? ()

Thread 7 (Thread 0x7fc53ce83700 (LWP 18130)):
#0 0x00007fc5438f7f03 in __poll (fds=<value optimized out>, nfds=<value optimized out>, timeout=<value optimized out>)
at ../sysdeps/unix/sysv/linux/poll.c:87
#1 0x000000000076fc02 in tcp_read_wait (sd=22, timeout=900000) at ../../src/msg/tcp.cc:48
#2 0x000000000076fb51 in tcp_read (cct=0x1f59000, sd=22, buf=0x7fc53ce82daf "\377HI\363\001", len=1, timeout=900000)
at ../../src/msg/tcp.cc:25
#3 0x000000000077e245 in SimpleMessenger::Pipe::reader (this=0x1858955280) at ../../src/msg/SimpleMessenger.cc:1567
#4 0x0000000000624e76 in SimpleMessenger::Pipe::Reader::entry (this=0x18589554b0) at ../../src/msg/SimpleMessenger.h:205
#5 0x00000000006f9e61 in Thread::_entry_func (arg=0x18589554b0) at ../../src/common/Thread.cc:45
#6 0x00007fc544ebfd8c in start_thread (arg=0x7fc53ce83700) at pthread_create.c:304
#7 0x00007fc54390504d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#8 0x0000000000000000 in ?? ()

Thread 6 (Thread 0x7fc53d9fc700 (LWP 18131)):
#0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
#1 0x0000000000788557 in Cond::Wait (this=0x18589551d0, mutex=...) at ../../src/common/Cond.h:46
#2 0x0000000000780000 in SimpleMessenger::Pipe::writer (this=0x1858955000) at ../../src/msg/SimpleMessenger.cc:1782
#3 0x0000000000624ed0 in SimpleMessenger::Pipe::Writer::entry (this=0x1858955248) at ../../src/msg/SimpleMessenger.h:213
#4 0x00000000006f9e61 in Thread::_entry_func (arg=0x1858955248) at ../../src/common/Thread.cc:45
#5 0x00007fc544ebfd8c in start_thread (arg=0x7fc53d9fc700) at pthread_create.c:304
#6 0x00007fc54390504d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#7 0x0000000000000000 in ?? ()

Thread 5 (Thread 0x7fc53d085700 (LWP 18132)):
#0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
#1 0x0000000000788557 in Cond::Wait (this=0x1858955e50, mutex=...) at ../../src/common/Cond.h:46
#2 0x0000000000780000 in SimpleMessenger::Pipe::writer (this=0x1858955c80) at ../../src/msg/SimpleMessenger.cc:1782
#3 0x0000000000624ed0 in SimpleMessenger::Pipe::Writer::entry (this=0x1858955ec8) at ../../src/msg/SimpleMessenger.h:213
#4 0x00000000006f9e61 in Thread::_entry_func (arg=0x1858955ec8) at ../../src/common/Thread.cc:45
#5 0x00007fc544ebfd8c in start_thread (arg=0x7fc53d085700) at pthread_create.c:304
#6 0x00007fc54390504d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#7 0x0000000000000000 in ?? ()

Thread 4 (Thread 0x7fc53dafd700 (LWP 18133)):
#0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
#1 0x0000000000788557 in Cond::Wait (this=0x1858955450, mutex=...) at ../../src/common/Cond.h:46
#2 0x0000000000780000 in SimpleMessenger::Pipe::writer (this=0x1858955280) at ../../src/msg/SimpleMessenger.cc:1782
#3 0x0000000000624ed0 in SimpleMessenger::Pipe::Writer::entry (this=0x18589554c8) at ../../src/msg/SimpleMessenger.h:213
#4 0x00000000006f9e61 in Thread::_entry_func (arg=0x18589554c8) at ../../src/common/Thread.cc:45
#5 0x00007fc544ebfd8c in start_thread (arg=0x7fc53dafd700) at pthread_create.c:304
#6 0x00007fc54390504d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
---Type <return> to continue, or q <return> to quit---
#7 0x0000000000000000 in ?? ()

Thread 3 (Thread 0x7fc53ede6700 (LWP 18135)):
#0 0x00007fc5438f7f03 in __poll (fds=<value optimized out>, nfds=<value optimized out>, timeout=<value optimized out>)
at ../sysdeps/unix/sysv/linux/poll.c:87
#1 0x000000000076fc02 in tcp_read_wait (sd=17, timeout=900000) at ../../src/msg/tcp.cc:48
#2 0x000000000076fb51 in tcp_read (cct=0x1f59000, sd=17, buf=0x7fc53ede5daf "\377PC\363\001", len=1, timeout=900000)
at ../../src/msg/tcp.cc:25
#3 0x000000000077e245 in SimpleMessenger::Pipe::reader (this=0x184dbfe000) at ../../src/msg/SimpleMessenger.cc:1567
#4 0x0000000000624e76 in SimpleMessenger::Pipe::Reader::entry (this=0x184dbfe230) at ../../src/msg/SimpleMessenger.h:205
#5 0x00000000006f9e61 in Thread::_entry_func (arg=0x184dbfe230) at ../../src/common/Thread.cc:45
#6 0x00007fc544ebfd8c in start_thread (arg=0x7fc53ede6700) at pthread_create.c:304
#7 0x00007fc54390504d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#8 0x0000000000000000 in ?? ()

Thread 2 (Thread 0x7fc53eff8700 (LWP 18136)):
#0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
#1 0x0000000000788557 in Cond::Wait (this=0x184dbfe1d0, mutex=...) at ../../src/common/Cond.h:46
#2 0x0000000000780000 in SimpleMessenger::Pipe::writer (this=0x184dbfe000) at ../../src/msg/SimpleMessenger.cc:1782
#3 0x0000000000624ed0 in SimpleMessenger::Pipe::Writer::entry (this=0x184dbfe248) at ../../src/msg/SimpleMessenger.h:213
#4 0x00000000006f9e61 in Thread::_entry_func (arg=0x184dbfe248) at ../../src/common/Thread.cc:45
#5 0x00007fc544ebfd8c in start_thread (arg=0x7fc53eff8700) at pthread_create.c:304
#6 0x00007fc54390504d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#7 0x0000000000000000 in ?? ()

Thread 1 (Thread 0x7fc5452e6760 (LWP 15246)):
#0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
#1 0x0000000000788557 in Cond::Wait (this=0x1f7f1c0, mutex=...) at ../../src/common/Cond.h:46
#2 0x00000000007865ca in SimpleMessenger::wait (this=0x1f7f000) at ../../src/msg/SimpleMessenger.cc:2629
#3 0x0000000000622096 in main (argc=5, argv=0x7fff8253aac8) at ../../src/cmon.cc:285

Actions #2

Updated by Greg Farnum over 12 years ago

Hmm. What version are you running? (For some reason it's not being dumped in the backtrace...)

Also, you have 5 monitors and the other 4 are running correctly?

Actions #3

Updated by Sam Lang over 12 years ago

stable branch as of: 7a8ab747addf493cb4b82351aeb3c2e07ba46a95

Actions #4

Updated by Sage Weil over 12 years ago

  • Target version set to v0.36

Any chance the original crash was on ENOSPC?

And for the restart bug, can you capture a log with debug ms = 1 and debug mon = 10?

Thanks!

Actions #5

Updated by Sage Weil over 12 years ago

  • Translation missing: en.field_position set to 9
Actions #6

Updated by Sam Lang over 12 years ago

Ah yes, probably ENOSPC. With a fresh restart of the whole cluster, the monitor process seemed fine. If it happens again I'll try to get some debugging output.

Actions #7

Updated by Sage Weil over 12 years ago

  • Status changed from New to Closed

Ok. The monitor should also log what error it hits now (tho of course that won't help if the logs are on the same partition that's getting ENOSPC).

Actions #8

Updated by Sage Weil over 12 years ago

  • Target version deleted (v0.36)
  • Translation missing: en.field_position deleted (11)
  • Translation missing: en.field_position set to 27
Actions

Also available in: Atom PDF