Project

General

Profile

Actions

Bug #2569

closed

msgr: connect_rank crash

Added by Sage Weil almost 12 years ago. Updated about 5 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description


2012-06-12T16:46:51.280 INFO:teuthology.task.ceph.mon.g.err:msg/SimpleMessenger.cc: In function 'void SimpleMessenger::Pipe::register_pipe()' thread 7ff94b5d5700 time 2012-06-12 16:46:51.262641
2012-06-12T16:46:51.280 INFO:teuthology.task.ceph.mon.g.err:msg/SimpleMessenger.cc: 1323: FAILED assert(msgr->rank_pipe.count(peer_addr) == 0)
2012-06-12T16:46:51.280 INFO:teuthology.task.ceph.mon.g.err: ceph version 0.47.2-463-gb88a5e5 (commit:b88a5e5344b34db3404ab1a0b32b8f1b9eb83a09)
2012-06-12T16:46:51.280 INFO:teuthology.task.ceph.mon.g.err: 1: (SimpleMessenger::Pipe::register_pipe()+0x270) [0x5c1300]
2012-06-12T16:46:51.280 INFO:teuthology.task.ceph.mon.g.err: 2: (SimpleMessenger::connect_rank(entity_addr_t const&, int, Connection*)+0x782) [0x5c4922]
2012-06-12T16:46:51.280 INFO:teuthology.task.ceph.mon.g.err: 3: (SimpleMessenger::submit_message(Message*, Connection*, entity_addr_t const&, int, bool)+0x99b) [0x5d2bcb]
2012-06-12T16:46:51.281 INFO:teuthology.task.ceph.mon.g.err: 4: (SimpleMessenger::_send_message(Message*, Connection*, bool)+0x228) [0x5d3108]
2012-06-12T16:46:51.281 INFO:teuthology.task.ceph.mon.g.err: 5: (Monitor::handle_probe_probe(MMonProbe*)+0x542) [0x482312]
2012-06-12T16:46:51.281 INFO:teuthology.task.ceph.mon.g.err: 6: (Monitor::handle_probe(MMonProbe*)+0x35b) [0x48662b]
2012-06-12T16:46:51.287 INFO:teuthology.task.ceph.mon.g.err: 7: (Monitor::_ms_dispatch(Message*)+0x1304) [0x487964]
2012-06-12T16:46:51.287 INFO:teuthology.task.ceph.mon.g.err: 8: (Monitor::ms_dispatch(Message*)+0x32) [0x494b22]
2012-06-12T16:46:51.287 INFO:teuthology.task.ceph.mon.g.err: 9: (SimpleMessenger::dispatch_entry()+0x92b) [0x5c5d6b]
2012-06-12T16:46:51.287 INFO:teuthology.task.ceph.mon.g.err: 10: (SimpleMessenger::DispatchThread::entry()+0xd) [0x597c7d]
2012-06-12T16:46:51.287 INFO:teuthology.task.ceph.mon.g.err: 11: (()+0x7e9a) [0x7ff95012ee9a]
2012-06-12T16:46:51.287 INFO:teuthology.task.ceph.mon.g.err: 12: (clone()+0x6d) [0x7ff94e8e74bd]
2012-06-12T16:46:51.287 INFO:teuthology.task.ceph.mon.g.err: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
2012-06-12T16:46:51.287 INFO:teuthology.task.ceph.mon.g.err:2012-06-12 16:46:51.264375 7ff94b5d5700 -1 msg/SimpleMessenger.cc: In function 'void SimpleMessenger::Pipe::register_pipe()' thread 7ff94b5d5700 time 2012-06-12 16:46:51.262641
2012-06-12T16:46:51.288 INFO:teuthology.task.ceph.mon.g.err:msg/SimpleMessenger.cc: 1323: FAILED assert(msgr->rank_pipe.count(peer_addr) == 0)

job was

kernel: &id001
  branch: testing
nuke-on-error: true
overrides:
  ceph:
    branch: master
    log-whitelist:
    - clocks not synchronized
    - slow request
roles:
- - mon.a
  - mon.d
  - mon.g
  - mon.j
  - mon.m
  - mon.p
  - mon.s
  - osd.0
- - mon.b
  - mon.e
  - mon.h
  - mon.k
  - mon.n
  - mon.q
  - mon.t
  - mds.a
- - mon.c
  - mon.f
  - mon.i
  - mon.l
  - mon.o
  - mon.r
  - mon.u
  - osd.1
targets:
  ubuntu@plana46.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCrD6s9otJ5xCNH4nyv0iJu6AoqmlNTFd8D0X9RfFBnmOMrMBWU9kwsFzPIOsuJGbSYbA8LCtjWUwaWoXmbEFtTMitxaDXp47gbVNXknHq7TGZHkWWOwKKu+tlSQBpCVzO/rzBbvJ9fcG7tewq5XcIHz0IUXsUFuEuXR1HaTUJKic2twBpaeAGNvdd6IZ9Sz9TMkfiRV/aVdcHJ/yF8bsXi3pfRPR3puMK/Nyfq5Hz/aabQo1TSyK2o0weoWV7D8vD6S8f3D7p5/5ScBhL3zUcP85SsV47W+/hTFbU8kN1Grlv2sx0fVMB/TUB/UNVdsHKGn5Nv6zb/qMqBEx9nSeZ9
  ubuntu@plana53.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCyKZ4FptN6/LE+aYMkIRuwd/OexUuzb0oQao8qhiQNXwV7l0t/un6s7b+EZPWF08g67gCdeV5pQSF9mGGh49AA6UKAs1kACNQnF1OC/kngqICwXt6XacT4uJn6A5kBwrmdjAV97+WBH7a+5sZwtPbIXIlwi/Bxkl0YUtsdszTj6r303FsXywMwj296jaPkYMy4RZFmlLFOzP9X2tZ137lnFKyP1odLce0GQ70pPt96ojwylZT8OAtHHGGaxJzc60LD3isvr6U8W8pTA1B3jU4FhWDkBVvGtWEq6a/ZRvujVVjz6GYRbbb5/Mq+jTS+7SEw/2FUuFQQ1sEda6US5xFZ
  ubuntu@plana55.front.sepia.ceph.com: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDXY+KFAAzWoJq5vLwy6PJHxNeqz3fHCisJDAbtdrnjhhxVyUQtQLlhIPqiQHi6PADNYNUS/4um0TNmDFYxJLJU9SxqmBQ3QTM9F56YQa9F/+98o4LyPLS5TXqq+nCDbU1vhMbpu0mv2MDZ9BVZAgdT/yYgYGErIQz2MnaCAbgp0SRSZOxq0/3KgMz4W0KxkagiNglZV3RvarYASdqZheYeQYtnIyEw+Hk/ZLHoxUirBthAuCu5RvYYTDptQDuOR0tjRaMS81kapD5VZhFbetSxJ9rJ21oepmLSY+0UoIufZS4CNJ/sP2HDDc1Pw1mjJhqClScxTOP1yUnNWhW1d0sP
tasks:
- internal.lock_machines: 3
- internal.save_config: null
- internal.check_lock: null
- internal.connect: null
- internal.check_conflict: null
- kernel: *id001
- internal.base: null
- internal.archive: null
- internal.coredump: null
- internal.syslog: null
- internal.timer: null
- chef: null
- ceph: null
- mon_recovery: null

ubuntu@teuthology:/a/master-2012-06-12_16:17:15/7473
Actions #1

Updated by Greg Farnum almost 12 years ago

  • Status changed from 12 to Need More Info

I'm attempting to reproduce this, but what's available right now is just the teuthology log — it didn't pull off any of the daemon logs, the core dump, or anything else (the SSH connections appear to have failed).
Given that, all I can do is guess wildly, and my guesses tell me that this has to do with restarting bunches of monitors all together and not having the nonce in the monmap, and that combination breaking something that won't be broken by other things.

Actions #2

Updated by Mark Nelson almost 12 years ago

Saw the following while debugging my aging test scripts. Seems to have happened when the mon was started. No core dump sadly.

2012-06-25 07:35:03.493418 7f50d6bc0780 1 store(/var/lib/ceph/mon/ceph-a) mount
2012-06-25 07:35:03.493775 7f50d6bc0780 0 ceph version 0.47.2-521-g88c7629 (commit:88c7629e041699c25a7c91114bd1ac4ffc64c3eb), process ceph-mon, pid 4955
2012-06-25 07:35:03.495754 7f50d6bc0780 1 mon.a@-1(probing) e0 init fsid e010871a-8adb-4503-b17e-e0e68967a9b5
2012-06-25 07:35:03.498224 7f50d6bc0780 0 mon.a@-1(probing) e0 my rank is now 2 (was 1)
2012-06-25 07:35:03.499467 7f50d0a3e700 0 -
10.214.146.30:6789/0 >> 10.214.146.24:6789/0 pipe(0x19c6280 sd=18 pgs=9 cs=1 l=0).fault initiating reconnect
2012-06-25 07:35:03.500275 7f50d093d700 0 -- 10.214.146.30:6789/0 >> 10.214.146.24:6789/0 pipe(0x19c6a00 sd=19 pgs=10 cs=1 l=0).fault with nothing to send, going to standby
2012-06-25 07:35:03.500349 7f50d1b41700 1 msg/SimpleMessenger.cc: In function 'void SimpleMessenger::Pipe::register_pipe()' thread 7f50d1b41700 time 2012-06-25 07:35:03.498577
msg/SimpleMessenger.cc: 1323: FAILED assert(msgr
>rank_pipe.count(peer_addr) == 0)

ceph version 0.47.2-521-g88c7629 (commit:88c7629e041699c25a7c91114bd1ac4ffc64c3eb)
1: (SimpleMessenger::Pipe::register_pipe()+0x270) [0x5c2d90]
2: (SimpleMessenger::connect_rank(entity_addr_t const&, int, Connection*)+0x782) [0x5c63b2]
3: (SimpleMessenger::submit_message(Message*, Connection*, entity_addr_t const&, int, bool)+0x99b) [0x5d45cb]
4: (SimpleMessenger::_send_message(Message*, Connection*, bool)+0x228) [0x5d4b08]
5: (Monitor::handle_probe_probe(MMonProbe*)+0x542) [0x4826f2]
6: (Monitor::handle_probe(MMonProbe*)+0x35b) [0x486a0b]
7: (Monitor::_ms_dispatch(Message*)+0x1304) [0x487d44]
8: (Monitor::ms_dispatch(Message*)+0x32) [0x4950b2]
9: (SimpleMessenger::dispatch_entry()+0x92b) [0x5c782b]
10: (SimpleMessenger::DispatchThread::entry()+0xd) [0x5992cd]
11: (()+0x7e9a) [0x7f50d679be9a]
12: (clone()+0x6d) [0x7f50d4f544bd]
NOTE: a copy of the executable, or `objdump -rdS &lt;executable&gt;` is needed to interpret this.

--- begin dump of recent events ---
6> 2012-06-25 07:35:03.493418 7f50d6bc0780 1 store(/var/lib/ceph/mon/ceph-a) mount
-5> 2012-06-25 07:35:03.493775 7f50d6bc0780 0 ceph version 0.47.2-521-g88c7629 (commit:88c7629e041699c25a7c91114bd1ac4ffc64c3eb), process ceph-mon, pid 4955
-4> 2012-06-25 07:35:03.495754 7f50d6bc0780 1 mon.a@-1(probing) e0 init fsid e010871a-8adb-4503-b17e-e0e68967a9b5
-3> 2012-06-25 07:35:03.498224 7f50d6bc0780 0 mon.a@-1(probing) e0 my rank is now 2 (was -1)
-2> 2012-06-25 07:35:03.499467 7f50d0a3e700 0 -
10.214.146.30:6789/0 >> 10.214.146.24:6789/0 pipe(0x19c6280 sd=18 pgs=9 cs=1 l=0).fault initiating reconnect
1> 2012-06-25 07:35:03.500275 7f50d093d700 0 - 10.214.146.30:6789/0 >> 10.214.146.24:6789/0 pipe(0x19c6a00 sd=19 pgs=10 cs=1 l=0).fault with nothing to send, going to standby
0> 2012-06-25 07:35:03.500349 7f50d1b41700 1 msg/SimpleMessenger.cc: In function 'void SimpleMessenger::Pipe::register_pipe()' thread 7f50d1b41700 time 2012-06-25 07:35:03.498577
msg/SimpleMessenger.cc: 1323: FAILED assert(msgr
>rank_pipe.count(peer_addr) == 0)

ceph version 0.47.2-521-g88c7629 (commit:88c7629e041699c25a7c91114bd1ac4ffc64c3eb)
1: (SimpleMessenger::Pipe::register_pipe()+0x270) [0x5c2d90]
2: (SimpleMessenger::connect_rank(entity_addr_t const&, int, Connection*)+0x782) [0x5c63b2]
3: (SimpleMessenger::submit_message(Message*, Connection*, entity_addr_t const&, int, bool)+0x99b) [0x5d45cb]
4: (SimpleMessenger::_send_message(Message*, Connection*, bool)+0x228) [0x5d4b08]
5: (Monitor::handle_probe_probe(MMonProbe*)+0x542) [0x4826f2]
6: (Monitor::handle_probe(MMonProbe*)+0x35b) [0x486a0b]
7: (Monitor::_ms_dispatch(Message*)+0x1304) [0x487d44]
8: (Monitor::ms_dispatch(Message*)+0x32) [0x4950b2]
9: (SimpleMessenger::dispatch_entry()+0x92b) [0x5c782b]
10: (SimpleMessenger::DispatchThread::entry()+0xd) [0x5992cd]
11: (()+0x7e9a) [0x7f50d679be9a]
12: (clone()+0x6d) [0x7f50d4f544bd]
NOTE: a copy of the executable, or `objdump -rdS &lt;executable&gt;` is needed to interpret this.

--- end dump of recent events ---
2012-06-25 07:35:03.502641 7f50d1b41700 -1 ** Caught signal (Aborted) *
in thread 7f50d1b41700

ceph version 0.47.2-521-g88c7629 (commit:88c7629e041699c25a7c91114bd1ac4ffc64c3eb)
1: /usr/bin/ceph-mon() [0x52da0a]
2: (()+0xfcb0) [0x7f50d67a3cb0]
3: (gsignal()+0x35) [0x7f50d4e98445]
4: (abort()+0x17b) [0x7f50d4e9bbab]
5: (_gnu_cxx::_verbose_terminate_handler()+0x11d) [0x7f50d57e669d]
6: (()+0xb5846) [0x7f50d57e4846]
7: (()+0xb5873) [0x7f50d57e4873]
8: (()+0xb596e) [0x7f50d57e496e]
9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x282) [0x5e4f62]
10: (SimpleMessenger::Pipe::register_pipe()+0x270) [0x5c2d90]
11: (SimpleMessenger::connect_rank(entity_addr_t const&, int, Connection*)+0x782) [0x5c63b2]
12: (SimpleMessenger::submit_message(Message*, Connection*, entity_addr_t const&, int, bool)+0x99b) [0x5d45cb]
13: (SimpleMessenger::_send_message(Message*, Connection*, bool)+0x228) [0x5d4b08]
14: (Monitor::handle_probe_probe(MMonProbe*)+0x542) [0x4826f2]
15: (Monitor::handle_probe(MMonProbe*)+0x35b) [0x486a0b]
16: (Monitor::_ms_dispatch(Message*)+0x1304) [0x487d44]
17: (Monitor::ms_dispatch(Message*)+0x32) [0x4950b2]
18: (SimpleMessenger::dispatch_entry()+0x92b) [0x5c782b]
19: (SimpleMessenger::DispatchThread::entry()+0xd) [0x5992cd]
20: (()+0x7e9a) [0x7f50d679be9a]
21: (clone()+0x6d) [0x7f50d4f544bd]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- begin dump of recent events ---
0> 2012-06-25 07:35:03.502641 7f50d1b41700 -1 ** Caught signal (Aborted) *
in thread 7f50d1b41700

ceph version 0.47.2-521-g88c7629 (commit:88c7629e041699c25a7c91114bd1ac4ffc64c3eb)
1: /usr/bin/ceph-mon() [0x52da0a]
2: (()+0xfcb0) [0x7f50d67a3cb0]
3: (gsignal()+0x35) [0x7f50d4e98445]
4: (abort()+0x17b) [0x7f50d4e9bbab]
5: (_gnu_cxx::_verbose_terminate_handler()+0x11d) [0x7f50d57e669d]
6: (()+0xb5846) [0x7f50d57e4846]
7: (()+0xb5873) [0x7f50d57e4873]
8: (()+0xb596e) [0x7f50d57e496e]
9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x282) [0x5e4f62]
10: (SimpleMessenger::Pipe::register_pipe()+0x270) [0x5c2d90]
11: (SimpleMessenger::connect_rank(entity_addr_t const&, int, Connection*)+0x782) [0x5c63b2]
12: (SimpleMessenger::submit_message(Message*, Connection*, entity_addr_t const&, int, bool)+0x99b) [0x5d45cb]
13: (SimpleMessenger::_send_message(Message*, Connection*, bool)+0x228) [0x5d4b08]
14: (Monitor::handle_probe_probe(MMonProbe*)+0x542) [0x4826f2]
15: (Monitor::handle_probe(MMonProbe*)+0x35b) [0x486a0b]
16: (Monitor::_ms_dispatch(Message*)+0x1304) [0x487d44]
17: (Monitor::ms_dispatch(Message*)+0x32) [0x4950b2]
18: (SimpleMessenger::dispatch_entry()+0x92b) [0x5c782b]
19: (SimpleMessenger::DispatchThread::entry()+0xd) [0x5992cd]
20: (()+0x7e9a) [0x7f50d679be9a]
21: (clone()+0x6d) [0x7f50d4f544bd]
NOTE: a copy of the executable, or `objdump -rdS &lt;executable&gt;` is needed to interpret this.

--- end dump of recent events ---

Actions #3

Updated by Mark Nelson almost 12 years ago

All three mon nodes and a client node on the second aging cluster died over the weekend (kernel and all). Looks like it might have been the same bug as above. From the 3rd mon:

2012-06-20 14:55:09.752866 7f81706f8780  1 store(/srv/mon.c) mount
2012-06-20 14:55:09.766425 7f81706f8780  0 ceph version 0.47.2-521-g88c7629 (commit:88c7629e041699c25a7c91114bd1ac4ffc64c3eb), process ceph-mon, pid 19110
2012-06-20 14:55:09.766587 7f81706f8780  1 -- 10.214.133.34:6789/0 accepter.bind my_inst.addr is 10.214.133.34:6789/0 need_addr=0
2012-06-20 14:55:09.767991 7f81706f8780  1 -- 10.214.133.34:6789/0 messenger.start
2012-06-20 14:55:09.768087 7f81706f8780  1 -- 10.214.133.34:6789/0 accepter.start
2012-06-20 14:55:09.768344 7f81706f8780  1 mon.c@-1(probing) e1 init fsid cc58a8e4-1465-49ff-8a47-02cf132fadba
2012-06-20 14:55:09.768462 7f81706f8780 10 mon.c@-1(probing) e1 features compat={},rocompat={},incompat={1=initial feature set (~v.18)}
2012-06-20 14:55:09.768507 7f81706f8780 10 mon.c@-1(probing) e1 has_ever_joined = 0
2012-06-20 14:55:09.830962 7f81706f8780  7 mon.c@-1(probing).pg v0 update_from_paxos loading latest full pgmap v674387
2012-06-20 14:55:09.901029 7f81706f8780 10 mon.c@-1(probing).pg v674387 send_pg_creates to 0 pgs
2012-06-20 14:55:09.901058 7f81706f8780 10 mon.c@-1(probing).pg v674387 update_logger
2012-06-20 14:55:09.907927 7f81706f8780 10 mon.c@-1(probing).mds e0 update_from_paxos paxosv 1, my e 0
2012-06-20 14:55:09.907985 7f81706f8780 10 mon.c@-1(probing).mds e0 update_from_paxos  got 1
2012-06-20 14:55:09.908012 7f81706f8780  4 mon.c@-1(probing).mds e1 new map
2012-06-20 14:55:09.908014 7f81706f8780  7 mon.c@-1(probing).mds e1 print_map
epoch   1
flags   0
created 2012-05-22 06:30:16.233146
modified        2012-05-22 06:30:16.233184
tableserver     0
root    0
session_timeout 60
session_autoclose       300
last_failure    0
last_failure_osd_epoch  0
compat  compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object}
max_mds 1
in      
up      {}
failed  
stopped 
data_pools      [0]
metadata_pool   1

2012-06-20 14:55:09.908072 7f81706f8780 10 mon.c@-1(probing).mds e1 update_logger
2012-06-20 14:55:09.927544 7f81706f8780  7 mon.c@-1(probing).osd e0 update_from_paxos loading latest full map e3996
2012-06-20 14:55:09.928506 7f81706f8780 10 mon.c@-1(probing).osd e3996 send_to_waiting 3996
2012-06-20 14:55:09.928525 7f81706f8780 10 mon.c@-1(probing).osd e3996 update_logger
2012-06-20 14:55:09.931520 7f81706f8780  7 mon.c@-1(probing).log v186969 update_from_paxos loading summary e186969
2012-06-20 14:55:09.931700 7f81706f8780 10 mon.c@-1(probing).log v186969 check_subs
2012-06-20 14:55:09.932790 7f81706f8780 10 mon.c@-1(probing).auth v768 update_from_paxos()
2012-06-20 14:55:09.932835 7f81706f8780  7 mon.c@-1(probing).auth v768 update_from_paxos loading summary e768
2012-06-20 14:55:09.933231 7f81706f8780 10 mon.c@-1(probing).auth v768 update_from_paxos() last_allocated_id=4796 max_global_id=4796
2012-06-20 14:55:09.933428 7f81706f8780 -1 auth: error reading file: /srv/mon.c/keyring: can't open /srv/mon.c/keyring: (2) No such file or directory
2012-06-20 14:55:09.933444 7f81706f8780  1 mon.c@-1(probing) e1 copying mon. key from old db to external keyring
2012-06-20 14:55:09.934205 7f81706f4700  1 -- 10.214.133.34:6789/0 >> :/0 pipe(0x26e1000 sd=18 pgs=0 cs=0 l=0).accept sd=18
2012-06-20 14:55:09.977401 7f81706f8780 10 mon.c@-1(probing) e1 bootstrap
2012-06-20 14:55:09.977418 7f81706f8780 10 mon.c@-1(probing) e1 unregister_cluster_logger - not registered
2012-06-20 14:55:09.977421 7f81706f8780 10 mon.c@-1(probing) e1 cancel_probe_timeout (none scheduled)
2012-06-20 14:55:09.977424 7f81706f8780  0 mon.c@-1(probing) e1  my rank is now 0 (was -1)
2012-06-20 14:55:09.977427 7f81706f8780  1 -- 10.214.133.34:6789/0 mark_down_all
2012-06-20 14:55:09.977449 7f816b676700  1 -- 10.214.133.34:6789/0 <== mon.2 10.214.133.36:6789/0 1 ==== mon_probe(probe cc58a8e4-1465-49ff-8a47-02cf132fadba name a new) v2 ==== 74+0+0 (1965475612 0 0) 0x26fb000 con 0x2725280
2012-06-20 14:55:09.977494 7f81706f8780 10 mon.c@0(probing) e1 reset
2012-06-20 14:55:09.977514 7f81706f8780 10 mon.c@0(probing) e1 cancel_probe_timeout (none scheduled)
2012-06-20 14:55:09.977520 7f81706f8780 10 mon.c@0(probing) e1 reset_probe_timeout 0x2f7fcf0 after 2 seconds
2012-06-20 14:55:09.977585 7f81706f8780 10 mon.c@0(probing) e1 probing other monitors
2012-06-20 14:55:09.977592 7f81706f8780  1 -- 10.214.133.34:6789/0 --> mon.1 10.214.133.35:6789/0 -- mon_probe(probe cc58a8e4-1465-49ff-8a47-02cf132fadba name c new) v2 -- ?+0 0x2cd7900
2012-06-20 14:55:09.977682 7f81706f8780  1 -- 10.214.133.34:6789/0 --> mon.2 10.214.133.36:6789/0 -- mon_probe(probe cc58a8e4-1465-49ff-8a47-02cf132fadba name c new) v2 -- ?+0 0x26fbc00
2012-06-20 14:55:09.977909 7f816b676700 10 mon.c@0(probing) e1 do not have session, making new one
2012-06-20 14:55:09.977930 7f816b676700 10 mon.c@0(probing) e1 ms_dispatch new session MonSession: mon.2 10.214.133.36:6789/0 is open for mon.2 10.214.133.36:6789/0
2012-06-20 14:55:09.977984 7f816b676700  5 mon.c@0(probing) e1 setting monitor caps on this connection
2012-06-20 14:55:09.977989 7f816b676700 10 mon.c@0(probing) e1 handle_probe mon_probe(probe cc58a8e4-1465-49ff-8a47-02cf132fadba name a new) v2
2012-06-20 14:55:09.978010 7f816b676700 10 mon.c@0(probing) e1 handle_probe_probe mon.2 10.214.133.36:6789/0mon_probe(probe cc58a8e4-1465-49ff-8a47-02cf132fadba name a new) v2
2012-06-20 14:55:09.978033 7f816b676700  1 -- 10.214.133.34:6789/0 --> 10.214.133.36:6789/0 -- mon_probe(reply cc58a8e4-1465-49ff-8a47-02cf132fadba name c versions {auth=768,logm=186969,mdsmap=1,monmap=1,osdmap=3996,pgmap=674387} new) v2 -- ?+0 0x26fd000 con 0x2725280
2012-06-20 14:55:09.978200 7f816a674700  1 -- 10.214.133.34:6789/0 >> :/0 pipe(0x26ff280 sd=19 pgs=0 cs=0 l=0).accept sd=19
2012-06-20 14:55:09.978439 7f816b777700 10 mon.c@0(probing) e1 ms_get_authorizer for mon
2012-06-20 14:55:09.978623 7f81706f4700 10 mon.c@0(probing) e1 ms_get_authorizer for mon
2012-06-20 14:55:09.978884 7f816a573700 10 mon.c@0(probing) e1 ms_get_authorizer for mon
2012-06-20 14:55:09.979218 7f816a472700  0 -- 10.214.133.34:6789/0 >> 10.214.133.36:6789/0 pipe(0x26e1a00 sd=18 pgs=4 cs=1 l=0).fault initiating reconnect
2012-06-20 14:55:09.979735 7f816b777700 10 mon.c@0(probing) e1 ms_get_authorizer for mon
2012-06-20 14:55:09.980024 7f816a270700  0 -- 10.214.133.34:6789/0 >> 10.214.133.36:6789/0 pipe(0x26ff500 sd=20 pgs=5 cs=1 l=0).fault with nothing to send, going to standby
2012-06-20 14:55:09.980246 7f816b676700 -1 msg/SimpleMessenger.cc: In function 'void SimpleMessenger::Pipe::register_pipe()' thread 7f816b676700 time 2012-06-20 14:55:09.978281
msg/SimpleMessenger.cc: 1323: FAILED assert(msgr->rank_pipe.count(peer_addr) == 0)

 ceph version 0.47.2-521-g88c7629 (commit:88c7629e041699c25a7c91114bd1ac4ffc64c3eb)
 1: (SimpleMessenger::Pipe::register_pipe()+0x270) [0x5c2d90]
 2: (SimpleMessenger::connect_rank(entity_addr_t const&, int, Connection*)+0x782) [0x5c63b2]
 3: (SimpleMessenger::submit_message(Message*, Connection*, entity_addr_t const&, int, bool)+0x99b) [0x5d45cb]
 4: (SimpleMessenger::_send_message(Message*, Connection*, bool)+0x228) [0x5d4b08]
 5: (Monitor::handle_probe_probe(MMonProbe*)+0x542) [0x4826f2]
 6: (Monitor::handle_probe(MMonProbe*)+0x35b) [0x486a0b]
 7: (Monitor::_ms_dispatch(Message*)+0x1304) [0x487d44]
 8: (Monitor::ms_dispatch(Message*)+0x32) [0x4950b2]
 9: (SimpleMessenger::dispatch_entry()+0x92b) [0x5c782b]
 10: (SimpleMessenger::DispatchThread::entry()+0xd) [0x5992cd]
 11: (()+0x7e9a) [0x7f81702d0e9a]
 12: (clone()+0x6d) [0x7f816ea894bd]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- begin dump of recent events ---
   -52> 2012-06-20 14:55:09.752866 7f81706f8780  1 store(/srv/mon.c) mount
   -51> 2012-06-20 14:55:09.766425 7f81706f8780  0 ceph version 0.47.2-521-g88c7629 (commit:88c7629e041699c25a7c91114bd1ac4ffc64c3eb), process ceph-mon, pid 19110
   -50> 2012-06-20 14:55:09.766587 7f81706f8780  1 -- 10.214.133.34:6789/0 accepter.bind my_inst.addr is 10.214.133.34:6789/0 need_addr=0
   -49> 2012-06-20 14:55:09.767991 7f81706f8780  1 -- 10.214.133.34:6789/0 messenger.start
   -48> 2012-06-20 14:55:09.768087 7f81706f8780  1 -- 10.214.133.34:6789/0 accepter.start
   -47> 2012-06-20 14:55:09.768344 7f81706f8780  1 mon.c@-1(probing) e1 init fsid cc58a8e4-1465-49ff-8a47-02cf132fadba
   -46> 2012-06-20 14:55:09.768462 7f81706f8780 10 mon.c@-1(probing) e1 features compat={},rocompat={},incompat={1=initial feature set (~v.18)}
   -45> 2012-06-20 14:55:09.768507 7f81706f8780 10 mon.c@-1(probing) e1 has_ever_joined = 0
   -44> 2012-06-20 14:55:09.830962 7f81706f8780  7 mon.c@-1(probing).pg v0 update_from_paxos loading latest full pgmap v674387
   -43> 2012-06-20 14:55:09.901029 7f81706f8780 10 mon.c@-1(probing).pg v674387 send_pg_creates to 0 pgs
   -42> 2012-06-20 14:55:09.901058 7f81706f8780 10 mon.c@-1(probing).pg v674387 update_logger
   -41> 2012-06-20 14:55:09.907927 7f81706f8780 10 mon.c@-1(probing).mds e0 update_from_paxos paxosv 1, my e 0
   -40> 2012-06-20 14:55:09.907985 7f81706f8780 10 mon.c@-1(probing).mds e0 update_from_paxos  got 1
   -39> 2012-06-20 14:55:09.908012 7f81706f8780  4 mon.c@-1(probing).mds e1 new map
   -38> 2012-06-20 14:55:09.908014 7f81706f8780  7 mon.c@-1(probing).mds e1 print_map
epoch   1
flags   0
created 2012-05-22 06:30:16.233146
modified        2012-05-22 06:30:16.233184
tableserver     0
root    0
session_timeout 60
session_autoclose       300
last_failure    0
last_failure_osd_epoch  0
compat  compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object}
max_mds 1
in      
up      {}
failed  
stopped 
data_pools      [0]
metadata_pool   1

   -37> 2012-06-20 14:55:09.908072 7f81706f8780 10 mon.c@-1(probing).mds e1 update_logger
   -36> 2012-06-20 14:55:09.927544 7f81706f8780  7 mon.c@-1(probing).osd e0 update_from_paxos loading latest full map e3996
   -35> 2012-06-20 14:55:09.928506 7f81706f8780 10 mon.c@-1(probing).osd e3996 send_to_waiting 3996
   -34> 2012-06-20 14:55:09.928525 7f81706f8780 10 mon.c@-1(probing).osd e3996 update_logger
   -33> 2012-06-20 14:55:09.931520 7f81706f8780  7 mon.c@-1(probing).log v186969 update_from_paxos loading summary e186969
   -32> 2012-06-20 14:55:09.931700 7f81706f8780 10 mon.c@-1(probing).log v186969 check_subs
   -31> 2012-06-20 14:55:09.932790 7f81706f8780 10 mon.c@-1(probing).auth v768 update_from_paxos()
   -30> 2012-06-20 14:55:09.932835 7f81706f8780  7 mon.c@-1(probing).auth v768 update_from_paxos loading summary e768
   -29> 2012-06-20 14:55:09.933231 7f81706f8780 10 mon.c@-1(probing).auth v768 update_from_paxos() last_allocated_id=4796 max_global_id=4796
   -28> 2012-06-20 14:55:09.933428 7f81706f8780 -1 auth: error reading file: /srv/mon.c/keyring: can't open /srv/mon.c/keyring: (2) No such file or directory
   -27> 2012-06-20 14:55:09.933444 7f81706f8780  1 mon.c@-1(probing) e1 copying mon. key from old db to external keyring
   -26> 2012-06-20 14:55:09.934205 7f81706f4700  1 -- 10.214.133.34:6789/0 >> :/0 pipe(0x26e1000 sd=18 pgs=0 cs=0 l=0).accept sd=18
   -25> 2012-06-20 14:55:09.977401 7f81706f8780 10 mon.c@-1(probing) e1 bootstrap
   -24> 2012-06-20 14:55:09.977418 7f81706f8780 10 mon.c@-1(probing) e1 unregister_cluster_logger - not registered
   -23> 2012-06-20 14:55:09.977421 7f81706f8780 10 mon.c@-1(probing) e1 cancel_probe_timeout (none scheduled)
   -22> 2012-06-20 14:55:09.977424 7f81706f8780  0 mon.c@-1(probing) e1  my rank is now 0 (was -1)
   -21> 2012-06-20 14:55:09.977427 7f81706f8780  1 -- 10.214.133.34:6789/0 mark_down_all
   -20> 2012-06-20 14:55:09.977449 7f816b676700  1 -- 10.214.133.34:6789/0 <== mon.2 10.214.133.36:6789/0 1 ==== mon_probe(probe cc58a8e4-1465-49ff-8a47-02cf132fadba name a new) v2 ==== 74+0+0 (1965475612 0 0) 0x26fb000 con 0x2725280
   -19> 2012-06-20 14:55:09.977494 7f81706f8780 10 mon.c@0(probing) e1 reset
   -18> 2012-06-20 14:55:09.977514 7f81706f8780 10 mon.c@0(probing) e1 cancel_probe_timeout (none scheduled)
   -17> 2012-06-20 14:55:09.977520 7f81706f8780 10 mon.c@0(probing) e1 reset_probe_timeout 0x2f7fcf0 after 2 seconds
   -16> 2012-06-20 14:55:09.977585 7f81706f8780 10 mon.c@0(probing) e1 probing other monitors
   -15> 2012-06-20 14:55:09.977592 7f81706f8780  1 -- 10.214.133.34:6789/0 --> mon.1 10.214.133.35:6789/0 -- mon_probe(probe cc58a8e4-1465-49ff-8a47-02cf132fadba name c new) v2 -- ?+0 0x2cd7900
   -14> 2012-06-20 14:55:09.977682 7f81706f8780  1 -- 10.214.133.34:6789/0 --> mon.2 10.214.133.36:6789/0 -- mon_probe(probe cc58a8e4-1465-49ff-8a47-02cf132fadba name c new) v2 -- ?+0 0x26fbc00
   -13> 2012-06-20 14:55:09.977909 7f816b676700 10 mon.c@0(probing) e1 do not have session, making new one
   -12> 2012-06-20 14:55:09.977930 7f816b676700 10 mon.c@0(probing) e1 ms_dispatch new session MonSession: mon.2 10.214.133.36:6789/0 is open for mon.2 10.214.133.36:6789/0
   -11> 2012-06-20 14:55:09.977984 7f816b676700  5 mon.c@0(probing) e1 setting monitor caps on this connection
   -10> 2012-06-20 14:55:09.977989 7f816b676700 10 mon.c@0(probing) e1 handle_probe mon_probe(probe cc58a8e4-1465-49ff-8a47-02cf132fadba name a new) v2
    -9> 2012-06-20 14:55:09.978010 7f816b676700 10 mon.c@0(probing) e1 handle_probe_probe mon.2 10.214.133.36:6789/0mon_probe(probe cc58a8e4-1465-49ff-8a47-02cf132fadba name a new) v2
    -8> 2012-06-20 14:55:09.978033 7f816b676700  1 -- 10.214.133.34:6789/0 --> 10.214.133.36:6789/0 -- mon_probe(reply cc58a8e4-1465-49ff-8a47-02cf132fadba name c versions {auth=768,logm=186969,mdsmap=1,monmap=1,osdmap=3996,pgmap=674387} new) v2 -- ?+0 0x26fd000 con 0x2725280
    -7> 2012-06-20 14:55:09.978200 7f816a674700  1 -- 10.214.133.34:6789/0 >> :/0 pipe(0x26ff280 sd=19 pgs=0 cs=0 l=0).accept sd=19
    -6> 2012-06-20 14:55:09.978439 7f816b777700 10 mon.c@0(probing) e1 ms_get_authorizer for mon
    -5> 2012-06-20 14:55:09.978623 7f81706f4700 10 mon.c@0(probing) e1 ms_get_authorizer for mon
    -4> 2012-06-20 14:55:09.978884 7f816a573700 10 mon.c@0(probing) e1 ms_get_authorizer for mon
    -3> 2012-06-20 14:55:09.979218 7f816a472700  0 -- 10.214.133.34:6789/0 >> 10.214.133.36:6789/0 pipe(0x26e1a00 sd=18 pgs=4 cs=1 l=0).fault initiating reconnect
    -2> 2012-06-20 14:55:09.979735 7f816b777700 10 mon.c@0(probing) e1 ms_get_authorizer for mon
    -1> 2012-06-20 14:55:09.980024 7f816a270700  0 -- 10.214.133.34:6789/0 >> 10.214.133.36:6789/0 pipe(0x26ff500 sd=20 pgs=5 cs=1 l=0).fault with nothing to send, going to standby
     0> 2012-06-20 14:55:09.980246 7f816b676700 -1 msg/SimpleMessenger.cc: In function 'void SimpleMessenger::Pipe::register_pipe()' thread 7f816b676700 time 2012-06-20 14:55:09.978281
msg/SimpleMessenger.cc: 1323: FAILED assert(msgr->rank_pipe.count(peer_addr) == 0)

 ceph version 0.47.2-521-g88c7629 (commit:88c7629e041699c25a7c91114bd1ac4ffc64c3eb)
 1: (SimpleMessenger::Pipe::register_pipe()+0x270) [0x5c2d90]
 2: (SimpleMessenger::connect_rank(entity_addr_t const&, int, Connection*)+0x782) [0x5c63b2]
 3: (SimpleMessenger::submit_message(Message*, Connection*, entity_addr_t const&, int, bool)+0x99b) [0x5d45cb]
 4: (SimpleMessenger::_send_message(Message*, Connection*, bool)+0x228) [0x5d4b08]
 5: (Monitor::handle_probe_probe(MMonProbe*)+0x542) [0x4826f2]
 6: (Monitor::handle_probe(MMonProbe*)+0x35b) [0x486a0b]
 7: (Monitor::_ms_dispatch(Message*)+0x1304) [0x487d44]
 8: (Monitor::ms_dispatch(Message*)+0x32) [0x4950b2]
 9: (SimpleMessenger::dispatch_entry()+0x92b) [0x5c782b]
 10: (SimpleMessenger::DispatchThread::entry()+0xd) [0x5992cd]
 11: (()+0x7e9a) [0x7f81702d0e9a]
 12: (clone()+0x6d) [0x7f816ea894bd]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- end dump of recent events ---
2012-06-20 14:55:09.983303 7f816b676700 -1 *** Caught signal (Aborted) **
 in thread 7f816b676700

 ceph version 0.47.2-521-g88c7629 (commit:88c7629e041699c25a7c91114bd1ac4ffc64c3eb)
 1: /usr/bin/ceph-mon() [0x52da0a]
 2: (()+0xfcb0) [0x7f81702d8cb0]
 3: (gsignal()+0x35) [0x7f816e9cd445]
 4: (abort()+0x17b) [0x7f816e9d0bab]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f816f31b69d]
 6: (()+0xb5846) [0x7f816f319846]
 7: (()+0xb5873) [0x7f816f319873]
 8: (()+0xb596e) [0x7f816f31996e]
 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x282) [0x5e4f62]
 10: (SimpleMessenger::Pipe::register_pipe()+0x270) [0x5c2d90]
 11: (SimpleMessenger::connect_rank(entity_addr_t const&, int, Connection*)+0x782) [0x5c63b2]
 12: (SimpleMessenger::submit_message(Message*, Connection*, entity_addr_t const&, int, bool)+0x99b) [0x5d45cb]
 13: (SimpleMessenger::_send_message(Message*, Connection*, bool)+0x228) [0x5d4b08]
 14: (Monitor::handle_probe_probe(MMonProbe*)+0x542) [0x4826f2]
 15: (Monitor::handle_probe(MMonProbe*)+0x35b) [0x486a0b]
 16: (Monitor::_ms_dispatch(Message*)+0x1304) [0x487d44]
 17: (Monitor::ms_dispatch(Message*)+0x32) [0x4950b2]
 18: (SimpleMessenger::dispatch_entry()+0x92b) [0x5c782b]
 19: (SimpleMessenger::DispatchThread::entry()+0xd) [0x5992cd]
 20: (()+0x7e9a) [0x7f81702d0e9a]
 21: (clone()+0x6d) [0x7f816ea894bd]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- begin dump of recent events ---
     0> 2012-06-20 14:55:09.983303 7f816b676700 -1 *** Caught signal (Aborted) **
 in thread 7f816b676700

 ceph version 0.47.2-521-g88c7629 (commit:88c7629e041699c25a7c91114bd1ac4ffc64c3eb)
 1: /usr/bin/ceph-mon() [0x52da0a]
 2: (()+0xfcb0) [0x7f81702d8cb0]
 3: (gsignal()+0x35) [0x7f816e9cd445]
 4: (abort()+0x17b) [0x7f816e9d0bab]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f816f31b69d]
 6: (()+0xb5846) [0x7f816f319846]
 7: (()+0xb5873) [0x7f816f319873]
 8: (()+0xb596e) [0x7f816f31996e]
 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x282) [0x5e4f62]
 10: (SimpleMessenger::Pipe::register_pipe()+0x270) [0x5c2d90]
 11: (SimpleMessenger::connect_rank(entity_addr_t const&, int, Connection*)+0x782) [0x5c63b2]
 12: (SimpleMessenger::submit_message(Message*, Connection*, entity_addr_t const&, int, bool)+0x99b) [0x5d45cb]
 13: (SimpleMessenger::_send_message(Message*, Connection*, bool)+0x228) [0x5d4b08]
 14: (Monitor::handle_probe_probe(MMonProbe*)+0x542) [0x4826f2]
 15: (Monitor::handle_probe(MMonProbe*)+0x35b) [0x486a0b]
 16: (Monitor::_ms_dispatch(Message*)+0x1304) [0x487d44]
 17: (Monitor::ms_dispatch(Message*)+0x32) [0x4950b2]
 18: (SimpleMessenger::dispatch_entry()+0x92b) [0x5c782b]
 19: (SimpleMessenger::DispatchThread::entry()+0xd) [0x5992cd]
 20: (()+0x7e9a) [0x7f81702d0e9a]
 21: (clone()+0x6d) [0x7f816ea894bd]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- end dump of recent events ---

Actions #4

Updated by Sage Weil almost 12 years ago

  • Status changed from Need More Info to 7
  • Assignee set to Sage Weil

fix for this is in wip-msgr, still testing

Actions #5

Updated by Sage Weil almost 12 years ago

i've merged fix for this into master, commit:204bc594be1a6046d1b362693d086b49294c2a27 (with possible side-effects from surrounding commits).

if all goes well, it'll get merged into argonaut/stable/next as well.

Actions #6

Updated by Sage Weil almost 12 years ago

  • Status changed from 7 to Resolved
Actions #7

Updated by Tamilarasi muthamizhan over 11 years ago

hit this on a mixed cluster running argonaut v0.48.3 and v0.56 [ ceph version 0.56-193-g00898c1]

monitors,mds,osds were on argonaut while only osd.4 was on v0.56

ubuntu@burnupi24:~$ sudo cat /etc/ceph/ceph.conf
[global]
auth client required = none
auth cluster required = none
auth service required = none
  1. debug objclass = 20
  2. debug librbd = 20
  3. debug rbd = 20
  4. debug ms = 10

[client]
log file = /var/log/ceph/client.admin.log
debug client = 20

[osd]
osd journal size = 1000
filestore xattr use omap = true
debug osd = 20
debug ms = 1

[osd.1]
host = burnupi21

[osd.2]
host = burnupi22

[osd.3]
host = burnupi23

[osd.4]
host = burnupi24
osd min pg log entries = 10

[mon.a]
host = burnupi21
mon addr = 10.214.134.10:6789

[mon.b]
host = burnupi22
mon addr = 10.214.134.8:6789

[mon.c]
host = burnupi23
mon addr = 10.214.134.6:6789

[mds.a]
host = burnupi21

0> 2013-01-10 16:07:20.715212 7faef6e22700 1 msg/SimpleMessenger.cc: In function 'void SimpleMessenger::Pipe::register_pipe()' thread 7faef6e22700 time 2013-01-10 16:07:20.097488
msg/SimpleMessenger.cc: 1297: FAILED assert(msgr
>rank_pipe.count(peer_addr) == 0)
ceph version 0.48.3argonaut (commit:920f82e805efec2cae05b79c155c07df0f3ed5dd)
1: (SimpleMessenger::Pipe::register_pipe()+0x2cf) [0x5bae2f]
2: (SimpleMessenger::connect_rank(entity_addr_t const&, int, Connection*)+0x316) [0x5bec86]
3: (SimpleMessenger::submit_message(Message*, Connection*, entity_addr_t const&, int, bool)+0xb13) [0x5cdcd3]
4: (SimpleMessenger::_send_message(Message*, Connection*, bool)+0x280) [0x5ce290]
5: (Monitor::handle_probe_probe(MMonProbe*)+0x542) [0x477a72]
6: (Monitor::handle_probe(MMonProbe*)+0x35b) [0x47bebb]
7: (Monitor::_ms_dispatch(Message*)+0x1304) [0x47d1f4]
8: (Monitor::ms_dispatch(Message*)+0x32) [0x48aa72]
9: (SimpleMessenger::DispatchQueue::entry()+0x903) [0x5bfcf3]
10: (SimpleMessenger::dispatch_entry()+0x24) [0x5c0a94]
11: (SimpleMessenger::DispatchThread::entry()+0xd) [0x58fb3d]
12: (()+0x7e9a) [0x7faefc70ae9a]
13: (clone()+0x6d) [0x7faefb5abcbd]
NOTE: a copy of the executable, or `objdump -rdS &lt;executable&gt;` is needed to interpret this.

--- end dump of recent events ---

Actions #8

Updated by Greg Farnum over 11 years ago

I believe this was caused by some issues which we decided not to backport the fixes for due to their size; Sage can confirm or correct.

Actions #9

Updated by Tamilarasi muthamizhan over 11 years ago

yes, you are right, Greg. I just wanted to put a note of this somewhere, so chose to update the bug itself :)

Actions #10

Updated by Tamilarasi muthamizhan almost 11 years ago

  • Status changed from Resolved to In Progress
  • Priority changed from Urgent to High

hit this on burnupi39 on argonaut branch, when trying to run upgrade test from argonaut to cuttlefish.

Actions #11

Updated by Tamilarasi muthamizhan almost 11 years ago

2013-05-01 11:39:20.698027 7fefeaa9f700 1 msg/SimpleMessenger.cc: In function 'void SimpleMessenger::Pipe::register_pipe()' thread 7fefeaa9f700 time 2013-05-01 11:39:20.694735
msg/SimpleMessenger.cc: 1297: FAILED assert(msgr
>rank_pipe.count(peer_addr) == 0)

ceph version 0.48.3argonaut-22-g2382d9b (commit:2382d9b7c0a283d0cab6188c92e5cf970b713f8f)
1: (SimpleMessenger::Pipe::register_pipe()+0x2cf) [0x5c783f]
2: (SimpleMessenger::connect_rank(entity_addr_t const&, int, Connection*)+0x316) [0x5cb696]
3: (SimpleMessenger::submit_message(Message*, Connection*, entity_addr_t const&, int, bool)+0xb13) [0x5da6e3]
4: (SimpleMessenger::_send_message(Message*, Connection*, bool)+0x280) [0x5daca0]
5: (Monitor::handle_probe_probe(MMonProbe*)+0x542) [0x482fd2]
6: (Monitor::handle_probe(MMonProbe*)+0x35b) [0x48741b]
7: (Monitor::_ms_dispatch(Message*)+0x1304) [0x488754]
8: (Monitor::ms_dispatch(Message*)+0x32) [0x495fd2]
9: (SimpleMessenger::DispatchQueue::entry()+0x903) [0x5cc703]
10: (SimpleMessenger::dispatch_entry()+0x24) [0x5cd4a4]
11: (SimpleMessenger::DispatchThread::entry()+0xd) [0x59c54d]
12: (()+0x7e9a) [0x7fefef6fde9a]
13: (clone()+0x6d) [0x7fefedeb3ccd]
NOTE: a copy of the executable, or `objdump -rdS &lt;executable&gt;` is needed to interpret this.

--- begin dump of recent events ---

Actions #12

Updated by Tamilarasi muthamizhan almost 11 years ago

re-pasting it

2013-05-01 11:39:20.698027 7fefeaa9f700 -1 msg/SimpleMessenger.cc: In function 'void SimpleMessenger::Pipe::register_pipe()' thread 7fefeaa9f700 time 2013-05-01 11:39:20.694735
msg/SimpleMessenger.cc: 1297: FAILED assert(msgr->rank_pipe.count(peer_addr) == 0)

 ceph version 0.48.3argonaut-22-g2382d9b (commit:2382d9b7c0a283d0cab6188c92e5cf970b713f8f)
 1: (SimpleMessenger::Pipe::register_pipe()+0x2cf) [0x5c783f]
 2: (SimpleMessenger::connect_rank(entity_addr_t const&, int, Connection*)+0x316) [0x5cb696]
 3: (SimpleMessenger::submit_message(Message*, Connection*, entity_addr_t const&, int, bool)+0xb13) [0x5da6e3]
 4: (SimpleMessenger::_send_message(Message*, Connection*, bool)+0x280) [0x5daca0]
 5: (Monitor::handle_probe_probe(MMonProbe*)+0x542) [0x482fd2]
 6: (Monitor::handle_probe(MMonProbe*)+0x35b) [0x48741b]
 7: (Monitor::_ms_dispatch(Message*)+0x1304) [0x488754]
 8: (Monitor::ms_dispatch(Message*)+0x32) [0x495fd2]
 9: (SimpleMessenger::DispatchQueue::entry()+0x903) [0x5cc703]
 10: (SimpleMessenger::dispatch_entry()+0x24) [0x5cd4a4]
 11: (SimpleMessenger::DispatchThread::entry()+0xd) [0x59c54d]
 12: (()+0x7e9a) [0x7fefef6fde9a]
 13: (clone()+0x6d) [0x7fefedeb3ccd]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- begin dump of recent events ---

Actions #13

Updated by Sage Weil almost 11 years ago

  • Status changed from In Progress to Resolved

this is a known problem in argonaut that we aren't going to backport the fix for.

Actions #14

Updated by Greg Farnum about 5 years ago

  • Project changed from Ceph to Messengers
  • Category deleted (msgr)
Actions

Also available in: Atom PDF