Project

General

Profile

Actions

Bug #1131

closed

OSD assert failure in update_heartbeat_peers()

Added by Sam Lang almost 13 years ago. Updated almost 13 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
OSD
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

I'm not sure I can reproduce it, because my system state is a bit out of whack due to a previous bug (#1130), but I've been seeing failures of OSDs that appear to fail at random times. This is the log of one of the osds. I can try to provide more info if needed. I'm using the stable branch (7330c3c473aa128b1e3ecb8752278f655bc79620).

2011-06-02 09:11:27.439711 7fc5480b6740 journal _open /data/ceph/osd.1/journal fd 11: 1048576000 bytes, block size 4096 bytes, directio = 1
2011-06-02 09:11:36.583793 7fc537647700 -- 0.0.0.0:6802/30627 >> 192.168.60.135:6801/32009 pipe(0x7fc530071cf0 sd=17 pgs=0 cs=0 l=0).connect claims to be 0.0.0.0:6801/6788 not 192.168.60.135:6801/32009 - wrong node!
2011-06-02 09:11:36.583961 7fc537647700 -- 0.0.0.0:6802/30627 >> 192.168.60.135:6801/32009 pipe(0x7fc530071cf0 sd=17 pgs=0 cs=0 l=0).fault first fault
2011-06-02 09:11:36.584179 7fc53a250700 -- 192.168.60.109:6802/30627 >> 192.168.60.104:6802/22836 pipe(0x7fc530070950 sd=14 pgs=0 cs=0 l=0).connect claims to be 0.0.0.0:6802/22836 not 192.168.60.104:6802/22836 - presumably this is the same node!
2011-06-02 09:11:36.584252 7fc537748700 -- 192.168.60.109:6802/30627 >> 192.168.60.132:6801/22996 pipe(0x7fc530071590 sd=16 pgs=0 cs=0 l=0).connect claims to be 0.0.0.0:6801/30784 not 192.168.60.132:6801/22996 - wrong node!
2011-06-02 09:11:36.584276 7fc537748700 -- 192.168.60.109:6802/30627 >> 192.168.60.132:6801/22996 pipe(0x7fc530071590 sd=16 pgs=0 cs=0 l=0).fault first fault
2011-06-02 09:11:36.584464 7fc53a14f700 -- 192.168.60.109:6802/30627 >> 192.168.60.134:6802/2299 pipe(0x7fc530070370 sd=13 pgs=0 cs=0 l=0).connect claims to be 0.0.0.0:6802/4059 not 192.168.60.134:6802/2299 - wrong node!
2011-06-02 09:11:36.584488 7fc53a14f700 -- 192.168.60.109:6802/30627 >> 192.168.60.134:6802/2299 pipe(0x7fc530070370 sd=13 pgs=0 cs=0 l=0).fault first fault
2011-06-02 09:11:36.584559 7fc537546700 -- 0.0.0.0:6803/30627 >> 192.168.60.134:6803/2299 pipe(0x7fc530072c00 sd=18 pgs=0 cs=0 l=0).connect claims to be 0.0.0.0:6803/4059 not 192.168.60.134:6803/2299 - wrong node!
2011-06-02 09:11:36.584581 7fc537546700 -- 0.0.0.0:6803/30627 >> 192.168.60.134:6803/2299 pipe(0x7fc530072c00 sd=18 pgs=0 cs=0 l=0).fault first fault
2011-06-02 09:11:36.584710 7fc537445700 -- 0.0.0.0:6803/30627 >> 192.168.60.135:6802/32009 pipe(0x7fc530073470 sd=19 pgs=0 cs=0 l=0).connect claims to be 0.0.0.0:6802/6788 not 192.168.60.135:6802/32009 - wrong node!
2011-06-02 09:11:36.584733 7fc537445700 -- 0.0.0.0:6803/30627 >> 192.168.60.135:6802/32009 pipe(0x7fc530073470 sd=19 pgs=0 cs=0 l=0).fault first fault
2011-06-02 09:11:36.585247 7fc537748700 -- 192.168.60.109:6802/30627 >> 192.168.60.132:6801/22996 pipe(0x7fc530071590 sd=16 pgs=0 cs=0 l=0).connect claims to be 0.0.0.0:6801/30784 not 192.168.60.132:6801/22996 - wrong node!
2011-06-02 09:11:36.585422 7fc537243700 -- 192.168.60.109:6803/30627 >> 192.168.60.132:6802/22996 pipe(0x7fc53009ac20 sd=21 pgs=0 cs=0 l=0).connect claims to be 0.0.0.0:6802/30784 not 192.168.60.132:6802/22996 - wrong node!
2011-06-02 09:11:36.585445 7fc537243700 -- 192.168.60.109:6803/30627 >> 192.168.60.132:6802/22996 pipe(0x7fc53009ac20 sd=21 pgs=0 cs=0 l=0).fault first fault
2011-06-02 09:11:36.585510 7fc537445700 -- 192.168.60.109:6803/30627 >> 192.168.60.135:6802/32009 pipe(0x7fc530073470 sd=19 pgs=0 cs=0 l=0).connect claims to be 0.0.0.0:6802/6788 not 192.168.60.135:6802/32009 - wrong node!
2011-06-02 09:11:36.585918 7fc537546700 -- 192.168.60.109:6803/30627 >> 192.168.60.134:6803/2299 pipe(0x7fc530072c00 sd=18 pgs=0 cs=0 l=0).connect claims to be 0.0.0.0:6803/4059 not 192.168.60.134:6803/2299 - wrong node!
2011-06-02 09:11:36.585976 7fc53a14f700 -- 192.168.60.109:6802/30627 >> 192.168.60.134:6802/2299 pipe(0x7fc530070370 sd=13 pgs=0 cs=0 l=0).connect claims to be 0.0.0.0:6802/4059 not 192.168.60.134:6802/2299 - wrong node!
2011-06-02 09:11:36.586032 7fc537647700 -- 192.168.60.109:6802/30627 >> 192.168.60.135:6801/32009 pipe(0x7fc530071cf0 sd=17 pgs=0 cs=0 l=0).connect claims to be 0.0.0.0:6801/6788 not 192.168.60.135:6801/32009 - wrong node!
2011-06-02 09:11:36.586100 7fc537142700 -- 192.168.60.109:6803/30627 >> 192.168.60.104:6803/22836 pipe(0x7fc53009b490 sd=22 pgs=0 cs=0 l=0).connect claims to be 0.0.0.0:6803/22836 not 192.168.60.104:6803/22836 - presumably this is the same node!
2011-06-02 09:11:36.587717 7fc537243700 -- 192.168.60.109:6803/30627 >> 192.168.60.132:6802/22996 pipe(0x7fc53009ac20 sd=21 pgs=0 cs=0 l=0).connect claims to be 0.0.0.0:6802/30784 not 192.168.60.132:6802/22996 - wrong node!
2011-06-02 09:11:36.786072 7fc537748700 -- 192.168.60.109:6802/30627 >> 192.168.60.132:6801/22996 pipe(0x7fc530071590 sd=16 pgs=0 cs=0 l=0).connect claims to be 0.0.0.0:6801/30784 not 192.168.60.132:6801/22996 - wrong node!
2011-06-02 09:11:36.786562 7fc537445700 -- 192.168.60.109:6803/30627 >> 192.168.60.135:6802/32009 pipe(0x7fc530073470 sd=19 pgs=0 cs=0 l=0).connect claims to be 0.0.0.0:6802/6788 not 192.168.60.135:6802/32009 - wrong node!
2011-06-02 09:11:36.787054 7fc53a14f700 -- 192.168.60.109:6802/30627 >> 192.168.60.134:6802/2299 pipe(0x7fc530070370 sd=13 pgs=0 cs=0 l=0).connect claims to be 0.0.0.0:6802/4059 not 192.168.60.134:6802/2299 - wrong node!
2011-06-02 09:11:36.787092 7fc537647700 -- 192.168.60.109:6802/30627 >> 192.168.60.135:6801/32009 pipe(0x7fc530071cf0 sd=17 pgs=0 cs=0 l=0).connect claims to be 0.0.0.0:6801/6788 not 192.168.60.135:6801/32009 - wrong node!
2011-06-02 09:11:36.787127 7fc537546700 -- 192.168.60.109:6803/30627 >> 192.168.60.134:6803/2299 pipe(0x7fc530072c00 sd=18 pgs=0 cs=0 l=0).connect claims to be 0.0.0.0:6803/4059 not 192.168.60.134:6803/2299 - wrong node!
2011-06-02 09:11:36.788709 7fc537243700 -- 192.168.60.109:6803/30627 >> 192.168.60.132:6802/22996 pipe(0x7fc53009ac20 sd=21 pgs=0 cs=0 l=0).connect claims to be 0.0.0.0:6802/30784 not 192.168.60.132:6802/22996 - wrong node!
2011-06-02 09:11:37.186862 7fc537748700 -- 192.168.60.109:6802/30627 >> 192.168.60.132:6801/22996 pipe(0x7fc530071590 sd=16 pgs=0 cs=0 l=0).connect claims to be 0.0.0.0:6801/30784 not 192.168.60.132:6801/22996 - wrong node!
2011-06-02 09:11:37.187662 7fc537445700 -- 192.168.60.109:6803/30627 >> 192.168.60.135:6802/32009 pipe(0x7fc530073470 sd=19 pgs=0 cs=0 l=0).connect claims to be 0.0.0.0:6802/6788 not 192.168.60.135:6802/32009 - wrong node!
2011-06-02 09:11:37.188188 7fc537647700 -- 192.168.60.109:6802/30627 >> 192.168.60.135:6801/32009 pipe(0x7fc530071cf0 sd=18 pgs=0 cs=0 l=0).connect claims to be 0.0.0.0:6801/6788 not 192.168.60.135:6801/32009 - wrong node!
2011-06-02 09:11:37.188225 7fc53a14f700 -- 192.168.60.109:6802/30627 >> 192.168.60.134:6802/2299 pipe(0x7fc530070370 sd=13 pgs=0 cs=0 l=0).connect claims to be 0.0.0.0:6802/4059 not 192.168.60.134:6802/2299 - wrong node!
2011-06-02 09:11:37.188260 7fc537546700 -- 192.168.60.109:6803/30627 >> 192.168.60.134:6803/2299 pipe(0x7fc530072c00 sd=17 pgs=0 cs=0 l=0).connect claims to be 0.0.0.0:6803/4059 not 192.168.60.134:6803/2299 - wrong node!
2011-06-02 09:11:37.189796 7fc537243700 -- 192.168.60.109:6803/30627 >> 192.168.60.132:6802/22996 pipe(0x7fc53009ac20 sd=21 pgs=0 cs=0 l=0).connect claims to be 0.0.0.0:6802/30784 not 192.168.60.132:6802/22996 - wrong node!
2011-06-02 09:11:37.310139 7fc537546700 -- 192.168.60.109:6803/30627 >> 192.168.60.134:6803/2299 pipe(0x7fc530072c00 sd=17 pgs=0 cs=0 l=0).connect claims to be 0.0.0.0:6803/4059 not 192.168.60.134:6803/2299 - wrong node!
2011-06-02 09:11:37.310184 7fc537243700 -- 192.168.60.109:6803/30627 >> 192.168.60.132:6802/22996 pipe(0x7fc53009ac20 sd=21 pgs=0 cs=0 l=0).connect claims to be 0.0.0.0:6802/30784 not 192.168.60.132:6802/22996 - wrong node!
2011-06-02 09:11:37.310233 7fc537445700 -- 192.168.60.109:6803/30627 >> 192.168.60.135:6802/32009 pipe(0x7fc530073470 sd=19 pgs=0 cs=0 l=0).connect claims to be 0.0.0.0:6802/6788 not 192.168.60.135:6802/32009 - wrong node!
2011-06-02 09:11:37.798154 7fc537748700 -- 192.168.60.109:6802/30627 >> 192.168.60.132:6801/22996 pipe(0x7fc530071590 sd=13 pgs=0 cs=0 l=0).connect claims to be 0.0.0.0:6801/30784 not 192.168.60.132:6801/22996 - wrong node!
2011-06-02 09:11:37.799406 7fc537647700 -- 192.168.60.109:6802/30627 >> 192.168.60.135:6801/32009 pipe(0x7fc530071cf0 sd=16 pgs=0 cs=0 l=0).connect claims to be 0.0.0.0:6801/6788 not 192.168.60.135:6801/32009 - wrong node!
2011-06-02 09:11:37.801684 7fc537546700 -- 192.168.60.109:6803/30627 >> 192.168.60.134:6803/2299 pipe(0x7fc530072c00 sd=17 pgs=0 cs=0 l=1).connect claims to be 0.0.0.0:6803/4059 not 192.168.60.134:6803/2299 - wrong node!
2011-06-02 09:11:38.610327 7fc537243700 -- 192.168.60.109:6803/30627 >> 192.168.60.132:6802/22996 pipe(0x7fc53009ac20 sd=18 pgs=0 cs=0 l=0).connect claims to be 0.0.0.0:6802/30784 not 192.168.60.132:6802/22996 - wrong node!
2011-06-02 09:11:38.610380 7fc537445700 -- 192.168.60.109:6803/30627 >> 192.168.60.135:6802/32009 pipe(0x7fc530073470 sd=17 pgs=0 cs=0 l=0).connect claims to be 0.0.0.0:6802/6788 not 192.168.60.135:6802/32009 - wrong node!
2011-06-02 09:11:39.398924 7fc537748700 -- 192.168.60.109:6802/30627 >> 192.168.60.132:6801/22996 pipe(0x7fc530071590 sd=13 pgs=0 cs=0 l=0).connect claims to be 0.0.0.0:6801/30784 not 192.168.60.132:6801/22996 - wrong node!
2011-06-02 09:11:39.400342 7fc537647700 -- 192.168.60.109:6802/30627 >> 192.168.60.135:6801/32009 pipe(0x7fc530071cf0 sd=16 pgs=0 cs=0 l=0).connect claims to be 0.0.0.0:6801/6788 not 192.168.60.135:6801/32009 - wrong node!
2011-06-02 09:11:39.410529 7fc537243700 -- 192.168.60.109:6803/30627 >> 192.168.60.132:6802/22996 pipe(0x7fc53009ac20 sd=18 pgs=0 cs=0 l=0).connect claims to be 0.0.0.0:6802/30784 not 192.168.60.132:6802/22996 - wrong node!
2011-06-02 09:11:39.410580 7fc537445700 -- 192.168.60.109:6803/30627 >> 192.168.60.135:6802/32009 pipe(0x7fc530073470 sd=17 pgs=0 cs=0 l=0).connect claims to be 0.0.0.0:6802/6788 not 192.168.60.135:6802/32009 - wrong node!
2011-06-02 09:11:40.610750 7fc537243700 -- 192.168.60.109:6803/30627 >> 192.168.60.132:6802/22996 pipe(0x7fc53009ac20 sd=18 pgs=0 cs=0 l=0).connect claims to be 0.0.0.0:6802/30784 not 192.168.60.132:6802/22996 - wrong node!
2011-06-02 09:11:40.610818 7fc537445700 -- 192.168.60.109:6803/30627 >> 192.168.60.135:6802/32009 pipe(0x7fc530073470 sd=17 pgs=0 cs=0 l=0).connect claims to be 0.0.0.0:6802/6788 not 192.168.60.135:6802/32009 - wrong node!
2011-06-02 09:11:42.010743 7fc537445700 -- 192.168.60.109:6803/30627 >> 192.168.60.135:6802/32009 pipe(0x7fc530073470 sd=17 pgs=0 cs=0 l=0).connect claims to be 0.0.0.0:6802/6788 not 192.168.60.135:6802/32009 - wrong node!
2011-06-02 09:11:42.010954 7fc537243700 -- 192.168.60.109:6803/30627 >> 192.168.60.132:6802/22996 pipe(0x7fc53009ac20 sd=18 pgs=0 cs=0 l=0).connect claims to be 0.0.0.0:6802/30784 not 192.168.60.132:6802/22996 - wrong node!
2011-06-02 09:11:42.599926 7fc537748700 -- 192.168.60.109:6802/30627 >> 192.168.60.132:6801/22996 pipe(0x7fc530071590 sd=13 pgs=0 cs=0 l=0).connect claims to be 192.168.60.132:6801/30784 not 192.168.60.132:6801/22996 - wrong node!
2011-06-02 09:11:42.601313 7fc537647700 -- 192.168.60.109:6802/30627 >> 192.168.60.135:6801/32009 pipe(0x7fc530071cf0 sd=16 pgs=0 cs=0 l=0).connect claims to be 0.0.0.0:6801/6788 not 192.168.60.135:6801/32009 - wrong node!
../../src/osd/OSD.cc: In function 'void OSD::update_heartbeat_peers()', in thread '0x7fc53c456700'
../../src/osd/OSD.cc: 1484: FAILED assert(p->second <= osdmap->get_epoch())
ceph version (commit:)
1: (OSD::update_heartbeat_peers()+0x2ac7) [0x52a577]
2: (OSD::activate_map(ObjectStore::Transaction&, std::list<Context*, std::allocator<Context*> >&)+0x2f2) [0x52a982]
3: (OSD::handle_osd_map(MOSDMap*)+0x239e) [0x53a9ce]
4: (OSD::_dispatch(Message*)+0x2b8) [0x53d1f8]
5: (OSD::ms_dispatch(Message*)+0xbc) [0x53e09c]
6: (SimpleMessenger::dispatch_entry()+0x667) [0x4a6a57]
7: (SimpleMessenger::DispatchThread::entry()+0x1c) [0x49882c]
8: (()+0x6d8c) [0x7fc547717d8c]
9: (clone()+0x6d) [0x7fc546abe04d]
ceph version (commit:)
1: (OSD::update_heartbeat_peers()+0x2ac7) [0x52a577]
2: (OSD::activate_map(ObjectStore::Transaction&, std::list<Context*, std::allocator<Context*> >&)+0x2f2) [0x52a982]
3: (OSD::handle_osd_map(MOSDMap*)+0x239e) [0x53a9ce]
4: (OSD::_dispatch(Message*)+0x2b8) [0x53d1f8]
5: (OSD::ms_dispatch(Message*)+0xbc) [0x53e09c]
6: (SimpleMessenger::dispatch_entry()+0x667) [0x4a6a57]
7: (SimpleMessenger::DispatchThread::entry()+0x1c) [0x49882c]
8: (()+0x6d8c) [0x7fc547717d8c]
9: (clone()+0x6d) [0x7fc546abe04d]
2011-06-02 09:11:43.214656 7fc537243700 -- 192.168.60.109:6803/30627 >> 192.168.60.132:6802/22996 pipe(0x7fc53009ac20 sd=18 pgs=0 cs=0 l=0).connect claims to be 192.168.60.132:6802/30784 not 192.168.60.132:6802/22996 - wrong node!
2011-06-02 09:11:43.214696 7fc537647700 -- 192.168.60.109:6802/30627 >> 192.168.60.135:6801/32009 pipe(0x7fc530071cf0 sd=16 pgs=0 cs=0 l=0).connect claims to be 0.0.0.0:6801/6788 not 192.168.60.135:6801/32009 - wrong node!
2011-06-02 09:11:43.214727 7fc537748700 -- 192.168.60.109:6802/30627 >> 192.168.60.132:6801/22996 pipe(0x7fc530071590 sd=13 pgs=0 cs=0 l=0).connect claims to be 192.168.60.132:6801/30784 not 192.168.60.132:6801/22996 - wrong node!
  • Caught signal (Aborted) *
    in thread 0x7fc53c456700
    ceph version (commit:)
    1: /usr/ceph/bin/cosd() [0x665f39]
    2: (()+0xfc60) [0x7fc547720c60]
    3: (gsignal()+0x35) [0x7fc546a0bd05]
    4: (abort()+0x186) [0x7fc546a0fab6]
    5: (_gnu_cxx::_verbose_terminate_handler()+0x11d) [0x7fc5472c26dd]
    6: (()+0xb9926) [0x7fc5472c0926]
    7: (()+0xb9953) [0x7fc5472c0953]
    8: (()+0xb9a5e) [0x7fc5472c0a5e]
    9: (ceph::__ceph_assert_fail(char const
    , char const*, int, char const*)+0x362) [0x649362]
    10: (OSD::update_heartbeat_peers()+0x2ac7) [0x52a577]
    11: (OSD::activate_map(ObjectStore::Transaction&, std::list<Context*, std::allocator<Context*> >&)+0x2f2) [0x52a982]
    12: (OSD::handle_osd_map(MOSDMap*)+0x239e) [0x53a9ce]
    13: (OSD::_dispatch(Message*)+0x2b8) [0x53d1f8]
    14: (OSD::ms_dispatch(Message*)+0xbc) [0x53e09c]
    15: (SimpleMessenger::dispatch_entry()+0x667) [0x4a6a57]
    16: (SimpleMessenger::DispatchThread::entry()+0x1c) [0x49882c]
    17: (()+0x6d8c) [0x7fc547717d8c]
    18: (clone()+0x6d) [0x7fc546abe04d]
Actions #1

Updated by Samuel Just almost 13 years ago

  • Status changed from New to Resolved

Probably fixed in current stable: c5470e0f855b246cfbde6982ca90f565e7074600. Let us know if it persists!

Actions

Also available in: Atom PDF