Actions
Bug #12654
closedunable to bind to IP:PORT on any port in range 6800-7300: (98) Address already in us
Status:
Duplicate
Priority:
Urgent
Assignee:
-
Category:
OSD
Target version:
-
% Done:
0%
Source:
Community (user)
Tags:
OSD
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
Hi There
In my production ceph cluster several OSD's ( ~ 20 OSD ) crashed throwing this error
ceph-osd.34.log:2015-08-10 15:55:32.654670 7f792ea4c700 -1 accepter.accepter.bind unable to bind to 10.100.50.1:7300 on any port in range 6800-7300: (98) Address already in use
With this same error other OSD also crashed
[root@node-s01 ceph]# grep -i unable *.log ceph-osd.34.log:2015-08-10 15:55:32.654670 7f792ea4c700 -1 accepter.accepter.bind unable to bind to 10.100.50.1:7300 on any port in range 6800-7300: (98) Address already in use ceph-osd.34.log:2015-08-10 15:55:32.655310 7f792ea4c700 -1 accepter.accepter.bind unable to bind to 10.100.50.1:7300 on any port in range 6800-7300: (98) Address already in use ceph-osd.34.log:2015-08-10 15:55:32.655769 7f792ea4c700 -1 accepter.accepter.bind unable to bind to 10.100.50.1:7300 on any port in range 6800-7300: (98) Address already in use ceph-osd.34.log: -4> 2015-08-10 15:55:32.654670 7f792ea4c700 -1 accepter.accepter.bind unable to bind to 10.100.50.1:7300 on any port in range 6800-7300: (98) Address already in use ceph-osd.34.log: -3> 2015-08-10 15:55:32.655310 7f792ea4c700 -1 accepter.accepter.bind unable to bind to 10.100.50.1:7300 on any port in range 6800-7300: (98) Address already in use ceph-osd.34.log: -2> 2015-08-10 15:55:32.655769 7f792ea4c700 -1 accepter.accepter.bind unable to bind to 10.100.50.1:7300 on any port in range 6800-7300: (98) Address already in use ceph-osd.38.log:2015-08-10 15:55:31.149299 7f31a14bf700 -1 accepter.accepter.bind unable to bind to 10.100.50.1:7300 on any port in range 6800-7300: (98) Address already in use ceph-osd.38.log:2015-08-10 15:55:31.150890 7f31a14bf700 -1 accepter.accepter.bind unable to bind to 10.100.50.1:7300 on any port in range 6800-7300: (98) Address already in use ceph-osd.38.log:2015-08-10 15:55:31.154187 7f31a14bf700 -1 accepter.accepter.bind unable to bind to 10.100.50.1:7300 on any port in range 6800-7300: (98) Address already in use ceph-osd.38.log: -4> 2015-08-10 15:55:31.149299 7f31a14bf700 -1 accepter.accepter.bind unable to bind to 10.100.50.1:7300 on any port in range 6800-7300: (98) Address already in use ceph-osd.38.log: -3> 2015-08-10 15:55:31.150890 7f31a14bf700 -1 accepter.accepter.bind unable to bind to 10.100.50.1:7300 on any port in range 6800-7300: (98) Address already in use ceph-osd.38.log: -2> 2015-08-10 15:55:31.154187 7f31a14bf700 -1 accepter.accepter.bind unable to bind to 10.100.50.1:7300 on any port in range 6800-7300: (98) Address already in use ceph-osd.41.log:2015-08-10 07:30:49.837903 7fe825c26700 -1 accepter.accepter.bind unable to bind to 10.100.50.1:7300 on any port in range 6800-7300: (98) Address already in use ceph-osd.41.log: -2> 2015-08-10 07:30:49.837903 7fe825c26700 -1 accepter.accepter.bind unable to bind to 10.100.50.1:7300 on any port in range 6800-7300: (98) Address already in use ceph-osd.45.log:2015-08-10 15:55:31.147666 7f647fb54700 -1 accepter.accepter.bind unable to bind to 10.100.50.1:7300 on any port in range 6800-7300: (98) Address already in use ceph-osd.45.log:2015-08-10 15:55:31.149575 7f647fb54700 -1 accepter.accepter.bind unable to bind to 10.100.50.1:7300 on any port in range 6800-7300: (98) Address already in use ceph-osd.45.log: -3> 2015-08-10 15:55:31.147666 7f647fb54700 -1 accepter.accepter.bind unable to bind to 10.100.50.1:7300 on any port in range 6800-7300: (98) Address already in use ceph-osd.45.log: -2> 2015-08-10 15:55:31.149575 7f647fb54700 -1 accepter.accepter.bind unable to bind to 10.100.50.1:7300 on any port in range 6800-7300: (98) Address already in use ceph-osd.55.log:2015-08-10 15:55:31.199112 7fa2b02d6700 -1 accepter.accepter.bind unable to bind to 10.100.50.1:7300 on any port in range 6800-7300: (98) Address already in use ceph-osd.55.log:2015-08-10 15:55:31.201311 7fa2b02d6700 -1 accepter.accepter.bind unable to bind to 10.100.50.1:7300 on any port in range 6800-7300: (98) Address already in use ceph-osd.55.log:2015-08-10 15:55:31.204034 7fa2b02d6700 -1 accepter.accepter.bind unable to bind to 10.100.50.1:7300 on any port in range 6800-7300: (98) Address already in use ceph-osd.55.log: -4> 2015-08-10 15:55:31.199112 7fa2b02d6700 -1 accepter.accepter.bind unable to bind to 10.100.50.1:7300 on any port in range 6800-7300: (98) Address already in use ceph-osd.55.log: -3> 2015-08-10 15:55:31.201311 7fa2b02d6700 -1 accepter.accepter.bind unable to bind to 10.100.50.1:7300 on any port in range 6800-7300: (98) Address already in use ceph-osd.55.log: -2> 2015-08-10 15:55:31.204034 7fa2b02d6700 -1 accepter.accepter.bind unable to bind to 10.100.50.1:7300 on any port in range 6800-7300: (98) Address already in use ceph-osd.58.log:2015-08-10 15:55:31.162660 7fd477c52700 -1 accepter.accepter.bind unable to bind to 10.100.50.1:7300 on any port in range 6800-7300: (98) Address already in use ceph-osd.58.log:2015-08-10 15:55:31.166120 7fd477c52700 -1 accepter.accepter.bind unable to bind to 10.100.50.1:7300 on any port in range 6800-7300: (98) Address already in use ceph-osd.58.log: -3> 2015-08-10 15:55:31.162660 7fd477c52700 -1 accepter.accepter.bind unable to bind to 10.100.50.1:7300 on any port in range 6800-7300: (98) Address already in use ceph-osd.58.log: -2> 2015-08-10 15:55:31.166120 7fd477c52700 -1 accepter.accepter.bind unable to bind to 10.100.50.1:7300 on any port in range 6800-7300: (98) Address already in use ceph-osd.8.log:2015-08-10 07:30:51.730846 7fe5e67f7700 -1 accepter.accepter.bind unable to bind to 10.100.50.1:7300 on any port in range 6800-7300: (98) Address already in use ceph-osd.8.log:2015-08-10 07:30:51.731851 7fe5e67f7700 -1 accepter.accepter.bind unable to bind to 10.100.50.1:7300 on any port in range 6800-7300: (98) Address already in use ceph-osd.8.log: -3> 2015-08-10 07:30:51.730846 7fe5e67f7700 -1 accepter.accepter.bind unable to bind to 10.100.50.1:7300 on any port in range 6800-7300: (98) Address already in use ceph-osd.8.log: -2> 2015-08-10 07:30:51.731851 7fe5e67f7700 -1 accepter.accepter.bind unable to bind to 10.100.50.1:7300 on any port in range 6800-7300: (98) Address already in use [root@node-s01 ceph]#
Here is the backtrack from one of the OSD
-10> 2015-08-10 12:38:02.766359 7faa0abce700 -1 osd.60 39761 heartbeat_check: no reply from osd.33 ever on either front or back, first ping sent 2015-08-10 12:37:00.655566 (cutoff 2015-08-10 12:37:42.766354) -9> 2015-08-10 12:38:02.766423 7faa0abce700 -1 osd.60 39761 heartbeat_check: no reply from osd.50 ever on either front or back, first ping sent 2015-08-10 12:37:00.655566 (cutoff 2015-08-10 12:37:42.766354) -8> 2015-08-10 12:38:02.766433 7faa0abce700 -1 osd.60 39761 heartbeat_check: no reply from osd.134 ever on either front or back, first ping sent 2015-08-10 12:37:23.469422 (cutoff 2015-08-10 12:37:42.766354) -7> 2015-08-10 12:38:02.766446 7faa0abce700 -1 osd.60 39761 heartbeat_check: no reply from osd.200 ever on either front or back, first ping sent 2015-08-10 12:37:15.361731 (cutoff 2015-08-10 12:37:42.766354) -6> 2015-08-10 12:38:02.766454 7faa0abce700 -1 osd.60 39761 heartbeat_check: no reply from osd.228 ever on either front or back, first ping sent 2015-08-10 12:37:00.655566 (cutoff 2015-08-10 12:37:42.766354) -5> 2015-08-10 12:38:03.259647 7fa9b5b9a700 0 -- 10.100.50.2:0/82807 >> 10.100.50.4:7142/147030592 pipe(0x4ff3200 sd=399 :0 s=1 pgs=0 cs=0 l=1 c=0x44b3de0).fault -4> 2015-08-10 12:38:03.259682 7fa9b5594700 0 -- 10.100.50.2:0/82807 >> 10.100.50.1:7204/408026440 pipe(0xf278f00 sd=411 :0 s=1 pgs=0 cs=0 l=1 c=0x44b7bc0).fault -3> 2015-08-10 12:38:03.271675 7fa9ecda2700 0 log [WRN] : map e39763 wrongly marked me down -2> 2015-08-10 12:38:03.306073 7fa9ecda2700 -1 accepter.accepter.bind unable to bind to 10.100.50.2:7300 on any port in range 6800-7300: (98) Address already in use -1> 2015-08-10 12:38:03.368817 7fa9ecda2700 0 osd.60 39763 prepare_to_stop starting shutdown 0> 2015-08-10 12:38:03.372071 7fa9ecda2700 -1 common/Mutex.cc: In function 'void Mutex::Lock(bool)' thread 7fa9ecda2700 time 2015-08-10 12:38:03.368886 common/Mutex.cc: 93: FAILED assert(r == 0) ceph version 0.80.9 (b5a67f0e1d15385bc0d60a6da6e7fc810bde6047) 1: (Mutex::Lock(bool)+0x1d3) [0xa83003] 2: (OSD::shutdown()+0x63) [0x63f3f3] 3: (OSD::handle_osd_map(MOSDMap*)+0x1829) [0x64dff9] 4: (OSD::_dispatch(Message*)+0x2fb) [0x6600eb] 5: (OSD::ms_dispatch(Message*)+0x211) [0x6607b1] 6: (DispatchQueue::entry()+0x5a2) [0xb5ac12] 7: (DispatchQueue::DispatchThread::entry()+0xd) [0xaf23ad] 8: /lib64/libpthread.so.0() [0x35952079d1] 9: (clone()+0x6d) [0x3594ee89dd] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.Environment Details
- ceph version 0.80.9 (b5a67f0e1d15385bc0d60a6da6e7fc810bde6047)
- Kernel : 2.6.32-431.el6.x86_64
- CentOS release 6.5 (Final)
- I have 4 OSD nodes but just 2 of them has shown this error
After restarting the crashed OSD's this error did not came, Currently the cluster is running with NOOUT,NODOWN flags and the current HEALTH is OK. I am not sure if i unset noout , nodown the cluster OSD's will not die again.
In tracker.ceph.com there are #3816 and #10494 which are not directly but slightly releated to this problem.
Need Help ( i dont have Debug 20 logs )
Updated by karan singh over 8 years ago
By mistake the issue got created twice. You can close this.
Bug #12655 needs your attention
Actions