Actions
Bug #37882
closedmsg/async: wrong source ip inferred
Status:
Resolved
Priority:
Urgent
Assignee:
-
Category:
-
Target version:
-
% Done:
0%
Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
in /a/sage-2019-01-11_15:59:52-rados-wip-sage2-testing-2019-01-10-1820-distro-basic-smithi/3448145 this leads to the osd not being marked up and then, eventually,
'Scrubbing terminated -- not all pgs were active and clean.'.
osd connects from 172.21.15.79:52058,
2019-01-12 11:09:18.034 7f326a7b8700 10 -- v1:172.21.15.95:6790/0 >> conn(0x564b29854800 legacy=0x564b298b7600 :-1 s=STATE_NONE l=0).accept sd=37 listen_addr v1:172.21.15.95:6790/0 peer_addr v1:172.21.15.79:52058/0 2019-01-12 11:09:18.034 7f326afb9700 20 -- v1:172.21.15.95:6790/0 >> conn(0x564b29854800 legacy=0x564b298b7600 :-1 s=STATE_ACCEPTING l=0).process 2019-01-12 11:09:18.035 7f326afb9700 20 --1- v1:172.21.15.95:6790/0 >> - 0x564b298b7600 conn(0x564b29854800 legacy=0x564b298b7600 :-1 s=START_ACCEPT pgs=0 cs=0 l=0).read_event 2019-01-12 11:09:18.035 7f326afb9700 20 --1- v1:172.21.15.95:6790/0 >> - 0x564b298b7600 conn(0x564b29854800 legacy=0x564b298b7600 :-1 s=START_ACCEPT pgs=0 cs=0 l=0).send_server_banner 2019-01-12 11:09:18.035 7f326afb9700 1 --1- v1:172.21.15.95:6790/0 >> - 0x564b298b7600 conn(0x564b29854800 legacy=0x564b298b7600 :6790 s=ACCEPTING pgs=0 cs=0 l=0).send_server_banner sd=37 legacy v1:172.21.15.95:6790/0 socket_addr v1:172.21.15.95:6790/0 target_addr v1:172.21.15.79:52058/0 2019-01-12 11:09:18.035 7f326afb9700 10 -- v1:172.21.15.95:6790/0 >> conn(0x564b29854800 legacy=0x564b298b7600 :6790 s=STATE_CONNECTION_ESTABLISHED l=0)._try_send sent bytes 281 remaining bytes 0 2019-01-12 11:09:18.035 7f326afb9700 20 --1- v1:172.21.15.95:6790/0 >> - 0x564b298b7600 conn(0x564b29854800 legacy=0x564b298b7600 :6790 s=ACCEPTING pgs=0 cs=0 l=0).handle_server_banner_write r=0 2019-01-12 11:09:18.035 7f326afb9700 10 --1- v1:172.21.15.95:6790/0 >> - 0x564b298b7600 conn(0x564b29854800 legacy=0x564b298b7600 :6790 s=ACCEPTING pgs=0 cs=0 l=0).handle_server_banner_write write banner and addr done: - 2019-01-12 11:09:18.035 7f326afb9700 20 --1- v1:172.21.15.95:6790/0 >> - 0x564b298b7600 conn(0x564b29854800 legacy=0x564b298b7600 :6790 s=ACCEPTING pgs=0 cs=0 l=0).wait_client_banner 2019-01-12 11:09:18.035 7f326afb9700 20 -- v1:172.21.15.95:6790/0 >> conn(0x564b29854800 legacy=0x564b298b7600 :6790 s=STATE_CONNECTION_ESTABLISHED l=0).read start len=145 2019-01-12 11:09:18.035 7f326afb9700 20 -- v1:172.21.15.95:6790/0 >> conn(0x564b29854800 legacy=0x564b298b7600 :6790 s=STATE_CONNECTION_ESTABLISHED l=0).process 2019-01-12 11:09:18.035 7f326afb9700 20 -- v1:172.21.15.95:6790/0 >> conn(0x564b29854800 legacy=0x564b298b7600 :6790 s=STATE_CONNECTION_ESTABLISHED l=0).read continue len=145 2019-01-12 11:09:18.035 7f326afb9700 20 --1- v1:172.21.15.95:6790/0 >> - 0x564b298b7600 conn(0x564b29854800 legacy=0x564b298b7600 :6790 s=ACCEPTING pgs=0 cs=0 l=0).handle_client_banner r=0 2019-01-12 11:09:18.035 7f326afb9700 10 --1- v1:172.21.15.95:6790/0 >> - 0x564b298b7600 conn(0x564b29854800 legacy=0x564b298b7600 :6790 s=ACCEPTING pgs=0 cs=0 l=0).handle_client_banner accept peer addr is v1:0.0.0.0:6809/34208 2019-01-12 11:09:18.035 7f326afb9700 0 --1- v1:172.21.15.95:6790/0 >> - 0x564b298b7600 conn(0x564b29854800 legacy=0x564b298b7600 :6790 s=ACCEPTING pgs=0 cs=0 l=0).handle_client_banner accept peer addr is really v1:172.21.15.95:6809/34208 (socket is v1:172.21.15.95:6790/0) 2019-01-12 11:09:18.035 7f326afb9700 20 --1- v1:172.21.15.95:6790/0 >> v1:172.21.15.95:6809/34208 0x564b298b7600 conn(0x564b29854800 legacy=0x564b298b7600 :6790 s=ACCEPTING pgs=0 cs=0 l=0).wait_connect_message
but there at the end we infer their ip is .95.
this is a mixup of socket_addr (our socket addr) and target_addr (peer addr we are connected to)
Actions