Actions
Bug #856
closedbusy loop osd reconnects if osd is down
Source:
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):
Description
This is fallout from commit:ab48fe45f3f51e54975be158a9b4ca3ec90d4aad.
There is a confusing and subtle breakdown in retry/backoff between the msgr and osd_client. Need to look at this closely.
[ 139.890000] libceph: osd0 10.0.1.252:6800 connection failed [ 139.890000] libceph: messenger.c:1990 : fault 000000009f64a820 state 2066 to peer 10.0.1.252:6800 [ 139.890000] libceph: messenger.c:292 : con_close_socket on 000000009f64a820 sock 000000009bac0d10 [ 139.890000] libceph: messenger.c:162 : ceph_state_change 000000009f64a820 state = 2066 sk_state = 7 [ 139.890000] libceph: messenger.c:169 : ceph_state_change TCP_CLOSE [ 139.890000] libceph: messenger.c:171 : ceph_state_change TCP_CLOSE_WAIT [ 139.890000] libceph: messenger.c:2022 : fault queueing 000000009f64a820 delay 50 [ 139.890000] libceph: osd_client.c:580 : get_osd 000000009f64a7f0 3 -> 4 [ 139.890000] libceph: osd_client.c:591 : put_osd 000000009f64a7f0 4 -> 3 [ 139.890000] libceph: osd_client.c:543 : osd_reset osd0 [ 139.890000] libceph: osd_client.c:1121 : kick_requests osd0 [ 139.890000] libceph: osd_client.c:652 : __reset_osd 000000009f64a7f0 osd0 [ 139.890000] libceph: osd_client.c:660 : osd addr hasn't changed and connection still opening, letting msgr retry [ 139.890000] libceph: osd_client.c:1209 : 1 requests for down osds, need new map [ 139.890000] libceph: mon_client.c:291 : request_next_osdmap have 6 [ 139.890000] libceph: mon_client.c:190 : __send_subscribe sub_sent=0 exp=0 want_osd=1 [ 139.890000] libceph: mon_client.c:206 : __send_subscribe to 'osdmap' 6 [ 139.890000] libceph: mon_client.c:216 : __send_subscribe to 'mdsmap' 5+ [ 139.890000] libceph: messenger.c:2124 : ----- 000000009c885f30 to mon0 15=mon_subscribe len 85+0+0 ----- [ 139.890000] libceph: messenger.c:393 : con_get 000000009f7897f0 nref = 1 -> 2 [ 139.890000] libceph: messenger.c:1941 : queue_con 000000009f7897f0 [ 139.890000] libceph: osd_client.c:591 : put_osd 000000009f64a7f0 3 -> 2 [ 139.890000] libceph: messenger.c:1711 : try_write start 000000009f64a820 state 18 nref 1 [ 139.890000] libceph: messenger.c:1714 : try_write out_kvec_bytes 352 [ 139.890000] libceph: messenger.c:666 : prepare_write_connect 000000009f64a820 cseq=0 gseq=5 proto=24 [ 139.890000] libceph: messenger.c:924 : prepare_read_banner 000000009f64a820 [ 139.890000] libceph: messenger.c:1735 : try_write initiating connect on 000000009f64a820 new state 18 [ 139.890000] libceph: messenger.c:228 : connect 10.0.1.252:6800 [ 139.890000] libceph: messenger.c:162 : ceph_state_change 000000009f64a820 state = 18 sk_state = 7 [ 139.890000] libceph: messenger.c:169 : ceph_state_change TCP_CLOSE [ 139.890000] libceph: messenger.c:171 : ceph_state_change TCP_CLOSE_WAIT [ 139.890000] libceph: osd_client.c:580 : get_osd 000000009f64a7f0 2 -> 3 [ 139.890000] libceph: messenger.c:1941 : queue_con 000000009f64a820 [ 139.890000] libceph: messenger.c:235 : connect 10.0.1.252:6800 EINPROGRESS sk_state = 7 [ 139.890000] libceph: messenger.c:701 : write_partial_kvec 000000009f64a820 352 left [ 139.890000] libceph: messenger.c:729 : write_partial_kvec 000000009f64a820 352 left in 4 kvecs ret = -111 [ 139.890000] libceph: messenger.c:1800 : try_write done on 000000009f64a820 ret -111 [ 139.890000] libceph: osd0 10.0.1.252:6800 connection failed [ 139.890000] libceph: messenger.c:1990 : fault 000000009f64a820 state 2066 to peer 10.0.1.252:6800 [ 139.890000] libceph: messenger.c:292 : con_close_socket on 000000009f64a820 sock 000000009bac0d10 [ 139.890000] libceph: messenger.c:162 : ceph_state_change 000000009f64a820 state = 2066 sk_state = 7 [ 139.890000] libceph: messenger.c:169 : ceph_state_change TCP_CLOSE [ 139.890000] libceph: messenger.c:171 : ceph_state_change TCP_CLOSE_WAIT [ 139.890000] libceph: messenger.c:2022 : fault queueing 000000009f64a820 delay 100 [ 139.890000] libceph: osd_client.c:580 : get_osd 000000009f64a7f0 3 -> 4 [ 139.890000] libceph: osd_client.c:591 : put_osd 000000009f64a7f0 4 -> 3 [ 139.890000] libceph: osd_client.c:543 : osd_reset osd0 [ 139.890000] libceph: osd_client.c:1121 : kick_requests osd0 [ 139.890000] libceph: osd_client.c:652 : __reset_osd 000000009f64a7f0 osd0 [ 139.890000] libceph: osd_client.c:660 : osd addr hasn't changed and connection still opening, letting msgr retry [ 139.890000] libceph: osd_client.c:1209 : 1 requests for down osds, need new map [ 139.890000] libceph: mon_client.c:291 : request_next_osdmap have 6 [ 139.890000] libceph: osd_client.c:591 : put_osd 000000009f64a7f0 3 -> 2 [ 139.890000] libceph: messenger.c:1819 : try_read start on 000000009f7897f0 [ 139.890000] libceph: messenger.c:1823 : try_read tag 1 in_base_pos 0 [ 139.890000] libceph: messenger.c:1909 : try_read done on 000000009f7897f0 ret 0 [ 139.890000] libceph: messenger.c:1711 : try_write start 000000009f7897f0 state 21 nref 2 [ 139.890000] libceph: messenger.c:1714 : try_write out_kvec_bytes 0 [ 139.890000] libceph: messenger.c:510 : prepare_write_message 000000009c885f30 seq 5 type 15 len 85+0+0 0 pgs [ 139.890000] libceph: messenger.c:542 : prepare_write_message front_crc 4173134223 data_crc 0 [ 139.890000] libceph: messenger.c:452 : prepare_write_message_footer 000000009f7897f0 [ 139.890000] libceph: messenger.c:1714 : try_write out_kvec_bytes 152 [ 139.890000] libceph: messenger.c:701 : write_partial_kvec 000000009f7897f0 152 left [ 139.890000] libceph: messenger.c:729 : write_partial_kvec 000000009f7897f0 0 left in 0 kvecs ret = 1 [ 139.890000] libceph: messenger.c:1797 : try_write nothing else to write. [ 139.890000] libceph: messenger.c:1800 : try_write done on 000000009f7897f0 ret 0 [ 139.890000] libceph: messenger.c:402 : con_put 000000009f7897f0 nref = 2 -> 1 [ 139.890000] libceph: messenger.c:1711 : try_write start 000000009f64a820 state 18 nref 1 [ 139.890000] libceph: messenger.c:1714 : try_write out_kvec_bytes 352 [ 139.890000] libceph: messenger.c:666 : prepare_write_connect 000000009f64a820 cseq=0 gseq=6 proto=24 [ 139.890000] libceph: messenger.c:924 : prepare_read_banner 000000009f64a820 [ 139.890000] libceph: messenger.c:1735 : try_write initiating connect on 000000009f64a820 new state 18 [ 139.890000] libceph: messenger.c:228 : connect 10.0.1.252:6800 [ 139.890000] libceph: messenger.c:162 : ceph_state_change 000000009f64a820 state = 18 sk_state = 7 [ 139.890000] libceph: messenger.c:169 : ceph_state_change TCP_CLOSE [ 139.890000] libceph: messenger.c:171 : ceph_state_change TCP_CLOSE_WAIT [ 139.890000] libceph: osd_client.c:580 : get_osd 000000009f64a7f0 2 -> 3 [ 139.890000] libceph: messenger.c:1941 : queue_con 000000009f64a820 [ 139.890000] libceph: messenger.c:235 : connect 10.0.1.252:6800 EINPROGRESS sk_state = 7 [ 139.890000] libceph: messenger.c:701 : write_partial_kvec 000000009f64a820 352 left [ 139.890000] libceph: messenger.c:729 : write_partial_kvec 000000009f64a820 352 left in 4 kvecs ret = -111 [ 139.890000] libceph: messenger.c:1800 : try_write done on 000000009f64a820 ret -111
Updated by Sage Weil about 13 years ago
- Status changed from New to 7
hopefully fixed by commits in for-linus, see commit:c3d5a0cd56fa1b57db04aa151a994424657cf264 among others
Actions