Project

General

Profile

Actions

Bug #9505

closed

kcephfs: client gets stuck in reconnect loop?

Added by Sage Weil over 9 years ago. Updated over 9 years ago.

Status:
Duplicate
Priority:
Urgent
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Development
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):

Description

2014-09-17 07:30:26.833872 7f806c1fe700  0 -- 10.214.134.10:6801/29320 >> 10.214.131.102:0/1174985195 pipe(0x7f7fa8000980 sd=99 :6801 s=0 pgs=0 cs=0 l=0 c=0x7f7fa801f1b0).accept peer addr is really 10.214.131.102:0/1174985195 (socket is 10.214.131.102:40070/0)
2014-09-17 07:30:26.834109 7f806c1fe700 10 mds.0.107  existing session 0x7f7fb401e9e0 for client.1633185 10.214.131.102:0/1174985195 existing con 0x7f7fa8016d00, new/authorizing con 0x7f7fa801f1b0
2014-09-17 07:30:26.834121 7f806c1fe700  0 -- 10.214.134.10:6801/29320 >> 10.214.131.102:0/1174985195 pipe(0x7f7fa8000980 sd=99 :6801 s=0 pgs=0 cs=0 l=0 c=0x7f7fa801f1b0).accept connect_seq 1 vs existing 1 state standby
2014-09-17 07:30:26.834566 7f806c1fe700 10 mds.0.107  existing session 0x7f7fb401e9e0 for client.1633185 10.214.131.102:0/1174985195 existing con 0x7f7fa8016d00, new/authorizing con 0x7f7fa801f1b0
2014-09-17 07:30:26.834577 7f806c1fe700  0 -- 10.214.134.10:6801/29320 >> 10.214.131.102:0/1174985195 pipe(0x7f7fa8000980 sd=99 :6801 s=0 pgs=0 cs=0 l=0 c=0x7f7fa801f1b0).accept connect_seq 2 vs existing 1 state standby
2014-09-17 07:30:26.834621 7f80658ad700  5 mds.0.107 ms_handle_reset on 10.214.131.102:0/1174985195
2014-09-17 07:30:26.834626 7f80658ad700  3 mds.0.107 ms_handle_reset closing connection for session client.1633185 10.214.131.102:0/1174985195
2014-09-17 07:30:26.834696 7f80658ad700  1 -- 10.214.134.10:6801/29320 mark_down 0x7f7fa801f1b0 -- 0x7f7fa8000980
2014-09-17 07:30:26.834723 7f80658ad700 10 mds.0.sessionmap remove_session s=0x7f7fb401e9e0 name=client.1633185
2014-09-17 07:30:26.834735 7f80658ad700 10 mds.0.107 ms_handle_accept 10.214.131.102:0/1174985195 con 0x7f7fa8016d00 session 0x7f7fb401e9e0
2014-09-17 07:30:28.725578 7f806c1fe700  1 -- 10.214.134.10:6801/29320 >> :/0 pipe(0x7f7fa8000980 sd=98 :6801 s=0 pgs=0 cs=0 l=0 c=0x7f7fa8021c40).accept sd=98 10.214.131.102:40071/0
2014-09-17 07:30:28.736685 7f806c1fe700  0 -- 10.214.134.10:6801/29320 >> 10.214.131.102:0/1174985195 pipe(0x7f7fa8000980 sd=98 :6801 s=0 pgs=0 cs=0 l=0 c=0x7f7fa8021c40).accept peer addr is really 10.214.131.102:0/1174985195 (socket is 10.214.131.102:40071/0)
2014-09-17 07:30:28.739602 7f806c1fe700 10 mds.0.107  new session 0x7f802805a2e0 for client.1633185 10.214.131.102:0/1174985195 con 0x7f7fa8021c40
2014-09-17 07:30:28.739611 7f806c1fe700 10 mds.0.sessionmap add_session s=0x7f802805a2e0 name=client.1633185
2014-09-17 07:30:28.739621 7f806c1fe700  0 -- 10.214.134.10:6801/29320 >> 10.214.131.102:0/1174985195 pipe(0x7f7fa8000980 sd=98 :6801 s=0 pgs=0 cs=0 l=0 c=0x7f7fa8021c40).accept we reset (peer sent cseq 2), sending RESETSESSION
2014-09-17 07:30:28.740708 7f80658ad700  5 mds.0.107 ms_handle_reset on 10.214.131.102:0/1174985195
2014-09-17 07:30:28.740714 7f80658ad700  3 mds.0.107 ms_handle_reset closing connection for session client.1633185 10.214.131.102:0/1174985195
2014-09-17 07:30:28.740719 7f80658ad700  1 -- 10.214.134.10:6801/29320 mark_down 0x7f7fa8021c40 -- pipe dne
2014-09-17 07:30:28.740722 7f80658ad700 10 mds.0.sessionmap remove_session s=0x7f802805a2e0 name=client.1633185 
2014-09-17 07:30:28.742750 7f806c1fe700  1 -- 10.214.134.10:6801/29320 >> :/0 pipe(0x7f7fa8000980 sd=98 :6801 s=0 pgs=0 cs=0 l=0 c=0x7f7fa801f440).accept sd=98 10.214.131.102:40072/0
2014-09-17 07:30:28.742768 7f806c1fe700  0 -- 10.214.134.10:6801/29320 >> 10.214.131.102:0/1174985195 pipe(0x7f7fa8000980 sd=98 :6801 s=0 pgs=0 cs=0 l=0 c=0x7f7fa801f440).accept peer addr is really 10.214.131.102:0/1174985195 (socket is 10.214.131.102:40072/0)
2014-09-17 07:30:28.746093 7f806c1fe700 10 mds.0.107  new session 0x7f802805a2e0 for client.1633185 10.214.131.102:0/1174985195 con 0x7f7fa801f440
2014-09-17 07:30:28.746101 7f806c1fe700 10 mds.0.sessionmap add_session s=0x7f802805a2e0 name=client.1633185
2014-09-17 07:30:28.746404 7f80658ad700 10 mds.0.107 ms_handle_accept 10.214.131.102:0/1174985195 con 0x7f7fa801f440 session 0x7f802805a2e0
2014-09-17 07:30:28.746939 7f806c1fe700 -1 failed to decode message of type 24 v2: buffer::end_of_buffer
2014-09-17 07:30:28.746972 7f806c1fe700  0 -- 10.214.134.10:6801/29320 >> 10.214.131.102:0/1174985195 pipe(0x7f7fa8000980 sd=98 :6801 s=2 pgs=18357 cs=1 l=0 c=0x7f7fa801f440).fault with nothing to send, going to standby
2014-09-17 07:30:29.827737 7f7f9b9ac700  0 -- 10.214.134.10:6801/29320 >> 10.214.131.102:0/1174985195 pipe(0x7f7fa8009c40 sd=99 :6801 s=0 pgs=0 cs=0 l=0 c=0x7f7fa8008c60).accept peer addr is really 10.214.131.102:0/1174985195 (socket is 10.214.131.102:40083/0)
2014-09-17 07:30:29.828129 7f7f9b9ac700 10 mds.0.107  existing session 0x7f802805a2e0 for client.1633185 10.214.131.102:0/1174985195 existing con 0x7f7fa801f440, new/authorizing con 0x7f7fa8008c60
2014-09-17 07:30:29.828142 7f7f9b9ac700  0 -- 10.214.134.10:6801/29320 >> 10.214.131.102:0/1174985195 pipe(0x7f7fa8009c40 sd=99 :6801 s=0 pgs=0 cs=0 l=0 c=0x7f7fa8008c60).accept connect_seq 1 vs existing 1 state standby
2014-09-17 07:30:29.829755 7f7f9b9ac700 10 mds.0.107  existing session 0x7f802805a2e0 for client.1633185 10.214.131.102:0/1174985195 existing con 0x7f7fa801f440, new/authorizing con 0x7f7fa8008c60
2014-09-17 07:30:29.829765 7f7f9b9ac700  0 -- 10.214.134.10:6801/29320 >> 10.214.131.102:0/1174985195 pipe(0x7f7fa8009c40 sd=99 :6801 s=0 pgs=0 cs=0 l=0 c=0x7f7fa8008c60).accept connect_seq 2 vs existing 1 state standby
2014-09-17 07:30:29.829897 7f80658ad700  5 mds.0.107 ms_handle_reset on 10.214.131.102:0/1174985195
2014-09-17 07:30:29.829904 7f80658ad700  3 mds.0.107 ms_handle_reset closing connection for session client.1633185 10.214.131.102:0/1174985195
2014-09-17 07:30:29.829909 7f80658ad700  1 -- 10.214.134.10:6801/29320 mark_down 0x7f7fa8008c60 -- 0x7f7fa8009c40
2014-09-17 07:30:29.829925 7f80658ad700 10 mds.0.sessionmap remove_session s=0x7f802805a2e0 name=client.1633185
2014-09-17 07:30:29.829931 7f7fcb1f5700  1 -- 10.214.134.10:6801/29320 <== osd.38 10.214.136.136:6800/23687 2378105 ==== osd_op_reply(180486510 100028588d3.001be3f9 [delete] v0'0 uv463374 ondisk = -2 ((2) No such file or directory)) v6 ==== 187+0+0 (3777003740 0 0) 0x7f802123b560 con 0x7f7fdc002c90
2014-09-17 07:30:29.829942 7f80658ad700 10 mds.0.107 ms_handle_accept 10.214.131.102:0/1174985195 con 0x7f7fa801f440 session 0x7f802805a2e0
2014-09-17 07:30:30.721489 7f7f9b9ac700  1 -- 10.214.134.10:6801/29320 >> :/0 pipe(0x7f7fa8000980 sd=98 :6801 s=0 pgs=0 cs=0 l=0 c=0x7f7fa8018a50).accept sd=98 10.214.131.102:40084/0
2014-09-17 07:30:30.764545 7f7f9b9ac700  0 -- 10.214.134.10:6801/29320 >> 10.214.131.102:0/1174985195 pipe(0x7f7fa8000980 sd=98 :6801 s=0 pgs=0 cs=0 l=0 c=0x7f7fa8018a50).accept peer addr is really 10.214.131.102:0/1174985195 (socket is 10.214.131.102:40084/0)
2014-09-17 07:30:30.766755 7f7f9b9ac700 10 mds.0.107  new session 0x7f7fb4011280 for client.1633185 10.214.131.102:0/1174985195 con 0x7f7fa8018a50
2014-09-17 07:30:30.766783 7f7f9b9ac700  0 -- 10.214.134.10:6801/29320 >> 10.214.131.102:0/1174985195 pipe(0x7f7fa8000980 sd=98 :6801 s=0 pgs=0 cs=0 l=0 c=0x7f7fa8018a50).accept we reset (peer sent cseq 2), sending RESETSESSION
2014-09-17 07:30:30.767311 7f80658ad700  5 mds.0.107 ms_handle_reset on 10.214.131.102:0/1174985195
2014-09-17 07:30:30.767302 7f7f6db2d700  1 -- 10.214.134.10:6801/29320 <== osd.3 10.214.133.134:6800/8005 2509987 ==== osd_op_reply(180488961 10003402b59.0013cd3d [delete] v0'0 uv460711 ondisk = -2 ((2) No such file or directory)) v6 ==== 187+0+0 (4105353319 0 0) 0x7f806800eca0 con 0x7f80541ffb20
2014-09-17 07:30:30.767317 7f80658ad700  3 mds.0.107 ms_handle_reset closing connection for session client.1633185 10.214.131.102:0/1174985195
2014-09-17 07:30:30.767322 7f80658ad700  1 -- 10.214.134.10:6801/29320 mark_down 0x7f7fa8018a50 -- pipe dne
2014-09-17 07:30:30.767325 7f80658ad700 10 mds.0.sessionmap remove_session s=0x7f7fb4011280 name=client.1633185
2014-09-17 07:30:30.770221 7f7f9b9ac700  0 -- 10.214.134.10:6801/29320 >> 10.214.131.102:0/1174985195 pipe(0x7f7fa8000980 sd=98 :6801 s=0 pgs=0 cs=0 l=0 c=0x7f7fa8018a50).accept peer addr is really 10.214.131.102:0/1174985195 (socket is 10.214.131.102:40085/0)
2014-09-17 07:30:30.770938 7f7f9b9ac700 10 mds.0.107  new session 0x7f7fb40019b0 for client.1633185 10.214.131.102:0/1174985195 con 0x7f7fa8018a50
2014-09-17 07:30:30.770945 7f7f9b9ac700 10 mds.0.sessionmap add_session s=0x7f7fb40019b0 name=client.1633185
2014-09-17 07:30:30.770956 7f804e6f4700  1 -- 10.214.134.10:6801/29320 <== osd.6 10.214.133.134:6820/20014 2337016 ==== osd_op_reply(180489067 100033f7ee5.0014cbd0 [delete] v0'0 uv462507 ondisk = -2 ((2) No such file or directory)) v6 ==== 187+0+0 (3046909492 0 0) 0x7f80000433e0 con 0x7f804401f1d0
2014-09-17 07:30:30.770961 7f7fcb3f7700  1 -- 10.214.134.10:6801/29320 --> 10.214.133.134:6800/8005 -- osd_op(mds.0.107:180489207 100028e7f9c.00258bd1 [delete] 0.d221c5bc snapc 1=[] ondisk+write+known_if_redirected e362283) v4 -- ?+0 0x7f8014027d70 con 0x7f80541ffb20
2014-09-17 07:30:30.770973 7f80658ad700 10 mds.0.107 ms_handle_accept 10.214.131.102:0/1174985195 con 0x7f7fa8018a50 session 0x7f7fb40019b0
2014-09-17 07:30:30.771321 7f7f9b9ac700 -1 failed to decode message of type 24 v2: buffer::end_of_buffer
2014-09-17 07:30:30.771339 7f7f9b9ac700  0 -- 10.214.134.10:6801/29320 >> 10.214.131.102:0/1174985195 pipe(0x7f7fa8000980 sd=98 :6801 s=2 pgs=18362 cs=1 l=0 c=0x7f7fa8018a50).fault with nothing to send, going to standby

this is a partial log.. i was manually search for the addr and trying to catch relevant surrounding lines on the same thread, but probably missed a few things.

Actions #1

Updated by Sage Weil over 9 years ago

  • Project changed from CephFS to Linux kernel client
Actions #2

Updated by Zheng Yan over 9 years ago

dup of #9458

Actions #3

Updated by Zheng Yan over 9 years ago

  • Status changed from New to Duplicate
Actions

Also available in: Atom PDF