Project

General

Profile

Actions

Bug #8278

closed

monclient: failure to retry after ill-timed connection reset during auth

Added by Sage Weil almost 10 years ago. Updated almost 10 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
Monitor
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

2014-05-02 04:03:10.750746 7f65f7d06700 10 mon.d@6(peon) e1 ms_verify_authorizer 10.214.133.38:0/8015495 client protocol 0
2014-05-02 04:03:10.753350 7f65f9e12700  1 -- 10.214.132.3:6792/0 <== client.? 10.214.133.38:0/8015495 1 ==== auth(proto 0 30 bytes epoch 0) v1 ==== 60+0+0 (673663173 0 0) 0x19f7000 con 0x187b000
2014-05-02 04:03:10.753378 7f65f9e12700 10 mon.d@6(peon) e1 ms_dispatch new session MonSession: client.? 10.214.133.38:0/8015495 is open for client.? 10.214.133.38:0/8015495
2014-05-02 04:03:10.753389 7f65f9e12700 10 mon.d@6(peon).paxosservice(auth 1..2) dispatch auth(proto 0 30 bytes epoch 0) v1 from client.? 10.214.133.38:0/8015495
2014-05-02 04:03:10.753404 7f65f9e12700 10 mon.d@6(peon).auth v2 preprocess_query auth(proto 0 30 bytes epoch 0) v1 from client.? 10.214.133.38:0/8015495
2014-05-02 04:03:10.753434 7f65f9e12700  1 -- 10.214.132.3:6792/0 --> 10.214.133.38:0/8015495 -- mon_map v1 -- ?+0 0x15f3800 con 0x187b000
2014-05-02 04:03:10.753463 7f65f9e12700  1 -- 10.214.132.3:6792/0 --> 10.214.133.38:0/8015495 -- auth_reply(proto 2 0 (0) Success) v1 -- ?+0 0x15f3600 con 0x187b000
2014-05-02 04:03:10.755999 7f65f9e12700 10 mon.d@6(peon) e1 ms_handle_reset 0x187b000 10.214.133.38:0/8015495
2014-05-02 04:03:10.756012 7f65f9e12700 10 mon.d@6(peon) e1 reset/close on session client.? 10.214.133.38:0/8015495
2014-05-02 04:03:10.756017 7f65f9e12700 10 mon.d@6(peon) e1 remove_session MonSession: client.? 10.214.133.38:0/8015495 is open client.? 10.214.133.38:0/8015495

is the entirety of the mon's interaction with the client. the client does this:
2014-05-02 04:02:04.623135 7f6e0d324700  0 monclient: hunting for new mon
2014-05-02 04:02:22.503186 7f6e0cb23700  0 -- :/4015495 >> 10.214.133.38:6791/0 pipe(0xc55cf0 sd=13 :0 s=1 pgs=0 cs=0 l=1 c=0xc5b010).fault
2014-05-02 04:02:25.503326 7f6df6ffd700  0 -- :/4015495 >> 10.214.133.38:6789/0 pipe(0x7f6de0001000 sd=14 :0 s=1 pgs=0 cs=0 l=1 c=0x7f6de0001270).fault
2014-05-02 04:02:32.184061 7f6df6ffd700  0 -- :/5015495 >> 10.214.133.38:6791/0 pipe(0xc5b010 sd=15 :0 s=1 pgs=0 cs=0 l=1 c=0xc5b280).fault
2014-05-02 04:02:35.184302 7f6e0cb23700  0 -- :/5015495 >> 10.214.133.38:6789/0 pipe(0x7f6df8000c00 sd=16 :0 s=1 pgs=0 cs=0 l=1 c=0x7f6df8000e70).fault
2014-05-02 04:02:42.101257 7f6e0cb23700  0 -- :/6015495 >> 10.214.133.38:6791/0 pipe(0xc5b150 sd=17 :0 s=1 pgs=0 cs=0 l=1 c=0xc5b3c0).fault
2014-05-02 04:02:45.101464 7f6df6ffd700  0 -- :/6015495 >> 10.214.133.38:6789/0 pipe(0x7f6e08000ea0 sd=18 :0 s=1 pgs=0 cs=0 l=1 c=0x7f6e08001770).fault
2014-05-02 04:02:52.469771 7f6df6ffd700  0 -- :/7015495 >> 10.214.133.38:6791/0 pipe(0xc5b010 sd=19 :0 s=1 pgs=0 cs=0 l=1 c=0xc5b280).fault
2014-05-02 04:02:55.470062 7f6e0cb23700  0 -- :/7015495 >> 10.214.133.38:6789/0 pipe(0x7f6dec001830 sd=20 :0 s=1 pgs=0 cs=0 l=1 c=0x7f6dec001aa0).fault
2014-05-02 04:02:58.471733 7f6e0cb23700  0 -- 10.214.133.38:0/7015495 >> 10.214.132.3:6792/0 pipe(0x7f6dec003010 sd=20 :46809 s=2 pgs=84 cs=1 l=1 c=0x7f6dec003280).injecting socket failure
2014-05-02 04:03:04.751557 7f6df6ffd700  0 -- :/8015495 >> 10.214.133.38:6791/0 pipe(0xc5b010 sd=21 :0 s=1 pgs=0 cs=0 l=1 c=0xc5b280).fault
2014-05-02 04:03:07.751762 7f6e0cb23700  0 -- :/8015495 >> 10.214.133.38:6789/0 pipe(0x7f6e000010d0 sd=22 :0 s=1 pgs=0 cs=0 l=1 c=0x7f6e00001340).fault
2014-05-02 04:03:10.753784 7f6e0cb23700  0 -- 10.214.133.38:0/8015495 >> 10.214.132.3:6792/0 pipe(0x7f6e00003010 sd=22 :46828 s=2 pgs=88 cs=1 l=1 c=0x7f6e00003280).injecting socket failure
2014-05-02 04:03:13.749969 7f6e0e326700  0 -- 10.214.133.38:0/8015495 send_keepalive con 0x7f6e00003280, no pipe.
2014-05-02 04:03:23.750145 7f6e0e326700  0 -- 10.214.133.38:0/8015495 send_keepalive con 0x7f6e00003280, no pipe.
2014-05-02 04:03:33.750322 7f6e0e326700  0 -- 10.214.133.38:0/8015495 send_keepalive con 0x7f6e00003280, no pipe.
2014-05-02 04:03:43.750497 7f6e0e326700  0 -- 10.214.133.38:0/8015495 send_keepalive con 0x7f6e00003280, no pipe.
2014-05-02 04:03:53.750674 7f6e0e326700  0 -- 10.214.133.38:0/8015495 send_keepalive con 0x7f6e00003280, no pipe.
2014-05-02 04:04:03.750868 7f6e0e326700  0 -- 10.214.133.38:0/8015495 send_keepalive con 0x7f6e00003280, no pipe.
2014-05-02 04:04:13.751055 7f6e0e326700  0 -- 10.214.133.38:0/8015495 send_keepalive con 0x7f6e00003280, no pipe.
2014-05-02 04:04:23.751245 7f6e0e326700  0 -- 10.214.133.38:0/8015495 send_keepalive con 0x7f6e00003280, no pipe.
2014-05-02 04:04:33.751437 7f6e0e326700  0 -- 10.214.133.38:0/8015495 send_keepalive con 0x7f6e00003280, no pipe.
2014-05-02 04:04:43.751615 7f6e0e326700  0 -- 10.214.133.38:0/8015495 send_keepalive con 0x7f6e00003280, no pipe.
2014-05-02 04:04:53.751794 7f6e0e326700  0 -- 10.214.133.38:0/8015495 send_keepalive con 0x7f6e00003280, no pipe.
....
2014-05-02 04:13:04.749471 7f6e12f62780  0 monclient: authenticate timed out after 600

ubuntu@teuthology:/var/lib/teuthworker/archive/teuthology-2014-05-02_02:30:10-rados-master-testing-basic-plana/229201
Actions #1

Updated by Sage Weil almost 10 years ago

  • Status changed from In Progress to Fix Under Review
Actions #2

Updated by Sage Weil almost 10 years ago

  • Status changed from Fix Under Review to Pending Backport
Actions #3

Updated by Sage Weil almost 10 years ago

  • Priority changed from Urgent to High
Actions #4

Updated by Sage Weil almost 10 years ago

  • Status changed from Pending Backport to Resolved
Actions

Also available in: Atom PDF