Project

General

Profile

Bug #19312

ceph_test_rados_api_watch_notify --gtest_filter=LibRadosWatchNotify.Watch2Timeout fails due to reconnection

Added by Kefu Chai 2 months ago. Updated 2 months ago.

Status:
Resolved
Priority:
Immediate
Assignee:
Category:
-
Target version:
-
Start date:
03/20/2017
Due date:
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Release:
Needs Doc:
No

Description

see the failed tests at http://pulpito.ceph.com/kchai-2017-03-17_04:29:25-rados-wip-kefu-testing---basic-mira/

with the override of "ms tcp read timeout = 5", the client disconnects after idling for 5 seconds, and reconnects to the osd, but it sends the watch request which resets the timeout for disconnecting the watch.

see the client side log:

2017-03-20 15:17:50.407660 7fd29b7fe700  1 -- 127.0.0.1:0/3018425931 >> 127.0.0.1:6789/0 conn(0x7fd290030710 :-1 s=STATE_CLOSED pgs=157 cs=1 l=1).mark_down
2017-03-20 15:17:50.407711 7fd29b7fe700  1 -- 127.0.0.1:0/3018425931 --> 127.0.0.1:6790/0 -- mon_subscribe({mgrmap=0+,monmap=3+,osdmap=761}) v2 -- 0x7fd29001b460 con 0
2017-03-20 15:17:50.408281 7fd29b7fe700  1 -- 127.0.0.1:0/3018425931 <== mon.1 127.0.0.1:6790/0 4 ==== mgrmap(e 9) v1 ==== 63+0+0 (4036315079 0 0) 0x7fd29c003c20 con 0x7fd290037c80
2017-03-20 15:17:50.613675 7fd2b16f1700  1 -- 127.0.0.1:0/3018425931 >> 127.0.0.1:6812/34685 conn(0x7fd29000c140 :-1 s=STATE_OPEN pgs=782 cs=72 l=0).tick idle(5000024) more than 5000000 us, mark self fault.
2017-03-20 15:17:50.613736 7fd2b16f1700  0 -- 127.0.0.1:0/3018425931 >> 127.0.0.1:6812/34685 conn(0x7fd29000c140 :-1 s=STATE_OPEN pgs=782 cs=72 l=0).fault initiating reconnect
2017-03-20 15:17:51.284100 7fd2b0ef0700  1 -- 127.0.0.1:0/3018425931 >> 127.0.0.1:6804/34641 conn(0x7fd29003ae20 :-1 s=STATE_OPEN pgs=162 cs=1 l=1).tick idle(9776047) more than 5000000 us, mark self fault.
2017-03-20 15:17:51.284119 7fd2b0ef0700  1 -- 127.0.0.1:0/3018425931 >> 127.0.0.1:6804/34641 conn(0x7fd29003ae20 :-1 s=STATE_OPEN pgs=162 cs=1 l=1).fault on lossy channel, failing
2017-03-20 15:17:51.284477 7fd29b7fe700  1 -- 127.0.0.1:0/3018425931 >> 127.0.0.1:6804/34641 conn(0x7fd29003ae20 :-1 s=STATE_CLOSED pgs=162 cs=1 l=1).mark_down
2017-03-20 15:17:51.284738 7fd29b7fe700  1 -- 127.0.0.1:0/3018425931 --> 127.0.0.1:6804/34641 -- osd_op(unknown.0.0:20 193.7 193:ff9d772f:test-rados-api-gen8-41028-2::foo:head [watch reconnect cookie 94846125621760 gen 17] snapc 0=[] ondisk+write+known_if_redirected e760) v8 -- 0x7fd29001d3d0 con 0
2017-03-20 15:17:51.505017 7fd2b0ef0700  1 -- 127.0.0.1:0/3018425931 <== osd.1 127.0.0.1:6804/34641 1 ==== osd_op_reply(20 foo [watch reconnect cookie 94846125621760 gen 17] v760'19 uv1 ondisk = 0) v7 ==== 123+0+0 (1061418294 0 0) 0x7fd294001de0 con 0x7fd29004b850
G2017-03-20 15:17:55.396126 7fd29bfff700  1 -- 127.0.0.1:0/3018425931 --> 127.0.0.1:6804/34641 -- ping magic: 0 v1 -- 0x7fd2740017b0 con 0
2017-03-20 15:17:55.613861 7fd2b16f1700  1 -- 127.0.0.1:0/3018425931 >> 127.0.0.1:6812/34685 conn(0x7fd29000c140 :-1 s=STATE_OPEN pgs=788 cs=74 l=0).read_bulk peer close file descriptor 17

History

#1 Updated by Kefu Chai 2 months ago

reproduciable on master. i have not bisect'ed it.

#2 Updated by Kefu Chai 2 months ago

  • Status changed from New to In Progress
  • Assignee set to Kefu Chai

#3 Updated by Kefu Chai 2 months ago

  • Status changed from In Progress to Need Review

#4 Updated by Kefu Chai 2 months ago

  • Status changed from Need Review to Resolved

Also available in: Atom PDF