Actions
Bug #19312
closedceph_test_rados_api_watch_notify --gtest_filter=LibRadosWatchNotify.Watch2Timeout fails due to reconnection
% Done:
0%
Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
see the failed tests at http://pulpito.ceph.com/kchai-2017-03-17_04:29:25-rados-wip-kefu-testing---basic-mira/
with the override of "ms tcp read timeout = 5", the client disconnects after idling for 5 seconds, and reconnects to the osd, but it sends the watch request which resets the timeout for disconnecting the watch.
see the client side log:
2017-03-20 15:17:50.407660 7fd29b7fe700 1 -- 127.0.0.1:0/3018425931 >> 127.0.0.1:6789/0 conn(0x7fd290030710 :-1 s=STATE_CLOSED pgs=157 cs=1 l=1).mark_down 2017-03-20 15:17:50.407711 7fd29b7fe700 1 -- 127.0.0.1:0/3018425931 --> 127.0.0.1:6790/0 -- mon_subscribe({mgrmap=0+,monmap=3+,osdmap=761}) v2 -- 0x7fd29001b460 con 0 2017-03-20 15:17:50.408281 7fd29b7fe700 1 -- 127.0.0.1:0/3018425931 <== mon.1 127.0.0.1:6790/0 4 ==== mgrmap(e 9) v1 ==== 63+0+0 (4036315079 0 0) 0x7fd29c003c20 con 0x7fd290037c80 2017-03-20 15:17:50.613675 7fd2b16f1700 1 -- 127.0.0.1:0/3018425931 >> 127.0.0.1:6812/34685 conn(0x7fd29000c140 :-1 s=STATE_OPEN pgs=782 cs=72 l=0).tick idle(5000024) more than 5000000 us, mark self fault. 2017-03-20 15:17:50.613736 7fd2b16f1700 0 -- 127.0.0.1:0/3018425931 >> 127.0.0.1:6812/34685 conn(0x7fd29000c140 :-1 s=STATE_OPEN pgs=782 cs=72 l=0).fault initiating reconnect 2017-03-20 15:17:51.284100 7fd2b0ef0700 1 -- 127.0.0.1:0/3018425931 >> 127.0.0.1:6804/34641 conn(0x7fd29003ae20 :-1 s=STATE_OPEN pgs=162 cs=1 l=1).tick idle(9776047) more than 5000000 us, mark self fault. 2017-03-20 15:17:51.284119 7fd2b0ef0700 1 -- 127.0.0.1:0/3018425931 >> 127.0.0.1:6804/34641 conn(0x7fd29003ae20 :-1 s=STATE_OPEN pgs=162 cs=1 l=1).fault on lossy channel, failing 2017-03-20 15:17:51.284477 7fd29b7fe700 1 -- 127.0.0.1:0/3018425931 >> 127.0.0.1:6804/34641 conn(0x7fd29003ae20 :-1 s=STATE_CLOSED pgs=162 cs=1 l=1).mark_down 2017-03-20 15:17:51.284738 7fd29b7fe700 1 -- 127.0.0.1:0/3018425931 --> 127.0.0.1:6804/34641 -- osd_op(unknown.0.0:20 193.7 193:ff9d772f:test-rados-api-gen8-41028-2::foo:head [watch reconnect cookie 94846125621760 gen 17] snapc 0=[] ondisk+write+known_if_redirected e760) v8 -- 0x7fd29001d3d0 con 0 2017-03-20 15:17:51.505017 7fd2b0ef0700 1 -- 127.0.0.1:0/3018425931 <== osd.1 127.0.0.1:6804/34641 1 ==== osd_op_reply(20 foo [watch reconnect cookie 94846125621760 gen 17] v760'19 uv1 ondisk = 0) v7 ==== 123+0+0 (1061418294 0 0) 0x7fd294001de0 con 0x7fd29004b850 G2017-03-20 15:17:55.396126 7fd29bfff700 1 -- 127.0.0.1:0/3018425931 --> 127.0.0.1:6804/34641 -- ping magic: 0 v1 -- 0x7fd2740017b0 con 0 2017-03-20 15:17:55.613861 7fd2b16f1700 1 -- 127.0.0.1:0/3018425931 >> 127.0.0.1:6812/34685 conn(0x7fd29000c140 :-1 s=STATE_OPEN pgs=788 cs=74 l=0).read_bulk peer close file descriptor 17
Actions