Project

General

Profile

Bug #44245

nautilus: mgr: connection halt

Added by Patrick Donnelly about 2 months ago. Updated about 1 month ago.

Status:
Resolved
Priority:
Immediate
Category:
-
Target version:
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature:

Description

2020-02-21T12:22:55.057 INFO:teuthology.orchestra.run.smithi019:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph fs volume ls
2020-02-21T12:22:55.167 INFO:teuthology.orchestra.run.smithi019.stderr:2020-02-21 12:22:55.155 7f2246ca7700  1  Processor -- start
2020-02-21T12:22:55.167 INFO:teuthology.orchestra.run.smithi019.stderr:2020-02-21 12:22:55.155 7f2246ca7700  1 --  start start
2020-02-21T12:22:55.167 INFO:teuthology.orchestra.run.smithi019.stderr:2020-02-21 12:22:55.155 7f2246ca7700  1 --2-  >> v2:172.21.15.195:3300/0 conn(0x7f22400f8630 0x7f22400e3ae0 unknown :-1 s=NONE pgs=0 cs=0 l=0 rx=0 tx=0).connect
2020-02-21T12:22:55.167 INFO:teuthology.orchestra.run.smithi019.stderr:2020-02-21 12:22:55.155 7f2246ca7700  1 --2-  >> [v2:172.21.15.195:3301/0,v1:172.21.15.195:6790/0] conn(0x7f224009de50 0x7f224009e2a0 unknown :-1 s=NONE pgs=0 cs=0 l=0 rx=0 tx=0).connect
2020-02-21T12:22:55.168 INFO:teuthology.orchestra.run.smithi019.stderr:2020-02-21 12:22:55.155 7f2246ca7700  1 --  --> v1:172.21.15.195:6789/0 -- auth(proto 0 30 bytes epoch 0) v1 -- 0x7f22400e72d0 con 0x7f22400f8a80
2020-02-21T12:22:55.168 INFO:teuthology.orchestra.run.smithi019.stderr:2020-02-21 12:22:55.155 7f2246ca7700  1 --  --> v2:172.21.15.195:3300/0 -- mon_getmap magic: 0 v1 -- 0x7f22400e8c90 con 0x7f22400f8630
2020-02-21T12:22:55.168 INFO:teuthology.orchestra.run.smithi019.stderr:2020-02-21 12:22:55.155 7f2246ca7700  1 --  --> [v2:172.21.15.195:3301/0,v1:172.21.15.195:6790/0] -- mon_getmap magic: 0 v1 -- 0x7f22400fa1a0 con 0x7f224009de50
2020-02-21T12:22:55.168 INFO:teuthology.orchestra.run.smithi019.stderr:2020-02-21 12:22:55.155 7f2244a6d700  1 --2-  >> v2:172.21.15.195:3300/0 conn(0x7f22400f8630 0x7f22400e3ae0 unknown :-1 s=BANNER_CONNECTING pgs=0 cs=0 l=0 rx=0 tx=0)._handle_peer_banner_payload supported=0 required=0
2020-02-21T12:22:55.168 INFO:teuthology.orchestra.run.smithi019.stderr:2020-02-21 12:22:55.155 7f224526e700  1 --2-  >> [v2:172.21.15.195:3301/0,v1:172.21.15.195:6790/0] conn(0x7f224009de50 0x7f224009e2a0 unknown :-1 s=BANNER_CONNECTING pgs=0 cs=0 l=0 rx=0 tx=0)._handle_peer_banner_payload supported=0 required=0
2020-02-21T12:22:55.168 INFO:teuthology.orchestra.run.smithi019.stderr:2020-02-21 12:22:55.155 7f223ffff700  1 --1-  >> v1:172.21.15.195:6789/0 conn(0x7f22400f8a80 0x7f22400f96a0 :-1 s=CONNECTING_WAIT_BANNER_AND_IDENTIFY pgs=0 cs=0 l=0).handle_server_banner_and_identify peer v1:172.21.15.195:6789/0 says I am v1:172.21.15.19:56784/0 (socket says 172.21.15.19:56784)
2020-02-21T12:22:55.168 INFO:teuthology.orchestra.run.smithi019.stderr:2020-02-21 12:22:55.155 7f2244a6d700  1 --2-  >> v2:172.21.15.195:3300/0 conn(0x7f22400f8630 0x7f22400e3ae0 unknown :-1 s=HELLO_CONNECTING pgs=0 cs=0 l=0 rx=0 tx=0).handle_hello peer v2:172.21.15.195:3300/0 says I am v2:172.21.15.19:36718/0 (socket says 172.21.15.19:36718)
2020-02-21T12:22:55.169 INFO:teuthology.orchestra.run.smithi019.stderr:2020-02-21 12:22:55.155 7f223ffff700  1 -- 172.21.15.19:0/1054442872 learned_addr learned my addr 172.21.15.19:0/1054442872 (peer_addr_for_me v1:172.21.15.19:0/0)
2020-02-21T12:22:55.169 INFO:teuthology.orchestra.run.smithi019.stderr:2020-02-21 12:22:55.155 7f224526e700  1 -- 172.21.15.19:0/1054442872 >> v2:172.21.15.195:3300/0 conn(0x7f22400f8630 msgr2=0x7f22400e3ae0 unknown :-1 s=STATE_CONNECTION_ESTABLISHED l=0).mark_down
2020-02-21T12:22:55.169 INFO:teuthology.orchestra.run.smithi019.stderr:2020-02-21 12:22:55.155 7f224526e700  1 --2- 172.21.15.19:0/1054442872 >> v2:172.21.15.195:3300/0 conn(0x7f22400f8630 0x7f22400e3ae0 unknown :-1 s=AUTH_CONNECTING pgs=0 cs=0 l=0 rx=0 tx=0).stop
2020-02-21T12:22:55.170 INFO:teuthology.orchestra.run.smithi019.stderr:2020-02-21 12:22:55.155 7f224526e700  1 -- 172.21.15.19:0/1054442872 >> v1:172.21.15.195:6789/0 conn(0x7f22400f8a80 legacy=0x7f22400f96a0 unknown :-1 s=STATE_CONNECTION_ESTABLISHED l=1).mark_down
2020-02-21T12:22:55.170 INFO:teuthology.orchestra.run.smithi019.stderr:2020-02-21 12:22:55.155 7f224526e700  1 -- 172.21.15.19:0/1054442872 --> [v2:172.21.15.195:3301/0,v1:172.21.15.195:6790/0] -- mon_subscribe({config=0+,monmap=0+}) v3 -- 0x7f22400f9ea0 con 0x7f224009de50
2020-02-21T12:22:55.170 INFO:teuthology.orchestra.run.smithi019.stderr:2020-02-21 12:22:55.159 7f224526e700  1 --2- 172.21.15.19:0/1054442872 >> [v2:172.21.15.195:3301/0,v1:172.21.15.195:6790/0] conn(0x7f224009de50 0x7f224009e2a0 secure :-1 s=AUTH_CONNECTING_SIGN pgs=0 cs=0 l=0 rx=0x7f2238008b10 tx=0x7f22400e8e40).handle_read_frame_epilogue_main read frame epilogue bytes=32
2020-02-21T12:22:55.170 INFO:teuthology.orchestra.run.smithi019.stderr:2020-02-21 12:22:55.159 7f224526e700  1 --2- 172.21.15.19:0/1054442872 >> [v2:172.21.15.195:3301/0,v1:172.21.15.195:6790/0] conn(0x7f224009de50 0x7f224009e2a0 secure :-1 s=SESSION_CONNECTING pgs=0 cs=0 l=0 rx=0x7f2238008b10 tx=0x7f22400e8e40).handle_read_frame_epilogue_main read frame epilogue bytes=32
2020-02-21T12:22:55.171 INFO:teuthology.orchestra.run.smithi019.stderr:2020-02-21 12:22:55.159 7f224526e700  1 --2- 172.21.15.19:0/1054442872 >> [v2:172.21.15.195:3301/0,v1:172.21.15.195:6790/0] conn(0x7f224009de50 0x7f224009e2a0 secure :-1 s=READY pgs=358 cs=0 l=1 rx=0x7f2238008b10 tx=0x7f22400e8e40).ready entity=mon.2 client_cookie=a681eac99cb5b2bb server_cookie=0 in_seq=0 out_seq=0
2020-02-21T12:22:55.171 INFO:teuthology.orchestra.run.smithi019.stderr:2020-02-21 12:22:55.159 7f224526e700  1 --2- 172.21.15.19:0/1054442872 >> [v2:172.21.15.195:3301/0,v1:172.21.15.195:6790/0] conn(0x7f224009de50 0x7f224009e2a0 secure :-1 s=THROTTLE_DONE pgs=358 cs=0 l=1 rx=0x7f2238008b10 tx=0x7f22400e8e40).handle_read_frame_epilogue_main read frame epilogue bytes=32
2020-02-21T12:22:55.171 INFO:teuthology.orchestra.run.smithi019.stderr:2020-02-21 12:22:55.159 7f224526e700  1 --2- 172.21.15.19:0/1054442872 >> [v2:172.21.15.195:3301/0,v1:172.21.15.195:6790/0] conn(0x7f224009de50 0x7f224009e2a0 secure :-1 s=THROTTLE_DONE pgs=358 cs=0 l=1 rx=0x7f2238008b10 tx=0x7f22400e8e40).handle_read_frame_epilogue_main read frame epilogue bytes=32
2020-02-21T12:22:55.171 INFO:teuthology.orchestra.run.smithi019.stderr:2020-02-21 12:22:55.159 7f223f7fe700  1 -- 172.21.15.19:0/1054442872 <== mon.2 v2:172.21.15.195:3301/0 1 ==== mon_map magic: 0 v1 ==== 377+0+0 (secure 0 0 0) 0x7f22380117a0 con 0x7f224009de50
2020-02-21T12:22:55.171 INFO:teuthology.orchestra.run.smithi019.stderr:2020-02-21 12:22:55.159 7f224526e700  1 --2- 172.21.15.19:0/1054442872 >> [v2:172.21.15.195:3301/0,v1:172.21.15.195:6790/0] conn(0x7f224009de50 0x7f224009e2a0 secure :-1 s=THROTTLE_DONE pgs=358 cs=0 l=1 rx=0x7f2238008b10 tx=0x7f22400e8e40).handle_read_frame_epilogue_main read frame epilogue bytes=32
2020-02-21T12:22:55.171 INFO:teuthology.orchestra.run.smithi019.stderr:2020-02-21 12:22:55.159 7f223f7fe700  1 -- 172.21.15.19:0/1054442872 <== mon.2 v2:172.21.15.195:3301/0 2 ==== config(0 keys) v1 ==== 4+0+0 (secure 0 0 0) 0x7f22380119a0 con 0x7f224009de50
2020-02-21T12:22:55.171 INFO:teuthology.orchestra.run.smithi019.stderr:2020-02-21 12:22:55.159 7f223f7fe700  1 -- 172.21.15.19:0/1054442872 <== mon.2 v2:172.21.15.195:3301/0 3 ==== mon_map magic: 0 v1 ==== 377+0+0 (secure 0 0 0) 0x7f2238011df0 con 0x7f224009de50
2020-02-21T12:22:55.172 INFO:teuthology.orchestra.run.smithi019.stderr:2020-02-21 12:22:55.159 7f2246ca7700  1 -- 172.21.15.19:0/1054442872 >> [v2:172.21.15.195:3301/0,v1:172.21.15.195:6790/0] conn(0x7f224009de50 msgr2=0x7f224009e2a0 secure :-1 s=STATE_CONNECTION_ESTABLISHED l=1).mark_down
2020-02-21T12:22:55.172 INFO:teuthology.orchestra.run.smithi019.stderr:2020-02-21 12:22:55.159 7f2246ca7700  1 --2- 172.21.15.19:0/1054442872 >> [v2:172.21.15.195:3301/0,v1:172.21.15.195:6790/0] conn(0x7f224009de50 0x7f224009e2a0 secure :-1 s=READY pgs=358 cs=0 l=1 rx=0x7f2238008b10 tx=0x7f22400e8e40).stop
2020-02-21T12:22:55.172 INFO:teuthology.orchestra.run.smithi019.stderr:2020-02-21 12:22:55.159 7f2246ca7700  1 -- 172.21.15.19:0/1054442872 shutdown_connections
2020-02-21T12:22:55.172 INFO:teuthology.orchestra.run.smithi019.stderr:2020-02-21 12:22:55.159 7f2246ca7700  1 --2- 172.21.15.19:0/1054442872 >> [v2:172.21.15.195:3301/0,v1:172.21.15.195:6790/0] conn(0x7f224009de50 0x7f224009e2a0 unknown :-1 s=CLOSED pgs=358 cs=0 l=1 rx=0 tx=0).stop
2020-02-21T12:22:55.172 INFO:teuthology.orchestra.run.smithi019.stderr:2020-02-21 12:22:55.159 7f2246ca7700  1 --2- 172.21.15.19:0/1054442872 >> v2:172.21.15.195:3300/0 conn(0x7f22400f8630 0x7f22400e3ae0 unknown :-1 s=CLOSED pgs=0 cs=0 l=0 rx=0 tx=0).stop
2020-02-21T12:22:55.172 INFO:teuthology.orchestra.run.smithi019.stderr:2020-02-21 12:22:55.159 7f2246ca7700  1 -- 172.21.15.19:0/1054442872 shutdown_connections
2020-02-21T12:22:55.173 INFO:teuthology.orchestra.run.smithi019.stderr:2020-02-21 12:22:55.159 7f2246ca7700  1 -- 172.21.15.19:0/1054442872 wait complete.
2020-02-21T12:22:55.173 INFO:teuthology.orchestra.run.smithi019.stderr:2020-02-21 12:22:55.159 7f2246ca7700  1 -- 172.21.15.19:0/1054442872 >> 172.21.15.19:0/1054442872 conn(0x7f22400e47c0 msgr2=0x7f22400e7e50 unknown :-1 s=STATE_NONE l=0).mark_down
2020-02-21T12:22:55.176 INFO:teuthology.orchestra.run.smithi019.stderr:2020-02-21 12:22:55.163 7f2246ca7700  1  Processor -- start
2020-02-21T12:22:55.176 INFO:teuthology.orchestra.run.smithi019.stderr:2020-02-21 12:22:55.163 7f2246ca7700  1 --  start start
2020-02-21T12:22:55.176 INFO:teuthology.orchestra.run.smithi019.stderr:2020-02-21 12:22:55.163 7f2246ca7700  1 --2-  >> v2:172.21.15.195:3300/0 conn(0x7f2240096eb0 0x7f2240099000 unknown :-1 s=NONE pgs=0 cs=0 l=1 rx=0 tx=0).connect
2020-02-21T12:22:55.176 INFO:teuthology.orchestra.run.smithi019.stderr:2020-02-21 12:22:55.163 7f2246ca7700  1 --2-  >> v2:172.21.15.19:3300/0 conn(0x7f22400d5190 0x7f22401c55a0 unknown :-1 s=NONE pgs=0 cs=0 l=1 rx=0 tx=0).connect
2020-02-21T12:22:55.176 INFO:teuthology.orchestra.run.smithi019.stderr:2020-02-21 12:22:55.163 7f2246ca7700  1 --2-  >> [v2:172.21.15.195:3301/0,v1:172.21.15.195:6790/0] conn(0x7f22401c5af0 0x7f22400d5ff0 unknown :-1 s=NONE pgs=0 cs=0 l=1 rx=0 tx=0).connect
2020-02-21T12:22:55.176 INFO:teuthology.orchestra.run.smithi019.stderr:2020-02-21 12:22:55.163 7f2246ca7700  1 --  --> v2:172.21.15.19:3300/0 -- mon_getmap magic: 0 v1 -- 0x7f2240088270 con 0x7f22400d5190
2020-02-21T12:22:55.177 INFO:teuthology.orchestra.run.smithi019.stderr:2020-02-21 12:22:55.163 7f2246ca7700  1 --  --> v2:172.21.15.195:3300/0 -- mon_getmap magic: 0 v1 -- 0x7f22400d5cf0 con 0x7f2240096eb0
2020-02-21T12:22:55.177 INFO:teuthology.orchestra.run.smithi019.stderr:2020-02-21 12:22:55.163 7f2246ca7700  1 --  --> [v2:172.21.15.195:3301/0,v1:172.21.15.195:6790/0] -- mon_getmap magic: 0 v1 -- 0x7f2240097300 con 0x7f22401c5af0
2020-02-21T12:22:55.177 INFO:teuthology.orchestra.run.smithi019.stderr:2020-02-21 12:22:55.163 7f223ffff700  1 --2-  >> v2:172.21.15.19:3300/0 conn(0x7f22400d5190 0x7f22401c55a0 unknown :-1 s=BANNER_CONNECTING pgs=0 cs=0 l=1 rx=0 tx=0)._handle_peer_banner_payload supported=0 required=0
2020-02-21T12:22:55.177 INFO:teuthology.orchestra.run.smithi019.stderr:2020-02-21 12:22:55.163 7f223ffff700  1 --2-  >> v2:172.21.15.19:3300/0 conn(0x7f22400d5190 0x7f22401c55a0 unknown :-1 s=HELLO_CONNECTING pgs=0 cs=0 l=1 rx=0 tx=0).handle_hello peer v2:172.21.15.19:3300/0 says I am v2:172.21.15.19:57168/0 (socket says 172.21.15.19:57168)
2020-02-21T12:22:55.177 INFO:teuthology.orchestra.run.smithi019.stderr:2020-02-21 12:22:55.163 7f224526e700  1 --2-  >> [v2:172.21.15.195:3301/0,v1:172.21.15.195:6790/0] conn(0x7f22401c5af0 0x7f22400d5ff0 unknown :-1 s=BANNER_CONNECTING pgs=0 cs=0 l=1 rx=0 tx=0)._handle_peer_banner_payload supported=0 required=0
2020-02-21T12:22:55.177 INFO:teuthology.orchestra.run.smithi019.stderr:2020-02-21 12:22:55.163 7f223ffff700  1 -- 172.21.15.19:0/4189896993 learned_addr learned my addr 172.21.15.19:0/4189896993 (peer_addr_for_me v2:172.21.15.19:0/0)
2020-02-21T12:22:55.177 INFO:teuthology.orchestra.run.smithi019.stderr:2020-02-21 12:22:55.163 7f2244a6d700  1 --2- 172.21.15.19:0/4189896993 >> v2:172.21.15.195:3300/0 conn(0x7f2240096eb0 0x7f2240099000 unknown :-1 s=BANNER_CONNECTING pgs=0 cs=0 l=1 rx=0 tx=0)._handle_peer_banner_payload supported=0 required=0
2020-02-21T12:22:55.178 INFO:teuthology.orchestra.run.smithi019.stderr:2020-02-21 12:22:55.163 7f224526e700  1 -- 172.21.15.19:0/4189896993 >> v2:172.21.15.195:3300/0 conn(0x7f2240096eb0 msgr2=0x7f2240099000 unknown :-1 s=STATE_CONNECTION_ESTABLISHED l=1).mark_down
2020-02-21T12:22:55.178 INFO:teuthology.orchestra.run.smithi019.stderr:2020-02-21 12:22:55.163 7f224526e700  1 --2- 172.21.15.19:0/4189896993 >> v2:172.21.15.195:3300/0 conn(0x7f2240096eb0 0x7f2240099000 unknown :-1 s=AUTH_CONNECTING pgs=0 cs=0 l=1 rx=0 tx=0).stop
2020-02-21T12:22:55.178 INFO:teuthology.orchestra.run.smithi019.stderr:2020-02-21 12:22:55.163 7f224526e700  1 -- 172.21.15.19:0/4189896993 >> v2:172.21.15.19:3300/0 conn(0x7f22400d5190 msgr2=0x7f22401c55a0 unknown :-1 s=STATE_CONNECTION_ESTABLISHED l=1).mark_down
2020-02-21T12:22:55.178 INFO:teuthology.orchestra.run.smithi019.stderr:2020-02-21 12:22:55.163 7f224526e700  1 --2- 172.21.15.19:0/4189896993 >> v2:172.21.15.19:3300/0 conn(0x7f22400d5190 0x7f22401c55a0 unknown :-1 s=AUTH_CONNECTING pgs=0 cs=0 l=1 rx=0 tx=0).stop
2020-02-21T12:22:55.178 INFO:teuthology.orchestra.run.smithi019.stderr:2020-02-21 12:22:55.163 7f224526e700  1 -- 172.21.15.19:0/4189896993 --> [v2:172.21.15.195:3301/0,v1:172.21.15.195:6790/0] -- mon_subscribe({config=0+,monmap=0+}) v3 -- 0x7f2240097e30 con 0x7f22401c5af0
2020-02-21T12:22:55.178 INFO:teuthology.orchestra.run.smithi019.stderr:2020-02-21 12:22:55.163 7f223ffff700  1 --2- 172.21.15.19:0/4189896993 >> v2:172.21.15.19:3300/0 conn(0x7f22400d5190 0x7f22401c55a0 unknown :-1 s=CLOSED pgs=0 cs=0 l=1 rx=0 tx=0).handle_auth_done state changed!
2020-02-21T12:22:55.178 INFO:teuthology.orchestra.run.smithi019.stderr:2020-02-21 12:22:55.163 7f2244a6d700  1 --2- 172.21.15.19:0/4189896993 >> v2:172.21.15.195:3300/0 conn(0x7f2240096eb0 0x7f2240099000 unknown :-1 s=CLOSED pgs=0 cs=0 l=1 rx=0 tx=0).handle_auth_done state changed!
2020-02-21T12:22:55.179 INFO:teuthology.orchestra.run.smithi019.stderr:2020-02-21 12:22:55.163 7f224526e700  1 --2- 172.21.15.19:0/4189896993 >> [v2:172.21.15.195:3301/0,v1:172.21.15.195:6790/0] conn(0x7f22401c5af0 0x7f22400d5ff0 secure :-1 s=AUTH_CONNECTING_SIGN pgs=0 cs=0 l=1 rx=0x7f22400af2e0 tx=0x7f22400d5ba0).handle_read_frame_epilogue_main read frame epilogue bytes=32
2020-02-21T12:22:55.179 INFO:teuthology.orchestra.run.smithi019.stderr:2020-02-21 12:22:55.163 7f224526e700  1 --2- 172.21.15.19:0/4189896993 >> [v2:172.21.15.195:3301/0,v1:172.21.15.195:6790/0] conn(0x7f22401c5af0 0x7f22400d5ff0 secure :-1 s=SESSION_CONNECTING pgs=0 cs=0 l=1 rx=0x7f22400af2e0 tx=0x7f22400d5ba0).handle_read_frame_epilogue_main read frame epilogue bytes=32
2020-02-21T12:22:55.179 INFO:teuthology.orchestra.run.smithi019.stderr:2020-02-21 12:22:55.163 7f224526e700  1 --2- 172.21.15.19:0/4189896993 >> [v2:172.21.15.195:3301/0,v1:172.21.15.195:6790/0] conn(0x7f22401c5af0 0x7f22400d5ff0 secure :-1 s=READY pgs=359 cs=0 l=1 rx=0x7f22400af2e0 tx=0x7f22400d5ba0).ready entity=mon.2 client_cookie=0 server_cookie=0 in_seq=0 out_seq=0
2020-02-21T12:22:55.179 INFO:teuthology.orchestra.run.smithi019.stderr:2020-02-21 12:22:55.163 7f224526e700  1 --2- 172.21.15.19:0/4189896993 >> [v2:172.21.15.195:3301/0,v1:172.21.15.195:6790/0] conn(0x7f22401c5af0 0x7f22400d5ff0 secure :-1 s=THROTTLE_DONE pgs=359 cs=0 l=1 rx=0x7f22400af2e0 tx=0x7f22400d5ba0).handle_read_frame_epilogue_main read frame epilogue bytes=32
2020-02-21T12:22:55.180 INFO:teuthology.orchestra.run.smithi019.stderr:2020-02-21 12:22:55.163 7f223d7fa700  1 -- 172.21.15.19:0/4189896993 <== mon.2 v2:172.21.15.195:3301/0 1 ==== mon_map magic: 0 v1 ==== 377+0+0 (secure 0 0 0) 0x7f2240097e30 con 0x7f22401c5af0
2020-02-21T12:22:55.180 INFO:teuthology.orchestra.run.smithi019.stderr:2020-02-21 12:22:55.163 7f224526e700  1 --2- 172.21.15.19:0/4189896993 >> [v2:172.21.15.195:3301/0,v1:172.21.15.195:6790/0] conn(0x7f22401c5af0 0x7f22400d5ff0 secure :-1 s=THROTTLE_DONE pgs=359 cs=0 l=1 rx=0x7f22400af2e0 tx=0x7f22400d5ba0).handle_read_frame_epilogue_main read frame epilogue bytes=32
2020-02-21T12:22:55.180 INFO:teuthology.orchestra.run.smithi019.stderr:2020-02-21 12:22:55.163 7f223d7fa700  1 -- 172.21.15.19:0/4189896993 <== mon.2 v2:172.21.15.195:3301/0 2 ==== config(0 keys) v1 ==== 4+0+0 (secure 0 0 0) 0x7f2238010de0 con 0x7f22401c5af0
2020-02-21T12:22:55.180 INFO:teuthology.orchestra.run.smithi019.stderr:2020-02-21 12:22:55.163 7f224526e700  1 --2- 172.21.15.19:0/4189896993 >> [v2:172.21.15.195:3301/0,v1:172.21.15.195:6790/0] conn(0x7f22401c5af0 0x7f22400d5ff0 secure :-1 s=THROTTLE_DONE pgs=359 cs=0 l=1 rx=0x7f22400af2e0 tx=0x7f22400d5ba0).handle_read_frame_epilogue_main read frame epilogue bytes=32
2020-02-21T12:22:55.180 INFO:teuthology.orchestra.run.smithi019.stderr:2020-02-21 12:22:55.163 7f2246ca7700  1 -- 172.21.15.19:0/4189896993 --> [v2:172.21.15.195:3301/0,v1:172.21.15.195:6790/0] -- mon_subscribe({mgrmap=0+}) v3 -- 0x7f22400d5920 con 0x7f22401c5af0
2020-02-21T12:22:55.181 INFO:teuthology.orchestra.run.smithi019.stderr:2020-02-21 12:22:55.163 7f223d7fa700  1 -- 172.21.15.19:0/4189896993 <== mon.2 v2:172.21.15.195:3301/0 3 ==== mon_map magic: 0 v1 ==== 377+0+0 (secure 0 0 0) 0x7f2238005ab0 con 0x7f22401c5af0
2020-02-21T12:22:55.181 INFO:teuthology.orchestra.run.smithi019.stderr:2020-02-21 12:22:55.163 7f2246ca7700  1 -- 172.21.15.19:0/4189896993 --> [v2:172.21.15.195:3301/0,v1:172.21.15.195:6790/0] -- mon_subscribe({osdmap=0}) v3 -- 0x7f22400bb5e0 con 0x7f22401c5af0
2020-02-21T12:22:55.181 INFO:teuthology.orchestra.run.smithi019.stderr:2020-02-21 12:22:55.167 7f224526e700  1 --2- 172.21.15.19:0/4189896993 >> [v2:172.21.15.195:3301/0,v1:172.21.15.195:6790/0] conn(0x7f22401c5af0 0x7f22400d5ff0 secure :-1 s=THROTTLE_DONE pgs=359 cs=0 l=1 rx=0x7f22400af2e0 tx=0x7f22400d5ba0).handle_read_frame_epilogue_main read frame epilogue bytes=32
2020-02-21T12:22:55.181 INFO:teuthology.orchestra.run.smithi019.stderr:2020-02-21 12:22:55.167 7f223d7fa700  1 -- 172.21.15.19:0/4189896993 <== mon.2 v2:172.21.15.195:3301/0 4 ==== mgrmap(e 3) v1 ==== 32636+0+0 (secure 0 0 0) 0x7f223800fc10 con 0x7f22401c5af0
2020-02-21T12:22:55.181 INFO:teuthology.orchestra.run.smithi019.stderr:2020-02-21 12:22:55.167 7f224526e700  1 --2- 172.21.15.19:0/4189896993 >> [v2:172.21.15.195:3301/0,v1:172.21.15.195:6790/0] conn(0x7f22401c5af0 0x7f22400d5ff0 secure :-1 s=THROTTLE_DONE pgs=359 cs=0 l=1 rx=0x7f22400af2e0 tx=0x7f22400d5ba0).handle_read_frame_epilogue_main read frame epilogue bytes=32
2020-02-21T12:22:55.181 INFO:teuthology.orchestra.run.smithi019.stderr:2020-02-21 12:22:55.167 7f223d7fa700  1 --2- 172.21.15.19:0/4189896993 >> [v2:172.21.15.195:6800/12024,v1:172.21.15.195:6801/12024] conn(0x7f221c027d30 0x7f221c02a280 unknown :-1 s=NONE pgs=0 cs=0 l=1 rx=0 tx=0).connect
2020-02-21T12:22:55.181 INFO:teuthology.orchestra.run.smithi019.stderr:2020-02-21 12:22:55.167 7f223d7fa700  1 -- 172.21.15.19:0/4189896993 <== mon.2 v2:172.21.15.195:3301/0 5 ==== osd_map(165..165 src has 1..165) v4 ==== 5924+0+0 (secure 0 0 0) 0x7f22380354e0 con 0x7f22401c5af0
2020-02-21T12:22:55.182 INFO:teuthology.orchestra.run.smithi019.stderr:2020-02-21 12:22:55.167 7f2244a6d700  1 --2- 172.21.15.19:0/4189896993 >> [v2:172.21.15.195:6800/12024,v1:172.21.15.195:6801/12024] conn(0x7f221c027d30 0x7f221c02a280 unknown :-1 s=BANNER_CONNECTING pgs=0 cs=0 l=1 rx=0 tx=0)._handle_peer_banner_payload supported=0 required=0
2020-02-21T12:22:55.182 INFO:teuthology.orchestra.run.smithi019.stderr:2020-02-21 12:22:55.167 7f2244a6d700  1 --2- 172.21.15.19:0/4189896993 >> [v2:172.21.15.195:6800/12024,v1:172.21.15.195:6801/12024] conn(0x7f221c027d30 0x7f221c02a280 crc :-1 s=READY pgs=915 cs=0 l=1 rx=0 tx=0).ready entity=mgr.4100 client_cookie=0 server_cookie=0 in_seq=0 out_seq=0
2020-02-21T12:22:55.182 INFO:teuthology.orchestra.run.smithi019.stderr:2020-02-21 12:22:55.167 7f2246ca7700  1 -- 172.21.15.19:0/4189896993 --> [v2:172.21.15.195:3301/0,v1:172.21.15.195:6790/0] -- mon_command({"prefix": "get_command_descriptions"} v 0) v1 -- 0x7f22400e2080 con 0x7f22401c5af0
2020-02-21T12:22:55.183 INFO:teuthology.orchestra.run.smithi019.stderr:2020-02-21 12:22:55.179 7f224526e700  1 --2- 172.21.15.19:0/4189896993 >> [v2:172.21.15.195:3301/0,v1:172.21.15.195:6790/0] conn(0x7f22401c5af0 0x7f22400d5ff0 secure :-1 s=THROTTLE_DONE pgs=359 cs=0 l=1 rx=0x7f22400af2e0 tx=0x7f22400d5ba0).handle_read_frame_epilogue_main read frame epilogue bytes=32
2020-02-21T12:22:55.184 INFO:teuthology.orchestra.run.smithi019.stderr:2020-02-21 12:22:55.179 7f223d7fa700  1 -- 172.21.15.19:0/4189896993 <== mon.2 v2:172.21.15.195:3301/0 6 ==== mon_command_ack([{"prefix": "get_command_descriptions"}]=0  v0) v1 ==== 72+0+108270 (secure 0 0 0) 0x7f22380109e0 con 0x7f22401c5af0
2020-02-21T12:22:55.354 INFO:teuthology.orchestra.run.smithi019.stderr:2020-02-21 12:22:55.347 7f2246ca7700  1 -- 172.21.15.19:0/4189896993 --> [v2:172.21.15.195:6800/12024,v1:172.21.15.195:6801/12024] -- command(tid 0: {"prefix": "fs volume ls", "target": ["mgr", ""]}) v1 -- 0x7f22400d9320 con 0x7f221c027d30
2020-02-21T12:23:05.169 INFO:teuthology.orchestra.run.smithi019.stderr:2020-02-21 12:23:05.163 7f224526e700  1 --2- 172.21.15.19:0/4189896993 >> [v2:172.21.15.195:3301/0,v1:172.21.15.195:6790/0] conn(0x7f22401c5af0 0x7f22400d5ff0 secure :-1 s=READY pgs=359 cs=0 l=1 rx=0x7f22400af2e0 tx=0x7f22400d5ba0).handle_read_frame_epilogue_main read frame epilogue bytes=32
2020-02-21T12:23:05.897 INFO:teuthology.orchestra.run.smithi019:> sudo logrotate /etc/logrotate.d/ceph-test.conf
2020-02-21T12:23:05.900 INFO:teuthology.orchestra.run.smithi195:> sudo logrotate /etc/logrotate.d/ceph-test.conf
2020-02-21T12:23:15.169 INFO:teuthology.orchestra.run.smithi019.stderr:2020-02-21 12:23:15.163 7f224526e700  1 --2- 172.21.15.19:0/4189896993 >> [v2:172.21.15.195:3301/0,v1:172.21.15.195:6790/0] conn(0x7f22401c5af0 0x7f22400d5ff0 secure :-1 s=READY pgs=359 cs=0 l=1 rx=0x7f22400af2e0 tx=0x7f22400d5ba0).handle_read_frame_epilogue_main read frame epilogue bytes=32
2020-02-21T12:23:25.169 INFO:teuthology.orchestra.run.smithi019.stderr:2020-02-21 12:23:25.163 7f224526e700  1 --2- 172.21.15.19:0/4189896993 >> [v2:172.21.15.195:3301/0,v1:172.21.15.195:6790/0] conn(0x7f22401c5af0 0x7f22400d5ff0 secure :-1 s=READY pgs=359 cs=0 l=1 rx=0x7f22400af2e0 tx=0x7f22400d5ba0).handle_read_frame_epilogue_main read frame epilogue bytes=32
2020-02-21T12:23:35.169 INFO:teuthology.orchestra.run.smithi019.stderr:2020-02-21 12:23:35.167 7f224526e700  1 --2- 172.21.15.19:0/4189896993 >> [v2:172.21.15.195:3301/0,v1:172.21.15.195:6790/0] conn(0x7f22401c5af0 0x7f22400d5ff0 secure :-1 s=READY pgs=359 cs=0 l=1 rx=0x7f22400af2e0 tx=0x7f22400d5ba0).handle_read_frame_epilogue_main read frame epilogue bytes=32
2020-02-21T12:23:36.009 INFO:teuthology.orchestra.run.smithi019:> sudo logrotate /etc/logrotate.d/ceph-test.conf
2020-02-21T12:23:36.011 INFO:teuthology.orchestra.run.smithi195:> sudo logrotate /etc/logrotate.d/ceph-test.conf
2020-02-21T12:23:45.170 INFO:teuthology.orchestra.run.smithi019.stderr:2020-02-21 12:23:45.167 7f224526e700  1 --2- 172.21.15.19:0/4189896993 >> [v2:172.21.15.195:3301/0,v1:172.21.15.195:6790/0] conn(0x7f22401c5af0 0x7f22400d5ff0 secure :-1 s=READY pgs=359 cs=0 l=1 rx=0x7f22400af2e0 tx=0x7f22400d5ba0).handle_read_frame_epilogue_main read frame epilogue bytes=32
2020-02-21T12:23:55.170 INFO:teuthology.orchestra.run.smithi019.stderr:2020-02-21 12:23:55.163 7f224526e700  1 --2- 172.21.15.19:0/4189896993 >> [v2:172.21.15.195:3301/0,v1:172.21.15.195:6790/0] conn(0x7f22401c5af0 0x7f22400d5ff0 secure :-1 s=READY pgs=359 cs=0 l=1 rx=0x7f22400af2e0 tx=0x7f22400d5ba0).handle_read_frame_epilogue_main read frame epilogue bytes=32
2020-02-21T12:24:05.170 INFO:teuthology.orchestra.run.smithi019.stderr:2020-02-21 12:24:05.167 7f224526e700  1 --2- 172.21.15.19:0/4189896993 >> [v2:172.21.15.195:3301/0,v1:172.21.15.195:6790/0] conn(0x7f22401c5af0 0x7f22400d5ff0 secure :-1 s=READY pgs=359 cs=0 l=1 rx=0x7f22400af2e0 tx=0x7f22400d5ba0).handle_read_frame_epilogue_main read frame epilogue bytes=32
2020-02-21T12:24:06.057 INFO:teuthology.orchestra.run.smithi019:> sudo logrotate /etc/logrotate.d/ceph-test.conf
2020-02-21T12:24:06.060 INFO:teuthology.orchestra.run.smithi195:> sudo logrotate /etc/logrotate.d/ceph-test.conf
2020-02-21T12:24:15.170 INFO:teuthology.orchestra.run.smithi019.stderr:2020-02-21 12:24:15.166 7f224526e700  1 --2- 172.21.15.19:0/4189896993 >> [v2:172.21.15.195:3301/0,v1:172.21.15.195:6790/0] conn(0x7f22401c5af0 0x7f22400d5ff0 secure :-1 s=READY pgs=359 cs=0 l=1 rx=0x7f22400af2e0 tx=0x7f22400d5ba0).handle_read_frame_epilogue_main read frame epilogue bytes=32
2020-02-21T12:24:25.170 INFO:teuthology.orchestra.run.smithi019.stderr:2020-02-21 12:24:25.166 7f224526e700  1 --2- 172.21.15.19:0/4189896993 >> [v2:172.21.15.195:3301/0,v1:172.21.15.195:6790/0] conn(0x7f22401c5af0 0x7f22400d5ff0 secure :-1 s=READY pgs=359 cs=0 l=1 rx=0x7f22400af2e0 tx=0x7f22400d5ba0).handle_read_frame_epilogue_main read frame epilogue bytes=32
2020-02-21T12:24:35.170 INFO:teuthology.orchestra.run.smithi019.stderr:2020-02-21 12:24:35.166 7f224526e700  1 --2- 172.21.15.19:0/4189896993 >> [v2:172.21.15.195:3301/0,v1:172.21.15.195:6790/0] conn(0x7f22401c5af0 0x7f22400d5ff0 secure :-1 s=READY pgs=359 cs=0 l=1 rx=0x7f22400af2e0 tx=0x7f22400d5ba0).handle_read_frame_epilogue_main read frame epilogue bytes=32
2020-02-21T12:24:36.178 INFO:teuthology.orchestra.run.smithi019:> sudo logrotate /etc/logrotate.d/ceph-test.conf
2020-02-21T12:24:36.181 INFO:teuthology.orchestra.run.smithi195:> sudo logrotate /etc/logrotate.d/ceph-test.conf
2020-02-21T12:24:45.173 INFO:teuthology.orchestra.run.smithi019.stderr:2020-02-21 12:24:45.166 7f224526e700  1 --2- 172.21.15.19:0/4189896993 >> [v2:172.21.15.195:3301/0,v1:172.21.15.195:6790/0] conn(0x7f22401c5af0 0x7f22400d5ff0 secure :-1 s=READY pgs=359 cs=0 l=1 rx=0x7f22400af2e0 tx=0x7f22400d5ba0).handle_read_frame_epilogue_main read frame epilogue bytes=32
2020-02-21T12:24:55.079 DEBUG:teuthology.orchestra.run:got remote process result: 124
2020-02-21T12:24:55.116 INFO:tasks.cephfs_test_runner:test_subvolume_create_with_auto_cleanup_on_fail (tasks.cephfs.test_volumes.TestVolumes) ... ERROR

and from the mgr:

2020-02-21 12:22:55.167 7fe4acca8700  1 --2- [v2:172.21.15.195:6800/12024,v1:172.21.15.195:6801/12024] >>  conn(0x559fb112d600 0x559fafe77180 unknown :-1 s=NONE pgs=0 cs=0 l=0 rx=0 tx=0).accept
2020-02-21 12:22:55.167 7fe4ac4a7700  1 --2- [v2:172.21.15.195:6800/12024,v1:172.21.15.195:6801/12024] >>  conn(0x559fb112d600 0x559fafe77180 unknown :-1 s=BANNER_ACCEPTING pgs=0 cs=0 l=0 rx=0 tx=0)._handle_peer_banner_payload supported=0 required=0
2020-02-21 12:22:55.167 7fe4ac4a7700 20 AuthRegistry(0x7ffdaa53e1a8) get_handler peer_type 8 method 2 cluster_methods [2] service_methods [2] client_methods [2]
2020-02-21 12:22:55.167 7fe4ac4a7700 10 cephx: verify_authorizer decrypted service mgr secret_id=2
2020-02-21 12:22:55.167 7fe4ac4a7700 10 cephx: verify_authorizer global_id=7094
2020-02-21 12:22:55.167 7fe4ac4a7700 10 cephx: verify_authorizer ok nonce d007861ba3866998 reply_bl.length()=36
2020-02-21 12:22:55.167 7fe4ac4a7700 10 mgr.server ms_handle_authentication ms_handle_authentication new session 0x559fb10eda80 con 0x559fb112d600 entity client.admin addr
2020-02-21 12:22:55.167 7fe4ac4a7700 10 mgr.server ms_handle_authentication  session 0x559fb10eda80 client.admin has caps allow * 'allow *'
2020-02-21 12:22:55.167 7fe4ac4a7700  1 --2- [v2:172.21.15.195:6800/12024,v1:172.21.15.195:6801/12024] >> 172.21.15.19:0/4189896993 conn(0x559fb112d600 0x559fafe77180 crc :-1 s=READY pgs=4 cs=0 l=1 rx=0 tx=0).ready entity=client.7094 client_cookie=0 server_cookie=0 in_seq=0 out_seq=0
...
2020-02-21 12:20:06.787 7fe4acca8700  1 --2- [v2:172.21.15.195:6800/12024,v1:172.21.15.195:6801/12024] >>  conn(0x559fb112d600 0x559fb10cc100 unknown :-1 s=NONE pgs=0 cs=0 l=0 rx=0 tx=0).accept
2020-02-21 12:20:06.787 7fe4abca6700  1 --2- [v2:172.21.15.195:6800/12024,v1:172.21.15.195:6801/12024] >>  conn(0x559fb112d600 0x559fb10cc100 unknown :-1 s=BANNER_ACCEPTING pgs=0 cs=0 l=0 rx=0 tx=0)._handle_peer_banner_payload supported=0 required=0
2020-02-21 12:20:06.787 7fe4abca6700 10 mgr.server ms_handle_authentication ms_handle_authentication new session 0x559fb74e8a20 con 0x559fb112d600 entity client.admin addr
2020-02-21 12:20:06.787 7fe4abca6700  1 --2- [v2:172.21.15.195:6800/12024,v1:172.21.15.195:6801/12024] >> 172.21.15.19:0/2366727719 conn(0x559fb112d600 0x559fb10cc100 crc :-1 s=READY pgs=4 cs=0 l=1 rx=0 tx=0).ready entity=client.9183 client_cookie=0 server_cookie=0 in_seq=0 out_seq=0
2020-02-21 12:20:06.947 7fe4abca6700  1 -- [v2:172.21.15.195:6800/12024,v1:172.21.15.195:6801/12024] >> 172.21.15.19:0/2366727719 conn(0x559fb112d600 msgr2=0x559fb10cc100 crc :-1 s=STATE_CONNECTION_ESTABLISHED l=1).read_bulk peer close file descriptor 47
2020-02-21 12:20:06.947 7fe4abca6700  1 -- [v2:172.21.15.195:6800/12024,v1:172.21.15.195:6801/12024] >> 172.21.15.19:0/2366727719 conn(0x559fb112d600 msgr2=0x559fb10cc100 crc :-1 s=STATE_CONNECTION_ESTABLISHED l=1).read_until read failed
2020-02-21 12:20:06.947 7fe4abca6700  1 --2- [v2:172.21.15.195:6800/12024,v1:172.21.15.195:6801/12024] >> 172.21.15.19:0/2366727719 conn(0x559fb112d600 0x559fb10cc100 crc :-1 s=READY pgs=4 cs=0 l=1 rx=0 tx=0).handle_read_frame_preamble_main read frame length and tag failed r=-1 ((1) Operation not permitted)
2020-02-21 12:20:06.947 7fe4abca6700  1 --2- [v2:172.21.15.195:6800/12024,v1:172.21.15.195:6801/12024] >> 172.21.15.19:0/2366727719 conn(0x559fb112d600 0x559fb10cc100 crc :-1 s=READY pgs=4 cs=0 l=1 rx=0 tx=0).stop
2020-02-21 12:21:50.869 7fe4acca8700  1 --2- [v2:172.21.15.195:6800/12024,v1:172.21.15.195:6801/12024] >>  conn(0x559fb112d600 0x559fafeaec00 unknown :-1 s=NONE pgs=0 cs=0 l=0 rx=0 tx=0).accept
2020-02-21 12:21:50.869 7fe4ac4a7700  1 --2- [v2:172.21.15.195:6800/12024,v1:172.21.15.195:6801/12024] >>  conn(0x559fb112d600 0x559fafeaec00 unknown :-1 s=BANNER_ACCEPTING pgs=0 cs=0 l=0 rx=0 tx=0)._handle_peer_banner_payload supported=0 required=0
2020-02-21 12:21:50.869 7fe4ac4a7700 10 mgr.server ms_handle_authentication ms_handle_authentication new session 0x559fae7ae000 con 0x559fb112d600 entity client.admin addr
2020-02-21 12:21:50.869 7fe4ac4a7700  1 --2- [v2:172.21.15.195:6800/12024,v1:172.21.15.195:6801/12024] >> 172.21.15.19:0/258936092 conn(0x559fb112d600 0x559fafeaec00 crc :-1 s=READY pgs=4 cs=0 l=1 rx=0 tx=0).ready entity=client.9744 client_cookie=0 server_cookie=0 in_seq=0 out_seq=0
2020-02-21 12:21:51.057 7fe4ac4a7700  1 -- [v2:172.21.15.195:6800/12024,v1:172.21.15.195:6801/12024] >> 172.21.15.19:0/258936092 conn(0x559fb112d600 msgr2=0x559fafeaec00 crc :-1 s=STATE_CONNECTION_ESTABLISHED l=1).read_bulk peer close file descriptor 48
2020-02-21 12:21:51.057 7fe4ac4a7700  1 -- [v2:172.21.15.195:6800/12024,v1:172.21.15.195:6801/12024] >> 172.21.15.19:0/258936092 conn(0x559fb112d600 msgr2=0x559fafeaec00 crc :-1 s=STATE_CONNECTION_ESTABLISHED l=1).read_until read failed
2020-02-21 12:21:51.057 7fe4ac4a7700  1 --2- [v2:172.21.15.195:6800/12024,v1:172.21.15.195:6801/12024] >> 172.21.15.19:0/258936092 conn(0x559fb112d600 0x559fafeaec00 crc :-1 s=READY pgs=4 cs=0 l=1 rx=0 tx=0).handle_read_frame_preamble_main read frame length and tag failed r=-1 ((1) Operation not permitted)
2020-02-21 12:21:51.057 7fe4ac4a7700  1 --2- [v2:172.21.15.195:6800/12024,v1:172.21.15.195:6801/12024] >> 172.21.15.19:0/258936092 conn(0x559fb112d600 0x559fafeaec00 crc :-1 s=READY pgs=4 cs=0 l=1 rx=0 tx=0).stop
2020-02-21 12:22:55.167 7fe4acca8700  1 --2- [v2:172.21.15.195:6800/12024,v1:172.21.15.195:6801/12024] >>  conn(0x559fb112d600 0x559fafe77180 unknown :-1 s=NONE pgs=0 cs=0 l=0 rx=0 tx=0).accept
2020-02-21 12:22:55.167 7fe4ac4a7700  1 --2- [v2:172.21.15.195:6800/12024,v1:172.21.15.195:6801/12024] >>  conn(0x559fb112d600 0x559fafe77180 unknown :-1 s=BANNER_ACCEPTING pgs=0 cs=0 l=0 rx=0 tx=0)._handle_peer_banner_payload supported=0 required=0
2020-02-21 12:22:55.167 7fe4ac4a7700 10 mgr.server ms_handle_authentication ms_handle_authentication new session 0x559fb10eda80 con 0x559fb112d600 entity client.admin addr
2020-02-21 12:22:55.167 7fe4ac4a7700  1 --2- [v2:172.21.15.195:6800/12024,v1:172.21.15.195:6801/12024] >> 172.21.15.19:0/4189896993 conn(0x559fb112d600 0x559fafe77180 crc :-1 s=READY pgs=4 cs=0 l=1 rx=0 tx=0).ready entity=client.7094 client_cookie=0 server_cookie=0 in_seq=0 out_seq=0

From: /ceph/teuthology-archive/yuriw-2020-02-18_16:08:51-fs-nautilus-distro-basic-smithi/4777932


Related issues

Related to mgr - Backport #43046: nautilus: mgr: "mds metadata" to setup new DaemonState races with fsmap Resolved

History

#2 Updated by Patrick Donnelly about 2 months ago

Test run with --num 5 to see if we can get more information:

http://pulpito.ceph.com/pdonnell-2020-02-22_17:17:16-fs-nautilus-distro-basic-smithi/

#3 Updated by Sage Weil about 2 months ago

with debug_ms=20 we see, on the mgr,

2020-02-22 14:44:50.810 7fcf8af65700 20 -- [v2:172.21.15.44:6800/32851,v1:172.21.15.44:6801/32851] >> 172.21.15.49:0/592772018 conn(0x55eea8081c00 msgr2=0x55eea7be1080 crc :-1 s=STATE_CONNECTION_ESTABLISHED l=1).process
2020-02-22 14:44:50.810 7fcf8af65700 20 -- [v2:172.21.15.44:6800/32851,v1:172.21.15.44:6801/32851] >> 172.21.15.49:0/592772018 conn(0x55eea8081c00 msgr2=0x55eea7be1080 crc :-1 s=STATE_CONNECTION_ESTABLISHED l=1).read continue len=32
2020-02-22 14:44:50.810 7fcf8af65700 20 --2- [v2:172.21.15.44:6800/32851,v1:172.21.15.44:6801/32851] >> 172.21.15.49:0/592772018 conn(0x55eea8081c00 0x55eea7be1080 crc :-1 s=READY pgs=4 cs=0 l=1 rx=0 tx=0).handle_read_frame_preamble_main r=0
2020-02-22 14:44:50.810 7fcf8af65700 10 --2- [v2:172.21.15.44:6800/32851,v1:172.21.15.44:6801/32851] >> 172.21.15.49:0/592772018 conn(0x55eea8081c00 0x55eea7be1080 crc :-1 s=READY pgs=4 cs=0 l=1 rx=0 tx=0).handle_read_frame_preamble_main got new segment: len=41 align=8
2020-02-22 14:44:50.810 7fcf8af65700 10 --2- [v2:172.21.15.44:6800/32851,v1:172.21.15.44:6801/32851] >> 172.21.15.49:0/592772018 conn(0x55eea8081c00 0x55eea7be1080 crc :-1 s=READY pgs=4 cs=0 l=1 rx=0 tx=0).handle_read_frame_preamble_main got new segment: len=73 align=8
2020-02-22 14:44:50.810 7fcf8af65700 20 --2- [v2:172.21.15.44:6800/32851,v1:172.21.15.44:6801/32851] >> 172.21.15.49:0/592772018 conn(0x55eea8081c00 0x55eea7be1080 crc :-1 s=THROTTLE_MESSAGE pgs=4 cs=0 l=1 rx=0 tx=0).throttle_message
2020-02-22 14:44:50.810 7fcf8af65700 10 --2- [v2:172.21.15.44:6800/32851,v1:172.21.15.44:6801/32851] >> 172.21.15.49:0/592772018 conn(0x55eea8081c00 0x55eea7be1080 crc :-1 s=THROTTLE_MESSAGE pgs=4 cs=0 l=1 rx=0 tx=0).throttle_message wants 1 message from policy throttler 128/128
2020-02-22 14:44:50.810 7fcf8af65700 10 --2- [v2:172.21.15.44:6800/32851,v1:172.21.15.44:6801/32851] >> 172.21.15.49:0/592772018 conn(0x55eea8081c00 0x55eea7be1080 crc :-1 s=THROTTLE_MESSAGE pgs=4 cs=0 l=1 rx=0 tx=0).throttle_message wants 1 message from policy throttle 128/128 failed, just wait.
2020-02-22 14:44:50.812 7fcf8af65700 20 -- [v2:172.21.15.44:6800/32851,v1:172.21.15.44:6801/32851] >> 172.21.15.49:0/592772018 conn(0x55eea8081c00 msgr2=0x55eea7be1080 crc :-1 s=STATE_CONNECTION_ESTABLISHED l=1).process
2020-02-22 14:44:50.812 7fcf8af65700 20 --2- [v2:172.21.15.44:6800/32851,v1:172.21.15.44:6801/32851] >> 172.21.15.49:0/592772018 conn(0x55eea8081c00 0x55eea7be1080 crc :-1 s=THROTTLE_MESSAGE pgs=4 cs=0 l=1 rx=0 tx=0).read_event
2020-02-22 14:44:50.812 7fcf8af65700 20 --2- [v2:172.21.15.44:6800/32851,v1:172.21.15.44:6801/32851] >> 172.21.15.49:0/592772018 conn(0x55eea8081c00 0x55eea7be1080 crc :-1 s=THROTTLE_MESSAGE pgs=4 cs=0 l=1 rx=0 tx=0).throttle_message
2020-02-22 14:44:50.812 7fcf8af65700 10 --2- [v2:172.21.15.44:6800/32851,v1:172.21.15.44:6801/32851] >> 172.21.15.49:0/592772018 conn(0x55eea8081c00 0x55eea7be1080 crc :-1 s=THROTTLE_MESSAGE pgs=4 cs=0 l=1 rx=0 tx=0).throttle_message wants 1 message from policy throttler 128/128
2020-02-22 14:44:50.812 7fcf8af65700 10 --2- [v2:172.21.15.44:6800/32851,v1:172.21.15.44:6801/32851] >> 172.21.15.49:0/592772018 conn(0x55eea8081c00 0x55eea7be1080 crc :-1 s=THROTTLE_MESSAGE pgs=4 cs=0 l=1 rx=0 tx=0).throttle_message wants 1 message from policy throttle 128/128 failed, just wait.
2020-02-22 14:44:50.814 7fcf8af65700 20 -- [v2:172.21.15.44:6800/32851,v1:172.21.15.44:6801/32851] >> 172.21.15.49:0/592772018 conn(0x55eea8081c00 msgr2=0x55eea7be1080 crc :-1 s=STATE_CONNECTION_ESTABLISHED l=1).process
2020-02-22 14:44:50.814 7fcf8af65700 20 --2- [v2:172.21.15.44:6800/32851,v1:172.21.15.44:6801/32851] >> 172.21.15.49:0/592772018 conn(0x55eea8081c00 0x55eea7be1080 crc :-1 s=THROTTLE_MESSAGE pgs=4 cs=0 l=1 rx=0 tx=0).read_event
2020-02-22 14:44:50.814 7fcf8af65700 20 --2- [v2:172.21.15.44:6800/32851,v1:172.21.15.44:6801/32851] >> 172.21.15.49:0/592772018 conn(0x55eea8081c00 0x55eea7be1080 crc :-1 s=THROTTLE_MESSAGE pgs=4 cs=0 l=1 rx=0 tx=0).throttle_message
2020-02-22 14:44:50.814 7fcf8af65700 10 --2- [v2:172.21.15.44:6800/32851,v1:172.21.15.44:6801/32851] >> 172.21.15.49:0/592772018 conn(0x55eea8081c00 0x55eea7be1080 crc :-1 s=THROTTLE_MESSAGE pgs=4 cs=0 l=1 rx=0 tx=0).throttle_message wants 1 message from policy throttler 128/128
2020-02-22 14:44:50.814 7fcf8af65700 10 --2- [v2:172.21.15.44:6800/32851,v1:172.21.15.44:6801/32851] >> 172.21.15.49:0/592772018 conn(0x55eea8081c00 0x55eea7be1080 crc :-1 s=THROTTLE_MESSAGE pgs=4 cs=0 l=1 rx=0 tx=0).throttle_message wants 1 message from policy throttle 128/128 failed, just wait.
2020-02-22 14:44:50.816 7fcf8af65700 20 -- [v2:172.21.15.44:6800/32851,v1:172.21.15.44:6801/32851] >> 172.21.15.49:0/592772018 conn(0x55eea8081c00 msgr2=0x55eea7be1080 crc :-1 s=STATE_CONNECTION_ESTABLISHED l=1).process
2020-02-22 14:44:50.816 7fcf8af65700 20 --2- [v2:172.21.15.44:6800/32851,v1:172.21.15.44:6801/32851] >> 172.21.15.49:0/592772018 conn(0x55eea8081c00 0x55eea7be1080 crc :-1 s=THROTTLE_MESSAGE pgs=4 cs=0 l=1 rx=0 tx=0).read_event
2020-02-22 14:44:50.816 7fcf8af65700 20 --2- [v2:172.21.15.44:6800/32851,v1:172.21.15.44:6801/32851] >> 172.21.15.49:0/592772018 conn(0x55eea8081c00 0x55eea7be1080 crc :-1 s=THROTTLE_MESSAGE pgs=4 cs=0 l=1 rx=0 tx=0).throttle_message
2020-02-22 14:44:50.816 7fcf8af65700 10 --2- [v2:172.21.15.44:6800/32851,v1:172.21.15.44:6801/32851] >> 172.21.15.49:0/592772018 conn(0x55eea8081c00 0x55eea7be1080 crc :-1 s=THROTTLE_MESSAGE pgs=4 cs=0 l=1 rx=0 tx=0).throttle_message wants 1 message from policy throttler 128/128
2020-02-22 14:44:50.816 7fcf8af65700 10 --2- [v2:172.21.15.44:6800/32851,v1:172.21.15.44:6801/32851] >> 172.21.15.49:0/592772018 conn(0x55eea8081c00 0x55eea7be1080 crc :-1 s=THROTTLE_MESSAGE pgs=4 cs=0 l=1 rx=0 tx=0).throttle_message wants 1 message from policy throttle 128/128 failed, just wait.
...

/a/sage-2020-02-22_13:50:50-fs:basic_functional-nautilus-distro-basic-smithi/4791175

#4 Updated by Sage Weil about 2 months ago

lots of connections are busy looping, also waiting on the same throttle

2020-02-22 14:44:50.504 7fcf8a764700 10 --2- [v2:172.21.15.44:6800/32851,v1:172.21.15.44:6801/32851] >> [v2:172.21.15.44:6818/33792,v1:172.21.15.44:6819/33792] conn(0x55eea7152400 0x55eea714eb00 crc :-1 s=THROTTLE_MESSAGE pgs=6 cs=0 l=1 rx=0 tx=0).throttle_message wants 1 message from policy throttler 128/128
2020-02-22 14:44:50.504 7fcf89f63700 10 --2- [v2:172.21.15.44:6800/32851,v1:172.21.15.44:6801/32851] >> [v2:172.21.15.44:6834/2377610774,v1:172.21.15.44:6835/2377610774] conn(0x55eea818bc00 0x55eea7be0580 crc :-1 s=THROTTLE_MESSAGE pgs=4 cs=0 l=1 rx=0 tx=0).throttle_message wants 1 message from policy throttler 128/128
2020-02-22 14:44:50.504 7fcf89f63700 10 --2- [v2:172.21.15.44:6800/32851,v1:172.21.15.44:6801/32851] >> [v2:172.21.15.44:6834/2377610774,v1:172.21.15.44:6835/2377610774] conn(0x55eea818bc00 0x55eea7be0580 crc :-1 s=THROTTLE_MESSAGE pgs=4 cs=0 l=1 rx=0 tx=0).throttle_message wants 1 message from policy throttle 128/128 failed, just wait.
2020-02-22 14:44:50.504 7fcf8a764700 10 --2- [v2:172.21.15.44:6800/32851,v1:172.21.15.44:6801/32851] >> [v2:172.21.15.44:6818/33792,v1:172.21.15.44:6819/33792] conn(0x55eea7152400 0x55eea714eb00 crc :-1 s=THROTTLE_MESSAGE pgs=6 cs=0 l=1 rx=0 tx=0).throttle_message wants 1 message from policy throttle 128/128 failed, just wait.
2020-02-22 14:44:50.504 7fcf89f63700 10 --2- [v2:172.21.15.44:6800/32851,v1:172.21.15.44:6801/32851] >> [v2:172.21.15.49:6832/1766558866,v1:172.21.15.49:6833/1766558866] conn(0x55eea815c800 0x55eea8149080 crc :-1 s=THROTTLE_MESSAGE pgs=9 cs=0 l=1 rx=0 tx=0).throttle_message wants 1 message from policy throttler 128/128
2020-02-22 14:44:50.504 7fcf89f63700 10 --2- [v2:172.21.15.44:6800/32851,v1:172.21.15.44:6801/32851] >> [v2:172.21.15.49:6832/1766558866,v1:172.21.15.49:6833/1766558866] conn(0x55eea815c800 0x55eea8149080 crc :-1 s=THROTTLE_MESSAGE pgs=9 cs=0 l=1 rx=0 tx=0).throttle_message wants 1 message from policy throttle 128/128 failed, just wait.
2020-02-22 14:44:50.504 7fcf8af65700 10 --2- [v2:172.21.15.44:6800/32851,v1:172.21.15.44:6801/32851] >> [v2:172.21.15.49:6832/835642329,v1:172.21.15.49:6833/835642329] conn(0x55eea818b800 0x55eea6f19600 crc :-1 s=THROTTLE_MESSAGE pgs=4 cs=0 l=1 rx=0 tx=0).throttle_message wants 1 message from policy throttler 128/128
2020-02-22 14:44:50.504 7fcf8a764700 10 --2- [v2:172.21.15.44:6800/32851,v1:172.21.15.44:6801/32851] >> [v2:172.21.15.44:6834/1238526435,v1:172.21.15.44:6835/1238526435] conn(0x55eea7de4000 0x55eea807d600 crc :-1 s=THROTTLE_MESSAGE pgs=6 cs=0 l=1 rx=0 tx=0).throttle_message wants 1 message from policy throttler 128/128
2020-02-22 14:44:50.504 7fcf8a764700 10 --2- [v2:172.21.15.44:6800/32851,v1:172.21.15.44:6801/32851] >> [v2:172.21.15.44:6834/1238526435,v1:172.21.15.44:6835/1238526435] conn(0x55eea7de4000 0x55eea807d600 crc :-1 s=THROTTLE_MESSAGE pgs=6 cs=0 l=1 rx=0 tx=0).throttle_message wants 1 message from policy throttle 128/128 failed, just wait.
2020-02-22 14:44:50.504 7fcf8af65700 10 --2- [v2:172.21.15.44:6800/32851,v1:172.21.15.44:6801/32851] >> [v2:172.21.15.49:6832/835642329,v1:172.21.15.49:6833/835642329] conn(0x55eea818b800 0x55eea6f19600 crc :-1 s=THROTTLE_MESSAGE pgs=4 cs=0 l=1 rx=0 tx=0).throttle_message wants 1 message from policy throttle 128/128 failed, just wait.
2020-02-22 14:44:50.504 7fcf89f63700 10 --2- [v2:172.21.15.44:6800/32851,v1:172.21.15.44:6801/32851] >> 172.21.15.44:0/32850 conn(0x55eea63eec00 0x55eea7c0cb00 secure :-1 s=THROTTLE_MESSAGE pgs=2 cs=0 l=1 rx=0x55eea6e0f800 tx=0x55eea6ee7900).throttle_message wants 1 message from policy throttler 128/128
2020-02-22 14:44:50.504 7fcf89f63700 10 --2- [v2:172.21.15.44:6800/32851,v1:172.21.15.44:6801/32851] >> 172.21.15.44:0/32850 conn(0x55eea63eec00 0x55eea7c0cb00 secure :-1 s=THROTTLE_MESSAGE pgs=2 cs=0 l=1 rx=0x55eea6e0f800 tx=0x55eea6ee7900).throttle_message wants 1 message from policy throttle 128/128 failed, just wait.
2020-02-22 14:44:50.504 7fcf8a764700 10 --2- [v2:172.21.15.44:6800/32851,v1:172.21.15.44:6801/32851] >> [v2:172.21.15.44:6802/33790,v1:172.21.15.44:6803/33790] conn(0x55eea7152000 0x55eea714e580 crc :-1 s=THROTTLE_MESSAGE pgs=6 cs=0 l=1 rx=0 tx=0).throttle_message wants 1 message from policy throttler 128/128
2020-02-22 14:44:50.504 7fcf89f63700 10 --2- [v2:172.21.15.44:6800/32851,v1:172.21.15.44:6801/32851] >> 172.21.15.49:0/32977 conn(0x55eea34d8c00 0x55eea742d600 crc :-1 s=THROTTLE_MESSAGE pgs=6 cs=0 l=1 rx=0 tx=0).throttle_message wants 1 message from policy throttler 128/128
2020-02-22 14:44:50.504 7fcf89f63700 10 --2- [v2:172.21.15.44:6800/32851,v1:172.21.15.44:6801/32851] >> 172.21.15.49:0/32977 conn(0x55eea34d8c00 0x55eea742d600 crc :-1 s=THROTTLE_MESSAGE pgs=6 cs=0 l=1 rx=0 tx=0).throttle_message wants 1 message from policy throttle 128/128 failed, just wait.
2020-02-22 14:44:50.504 7fcf8a764700 10 --2- [v2:172.21.15.44:6800/32851,v1:172.21.15.44:6801/32851] >> [v2:172.21.15.44:6802/33790,v1:172.21.15.44:6803/33790] conn(0x55eea7152000 0x55eea714e580 crc :-1 s=THROTTLE_MESSAGE pgs=6 cs=0 l=1 rx=0 tx=0).throttle_message wants 1 message from policy throttle 128/128 failed, just wait.
2020-02-22 14:44:50.504 7fcf8af65700 10 --2- [v2:172.21.15.44:6800/32851,v1:172.21.15.44:6801/32851] >> 172.21.15.44:0/32851 conn(0x55eea6354800 0x55eea6ce8000 crc :-1 s=THROTTLE_MESSAGE pgs=8 cs=0 l=1 rx=0 tx=0).throttle_message wants 1 message from policy throttler 128/128
2020-02-22 14:44:50.504 7fcf8a764700 10 --2- [v2:172.21.15.44:6800/32851,v1:172.21.15.44:6801/32851] >> 172.21.15.49:0/32978 conn(0x55eea6dea800 0x55eea6cea680 secure :-1 s=THROTTLE_MESSAGE pgs=1 cs=0 l=1 rx=0x55eea6dd1500 tx=0x55eea6cc8d00).throttle_message wants 1 message from policy throttler 128/128
2020-02-22 14:44:50.504 7fcf8af65700 10 --2- [v2:172.21.15.44:6800/32851,v1:172.21.15.44:6801/32851] >> 172.21.15.44:0/32851 conn(0x55eea6354800 0x55eea6ce8000 crc :-1 s=THROTTLE_MESSAGE pgs=8 cs=0 l=1 rx=0 tx=0).throttle_message wants 1 message from policy throttle 128/128 failed, just wait.
2020-02-22 14:44:50.504 7fcf8a764700 10 --2- [v2:172.21.15.44:6800/32851,v1:172.21.15.44:6801/32851] >> 172.21.15.49:0/32978 conn(0x55eea6dea800 0x55eea6cea680 secure :-1 s=THROTTLE_MESSAGE pgs=1 cs=0 l=1 rx=0x55eea6dd1500 tx=0x55eea6cc8d00).throttle_message wants 1 message from policy throttle 128/128 failed, just wait.
2020-02-22 14:44:50.504 7fcf8a764700 10 --2- [v2:172.21.15.44:6800/32851,v1:172.21.15.44:6801/32851] >> [v2:172.21.15.49:6815/34301,v1:172.21.15.49:6817/34301] conn(0x55eea70bec00 0x55eea7098680 crc :-1 s=THROTTLE_MESSAGE pgs=6 cs=0 l=1 rx=0 tx=0).throttle_message wants 1 message from policy throttler 128/128
2020-02-22 14:44:50.504 7fcf8a764700 10 --2- [v2:172.21.15.44:6800/32851,v1:172.21.15.44:6801/32851] >> [v2:172.21.15.49:6815/34301,v1:172.21.15.49:6817/34301] conn(0x55eea70bec00 0x55eea7098680 crc :-1 s=THROTTLE_MESSAGE pgs=6 cs=0 l=1 rx=0 tx=0).throttle_message wants 1 message from policy throttle 128/128 failed, just wait.

#5 Updated by Sage Weil about 2 months ago

My guess is we are leaking a message ref somewhere...

#7 Updated by Patrick Donnelly about 1 month ago

Sage Weil wrote:

My guess is we are leaking a message ref somewhere...

I see at least one instance of that here: https://github.com/ceph/ceph/pull/31905/files#diff-6e2b0f299672aec02388db8d25680537R446

I will write up a fix for that but this PR was merged after https://github.com/ceph/ceph/pull/33122 so there must be another one somewhere. Still looking...

#8 Updated by Patrick Donnelly about 1 month ago

  • Status changed from New to Fix Under Review
  • Assignee set to Patrick Donnelly
  • Priority changed from Urgent to Immediate
  • Target version set to v14.2.8
  • Pull request ID set to 33498

Testing a fix for broken backport #31905. I think there must be another one of these.

#9 Updated by Patrick Donnelly about 1 month ago

  • Related to Backport #43046: nautilus: mgr: "mds metadata" to setup new DaemonState races with fsmap added

#10 Updated by Venky Shankar about 1 month ago

Patrick Donnelly wrote:

Merging this PR seems to cause or expose the issue: https://github.com/ceph/ceph/pull/33122

Before merge:

http://pulpito.ceph.com/pdonnell-2020-02-23_18:32:02-fs-nautilus-distro-basic-smithi/

Failures unrelated to this tracker.

After merge:

http://pulpito.ceph.com/pdonnell-2020-02-23_18:25:22-fs-nautilus-distro-basic-smithi/

One of the failures (4794420) resemble this issue: https://tracker.ceph.com/issues/44207

all timeout failures.

#11 Updated by Venky Shankar about 1 month ago

Patrick Donnelly wrote:

Sage Weil wrote:

My guess is we are leaking a message ref somewhere...

I see at least one instance of that here: https://github.com/ceph/ceph/pull/31905/files#diff-6e2b0f299672aec02388db8d25680537R446

I will write up a fix for that but this PR was merged after https://github.com/ceph/ceph/pull/33122 so there must be another one somewhere. Still looking...

The other one might just be https://tracker.ceph.com/issues/44207. We should include backport of https://github.com/ceph/ceph/pull/33413

Patrick?

#12 Updated by Patrick Donnelly about 1 month ago

  • Status changed from Fix Under Review to Resolved

Venky Shankar wrote:

Patrick Donnelly wrote:

Sage Weil wrote:

My guess is we are leaking a message ref somewhere...

I see at least one instance of that here: https://github.com/ceph/ceph/pull/31905/files#diff-6e2b0f299672aec02388db8d25680537R446

I will write up a fix for that but this PR was merged after https://github.com/ceph/ceph/pull/33122 so there must be another one somewhere. Still looking...

The other one might just be https://tracker.ceph.com/issues/44207. We should include backport of https://github.com/ceph/ceph/pull/33413

Patrick?

Yes, I see the dead lock now. #33413 seems to help but there is still deadlocks, from your test: /ceph/teuthology-archive/vshankar-2020-02-24_12:33:54-fs-wip-vshankar-testing-testing-basic-smithi/4798102

I will create a new issue for this.

Also available in: Atom PDF