Project

General

Profile

Bug #36507

client: connection failure during reconnect causes client to hang

Added by Patrick Donnelly about 2 months ago. Updated about 1 month ago.

Status:
New
Priority:
Urgent
Category:
-
Target version:
Start date:
Due date:
% Done:

0%

Source:
Q/A
Tags:
Backport:
mimic,luminous
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Client
Labels (FS):
Pull request ID:

Description

2018-10-16 17:39:27.001 7fe0caffd700 20 client.4493 trim_cache_for_reconnect mds.0 trimmed 12 dentries
2018-10-16 17:39:27.001 7fe0caffd700 10 client.4493 connect_mds_targets for mds.0
2018-10-16 17:39:27.001 7fe0caffd700 10 client.4493 send_request rebuilding request 1175 for mds.0
2018-10-16 17:39:27.001 7fe0caffd700 10 client.4493 send_request client_request(unknown.0:1175 rmdir #0x100000000d9/fstest_0d9406075aab89f5a64d5fdd4509065a 2018-10-16 17:39:08.087556 RETRY=1 REPLAY caller_uid=0, caller_gid=0{2685139136,}) v4 to mds.0
2018-10-16 17:39:27.001 7fe0d3fc9700 20 Event(0x55d7b126a7a0 nevent=5000 time_id=42).create_file_event create event started fd=21 mask=1 original mask is 0
2018-10-16 17:39:27.001 7fe0d3fc9700 20 EpollDriver.add_event add event fd=21 cur_mask=0 add_mask=1 to 10
2018-10-16 17:39:27.001 7fe0caffd700  1 -- 172.21.15.6:38668/2121577321 --> 172.21.15.176:6817/1640707343 -- client_request(unknown.0:1175 rmdir #0x100000000d9/fstest_0d9406075aab89f5a64d5fdd4509065a 2018-10-16 17:39:08.087556 RETRY=1 REPLAY caller_uid=0, caller_gid=0{2685139136,}) v4 -- 0x7fe0a00751d0 con 0
2018-10-16 17:39:27.001 7fe0d3fc9700 20 Event(0x55d7b126a7a0 nevent=5000 time_id=42).create_file_event create event end fd=21 mask=1 original mask is 1
2018-10-16 17:39:27.001 7fe0d3fc9700 10 NetHandler reconnect reconnect: Operation already in progress
2018-10-16 17:39:27.001 7fe0caffd700 15 -- 172.21.15.6:38668/2121577321 >> 172.21.15.176:6817/1640707343 conn(0x7fe0bc02bf30 legacy :-1 s=START_CONNECT pgs=0 cs=1 l=0).send_message inline write is denied, reschedule m=0x7fe0a00751d0
2018-10-16 17:39:27.001 7fe0caffd700 20 Event(0x55d7b126a7a0 nevent=5000 time_id=42).wakeup
2018-10-16 17:39:27.001 7fe0d3fc9700 10 -- 172.21.15.6:38668/2121577321 >> 172.21.15.176:6817/1640707343 conn(0x7fe0bc02bf30 legacy :-1 s=STATE_CONNECTING_RE l=0).process nonblock connect inprogress
2018-10-16 17:39:27.001 7fe0d3fc9700 20 Event(0x55d7b126a7a0 nevent=5000 time_id=42).create_file_event create event started fd=21 mask=2 original mask is 1
2018-10-16 17:39:27.001 7fe0d3fc9700 20 EpollDriver.add_event add event fd=21 cur_mask=1 add_mask=2 to 10
2018-10-16 17:39:27.001 7fe0caffd700 10 client.4493 send_request rebuilding request 1177 for mds.0
2018-10-16 17:39:27.001 7fe0caffd700 20 client.4493 encode_cap_releases enter (req: 0x7fe08c0038f0, mds: 0)
2018-10-16 17:39:27.001 7fe0caffd700 10 client.4493 send_request client_request(unknown.0:1177 lookup #0x100000000d9/fstest_b1f9e101fa675e0ca2f19c71523a8d5b 2018-10-16 17:39:08.155718 RETRY=1 caller_uid=0, caller_gid=0{0,}) v4 to mds.0
2018-10-16 17:39:27.001 7fe0caffd700  1 -- 172.21.15.6:38668/2121577321 --> 172.21.15.176:6817/1640707343 -- client_request(unknown.0:1177 lookup #0x100000000d9/fstest_b1f9e101fa675e0ca2f19c71523a8d5b 2018-10-16 17:39:08.155718 RETRY=1 caller_uid=0, caller_gid=0{0,}) v4 -- 0x7fe0a401f270 con 0
2018-10-16 17:39:27.001 7fe0d3fc9700 20 Event(0x55d7b126a7a0 nevent=5000 time_id=42).create_file_event create event end fd=21 mask=2 original mask is 3
2018-10-16 17:39:27.001 7fe0caffd700 15 -- 172.21.15.6:38668/2121577321 >> 172.21.15.176:6817/1640707343 conn(0x7fe0bc02bf30 legacy :-1 s=START_CONNECT pgs=0 cs=1 l=0).send_message inline write is denied, reschedule m=0x7fe0a401f270
2018-10-16 17:39:27.001 7fe0d3fc9700 20 -- 172.21.15.6:38668/2121577321 >> 172.21.15.176:6817/1640707343 conn(0x7fe0bc02bf30 legacy :-1 s=STATE_CONNECTING_RE l=0).process
2018-10-16 17:39:27.001 7fe0d3fc9700 20 EpollDriver.del_event del event fd=21 cur_mask=3 delmask=2 to 10
2018-10-16 17:39:27.001 7fe0d3fc9700 10 -- 172.21.15.6:38668/2121577321 >> 172.21.15.176:6817/1640707343 conn(0x7fe0bc02bf30 legacy :-1 s=STATE_CONNECTING_RE l=0).process connect successfully, ready to send banner
2018-10-16 17:39:27.001 7fe0d3fc9700 20 -- 172.21.15.6:38668/2121577321 >> 172.21.15.176:6817/1640707343 conn(0x7fe0bc02bf30 legacy :-1 s=START_CONNECT pgs=0 cs=1 l=0).read_event
2018-10-16 17:39:27.001 7fe0d3fc9700 20 -- 172.21.15.6:38668/2121577321 >> 172.21.15.176:6817/1640707343 conn(0x7fe0bc02bf30 legacy :-1 s=START_CONNECT pgs=0 cs=1 l=0).send_client_banner
2018-10-16 17:39:27.001 7fe0caffd700 10 client.4493  caps on 0x100000000d9.head pAsLsXsFsx wants -
2018-10-16 17:39:27.001 7fe0caffd700 10 client.4493     path #0x1/client.0/tmp/tmp
2018-10-16 17:39:27.001 7fe0caffd700 10 client.4493  snaprealm snaprealm(0x1 nref=10 c=0 seq=1 parent=0x0 my_snaps=[] cached_snapc=1=[])
2018-10-16 17:39:27.001 7fe0caffd700 10 client.4493  caps on 0x100000000d8.head pAsLsXsFsxcrwb wants pFscr
2018-10-16 17:39:27.001 7fe0caffd700 10 client.4493     path #0x100000000d8
2018-10-16 17:39:27.001 7fe0caffd700 10 client.4493  caps on 0x10000000001.head pAsLsXsFsx wants -
2018-10-16 17:39:27.001 7fe0caffd700 10 client.4493     path #0x1/client.0/tmp
2018-10-16 17:39:27.001 7fe0caffd700 10 client.4493  caps on 0x10000000000.head pAsLsXsFsx wants -
2018-10-16 17:39:27.001 7fe0caffd700 10 client.4493     path #0x1/client.0
2018-10-16 17:39:27.001 7fe0caffd700 10 client.4493  caps on 0x10000000005.head pAsLsXsFsx wants -
2018-10-16 17:39:27.001 7fe0caffd700 10 client.4493     path #0x1/client.0/tmp/pjd-fstest-20090130-RC/tests
2018-10-16 17:39:27.001 7fe0caffd700 10 client.4493  caps on 0x10000000003.head pAsLsXsFsx wants -
2018-10-16 17:39:27.001 7fe0caffd700 10 client.4493     path #0x1/client.0/tmp/pjd-fstest-20090130-RC
2018-10-16 17:39:27.001 7fe0caffd700 10 client.4493  caps on 0x1.head pAsLsXsFs wants -
2018-10-16 17:39:27.001 7fe0caffd700 10 client.4493     path #0x1
2018-10-16 17:39:27.001 7fe0caffd700 10 client.4493  caps on 0x10000000091.head pAsLsXsFsx wants -
2018-10-16 17:39:27.001 7fe0d3fc9700 10 -- 172.21.15.6:38668/2121577321 >> 172.21.15.176:6817/1640707343 conn(0x7fe0bc02bf30 legacy :-1 s=STATE_CONNECTION_ESTABLISHED l=0)._try_send sent bytes 9 remaining bytes 0
2018-10-16 17:39:27.001 7fe0d3fc9700 20 Event(0x55d7b126a7a0 nevent=5000 time_id=42).create_file_event create event started fd=21 mask=2 original mask is 1
2018-10-16 17:39:27.001 7fe0caffd700 10 client.4493     path #0x1/client.0/tmp/pjd-fstest-20090130-RC/tests/mkdir
2018-10-16 17:39:27.001 7fe0d3fc9700 20 EpollDriver.add_event add event fd=21 cur_mask=1 add_mask=2 to 10
2018-10-16 17:39:27.001 7fe0d3fc9700 20 Event(0x55d7b126a7a0 nevent=5000 time_id=42).create_file_event create event end fd=21 mask=2 original mask is 3
2018-10-16 17:39:27.001 7fe0caffd700 10 client.4493  caps on 0x100000001ba.head pAsLsXsFsx wants -
2018-10-16 17:39:27.001 7fe0caffd700 10 client.4493     path #0x100000001ba
2018-10-16 17:39:27.001 7fe0d3fc9700 20 -- 172.21.15.6:38668/2121577321 >> 172.21.15.176:6817/1640707343 conn(0x7fe0bc02bf30 legacy :-1 s=CONNECTING pgs=0 cs=1 l=0).handle_client_banner_write r=0
2018-10-16 17:39:27.001 7fe0caffd700 10 client.4493  caps on 0x1000000009b.head pAsLsXsFscr wants pFscr
2018-10-16 17:39:27.001 7fe0caffd700 10 client.4493     path #0x1000000009b
2018-10-16 17:39:27.001 7fe0d3fc9700 10 -- 172.21.15.6:38668/2121577321 >> 172.21.15.176:6817/1640707343 conn(0x7fe0bc02bf30 legacy :-1 s=CONNECTING pgs=0 cs=1 l=0).handle_client_banner_write connect write banner done: 172.21.15.176:6817/1640707343
2018-10-16 17:39:27.001 7fe0d3fc9700 20 -- 172.21.15.6:38668/2121577321 >> 172.21.15.176:6817/1640707343 conn(0x7fe0bc02bf30 legacy :-1 s=CONNECTING_WAIT_BANNER_AND_IDENTIFY pgs=0 cs=1 l=0).wait_server_banner
2018-10-16 17:39:27.001 7fe0d3fc9700 20 -- 172.21.15.6:38668/2121577321 >> 172.21.15.176:6817/1640707343 conn(0x7fe0bc02bf30 legacy :-1 s=STATE_CONNECTION_ESTABLISHED l=0).read start len=281
HERE4
2018-10-16 17:39:27.001 7fe0caffd700  1 -- 172.21.15.6:38668/2121577321 --> 172.21.15.176:6817/1640707343 -- client_reconnect(10 caps) v3 -- 0x7fe0b805cc40 con 0
2018-10-16 17:39:27.001 7fe0caffd700 15 -- 172.21.15.6:38668/2121577321 >> 172.21.15.176:6817/1640707343 conn(0x7fe0bc02bf30 legacy :-1 s=CONNECTING_WAIT_BANNER_AND_IDENTIFY pgs=0 cs=1 l=0).send_message inline write is denied, reschedule m=0x7fe0b805cc40
2018-10-16 17:39:27.001 7fe0d3fc9700 20 -- 172.21.15.6:38668/2121577321 >> 172.21.15.176:6817/1640707343 conn(0x7fe0bc02bf30 legacy :-1 s=CONNECTING_WAIT_BANNER_AND_IDENTIFY pgs=0 cs=1 l=0).handle_server_banner_and_identify r=0
2018-10-16 17:39:27.001 7fe0d3fc9700 20 -- 172.21.15.6:38668/2121577321 >> 172.21.15.176:6817/1640707343 conn(0x7fe0bc02bf30 legacy :-1 s=CONNECTING_WAIT_BANNER_AND_IDENTIFY pgs=0 cs=1 l=0).handle_server_banner_and_identify connect read peer addr 172.21.15.176:6817/1640707343 on socket 21
2018-10-16 17:39:27.001 7fe0d3fc9700 20 -- 172.21.15.6:38668/2121577321 >> 172.21.15.176:6817/1640707343 conn(0x7fe0bc02bf30 legacy :-1 s=CONNECTING_WAIT_BANNER_AND_IDENTIFY pgs=0 cs=1 l=0).handle_server_banner_and_identify connect peer addr for me is 172.21.15.6:47640/0
2018-10-16 17:39:27.001 7fe0caffd700 10 -- 172.21.15.6:38668/2121577321 dispatch_throttle_release 647 to dispatch throttler 647/104857600
2018-10-16 17:39:27.001 7fe0caffd700 20 -- 172.21.15.6:38668/2121577321 done calling dispatch on 0x7fe0c40075c0
2018-10-16 17:39:27.001 7fe0d3fc9700 10 -- 172.21.15.6:38668/2121577321 >> 172.21.15.176:6817/1640707343 conn(0x7fe0bc02bf30 legacy :-1 s=STATE_CONNECTION_ESTABLISHED l=0)._try_send sent bytes 136 remaining bytes 0
2018-10-16 17:39:27.001 7fe0d3fc9700 20 -- 172.21.15.6:38668/2121577321 >> 172.21.15.176:6817/1640707343 conn(0x7fe0bc02bf30 legacy :-1 s=CONNECTING_WAIT_BANNER_AND_IDENTIFY pgs=0 cs=1 l=0).handle_my_addr_write r=0
2018-10-16 17:39:27.001 7fe0d3fc9700 10 -- 172.21.15.6:38668/2121577321 >> 172.21.15.176:6817/1640707343 conn(0x7fe0bc02bf30 legacy :-1 s=CONNECTING_WAIT_BANNER_AND_IDENTIFY pgs=0 cs=1 l=0).handle_my_addr_write connect sent my addr 172.21.15.6:38668/2121577321
2018-10-16 17:39:27.001 7fe0d3fc9700 20 -- 172.21.15.6:38668/2121577321 >> 172.21.15.176:6817/1640707343 conn(0x7fe0bc02bf30 legacy :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=1 l=0).send_connect_message
2018-10-16 17:39:27.001 7fe0d3fc9700 10 -- 172.21.15.6:38668/2121577321 >> 172.21.15.176:6817/1640707343 conn(0x7fe0bc02bf30 legacy :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=1 l=0).send_connect_message connect_msg.authorizer_len=174 protocol=2
2018-10-16 17:39:27.001 7fe0d3fc9700 10 -- 172.21.15.6:38668/2121577321 >> 172.21.15.176:6817/1640707343 conn(0x7fe0bc02bf30 legacy :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=1 l=0).send_connect_message connect sending gseq=39 cseq=1 proto=32
2018-10-16 17:39:27.001 7fe0d3fc9700 10 -- 172.21.15.6:38668/2121577321 >> 172.21.15.176:6817/1640707343 conn(0x7fe0bc02bf30 legacy :-1 s=STATE_CONNECTION_ESTABLISHED l=0)._try_send sent bytes 207 remaining bytes 0
2018-10-16 17:39:27.001 7fe0d3fc9700 20 -- 172.21.15.6:38668/2121577321 >> 172.21.15.176:6817/1640707343 conn(0x7fe0bc02bf30 legacy :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=1 l=0).handle_connect_message_write r=0
2018-10-16 17:39:27.001 7fe0d3fc9700 20 -- 172.21.15.6:38668/2121577321 >> 172.21.15.176:6817/1640707343 conn(0x7fe0bc02bf30 legacy :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=1 l=0).handle_connect_message_write connect wrote (self +) cseq, waiting for reply
2018-10-16 17:39:27.001 7fe0d3fc9700 20 -- 172.21.15.6:38668/2121577321 >> 172.21.15.176:6817/1640707343 conn(0x7fe0bc02bf30 legacy :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=1 l=0).wait_connect_reply
2018-10-16 17:39:27.001 7fe0d3fc9700 20 -- 172.21.15.6:38668/2121577321 >> 172.21.15.176:6817/1640707343 conn(0x7fe0bc02bf30 legacy :-1 s=STATE_CONNECTION_ESTABLISHED l=0).read start len=26
2018-10-16 17:39:27.001 7fe0d3fc9700  4 -- 172.21.15.6:38668/2121577321 >> 172.21.15.176:6817/1640707343 conn(0x7fe0bc02bf30 legacy :-1 s=STATE_CONNECTION_ESTABLISHED l=0).handle_write
2018-10-16 17:39:27.001 7fe0d3fc9700 10 -- 172.21.15.6:38668/2121577321 >> 172.21.15.176:6817/1640707343 conn(0x7fe0bc02bf30 legacy :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=1 l=0).write_event
2018-10-16 17:39:27.001 7fe0d3fc9700 10 -- 172.21.15.6:38668/2121577321 >> 172.21.15.176:6817/1640707343 conn(0x7fe0bc02bf30 legacy :-1 s=STATE_CONNECTION_ESTABLISHED l=0)._try_send sent bytes 0 remaining bytes 0
2018-10-16 17:39:27.001 7fe0d3fc9700  4 -- 172.21.15.6:38668/2121577321 >> 172.21.15.176:6817/1640707343 conn(0x7fe0bc02bf30 legacy :-1 s=STATE_CONNECTION_ESTABLISHED l=0).handle_write
2018-10-16 17:39:27.001 7fe0d3fc9700 10 -- 172.21.15.6:38668/2121577321 >> 172.21.15.176:6817/1640707343 conn(0x7fe0bc02bf30 legacy :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=1 l=0).write_event
2018-10-16 17:39:27.001 7fe0d3fc9700 10 -- 172.21.15.6:38668/2121577321 >> 172.21.15.176:6817/1640707343 conn(0x7fe0bc02bf30 legacy :-1 s=STATE_CONNECTION_ESTABLISHED l=0)._try_send sent bytes 0 remaining bytes 0
2018-10-16 17:39:27.001 7fe0d3fc9700  4 -- 172.21.15.6:38668/2121577321 >> 172.21.15.176:6817/1640707343 conn(0x7fe0bc02bf30 legacy :-1 s=STATE_CONNECTION_ESTABLISHED l=0).handle_write
2018-10-16 17:39:27.001 7fe0d3fc9700 10 -- 172.21.15.6:38668/2121577321 >> 172.21.15.176:6817/1640707343 conn(0x7fe0bc02bf30 legacy :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=1 l=0).write_event
2018-10-16 17:39:27.001 7fe0d3fc9700 10 -- 172.21.15.6:38668/2121577321 >> 172.21.15.176:6817/1640707343 conn(0x7fe0bc02bf30 legacy :-1 s=STATE_CONNECTION_ESTABLISHED l=0)._try_send sent bytes 0 remaining bytes 0
2018-10-16 17:39:27.001 7fe0d3fc9700  4 -- 172.21.15.6:38668/2121577321 >> 172.21.15.176:6817/1640707343 conn(0x7fe0bc02bf30 legacy :-1 s=STATE_CONNECTION_ESTABLISHED l=0).handle_write
2018-10-16 17:39:27.001 7fe0d3fc9700 10 -- 172.21.15.6:38668/2121577321 >> 172.21.15.176:6817/1640707343 conn(0x7fe0bc02bf30 legacy :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=1 l=0).write_event
2018-10-16 17:39:27.001 7fe0d3fc9700 10 -- 172.21.15.6:38668/2121577321 >> 172.21.15.176:6817/1640707343 conn(0x7fe0bc02bf30 legacy :-1 s=STATE_CONNECTION_ESTABLISHED l=0)._try_send sent bytes 0 remaining bytes 0
2018-10-16 17:39:27.001 7fe0d3fc9700  4 -- 172.21.15.6:38668/2121577321 >> 172.21.15.176:6817/1640707343 conn(0x7fe0bc02bf30 legacy :-1 s=STATE_CONNECTION_ESTABLISHED l=0).handle_write
2018-10-16 17:39:27.001 7fe0d3fc9700 10 -- 172.21.15.6:38668/2121577321 >> 172.21.15.176:6817/1640707343 conn(0x7fe0bc02bf30 legacy :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=1 l=0).write_event
2018-10-16 17:39:27.001 7fe0d3fc9700 10 -- 172.21.15.6:38668/2121577321 >> 172.21.15.176:6817/1640707343 conn(0x7fe0bc02bf30 legacy :-1 s=STATE_CONNECTION_ESTABLISHED l=0)._try_send sent bytes 0 remaining bytes 0
2018-10-16 17:39:27.001 7fe0d3fc9700 20 -- 172.21.15.6:38668/2121577321 >> 172.21.15.176:6817/1640707343 conn(0x7fe0bc02bf30 legacy :-1 s=STATE_CONNECTION_ESTABLISHED l=0).process
2018-10-16 17:39:27.001 7fe0d3fc9700 20 -- 172.21.15.6:38668/2121577321 >> 172.21.15.176:6817/1640707343 conn(0x7fe0bc02bf30 legacy :-1 s=STATE_CONNECTION_ESTABLISHED l=0).read continue len=26
2018-10-16 17:39:27.001 7fe0d3fc9700 20 -- 172.21.15.6:38668/2121577321 >> 172.21.15.176:6817/1640707343 conn(0x7fe0bc02bf30 legacy :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=1 l=0).handle_connect_reply_1 r=0
2018-10-16 17:39:27.001 7fe0d3fc9700 20 -- 172.21.15.6:38668/2121577321 >> 172.21.15.176:6817/1640707343 conn(0x7fe0bc02bf30 legacy :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=1 l=0).handle_connect_reply_1 connect got reply tag 16 connect_seq 0 global_seq 0 proto 32 flags 0 features 4611087854031142911
2018-10-16 17:39:27.001 7fe0d3fc9700 20 -- 172.21.15.6:38668/2121577321 >> 172.21.15.176:6817/1640707343 conn(0x7fe0bc02bf30 legacy :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=1 l=0).wait_connect_reply_auth
2018-10-16 17:39:27.001 7fe0d3fc9700 10 -- 172.21.15.6:38668/2121577321 >> 172.21.15.176:6817/1640707343 conn(0x7fe0bc02bf30 legacy :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=1 l=0).wait_connect_reply_auth reply.authorizer_len=32
2018-10-16 17:39:27.001 7fe0d3fc9700 20 -- 172.21.15.6:38668/2121577321 >> 172.21.15.176:6817/1640707343 conn(0x7fe0bc02bf30 legacy :-1 s=STATE_CONNECTION_ESTABLISHED l=0).read start len=32
2018-10-16 17:39:27.001 7fe0d3fc9700 20 -- 172.21.15.6:38668/2121577321 >> 172.21.15.176:6817/1640707343 conn(0x7fe0bc02bf30 legacy :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=1 l=0).handle_connect_reply_auth r=0
2018-10-16 17:39:27.001 7fe0d3fc9700 10 -- 172.21.15.6:38668/2121577321 >> 172.21.15.176:6817/1640707343 conn(0x7fe0bc02bf30 legacy :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=1 l=0).handle_connect_reply_auth connect got auth challenge
2018-10-16 17:39:27.001 7fe0d3fc9700 20 -- 172.21.15.6:38668/2121577321 >> 172.21.15.176:6817/1640707343 conn(0x7fe0bc02bf30 legacy :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=1 l=0).send_connect_message
2018-10-16 17:39:27.001 7fe0d3fc9700 10 -- 172.21.15.6:38668/2121577321 >> 172.21.15.176:6817/1640707343 conn(0x7fe0bc02bf30 legacy :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=1 l=0).send_connect_message connect_msg.authorizer_len=174 protocol=2
2018-10-16 17:39:27.001 7fe0d3fc9700 10 -- 172.21.15.6:38668/2121577321 >> 172.21.15.176:6817/1640707343 conn(0x7fe0bc02bf30 legacy :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=1 l=0).send_connect_message connect sending gseq=39 cseq=1 proto=32
2018-10-16 17:39:27.001 7fe0d3fc9700 10 -- 172.21.15.6:38668/2121577321 >> 172.21.15.176:6817/1640707343 conn(0x7fe0bc02bf30 legacy :-1 s=STATE_CONNECTION_ESTABLISHED l=0)._try_send sent bytes 207 remaining bytes 0
2018-10-16 17:39:27.001 7fe0d3fc9700 20 -- 172.21.15.6:38668/2121577321 >> 172.21.15.176:6817/1640707343 conn(0x7fe0bc02bf30 legacy :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=1 l=0).handle_connect_message_write r=0
2018-10-16 17:39:27.001 7fe0d3fc9700 20 -- 172.21.15.6:38668/2121577321 >> 172.21.15.176:6817/1640707343 conn(0x7fe0bc02bf30 legacy :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=1 l=0).handle_connect_message_write connect wrote (self +) cseq, waiting for reply
2018-10-16 17:39:27.001 7fe0d3fc9700 20 -- 172.21.15.6:38668/2121577321 >> 172.21.15.176:6817/1640707343 conn(0x7fe0bc02bf30 legacy :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=1 l=0).wait_connect_reply
2018-10-16 17:39:27.001 7fe0d3fc9700 20 -- 172.21.15.6:38668/2121577321 >> 172.21.15.176:6817/1640707343 conn(0x7fe0bc02bf30 legacy :-1 s=STATE_CONNECTION_ESTABLISHED l=0).read start len=26
2018-10-16 17:39:27.001 7fe0d3fc9700  4 -- 172.21.15.6:38668/2121577321 >> 172.21.15.176:6817/1640707343 conn(0x7fe0bc02bf30 legacy :-1 s=STATE_CONNECTION_ESTABLISHED l=0).handle_write
2018-10-16 17:39:27.001 7fe0d3fc9700 10 -- 172.21.15.6:38668/2121577321 >> 172.21.15.176:6817/1640707343 conn(0x7fe0bc02bf30 legacy :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=1 l=0).write_event
2018-10-16 17:39:27.001 7fe0d3fc9700 10 -- 172.21.15.6:38668/2121577321 >> 172.21.15.176:6817/1640707343 conn(0x7fe0bc02bf30 legacy :-1 s=STATE_CONNECTION_ESTABLISHED l=0)._try_send sent bytes 0 remaining bytes 0
2018-10-16 17:39:27.001 7fe0d3fc9700 20 -- 172.21.15.6:38668/2121577321 >> 172.21.15.176:6817/1640707343 conn(0x7fe0bc02bf30 legacy :-1 s=STATE_CONNECTION_ESTABLISHED l=0).process
2018-10-16 17:39:27.001 7fe0d3fc9700 20 -- 172.21.15.6:38668/2121577321 >> 172.21.15.176:6817/1640707343 conn(0x7fe0bc02bf30 legacy :-1 s=STATE_CONNECTION_ESTABLISHED l=0).read continue len=26
2018-10-16 17:39:27.001 7fe0d3fc9700 20 -- 172.21.15.6:38668/2121577321 >> 172.21.15.176:6817/1640707343 conn(0x7fe0bc02bf30 legacy :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=1 l=0).handle_connect_reply_1 r=0
2018-10-16 17:39:27.001 7fe0d3fc9700 20 -- 172.21.15.6:38668/2121577321 >> 172.21.15.176:6817/1640707343 conn(0x7fe0bc02bf30 legacy :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=1 l=0).handle_connect_reply_1 connect got reply tag 2 connect_seq 0 global_seq 0 proto 32 flags 0 features 4611087854031142911
2018-10-16 17:39:27.001 7fe0d3fc9700 20 -- 172.21.15.6:38668/2121577321 >> 172.21.15.176:6817/1640707343 conn(0x7fe0bc02bf30 legacy :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=1 l=0).wait_connect_reply_auth
2018-10-16 17:39:27.001 7fe0d3fc9700 10 -- 172.21.15.6:38668/2121577321 >> 172.21.15.176:6817/1640707343 conn(0x7fe0bc02bf30 legacy :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=1 l=0).wait_connect_reply_auth reply.authorizer_len=36
2018-10-16 17:39:27.001 7fe0d3fc9700 20 -- 172.21.15.6:38668/2121577321 >> 172.21.15.176:6817/1640707343 conn(0x7fe0bc02bf30 legacy :-1 s=STATE_CONNECTION_ESTABLISHED l=0).read start len=36
2018-10-16 17:39:27.001 7fe0d3fc9700 20 -- 172.21.15.6:38668/2121577321 >> 172.21.15.176:6817/1640707343 conn(0x7fe0bc02bf30 legacy :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=1 l=0).handle_connect_reply_auth r=0
2018-10-16 17:39:27.001 7fe0d3fc9700 20 -- 172.21.15.6:38668/2121577321 >> 172.21.15.176:6817/1640707343 conn(0x7fe0bc02bf30 legacy :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=1 l=0).handle_connect_reply_2
2018-10-16 17:39:27.001 7fe0d3fc9700  0 -- 172.21.15.6:38668/2121577321 >> 172.21.15.176:6817/1640707343 conn(0x7fe0bc02bf30 legacy :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=1 l=0).handle_connect_reply_2 connect got RESETSESSION
2018-10-16 17:39:27.001 7fe0d3fc9700 10 -- 172.21.15.6:38668/2121577321 >> 172.21.15.176:6817/1640707343 conn(0x7fe0bc02bf30 legacy :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=1 l=0).session_reset started
2018-10-16 17:39:27.001 7fe0d3fc9700 10 -- 172.21.15.6:38668/2121577321 >> 172.21.15.176:6817/1640707343 conn(0x7fe0bc02bf30 legacy :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=1 l=0).discard_out_queue started
2018-10-16 17:39:27.001 7fe0d3fc9700 20 -- 172.21.15.6:38668/2121577321 >> 172.21.15.176:6817/1640707343 conn(0x7fe0bc02bf30 legacy :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=1 l=0).discard_out_queue discard 0x7fe0a00751d0
2018-10-16 17:39:27.001 7fe0d3fc9700 20 -- 172.21.15.6:38668/2121577321 >> 172.21.15.176:6817/1640707343 conn(0x7fe0bc02bf30 legacy :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=1 l=0).discard_out_queue discard 0x7fe0a401f270
2018-10-16 17:39:27.001 7fe0d3fc9700 20 -- 172.21.15.6:38668/2121577321 >> 172.21.15.176:6817/1640707343 conn(0x7fe0bc02bf30 legacy :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=1 l=0).discard_out_queue discard 0x7fe0b805cc40
2018-10-16 17:39:27.001 7fe0d3fc9700 20 -- 172.21.15.6:38668/2121577321 >> 172.21.15.176:6817/1640707343 conn(0x7fe0bc02bf30 legacy :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=0 l=0).send_connect_message
2018-10-16 17:39:27.001 7fe0d3fc9700 10 -- 172.21.15.6:38668/2121577321 >> 172.21.15.176:6817/1640707343 conn(0x7fe0bc02bf30 legacy :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=0 l=0).send_connect_message connect_msg.authorizer_len=174 protocol=2
2018-10-16 17:39:27.001 7fe0d3fc9700 10 -- 172.21.15.6:38668/2121577321 >> 172.21.15.176:6817/1640707343 conn(0x7fe0bc02bf30 legacy :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=0 l=0).send_connect_message connect sending gseq=39 cseq=0 proto=32
2018-10-16 17:39:27.001 7fe0d3fc9700 10 -- 172.21.15.6:38668/2121577321 >> 172.21.15.176:6817/1640707343 conn(0x7fe0bc02bf30 legacy :-1 s=STATE_CONNECTION_ESTABLISHED l=0)._try_send sent bytes 207 remaining bytes 0
2018-10-16 17:39:27.001 7fe0d3fc9700 20 EpollDriver.del_event del event fd=21 cur_mask=3 delmask=2 to 10
2018-10-16 17:39:27.001 7fe0d3fc9700 20 -- 172.21.15.6:38668/2121577321 >> 172.21.15.176:6817/1640707343 conn(0x7fe0bc02bf30 legacy :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=0 l=0).handle_connect_message_write r=0
2018-10-16 17:39:27.001 7fe0d3fc9700 20 -- 172.21.15.6:38668/2121577321 >> 172.21.15.176:6817/1640707343 conn(0x7fe0bc02bf30 legacy :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=0 l=0).handle_connect_message_write connect wrote (self +) cseq, waiting for reply
2018-10-16 17:39:27.001 7fe0caffd700  0 client.4493 ms_handle_remote_reset on 172.21.15.176:6817/1640707343
2018-10-16 17:39:27.001 7fe0d3fc9700 20 -- 172.21.15.6:38668/2121577321 >> 172.21.15.176:6817/1640707343 conn(0x7fe0bc02bf30 legacy :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=0 l=0).wait_connect_reply
2018-10-16 17:39:27.001 7fe0d3fc9700 20 -- 172.21.15.6:38668/2121577321 >> 172.21.15.176:6817/1640707343 conn(0x7fe0bc02bf30 legacy :-1 s=STATE_CONNECTION_ESTABLISHED l=0).read start len=26
2018-10-16 17:39:27.001 7fe0caffd700  1 -- 172.21.15.6:38668/2121577321 --> 172.21.15.176:6790/0 -- mon_subscribe({osdmap=22}) v3 -- 0x7fe0b8064060 con 0
2018-10-16 17:39:27.001 7fe0caffd700 15 -- 172.21.15.6:38668/2121577321 >> 172.21.15.176:6790/0 conn(0x55d7b129b580 legacy :-1 s=OPENED pgs=153 cs=1 l=1).send_message inline write is denied, reschedule m=0x7fe0b8064060
2018-10-16 17:39:27.001 7fe0caffd700 20 Event(0x55d7b123da00 nevent=5000 time_id=5).wakeup
2018-10-16 17:39:27.001 7fe0caffd700  1 client.4493 reset from mds we were open; mark session as stale
2018-10-16 17:39:27.001 7fe0d47ca700  4 -- 172.21.15.6:38668/2121577321 >> 172.21.15.176:6790/0 conn(0x55d7b129b580 legacy :-1 s=STATE_CONNECTION_ESTABLISHED l=1).handle_write
2018-10-16 17:39:27.001 7fe0d47ca700 10 -- 172.21.15.6:38668/2121577321 >> 172.21.15.176:6790/0 conn(0x55d7b129b580 legacy :-1 s=OPENED pgs=153 cs=1 l=1).write_event
2018-10-16 17:39:27.001 7fe0d47ca700 20 -- 172.21.15.6:38668/2121577321 >> 172.21.15.176:6790/0 conn(0x55d7b129b580 legacy :-1 s=OPENED pgs=153 cs=1 l=1).prepare_send_message m mon_subscribe({osdmap=22}) v3
2018-10-16 17:39:27.001 7fe0d47ca700 20 -- 172.21.15.6:38668/2121577321 >> 172.21.15.176:6790/0 conn(0x55d7b129b580 legacy :-1 s=OPENED pgs=153 cs=1 l=1).prepare_send_message encoding features 4611087854031142911 0x7fe0b8064060 mon_subscribe({osdmap=22}) v3
2018-10-16 17:39:27.001 7fe0d47ca700 20 -- 172.21.15.6:38668/2121577321 >> 172.21.15.176:6790/0 conn(0x55d7b129b580 legacy :-1 s=OPENED pgs=153 cs=1 l=1).write_message no session security
2018-10-16 17:39:27.001 7fe0d47ca700 20 -- 172.21.15.6:38668/2121577321 >> 172.21.15.176:6790/0 conn(0x55d7b129b580 legacy :-1 s=OPENED pgs=153 cs=1 l=1).write_message sending message type=15 src client.4493 front=36 data=0 off 0
2018-10-16 17:39:27.001 7fe0d47ca700 20 -- 172.21.15.6:38668/2121577321 >> 172.21.15.176:6790/0 conn(0x55d7b129b580 legacy :-1 s=OPENED pgs=153 cs=1 l=1).write_message sending 72 0x7fe0b8064060
2018-10-16 17:39:27.001 7fe0d3fc9700 20 -- 172.21.15.6:38668/2121577321 >> 172.21.15.176:6817/1640707343 conn(0x7fe0bc02bf30 legacy :-1 s=STATE_CONNECTION_ESTABLISHED l=0).process
2018-10-16 17:39:27.001 7fe0d47ca700 10 -- 172.21.15.6:38668/2121577321 >> 172.21.15.176:6790/0 conn(0x55d7b129b580 legacy :-1 s=STATE_CONNECTION_ESTABLISHED l=1)._try_send sent bytes 111 remaining bytes 0
2018-10-16 17:39:27.001 7fe0d3fc9700 20 -- 172.21.15.6:38668/2121577321 >> 172.21.15.176:6817/1640707343 conn(0x7fe0bc02bf30 legacy :-1 s=STATE_CONNECTION_ESTABLISHED l=0).read continue len=26
2018-10-16 17:39:27.001 7fe0d47ca700 10 -- 172.21.15.6:38668/2121577321 >> 172.21.15.176:6790/0 conn(0x55d7b129b580 legacy :-1 s=OPENED pgs=153 cs=1 l=1).write_message sending 0x7fe0b8064060 done.
2018-10-16 17:39:27.001 7fe0d3fc9700 20 -- 172.21.15.6:38668/2121577321 >> 172.21.15.176:6817/1640707343 conn(0x7fe0bc02bf30 legacy :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=0 l=0).handle_connect_reply_1 r=0
2018-10-16 17:39:27.001 7fe0d3fc9700 20 -- 172.21.15.6:38668/2121577321 >> 172.21.15.176:6817/1640707343 conn(0x7fe0bc02bf30 legacy :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=0 l=0).handle_connect_reply_1 connect got reply tag 13 connect_seq 1 global_seq 14 proto 32 flags 0 features 4611087854031142911
2018-10-16 17:39:27.001 7fe0d3fc9700 20 -- 172.21.15.6:38668/2121577321 >> 172.21.15.176:6817/1640707343 conn(0x7fe0bc02bf30 legacy :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=0 l=0).wait_connect_reply_auth
2018-10-16 17:39:27.001 7fe0d3fc9700 10 -- 172.21.15.6:38668/2121577321 >> 172.21.15.176:6817/1640707343 conn(0x7fe0bc02bf30 legacy :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=0 l=0).wait_connect_reply_auth reply.authorizer_len=36
2018-10-16 17:39:27.001 7fe0d3fc9700 20 -- 172.21.15.6:38668/2121577321 >> 172.21.15.176:6817/1640707343 conn(0x7fe0bc02bf30 legacy :-1 s=STATE_CONNECTION_ESTABLISHED l=0).read start len=36
2018-10-16 17:39:27.001 7fe0d3fc9700 20 -- 172.21.15.6:38668/2121577321 >> 172.21.15.176:6817/1640707343 conn(0x7fe0bc02bf30 legacy :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=0 l=0).handle_connect_reply_auth r=0
2018-10-16 17:39:27.001 7fe0d3fc9700 20 -- 172.21.15.6:38668/2121577321 >> 172.21.15.176:6817/1640707343 conn(0x7fe0bc02bf30 legacy :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=0 l=0).handle_connect_reply_2
2018-10-16 17:39:27.001 7fe0d3fc9700 10 -- 172.21.15.6:38668/2121577321 >> 172.21.15.176:6817/1640707343 conn(0x7fe0bc02bf30 legacy :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=0 l=0).handle_connect_reply_2 got CEPH_MSGR_TAG_SEQ, reading acked_seq and writing in_seq
2018-10-16 17:39:27.001 7fe0d3fc9700 20 -- 172.21.15.6:38668/2121577321 >> 172.21.15.176:6817/1640707343 conn(0x7fe0bc02bf30 legacy :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=0 l=0).wait_ack_seq
2018-10-16 17:39:27.001 7fe0d3fc9700 20 -- 172.21.15.6:38668/2121577321 >> 172.21.15.176:6817/1640707343 conn(0x7fe0bc02bf30 legacy :-1 s=STATE_CONNECTION_ESTABLISHED l=0).read start len=8
2018-10-16 17:39:27.001 7fe0d3fc9700 20 -- 172.21.15.6:38668/2121577321 >> 172.21.15.176:6817/1640707343 conn(0x7fe0bc02bf30 legacy :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=0 l=0).handle_ack_seq r=0
2018-10-16 17:39:27.001 7fe0d3fc9700  2 -- 172.21.15.6:38668/2121577321 >> 172.21.15.176:6817/1640707343 conn(0x7fe0bc02bf30 legacy :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=0 l=0).handle_ack_seq got newly_acked_seq 0 vs out_seq 0
2018-10-16 17:39:27.001 7fe0d3fc9700 10 -- 172.21.15.6:38668/2121577321 >> 172.21.15.176:6817/1640707343 conn(0x7fe0bc02bf30 legacy :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=0 l=0).discard_requeued_up_to 0
2018-10-16 17:39:27.001 7fe0d3fc9700 10 -- 172.21.15.6:38668/2121577321 >> 172.21.15.176:6817/1640707343 conn(0x7fe0bc02bf30 legacy :-1 s=STATE_CONNECTION_ESTABLISHED l=0)._try_send sent bytes 8 remaining bytes 0
2018-10-16 17:39:27.001 7fe0d3fc9700 20 -- 172.21.15.6:38668/2121577321 >> 172.21.15.176:6817/1640707343 conn(0x7fe0bc02bf30 legacy :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=0 l=0).handle_in_seq_write r=0
2018-10-16 17:39:27.001 7fe0d3fc9700 10 -- 172.21.15.6:38668/2121577321 >> 172.21.15.176:6817/1640707343 conn(0x7fe0bc02bf30 legacy :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=0 l=0).handle_in_seq_write send in_seq done
2018-10-16 17:39:27.001 7fe0d3fc9700 20 -- 172.21.15.6:38668/2121577321 >> 172.21.15.176:6817/1640707343 conn(0x7fe0bc02bf30 legacy :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=0 l=0).client_ready
2018-10-16 17:39:27.001 7fe0d3fc9700 10 -- 172.21.15.6:38668/2121577321 >> 172.21.15.176:6817/1640707343 conn(0x7fe0bc02bf30 legacy :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=14 cs=1 l=0).client_ready connect success 1, lossy = 0, features 4611087854031142911
2018-10-16 17:39:27.005 7fe0d3fc9700 20 -- 172.21.15.6:38668/2121577321 >> 172.21.15.176:6817/1640707343 conn(0x7fe0bc02bf30 legacy :-1 s=OPENED pgs=14 cs=1 l=0).wait_message
2018-10-16 17:39:27.005 7fe0d3fc9700 20 -- 172.21.15.6:38668/2121577321 >> 172.21.15.176:6817/1640707343 conn(0x7fe0bc02bf30 legacy :-1 s=STATE_CONNECTION_ESTABLISHED l=0).read start len=1
2018-10-16 17:39:27.005 7fe0caffd700 10 client.4493 ms_handle_connect on 172.21.15.176:6817/1640707343
2018-10-16 17:39:27.005 7fe097fff700  8 client.4493 _ll_getattr 0x1.head
2018-10-16 17:39:27.005 7fe097fff700 10 client.4493 _getattr mask pAsLsXsFs issued=1
2018-10-16 17:39:27.005 7fe097fff700 10 client.4493 fill_stat on 0x1 snap/devhead mode 041777 mtime 2018-10-16 17:38:19.464553 ctime 2018-10-16 17:38:19.464553
2018-10-16 17:39:27.005 7fe097fff700  3 client.4493 ll_getattr 0x1.head = 0
2018-10-16 17:39:27.005 7fe097fff700  8 client.4493 _ll_forget 0x1 1
2018-10-16 17:39:27.005 7fe0b1ffb700  3 client.4493 ll_opendir 0x1.head
2018-10-16 17:39:27.005 7fe0b1ffb700 20 client.4493 may_open 0x1.head(faked_ino=0 ref=3 ll_ref=1882 cap_refs={} open={} mode=41777 size=0/0 nlink=1 btime=2018-10-16 17:37:45.984716 mtime=2018-10-16 17:38:19.464553 ctime=2018-10-16 17:38:19.464553 caps=pAsLsXsFs(0=pAsLsXsFs) has_dir_layout 0x7fe0b8006910); UserPerm(uid: 0, gid: 0)
2018-10-16 17:39:27.005 7fe0b1ffb700 10 client.4493 _getattr mask As issued=1
2018-10-16 17:39:27.005 7fe0b1ffb700  3 client.4493 may_open 0x7fe0b8006910 = 0
2018-10-16 17:39:27.005 7fe0b1ffb700 15 inode.get on 0x7fe0b8006910 0x1.head now 4
2018-10-16 17:39:27.005 7fe0b1ffb700  8 client.4493 _opendir(0x1) = 0 (0x7fe0a00a4270)
2018-10-16 17:39:27.005 7fe0b1ffb700  3 client.4493 ll_opendir 0x1.head = 0 (0x7fe0a00a4270)
2018-10-16 17:39:27.005 7fe0b1ffb700  8 client.4493 _ll_forget 0x1 1
2018-10-16 17:39:27.005 7fe0b0ff9700  8 client.4493 _ll_getattr 0x1.head
2018-10-16 17:39:27.005 7fe0b0ff9700 10 client.4493 _getattr mask pAsLsXsFs issued=1
2018-10-16 17:39:27.005 7fe0b0ff9700 10 client.4493 fill_stat on 0x1 snap/devhead mode 041777 mtime 2018-10-16 17:38:19.464553 ctime 2018-10-16 17:38:19.464553
2018-10-16 17:39:27.005 7fe0b0ff9700  3 client.4493 ll_getattr 0x1.head = 0
2018-10-16 17:39:27.005 7fe0b0ff9700  8 client.4493 _ll_forget 0x1 1
2018-10-16 17:39:27.005 7fe0b17fa700  3 client.4493 ll_releasedir 0x7fe0a00a4270
2018-10-16 17:39:27.005 7fe0b17fa700 10 client.4493 _closedir(0x7fe0a00a4270)
2018-10-16 17:39:27.005 7fe0b17fa700 10 client.4493 _closedir detaching inode 0x7fe0b8006910
2018-10-16 17:39:27.005 7fe0b17fa700 10 client.4493 put_inode on 0x1.head(faked_ino=0 ref=4 ll_ref=1883 cap_refs={} open={} mode=41777 size=0/0 nlink=1 btime=2018-10-16 17:37:45.984716 mtime=2018-10-16 17:38:19.464553 ctime=2018-10-16 17:38:19.464553 caps=pAsLsXsFs(0=pAsLsXsFs) has_dir_layout 0x7fe0b8006910)
2018-10-16 17:39:27.005 7fe0b17fa700 15 inode.put on 0x7fe0b8006910 0x1.head now 3
2018-10-16 17:39:27.005 7fe0b17fa700 10 client.4493 _readdir_drop_dirp_buffer 0x7fe0a00a4270
2018-10-16 17:39:27.005 7fe097fff700  8 client.4493 _ll_getattr 0x1.head
2018-10-16 17:39:27.005 7fe097fff700 10 client.4493 _getattr mask pAsLsXsFs issued=1
2018-10-16 17:39:27.005 7fe097fff700 10 client.4493 fill_stat on 0x1 snap/devhead mode 041777 mtime 2018-10-16 17:38:19.464553 ctime 2018-10-16 17:38:19.464553
2018-10-16 17:39:27.005 7fe097fff700  3 client.4493 ll_getattr 0x1.head = 0
2018-10-16 17:39:27.005 7fe097fff700  8 client.4493 _ll_forget 0x1 1
2018-10-16 17:39:27.897 7fe0cbfff700 20 client.4493 trim_cache size 7 max 16384
2018-10-16 17:39:28.889 7fe0c9ffb700 10 -- 172.21.15.6:38668/2121577321 >> 172.21.15.176:6790/0 conn(0x55d7b129b580 legacy :-1 s=OPENED pgs=153 cs=1 l=1).send_keepalive
2018-10-16 17:39:28.889 7fe0c9ffb700 20 Event(0x55d7b123da00 nevent=5000 time_id=5).wakeup
2018-10-16 17:39:28.889 7fe0d47ca700  4 -- 172.21.15.6:38668/2121577321 >> 172.21.15.176:6790/0 conn(0x55d7b129b580 legacy :-1 s=STATE_CONNECTION_ESTABLISHED l=1).handle_write
2018-10-16 17:39:28.889 7fe0d47ca700 10 -- 172.21.15.6:38668/2121577321 >> 172.21.15.176:6790/0 conn(0x55d7b129b580 legacy :-1 s=OPENED pgs=153 cs=1 l=1).write_event
2018-10-16 17:39:28.889 7fe0d47ca700 10 -- 172.21.15.6:38668/2121577321 >> 172.21.15.176:6790/0 conn(0x55d7b129b580 legacy :-1 s=OPENED pgs=153 cs=1 l=1).append_keepalive_or_ack
2018-10-16 17:39:28.889 7fe0d47ca700 10 -- 172.21.15.6:38668/2121577321 >> 172.21.15.176:6790/0 conn(0x55d7b129b580 legacy :-1 s=STATE_CONNECTION_ESTABLISHED l=1)._try_send sent bytes 9 remaining bytes 0
2018-10-16 17:39:28.889 7fe0d47ca700 20 -- 172.21.15.6:38668/2121577321 >> 172.21.15.176:6790/0 conn(0x55d7b129b580 legacy :-1 s=STATE_CONNECTION_ESTABLISHED l=1).process
2018-10-16 17:39:28.889 7fe0d47ca700 20 -- 172.21.15.6:38668/2121577321 >> 172.21.15.176:6790/0 conn(0x55d7b129b580 legacy :-1 s=STATE_CONNECTION_ESTABLISHED l=1).read continue len=1
2018-10-16 17:39:28.889 7fe0d47ca700 20 -- 172.21.15.6:38668/2121577321 >> 172.21.15.176:6790/0 conn(0x55d7b129b580 legacy :-1 s=OPENED pgs=153 cs=1 l=1).handle_message r=0
2018-10-16 17:39:28.889 7fe0d47ca700 20 -- 172.21.15.6:38668/2121577321 >> 172.21.15.176:6790/0 conn(0x55d7b129b580 legacy :-1 s=OPENED pgs=153 cs=1 l=1).handle_message process tag 15
2018-10-16 17:39:28.889 7fe0d47ca700 20 -- 172.21.15.6:38668/2121577321 >> 172.21.15.176:6790/0 conn(0x55d7b129b580 legacy :-1 s=STATE_CONNECTION_ESTABLISHED l=1).read start len=8
2018-10-16 17:39:28.889 7fe0d47ca700 20 -- 172.21.15.6:38668/2121577321 >> 172.21.15.176:6790/0 conn(0x55d7b129b580 legacy :-1 s=OPENED pgs=153 cs=1 l=1).handle_keepalive2_ack r=0
2018-10-16 17:39:28.889 7fe0d47ca700 20 -- 172.21.15.6:38668/2121577321 >> 172.21.15.176:6790/0 conn(0x55d7b129b580 legacy :-1 s=OPENED pgs=153 cs=1 l=1).handle_keepalive2_ack got KEEPALIVE_ACK
2018-10-16 17:39:28.889 7fe0d47ca700 20 -- 172.21.15.6:38668/2121577321 >> 172.21.15.176:6790/0 conn(0x55d7b129b580 legacy :-1 s=OPENED pgs=153 cs=1 l=1).wait_message
2018-10-16 17:39:28.889 7fe0d47ca700 20 -- 172.21.15.6:38668/2121577321 >> 172.21.15.176:6790/0 conn(0x55d7b129b580 legacy :-1 s=STATE_CONNECTION_ESTABLISHED l=1).read start len=1
2018-10-16 17:39:28.897 7fe0cbfff700 20 client.4493 trim_cache size 7 max 16384
2018-10-16 17:39:29.897 7fe0cbfff700 20 client.4493 trim_cache size 7 max 16384
2018-10-16 17:39:30.897 7fe0cbfff700 20 client.4493 trim_cache size 7 max 16384
2018-10-16 17:39:31.897 7fe0cbfff700 20 client.4493 trim_cache size 7 max 16384
2018-10-16 17:39:32.897 7fe0cbfff700 20 client.4493 trim_cache size 7 max 16384
2018-10-16 17:39:33.897 7fe0cbfff700 20 client.4493 trim_cache size 7 max 16384

From: /ceph/teuthology-archive/pdonnell-2018-10-16_17:17:21-fs-wip-pdonnell-testing-20181011.152759-distro-basic-smithi/3148389/remote/smithi006/log/ceph-client.0.15648.log.gz

All of the messages got discarded and apparently nothing is driving resending the requests/control messages (MClientReconnect).


Related issues

Related to fs - Bug #36079: ceph-fuse: hang because it miss reconnect phase when hot standby mds switch occurs Need Review
Duplicated by fs - Bug #37544: mds: reconnect of client during thrashing fails Duplicate

History

#1 Updated by Patrick Donnelly about 2 months ago

For posterity, here's the original job that failed: /ceph/teuthology-archive/pdonnell-2018-10-11_17:55:20-fs-wip-pdonnell-testing-20181011.152759-distro-basic-smithi/3128045/teuthology.log

The job in the $description is the same failure with `debug ms = 20`.

2018-10-12 02:38:44.637 7fbad97fa700 10 client.4461 send_request client_request(unknown.0:1569 rename #0x1000000022b/fstest_b7a52942eefb0e03b9f3f456d6feb459 #0x1000000022b/fstest_313575b3505f500e307ef8202d668c03 2018-10-12 02:38:30.427363 RETRY=1 caller_uid=0, caller_gid=0{0,}) v4 to mds.0
2018-10-12 02:38:44.637 7fbad97fa700  1 -- 172.21.15.17:52990/1137632058 --> 172.21.15.166:6816/3748471958 -- client_request(unknown.0:1569 rename #0x1000000022b/fstest_b7a52942eefb0e03b9f3f456d6feb459 #0x1000000022b/fstest_313575b3505f500e307ef8202d668c03 2018-10-12 02:38:30.427363 RETRY=1 caller_uid=0, caller_gid=0{0,}) v4 -- 0x7fbab0014d00 con 0
2018-10-12 02:38:44.637 7fbad97fa700 10 client.4461  caps on 0x1000000022f.head pAsXsFs wants -
2018-10-12 02:38:44.637 7fbae2bdc700  1 -- 172.21.15.17:52990/1137632058 >> 172.21.15.166:6816/3748471958 conn(0x7fbad40139d0 legacy :-1 s=STATE_CONNECTION_ESTABLISHED l=0).read_bulk peer close file descriptor 20
2018-10-12 02:38:44.641 7fbad97fa700 10 client.4461     path #0x1000000022f
2018-10-12 02:38:44.641 7fbae2bdc700  1 -- 172.21.15.17:52990/1137632058 >> 172.21.15.166:6816/3748471958 conn(0x7fbad40139d0 legacy :-1 s=STATE_CONNECTION_ESTABLISHED l=0).read_until read failed
2018-10-12 02:38:44.641 7fbae2bdc700  1 -- 172.21.15.17:52990/1137632058 >> 172.21.15.166:6816/3748471958 conn(0x7fbad40139d0 legacy :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=0 l=0).handle_connect_reply_1 read connect reply failed
2018-10-12 02:38:44.641 7fbad97fa700 10 client.4461  snaprealm snaprealm(0x1 nref=14 c=0 seq=1 parent=0x0 my_snaps=[] cached_snapc=1=[])
2018-10-12 02:38:44.641 7fbad97fa700 10 client.4461  caps on 0x1000000022b.head pAsLsXsFsx wants -
2018-10-12 02:38:44.641 7fbad97fa700 10 client.4461     path #0x1/client.0/tmp/tmp/fstest_6ec4cd3b41514feb09c21fe4b2ad8868
2018-10-12 02:38:44.641 7fbae2bdc700  0 -- 172.21.15.17:52990/1137632058 >> 172.21.15.166:6816/3748471958 conn(0x7fbad40139d0 legacy :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=0 l=0).fault initiating reconnect
2018-10-12 02:38:44.641 7fbad97fa700 10 client.4461  caps on 0x100000000d9.head pAsLsXsFsx wants -
2018-10-12 02:38:44.641 7fbad97fa700 10 client.4461     path #0x1/client.0/tmp/tmp
2018-10-12 02:38:44.641 7fbad97fa700 10 client.4461  caps on 0x100000000d8.head pAsLsXsFsxcrwb wants pFscr
2018-10-12 02:38:44.641 7fbad97fa700 10 client.4461     path #0x100000000d8
2018-10-12 02:38:44.641 7fbad97fa700 10 client.4461  caps on 0x10000000231.head pAsXsFs wants -
2018-10-12 02:38:44.641 7fbad97fa700 10 client.4461     path #0x1/client.0/tmp/tmp/fstest_6ec4cd3b41514feb09c21fe4b2ad8868/fstest_313575b3505f500e307ef8202d668c03
2018-10-12 02:38:44.641 7fbad97fa700 10 client.4461  caps on 0x1000000003a.head pAsLsXsFsx wants -
2018-10-12 02:38:44.641 7fbad97fa700 10 client.4461     path #0x1/client.0/tmp/pjd-fstest-20090130-RC/tests/rename
2018-10-12 02:38:44.641 7fbad97fa700 10 client.4461  caps on 0x1000000021e.head pAsLsXsFs wants -
2018-10-12 02:38:44.641 7fbad97fa700 10 client.4461     path #0x1000000021e
2018-10-12 02:38:44.641 7fbad97fa700 10 client.4461  caps on 0x10000000001.head pAsLsXsFsx wants -
2018-10-12 02:38:44.641 7fbad97fa700 10 client.4461     path #0x1/client.0/tmp
2018-10-12 02:38:44.641 7fbad97fa700 10 client.4461  caps on 0x10000000000.head pAsLsXsFsx wants -
2018-10-12 02:38:44.641 7fbad97fa700 10 client.4461     path #0x1/client.0
2018-10-12 02:38:44.641 7fbad97fa700 10 client.4461  caps on 0x10000000005.head pAsLsXsFsx wants -
2018-10-12 02:38:44.641 7fbad97fa700 10 client.4461     path #0x1/client.0/tmp/pjd-fstest-20090130-RC/tests
2018-10-12 02:38:44.641 7fbad97fa700 10 client.4461  caps on 0x10000000230.head pAsXsFscr wants -
2018-10-12 02:38:44.641 7fbad97fa700 10 client.4461     path #0x10000000230
2018-10-12 02:38:44.641 7fbad97fa700 10 client.4461  caps on 0x10000000048.head pAsLsXsFscr wants pFscr
2018-10-12 02:38:44.641 7fbad97fa700 10 client.4461     path #0x10000000048
2018-10-12 02:38:44.641 7fbad97fa700 10 client.4461  caps on 0x10000000003.head pAsLsXsFsx wants -
2018-10-12 02:38:44.641 7fbad97fa700 10 client.4461     path #0x1/client.0/tmp/pjd-fstest-20090130-RC
2018-10-12 02:38:44.641 7fbad97fa700 10 client.4461  caps on 0x1.head pAsLsXsFs wants -
2018-10-12 02:38:44.641 7fbad97fa700 10 client.4461     path #0x1
2018-10-12 02:38:44.641 7fbad97fa700 20 client.4461  reflushing caps (early_kick) on 0x1000000021e.head(faked_ino=0 ref=1 ll_ref=0 cap_refs={} open={1=0,2=0,3=0} mode=100770 size=0/4194304 nlink=0 btime=0.000000 mtime=2018-10-12 02:38:26.536732 ctime=2018-10-12 02:38:27.298530 caps=pAsLsXsFs(0=pAsLsXsFs) flushing_caps=Ax objectset[0x1000000021e ts 0/0 objects 0 dirty_or_tx 0] 0x7fbac807fb00) to mds.0
2018-10-12 02:38:44.641 7fbad97fa700 10 client.4461 flush_caps 0x7fbac807fb00 mds.0
2018-10-12 02:38:44.641 7fbad97fa700 10 client.4461 send_cap 0x1000000021e.head(faked_ino=0 ref=1 ll_ref=0 cap_refs={} open={1=0,2=0,3=0} mode=100770 size=0/4194304 nlink=0 btime=0.000000 mtime=2018-10-12 02:38:26.536732 ctime=2018-10-12 02:38:27.298530 caps=pAsLsXsFs(0=pAsLsXsFs) flushing_caps=Ax objectset[0x1000000021e ts 0/0 objects 0 dirty_or_tx 0] 0x7fbac807fb00) mds.0 seq 0 async  used Ax want - flush Ax retain pAsLsXsFs held pAsLsXsFs revoking - dropping -
2018-10-12 02:38:44.641 7fbad97fa700 15 client.4461 auth cap, setting max_size = 0
2018-10-12 02:38:44.641 7fbad97fa700  1 -- 172.21.15.17:52990/1137632058 --> 172.21.15.166:6816/3748471958 -- client_caps(update ino 0x1000000021e 544 seq 0 tid 276 caps=pAsLsXsFs dirty=Ax wanted=- follows 1 size 0/0 ts 1/18446744073709551615 mtime 2018-10-12 02:38:26.536732) v11 -- 0x7fbac806c2b0 con 0
2018-10-12 02:38:44.641 7fbad97fa700  1 -- 172.21.15.17:52990/1137632058 --> 172.21.15.166:6816/3748471958 -- client_reconnect(14 caps) v3 -- 0x7fbad40165a0 con 0
2018-10-12 02:38:44.641 7fbae2bdc700  0 -- 172.21.15.17:52990/1137632058 >> 172.21.15.166:6816/3748471958 conn(0x7fbad40139d0 legacy :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=1 l=0).handle_connect_reply_2 connect got RESETSESSION
2018-10-12 02:38:44.641 7fbad97fa700  0 client.4461 ms_handle_remote_reset on 172.21.15.166:6816/3748471958
2018-10-12 02:38:44.641 7fbad97fa700  1 -- 172.21.15.17:52990/1137632058 --> 172.21.15.17:6789/0 -- mon_subscribe({osdmap=22}) v3 -- 0x7fbac8080080 con 0
2018-10-12 02:38:44.641 7fbad97fa700  1 client.4461 reset from mds we were open; mark session as stale
2018-10-12 02:38:44.641 7fbad97fa700 10 client.4461 ms_handle_connect on 172.21.15.166:6816/3748471958
2018-10-12 02:38:44.645 7fbabaffd700  8 client.4461 _ll_getattr 0x1.head

From: /ceph/teuthology-archive/pdonnell-2018-10-11_17:55:20-fs-wip-pdonnell-testing-20181011.152759-distro-basic-smithi/3128045/remote/smithi017/log/ceph-client.0.15659.log.gz

#2 Updated by Patrick Donnelly about 2 months ago

  • Related to Bug #36079: ceph-fuse: hang because it miss reconnect phase when hot standby mds switch occurs added

#3 Updated by Zheng Yan about 2 months ago

client bug or messenger bug?

#4 Updated by Zheng Yan about 2 months ago

I think the reset was sent by following code

    if (connection->policy.resetcheck &&  // RESETSESSION only used by servers;
                                          // peers do not reset each other
        exproto->connect_seq == 0) {
      ldout(cct, 0) << __func__ << " accept we reset (peer sent cseq " 
                    << connect_msg.connect_seq << ", " << existing
                    << ".cseq = " << exproto->connect_seq
                    << "), sending RESETSESSION " << dendl;
      existing->lock.unlock();
      return send_connect_message_reply(CEPH_MSGR_TAG_RESETSESSION, reply,
                                        authorizer_reply);
    }

#5 Updated by Patrick Donnelly about 2 months ago

Zheng Yan wrote:

client bug or messenger bug?

It is probably two bugs (both).

The client should not get stuck during reconnect like this due to a connection reset. Fixing this is not trivial. According to the messenger debug, it drops the messages (see also doc comments for ms_handle_reset) and we sent while handling up:reconnect in Client::handle_mds_map. The client needs to recreate or resend those messages. (I do wonder if there's some knob we can turn in the msgr to make it not drop those messages and retry in the background. I haven't looked.)

The second bug would be in the msgr. It's too coincidental to see the same reset in the same time period like this. It is too extraordinary to be explained by a random network hiccup.

#6 Updated by Patrick Donnelly about 2 months ago

Another: /ceph/teuthology-archive/pdonnell-2018-10-24_02:35:37-fs-wip-pdonnell-testing-20181023.224346-distro-basic-smithi/3177818/remote/smithi189/log/ceph-client.0.38768.log.gz

#7 Updated by Patrick Donnelly about 1 month ago

  • Assignee set to Patrick Donnelly

#8 Updated by Patrick Donnelly 2 days ago

  • Duplicates Bug #37544: mds: reconnect of client during thrashing fails added

#9 Updated by Patrick Donnelly 2 days ago

  • Duplicates deleted (Bug #37544: mds: reconnect of client during thrashing fails)

#10 Updated by Patrick Donnelly 2 days ago

  • Duplicated by Bug #37544: mds: reconnect of client during thrashing fails added

Also available in: Atom PDF