Project

General

Profile

Bug #36079

Updated by Patrick Donnelly over 5 years ago

Version: jewel(ceph-10.2.2) 
 MDS mode: active/hot-standby  
 Desscription: 
 As we know MDS will kill the session if client don't send renewcaps to mds within mds_session_autoclose(default 300s). If the client doesn't receive handle_mds_map message from monitor when the active and hot standby mds siwtch occurs it will hang all the time. I dumped the sessions on both client and mds side found the client record its session is open but mds side don't has it. After carefully study the client and mds log, i found the mds already killed the session but the client don't kown that beacause its network is not work at the time. Whereafter the active and hot standby mds switch happens, the client still didn't receive the handle_mds_map message due to its bad network.Thus the client think its session is ok,but it can't send and message to mds although the network is back to normal because its pipe has marked down. 

 client log: 
 <pre> 
 2018-09-13 15:30:22.414875 7f5b40989700 21 client.87547 tick 
 2018-09-13 15:30:22.414923 7f5b40989700 20 client.87547 trim_cache size 0 max 16384 
 2018-09-13 15:30:23.104030 7f5b39577700    3 client.87547 ll_getattr 1.head 
 2018-09-13 15:30:23.104067 7f5b39577700 10 client.87547 _getattr mask pAsLsXsFs issued=0 
 2018-09-13 15:30:23.104082 7f5b39577700 15 Dentry     inode.get on 0x7f5b542b7e00 1.head now 4 
 2018-09-13 15:30:23.104091 7f5b39577700 20 client.87547 choose_target_mds starting with req->inode 1.head(faked_ino=0 ref=4 ll_ref=273 cap_refs={} open={} mode=40755 size=0/0 mtime=2018-09-11 20:13:45.682305 caps=-(0=pAsLsXsFs) has_dir_layout 0x7f5b542b7e00) 
 2018-09-13 15:30:23.104123 7f5b39577700 20 client.87547 choose_target_mds 1.head(faked_ino=0 ref=4 ll_ref=273 cap_refs={} open={} mode=40755 size=0/0 mtime=2018-09-11 20:13:45.682305 caps=-(0=pAsLsXsFs) has_dir_layout 0x7f5b542b7e00) is_hash=0 hash=0 
 2018-09-13 15:30:23.104138 7f5b39577700 10 client.87547 choose_target_mds from caps on inode 1.head(faked_ino=0 ref=4 ll_ref=273 cap_refs={} open={} mode=40755 size=0/0 mtime=2018-09-11 20:13:45.682305 caps=-(0=pAsLsXsFs) has_dir_layout 0x7f5b542b7e00) 
 2018-09-13 15:30:23.104146 7f5b39577700 20 client.87547 mds is 0 
 2018-09-13 15:30:23.104153 7f5b39577700 10 client.87547 send_request rebuilding request 9 for mds.0mds_name:    request op: 257 
 2018-09-13 15:30:23.104159 7f5b39577700 20 client.87547 encode_cap_releases enter (req: 0x7f5b5431b9c0, mds: 0) 
 2018-09-13 15:30:23.104163 7f5b39577700 25 client.87547 encode_cap_releases exit (req: 0x7f5b5431b9c0, mds 0 
 2018-09-13 15:30:23.104165 7f5b39577700 20 client.87547 send_request set sent_stamp to 2018-09-13 15:30:23.104164 
 2018-09-13 15:30:23.104169 7f5b39577700 10 client.87547 send_request client_request(unknown.0:9 getattr pAsLsXsFs #1 2018-09-13 15:30:23.104088) v3 to mds.0 
 2018-09-13 15:30:23.104184 7f5b39577700    1 -- 192.168.12.201:0/1506388177 --> 192.168.12.201:6803/14343 -- client_request(client.87547:9 getattr pAsLsXsFs #1 2018-09-13 15:30:23.104088) v3 -- ?+0 0x7f5b5431aec0 con 0x7f5b5428ea80 
 2018-09-13 15:30:23.104213 7f5b39577700    0 -- 192.168.12.201:0/1506388177 submit_message client_request(client.87547:9 getattr pAsLsXsFs #1 2018-09-13 15:30:23.104088) v3 remote, 192.168.12.201:6803/14343, f*ailed lossy con, dropping message 0x7f5b5431aec0* 
 </pre>

Back