Project

General

Profile

Bug #49736

Updated by Jos Collin about 3 years ago

There are missing keys in the mgr/stats client_metadata for some clients, which causes the exception mentioned in the BZ [1] BZ[1] in cephfs-top [2]. cephfs-top[2]. Either cephfs-top should handle the missing metadata entries or the mgr/stats should fill in defaults until it can update the metadata. This exception occurs unexpectedly with no definite action/steps while cephfs-top is running. 

 Below is the `ceph fs perf stats` dumped during the exception. Notice client.14585. 
 <pre> 
 {"version": 1, "global_counters": ["cap_hit", "read_latency", "write_latency", "metadata_latency", "dentry_lease"], "counters": [],  

 "client_metadata":  
 {"client.14504": {"IP": "127.0.0.1", "hostname": "smithi069", "root": "/", "mount_point": "/mnt/cephfs", "valid_metrics": ["cap_hit", "read_latency", "write_latency", "metadata_latency", "dentry_lease"]},  
 "client.14507": {"IP": "127.0.0.1", "hostname": "smithi069", "root": "/", "mount_point": "/mnt/cephfs2", "valid_metrics": ["cap_hit", "read_latency", "write_latency", "metadata_latency", "dentry_lease"]},  
 "client.14585": {"IP": "127.0.0.1"}},  

 "global_metrics":  
 {"client.14504": [[2, 0], [0, 0], [0, 0], [0, 3038554], [0, 0]],  
 "client.14507": [[2, 0], [0, 0], [0, 0], [0, 3091147], [0, 0]],  
 "client.14585": [[0, 0], [0, 0], [0, 0], [0, 0], [0, 0]]},  

 "metrics": {"delayed_ranks": [], "mds.0": {"client.14504": [], "client.14507": [], "client.14585": []}}} 
 </pre> 

 The mgr logs during the exception reflect the same. The mgr logs cannot be attached to this ticket because of Maximum file size: 1000 KB limit. 

 More Details: 
 Here [3] we set IP metadata initially and then send a request to the mds for the remaining metadata. In the meantime, the current stats are dumped when cephfs-top queries mgr/stats, which would cause the exception. So the cephfs-top should be prepared to handle that OR mgr/stats should fill in the defaults (N/A, not available) and later update when it receives the metadata query reply. On the MDS side, it is observed that the metadata query reply did not contain metadata for client.14585 - this also need to be debugged. 

 [1] https://bugzilla.redhat.com/show_bug.cgi?id=1934426 
 [2] https://github.com/ceph/ceph/blob/master/src/tools/cephfs/top/cephfs-top#L256 
 [3] https://github.com/ceph/ceph/blob/master/src/pybind/mgr/stats/fs/perf_stats.py#L275

Back