Project

General

Profile

Bug #50491

Updated by Ernesto Puerta about 2 years ago

h3. User Story 

 As a Ceph operator I want to have a unified view of the logs from the different daemons, so that: 
 * I can perform a backward/post-mortem backward analysis of events leading to an issue, 
 * I can monitor cluster events in real-time. 

 h3. Persona 

 * Ceph cluster operator/sys admin 
 * Support engineer 
 * Developers 

 h3. Context 

 Every daemon in Ceph stores its logs locally (there's a "cluster log" but it's extremely concise, and not useful for troubleshooting). This means that if a user wants to perform a post-mortem analysis of an issue, they have first to collect log traces from multiple hosts, which involves to: 
 # Identify on which host a daemon is running, 
 # Log in to that host, 
 # Look for the log file in the filesystem, 
 # Open the log and perform a search. 

 For debugging a Ceph issue, users often have to follow the operational events from multiple daemons, so this tasks gets more and more complicated. Additionally, it's almost impossible to perform real-time (vs. post-mortem) troubleshooting. 

 h3. Implementation details 

 To explore multiple stacks: ELK, Fluentd, Loki, etc. 

 This might be embedded via iframe as already done for Grafana dashboards, or accessed stand-alone. 

 h3. References 

 "SUSE's Ceph + ELK":https://object-storage-ca-ymq-1.vexxhost.net/swift/v1/6e4619c416ff4bd19e1c087f27a43eea/www-assets-prod/summits/26/presentations/23563/slides/Know-more-about-your-Ceph-cluster-ELK-stack2.pdf

Back