Project

General

Profile

Actions

Bug #49161

open

common: ceph cluster and MON log being spammed with debug messages which cannot be turned off

Added by Janek Bevendorff about 3 years ago. Updated about 1 year ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

My MONs started running out of memory memory due to massive log spam. I had a /var/log/ceph-mon.*.log file of about 3.5GB and a /var/log/ceph/ceph.log cluster log of more than 60GB (!!).

Any debug_* settings were either at their defaults or at something such as 1/5 or even 0/0. I dumped the effective MON configs and found the setting mon_cluster_log_file_level, which I never set explicitly, but somehow its default value was debug all of a sudden (documentation says it should be info). Once I set this setting to warning, the spam to ceph.log stopped, but the MON daemon logs are still being spammed with messages like these:

Feb 04 12:18:19 XXX ceph-mon[25257]: 2021-02-04T12:18:19.267+0100 7f295eb0a700  0 log_channel(cluster) log [DBG] : mds.1 [v2:XXX:6800/3706045336,v1:XXX:6801/3706045336] up:active
Feb 04 12:18:19 XXX ceph-mon[25257]: 2021-02-04T12:18:19.267+0100 7f295eb0a700  0 log_channel(cluster) log [DBG] : fsmap cephfs.storage:3 {0=XXX=up:active,1=XXX=up:active,2=XXX=up:active} 3 up:standby-replay 1 up:standby
Feb 04 12:18:10 XXX ceph-mon[25257]: 2021-02-04T12:18:10.631+0100 7f296030d700  0 mon.XXX@0(leader) e22 handle_command mon_command({"prefix": "status"} v 0) v1                              
Feb 04 12:18:10 XXX ceph-mon[25257]: 2021-02-04T12:18:10.631+0100 7f296030d700  0 log_channel(audit) log [DBG] : from='client.? XXX:0/186853831' entity='client.admin' cmd=[{"prefix": "stat
us"}]: dispatch

So, both the cluster and audit channel are still writing to the mon log wit debug level. Setting clog_to_monitors to false doesn't change anything. I did not have this issue previously.

Here is my logging config:

global                            advanced  clog_to_syslog_level                                 warning                                                         
global                            basic     err_to_syslog                                        true                                                            
global                            basic     log_to_file                                          false                                                           
global                            basic     log_to_stderr                                        false                                                           
global                            basic     log_to_syslog                                        true                                                            
global                            advanced  mon_cluster_log_file_level                           error                                                           
global                            advanced  mon_cluster_log_to_file                              false                                                           
global                            advanced  mon_cluster_log_to_stderr                            false                                                           
global                            advanced  mon_cluster_log_to_syslog                            false                                                           
global                            advanced  mon_cluster_log_to_syslog_level                      warning 

The debug_* settings are all at their defaults right now (for debug_mon that is 1/5).

I completely disabled logging to files at the moment and redirect everything to syslog/journald, but the issue occurs regardless of the logging facility.

Actions #1

Updated by Janek Bevendorff about 3 years ago

It seems like this is only happening when logging to syslog/journald. If I set err_to_syslog to true, I see DBG messages in the MON logs coming from the "cluster" and "audit" facilities.The same log written to disk when log_to_file is enabled, does not contain these messages. I suppose there is something wrong with the log level set on application side for the syslog sink.

Yesterday, we had a very stability-threatening issues which required us to shut down all OSDs and reboot the cluster, because OSDs were thrashing the MONs with log messages. I don't know if this is related, but it could be. The issue is probably not the same as https://tracker.ceph.com/issues/39955, but the effects were quite similar. This only happened spontaneously, though, so the two incidents may not actually be connected.

For now, I have disabled syslog logging, but I would like to turn it back on once this is fixed. It makes it much easier to analyse and scrape logs to Promtail/Loki and it makes log creep more manageable on diskless servers where the rootfs is RAM.

Actions #2

Updated by Sebastian Wagner almost 3 years ago

  • Project changed from Ceph to CephFS
Actions #3

Updated by Patrick Donnelly almost 3 years ago

  • Project changed from CephFS to Ceph
  • Subject changed from Ceph cluster and MON log being spammed with debug messages which cannot be turned off to common: ceph cluster and MON log being spammed with debug messages which cannot be turned off

The log messages concerning CephFS are not abnormal and should not be changed. However, the log channel always writing to the debug log at level 0 could/should be changed or some other configuration to work around this problem.

Actions #4

Updated by Janek Bevendorff almost 2 years ago

Any progress here? I would like to re-enable cluster logging via syslog, but I can't, because my MONs keep logging all clog/audit messages at DBG level.

Actions #5

Updated by Janek Bevendorff almost 2 years ago

Actually, the same happens with other logging mechanisms as well. As soon as either log_to_file, log_to_stderr (which is captured by systemd), or log_to_syslog are true, the mon logs will receive those INFO/DBG audit messages. As to what happens when the mon_cluster_log_* settings are set:

mon_cluster_log_to_file = true: /var/log/ceph/ceph.log and /var/log/ceph/ceph.audit.log are being filled based on mon_cluster_log_file_level, but regardless of the log_to_file setting

mon_cluster_log_to_stderr = true: journalctl mon logs are being spammed no matter what

mon_cluster_log_to_syslog = true: mon log is being filled with cluster log messages depending on mon_cluster_log_to_syslog_level, but regardless of log_to_syslog

That means, stderr logging is entirely broken and I cannot enable daemon logs for my MONs if I don't want them to also log cluster messages at DBG level. I consider this a rather serious bug, since I basically have to disable mon daemon logging altogether.

Funnily, clog_to_monitors, clog_to_syslog, and clog_to_syslog_level seem to do nothing here.

Actions #6

Updated by Prashant D about 1 year ago

Hi Janek,

This tracker looks duplicate of https://tracker.ceph.com/issues/57061. This is a known behavior at the moment and I will get the fix merged soon. Let me know your thoughts on it.

Actions

Also available in: Atom PDF