Bug #49161
opencommon: ceph cluster and MON log being spammed with debug messages which cannot be turned off
0%
Description
My MONs started running out of memory memory due to massive log spam. I had a /var/log/ceph-mon.*.log
file of about 3.5GB and a /var/log/ceph/ceph.log
cluster log of more than 60GB (!!).
Any debug_*
settings were either at their defaults or at something such as 1/5 or even 0/0. I dumped the effective MON configs and found the setting mon_cluster_log_file_level
, which I never set explicitly, but somehow its default value was debug
all of a sudden (documentation says it should be info
). Once I set this setting to warning, the spam to ceph.log
stopped, but the MON daemon logs are still being spammed with messages like these:
Feb 04 12:18:19 XXX ceph-mon[25257]: 2021-02-04T12:18:19.267+0100 7f295eb0a700 0 log_channel(cluster) log [DBG] : mds.1 [v2:XXX:6800/3706045336,v1:XXX:6801/3706045336] up:active Feb 04 12:18:19 XXX ceph-mon[25257]: 2021-02-04T12:18:19.267+0100 7f295eb0a700 0 log_channel(cluster) log [DBG] : fsmap cephfs.storage:3 {0=XXX=up:active,1=XXX=up:active,2=XXX=up:active} 3 up:standby-replay 1 up:standby Feb 04 12:18:10 XXX ceph-mon[25257]: 2021-02-04T12:18:10.631+0100 7f296030d700 0 mon.XXX@0(leader) e22 handle_command mon_command({"prefix": "status"} v 0) v1 Feb 04 12:18:10 XXX ceph-mon[25257]: 2021-02-04T12:18:10.631+0100 7f296030d700 0 log_channel(audit) log [DBG] : from='client.? XXX:0/186853831' entity='client.admin' cmd=[{"prefix": "stat us"}]: dispatch
So, both the cluster and audit channel are still writing to the mon log wit debug level. Setting clog_to_monitors to false doesn't change anything. I did not have this issue previously.
Here is my logging config:
global advanced clog_to_syslog_level warning global basic err_to_syslog true global basic log_to_file false global basic log_to_stderr false global basic log_to_syslog true global advanced mon_cluster_log_file_level error global advanced mon_cluster_log_to_file false global advanced mon_cluster_log_to_stderr false global advanced mon_cluster_log_to_syslog false global advanced mon_cluster_log_to_syslog_level warning
The debug_*
settings are all at their defaults right now (for debug_mon
that is 1/5).
I completely disabled logging to files at the moment and redirect everything to syslog/journald, but the issue occurs regardless of the logging facility.
Updated by Janek Bevendorff about 3 years ago
It seems like this is only happening when logging to syslog/journald. If I set err_to_syslog
to true
, I see DBG
messages in the MON logs coming from the "cluster" and "audit" facilities.The same log written to disk when log_to_file
is enabled, does not contain these messages. I suppose there is something wrong with the log level set on application side for the syslog sink.
Yesterday, we had a very stability-threatening issues which required us to shut down all OSDs and reboot the cluster, because OSDs were thrashing the MONs with log messages. I don't know if this is related, but it could be. The issue is probably not the same as https://tracker.ceph.com/issues/39955, but the effects were quite similar. This only happened spontaneously, though, so the two incidents may not actually be connected.
For now, I have disabled syslog logging, but I would like to turn it back on once this is fixed. It makes it much easier to analyse and scrape logs to Promtail/Loki and it makes log creep more manageable on diskless servers where the rootfs is RAM.
Updated by Sebastian Wagner almost 3 years ago
- Project changed from Ceph to CephFS
Updated by Patrick Donnelly almost 3 years ago
- Project changed from CephFS to Ceph
- Subject changed from Ceph cluster and MON log being spammed with debug messages which cannot be turned off to common: ceph cluster and MON log being spammed with debug messages which cannot be turned off
The log messages concerning CephFS are not abnormal and should not be changed. However, the log channel always writing to the debug log at level 0 could/should be changed or some other configuration to work around this problem.
Updated by Janek Bevendorff almost 2 years ago
Any progress here? I would like to re-enable cluster logging via syslog, but I can't, because my MONs keep logging all clog/audit messages at DBG level.
Updated by Janek Bevendorff almost 2 years ago
Actually, the same happens with other logging mechanisms as well. As soon as either log_to_file
, log_to_stderr
(which is captured by systemd), or log_to_syslog
are true, the mon logs will receive those INFO/DBG audit messages. As to what happens when the mon_cluster_log_*
settings are set:
mon_cluster_log_to_file = true
: /var/log/ceph/ceph.log and /var/log/ceph/ceph.audit.log are being filled based on mon_cluster_log_file_level
, but regardless of the log_to_file
setting
mon_cluster_log_to_stderr = true
: journalctl mon logs are being spammed no matter what
mon_cluster_log_to_syslog = true
: mon log is being filled with cluster log messages depending on mon_cluster_log_to_syslog_level
, but regardless of log_to_syslog
That means, stderr logging is entirely broken and I cannot enable daemon logs for my MONs if I don't want them to also log cluster messages at DBG level. I consider this a rather serious bug, since I basically have to disable mon daemon logging altogether.
Funnily, clog_to_monitors
, clog_to_syslog
, and clog_to_syslog_level
seem to do nothing here.
Updated by Prashant D about 1 year ago
Hi Janek,
This tracker looks duplicate of https://tracker.ceph.com/issues/57061. This is a known behavior at the moment and I will get the fix merged soon. Let me know your thoughts on it.