Project

General

Profile

Feature #3775

Updated by Sage Weil about 11 years ago

Add We have some very insane logs. 
 the ceph logs often will loop through the same error and fill the /var/log/ceph logs until the disk runs out of space.    If the / disk runs out of space, the system will crash. This is not good for production. 

 we need to do a 'log stop couple things here. 
 1)    if we are repeating errors, we can keep some track/trap them, and report "this error has repeated N times"    I have seen this done on utilization = .95' option that    linux systems, I know it is possible.    it may not be easy. 


 will make attach some log/email conversations i was having 
 2)    it would be nice if we are flooding the log code print one last line like logs to either stop before the root disk is full, clear older logs before the root disk is full, or maybe send a warning to the users, so catastrophe does not strike. 

 --- suspending logging because disk utilization X > X 'log stop I expect these are both longer term issues, but I don't see how to file an RFE.    I am seeing the problem happen on utilization' threshold --- all of my VMs.    I have had my debug set to 20, then all logs removed and reset to 5.    No data being written yet.    Just very noisy due to a time skew (the first time).    Not sure what was causing the other cluster do the same thing.    Yes, my VM's have smaller root disks,    between 2G and 8G, but we also can not assume the customer will have really large root disks either.    A downed cluster is just bad. 

 We can not just assume the customer will handle this,    we need to face the issue and throw out events (as far as disk goes) until it drops down again. work on it.

Back