Cleanup #55835
openmgr: mute/hide NOTIFY_TYPES log errors
0%
Description
As mentioned here:
Now, if a module defines no NOTIFY_TYPES, as quite some do, we get a rather ugly error on any manager (re)start, e.g. on quincy: ceph-mgr[19176]: 2022-06-01T15:35:03.658+0200 7fd4a402fe80 -1 mgr[py] Module telemetry has missing NOTIFY_TYPES member ceph-mgr[19176]: 2022-06-01T15:35:03.926+0200 7fd4a402fe80 -1 mgr[py] Module volumes has missing NOTIFY_TYPES member ceph-mgr[19176]: 2022-06-01T15:35:04.078+0200 7fd4a402fe80 -1 mgr[py] Module rbd_support has missing NOTIFY_TYPES member ceph-mgr[19176]: 2022-06-01T15:35:04.186+0200 7fd4a402fe80 -1 mgr[py] Module devicehealth has missing NOTIFY_TYPES member ceph-mgr[19176]: 2022-06-01T15:35:04.302+0200 7fd4a402fe80 -1 mgr[py] Module crash has missing NOTIFY_TYPES member ceph-mgr[19176]: 2022-06-01T15:35:04.386+0200 7fd4a402fe80 -1 mgr[py] Module iostat has missing NOTIFY_TYPES member ceph-mgr[19176]: 2022-06-01T15:35:04.482+0200 7fd4a402fe80 -1 mgr[py] Module influx has missing NOTIFY_TYPES member ceph-mgr[19176]: 2022-06-01T15:35:04.574+0200 7fd4a402fe80 -1 mgr[py] Module selftest has missing NOTIFY_TYPES member It's not that bad, but QA complained on internal testing, and I'd figure some users of ours would get scared too. Without being too much into the code base I can think of the following options to improve this from top of my head: 1. defuse the error to a less visible one 2. add an empty NOTIFY_TYPES array to all other modules too 3. modules without any such entry get always notified instead (possibly behavior changing) The first and second options seem relatively non-invasive to me, for the third option I'd need to know what the notify mechanisms triggers exactly (could look it up just fine but figured I post here first as y'all probably now much better already). FWIW, out of the 37 modules I'm counting here in src/pybind/mgr 10 have a NOTIFY_TYPES array specified.
Updated by Dhairya Parmar about 1 year ago
Ernesto Puerta wrote:
As mentioned here:
[...]
This is existent in one of the failure logs I was investigating and it did scare me too, this happens at https://github.com/ceph/ceph/blob/main/src/mgr/PyModule.cc#L518 and usage of `derr` is indeed going to spam/pollute the logs. IF this doesn't break anything, then I think `dout(10)` or `dout(20)` can be used instead. This will lead to cleaner logs and anyone can ramp up the debug level to get these lines logged.
Updated by John Mulligan 13 days ago
I made a draft pr for this: https://github.com/ceph/ceph/pull/57106
I didn't feel like verifying the unit tests pass for this locally (I'm set up mainly for python stuff locally, not c++), so I'll leave it in draft until I confirm make check passes. If it does and there are no major objections I'll update this tracker to have that pr as a proper fix.