audit: create audit module which persists in RADOS important operations performed on the cluster
Currently Ceph has an "audit" log which is a free-form text log that contains mostly monitor commands executed on the cluster. This log serves to provide a auditable history for modifications or rescue commands, for the purposes of support or security. (Note: in Quincy, the LogMonitor also persists the recent history for query via the CLI.)
There are a few issues with this log as-is:
- It's a text file that can be deleted/rotated/lost. Sometimes rescue operations done months or years in the past may be of interest for a present disaster.
- The audit log requires mon w cephx caps to log. This is too restrictive for some use-cases. In particular, CephFS would like to write audit log entries for disaster recovery commands run on the cluster (#62856), including MDS scrubs.
So, this ticket proposes a new mgr module which accepts audit log entries that may persist for indefinite time periods. The module should support hierarchical namespaces like "cephfs/<fscid>/" and free-form entries (probably json). The module database in the .mgr pool can be used to persist the logs.
Some open questions:
- What kind of permission/cephx key checks should there be. How should that be configured per-namespace?
- What the API will look like.
- Whether all audit module entries should also be sent (using the mgr cephx credentials) to the normal cluster log.
- How to configure a retention specification.