osd: do not keep full pg log entries in memory
Right now, we keep the full pg log in memory. Each of these entries contains the name of the object involved, which means user naming patterns dramatically alter the OSD's memory usage. In #5700 we had reports of OSDs taking 1.8GB of RAM on startup with 600 PGs (which we haven't seen elsewhere), but that works out to ~1KB/log entry, which is definitely excessive.
To deal with this, we can stop storing the full pg_log_entry in memory and just keep a map or list of osd_reqid_t's. Individual log entries (or the whole thing) can be loaded on-demand when replayed ops come through or the PG has to peer.
This will reduce memory consumption and make it more predictable based on PG counts and total log sizes, at the cost of any necessary log accesses becoming more expensive (most especially, peering). Changes will need to be tested before merging for serious costs.
#2 Updated by Corin Langosch almost 6 years ago
Thank your for taking care of this. This is really a huge problem for us.
I don't quite understand your statement "user naming patterns dramatically alter the OSD's memory usage". We only have three pools, their names are really normal like "kvm-images". All pools only contain rbd images, which are named using a uuid (36 characters), which also seems quite normal.
I have no idea how these logs are used, but why not keep the last version in memory and the rest on disk only?