Bug #57796
openafter rebalance of pool via pgupmap balancer, continuous issues in monitor log
0%
Description
The pgupmap balancer was not balancing well, and after setting mgr/balancer/upmap_max_deviation to 1 (ceph config-key ...), the balancer kicked in and moved things around, resulting in a nicely balanced set of osds and pgs. Awesome.
However, it appears, that after the rebalance, the monitor logs are filling up (/var/log/ceph/ceph-mon.servername.log), every three minutes, with a line for every OSD that was affected by this rebalance. Those lines are of the following form:
2022-10-07T17:10:39.619+0000 7f7c2786d700 1 verify_upmap unable to get parent of osd.497, skipping for now
So, if the rebalance affected around 100 OSDs, there are around 100 lines of this form in my monitor log every 3 minutes. The pool in question is an ec pool.
I know the rebalance creates pg upmap items. But why does this warning/error happen, and is it a problem?
The pool with these osds (only 1) uses a custom crush root of the form:
root mycustomroot
rack rack1
pod pod1
host host1
host host2
pod pod2
host host3
host host4
rack rack2
pod pod3
host host5
...
In typing this up, I noticed that the hosts are also part of the 'default' crush root that no pool uses. Perhaps that is the issue...? Please advise.
Files