Project

General

Profile

Actions

Bug #57796

open

after rebalance of pool via pgupmap balancer, continuous issues in monitor log

Added by Chris Durham over 1 year ago. Updated about 1 year ago.

Status:
Need More Info
Priority:
Normal
Assignee:
-
Category:
Monitor
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
pg upmap
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
pgmap
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

The pgupmap balancer was not balancing well, and after setting mgr/balancer/upmap_max_deviation to 1 (ceph config-key ...), the balancer kicked in and moved things around, resulting in a nicely balanced set of osds and pgs. Awesome.

However, it appears, that after the rebalance, the monitor logs are filling up (/var/log/ceph/ceph-mon.servername.log), every three minutes, with a line for every OSD that was affected by this rebalance. Those lines are of the following form:

2022-10-07T17:10:39.619+0000 7f7c2786d700 1 verify_upmap unable to get parent of osd.497, skipping for now

So, if the rebalance affected around 100 OSDs, there are around 100 lines of this form in my monitor log every 3 minutes. The pool in question is an ec pool.
I know the rebalance creates pg upmap items. But why does this warning/error happen, and is it a problem?

The pool with these osds (only 1) uses a custom crush root of the form:

root mycustomroot
rack rack1
pod pod1
host host1
host host2
pod pod2
host host3
host host4
rack rack2
pod pod3
host host5
...

In typing this up, I noticed that the hosts are also part of the 'default' crush root that no pool uses. Perhaps that is the issue...? Please advise.


Files

upmap.txt (3.04 KB) upmap.txt Chris Durham, 10/10/2022 06:30 PM

Related issues 1 (1 open0 closed)

Related to RADOS - Bug #51729: Upmap verification fails for multi-level crush ruleIn ProgressLaura Flores

Actions
Actions

Also available in: Atom PDF