Bug #57796: after rebalance of pool via pgupmap balancer, continuous issues in monitor log - RADOS - Ceph

Actions

Copy link

Bug #57796

open

after rebalance of pool via pgupmap balancer, continuous issues in monitor log

Added by Chris Durham over 1 year ago. Updated about 1 year ago.

Status:

Need More Info

Priority:

Normal

Assignee:

Category:

Monitor

Target version:

% Done:

Source:

Community (user)

Tags:

pg upmap

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

Ceph - v16.2.9

ceph-qa-suite:

Component(RADOS):

pgmap

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

The pgupmap balancer was not balancing well, and after setting mgr/balancer/upmap_max_deviation to 1 (ceph config-key ...), the balancer kicked in and moved things around, resulting in a nicely balanced set of osds and pgs. Awesome.

However, it appears, that after the rebalance, the monitor logs are filling up (/var/log/ceph/ceph-mon.servername.log), every three minutes, with a line for every OSD that was affected by this rebalance. Those lines are of the following form:

2022-10-07T17:10:39.619+0000 7f7c2786d700 1 verify_upmap unable to get parent of osd.497, skipping for now

So, if the rebalance affected around 100 OSDs, there are around 100 lines of this form in my monitor log every 3 minutes. The pool in question is an ec pool.
I know the rebalance creates pg upmap items. But why does this warning/error happen, and is it a problem?

The pool with these osds (only 1) uses a custom crush root of the form:

root mycustomroot
rack rack1
pod pod1
host host1
host host2
pod pod2
host host3
host host4
rack rack2
pod pod3
host host5
...

In typing this up, I noticed that the hosts are also part of the 'default' crush root that no pool uses. Perhaps that is the issue...? Please advise.

Files

upmap.txt (3.04 KB) upmap.txt

Chris Durham, 10/10/2022 06:30 PM

Related issues 1 (1 open — 0 closed)

Actions

Copy link

Updated by Chris Durham over 1 year ago

preformatting the crush info so it shows up properly ...

root mycustomroot
  rack rack1
    pod pod1
      host host1
      host host2
    pod pod2
      host host3
      host host4
  rack rack2
    pod pod3
      host host5
      ...

Actions

Copy link

Updated by Chris Durham over 1 year ago

Note that the balancer balanced a replicated pool, using its own custom crush root too. The hosts in that pool (not in the ec pool affected) are also in the default crush root, but none of the verify_upmap log entries complain about osds in that pool.

Actions

Copy link

Updated by Chris Durham over 1 year ago

I removed the hosts holding the osds reported by verify_upmap from the default root rule that no one uses, and the log entries continue

Actions

Copy link

Updated by Radoslaw Zarzynski over 1 year ago

Status changed from New to Need More Info

Thanks for the report! The log comes from there:

int CrushWrapper::verify_upmap(CephContext *cct,
                               int rule_id,
                               int pool_size,
                               const vector<int>& up)
{
    // ...
      {
        int numrep = curstep->arg1;
        int type = curstep->arg2;
        if (numrep <= 0)
          numrep += pool_size;
        type_stack.emplace(type, numrep);
        if (type == 0) // osd
          break;
        map<int, set<int>> osds_by_parent; // parent_of_desired_type -> osds
        for (auto osd : up) {
          auto parent = get_parent_of_type(osd, type, rule_id);
          if (parent < 0) {
            osds_by_parent[parent].insert(osd);
          } else {
            ldout(cct, 1) << __func__ << " unable to get parent of osd." << osd
                          << ", skipping for now" 
                          << dendl;
          }
        }

It looks the verify_upmap was looking for parents for those OSDs (which should always be CRUSH buckets) but got something with non-negative ID (which is weird).

Could you please provide dump the CRUSH map as well as the output ceph osd tree?

Hints:
*https://docs.ceph.com/en/pacific/man/8/crushtool/ * ceph osd getcrushmap

Actions

Copy link

Updated by Chris Durham over 1 year ago

File upmap.txt upmap.txt added

Radoslaw,

Yes, I saw that piece of code too. But i think I figured it out just a short time ago. I had the crush hierarchy backwards. My crush rule has: pick racks(4)->pods(2)->host(1)(leaf). (It is a 6+2) EC Pool. So I get 8 chunks. But the hierarchy is that pods are HIGHER than racks. So I extracted the osdmap, and I ran: osdmaptool osdmap --upmap-cleanup. Doing so gives me the exact same errors as in the ceph-mon log for verify_upmap.

If I extract the crushmap from the osdmap, modify it to pick racks(4)->host(2)(leaf), put the crushmap back into the osdmap and run osdmaptool osdmap --upmap-cleanup, the verify_upmap messages do not occur, (but i get other upmap add/rm

My question is, if I actually deploy the crushmap without the pod choice (I can live without it), will I be ok or will it cause more problems given the current state. I am surprised that crush let me choose such a rule to begin with. The PGs look fine as to their OSDs and such.

Thanks

See the attached message I sent to ceph-users, that has what you asked.

Actions

Copy link