|
|
|
I am using pacific 16.2.10 on Rocky 8.6 Linux.
|
|
|
|
After setting max_deviations to 1 on the ceph balancer, I achieved a near perfect balance of PGs and space on my OSDs. This is great.
|
|
|
|
However, I started getting the following errors on my ceph-mon logs, every three minutes, for each of the OSDs that had PGs mapped
|
|
by the balancer:
|
|
|
|
2022-10-07T17:10:39.619+0000 7f7c2786d700 1 verify_upmap unable to get parent of osd.497, skipping for now
|
|
|
|
After banging my head against the wall for a bit trying to figure this out, I think I have discovered the issue:
|
|
|
|
Currently, I have my pool EC Pool, 480 OSDs, confgured with the following rule.
|
|
|
|
rule mypoolname {
|
|
id -5
|
|
type erasure
|
|
step take myroot
|
|
step choose indep 4 type rack
|
|
step choose indep 2 type pod
|
|
step chooseleaf indep 1 type host
|
|
step emit
|
|
}
|
|
|
|
basically, pick 4 racks, then 2 pods, and then one host in each rack, For a total of
|
|
8 chunks. (The pool is a is a 6+2). The 4 racks are chosen from the myroot root entry, which is as follows.
|
|
|
|
|
|
root myroot {
|
|
id -400
|
|
item rack1 weight N
|
|
item rack2 weight N
|
|
item rack3 weight N
|
|
item rack4 weight N
|
|
}
|
|
|
|
This has worked fine since inception, over a year ago.
|
|
|
|
The errors above, verify_upmap, started after I had the max_deviations set to 1 in the balancer and having it
|
|
move things around, creating pg_upmap entries.
|
|
|
|
I then discovered, while trying to figure this out, that the device types are:
|
|
|
|
type 0 osd
|
|
type 1 host
|
|
type 2 chassis
|
|
type 3 rack
|
|
...
|
|
type 6 pod
|
|
|
|
So pod is HIGHER on the heirarchy than rack. I have it as lower on my rule.
|
|
|
|
What I want to do is remove the pods completely to work around this. Something like:
|
|
|
|
rule mypoolname {
|
|
id -5
|
|
type erasure
|
|
step take myroot
|
|
step choose indep 4 type rack
|
|
step chooseleaf indep 2 type host
|
|
step emit
|
|
}
|
|
|
|
This will pick 4 racks and then 2 hosts in each rack. Will this cause any problems? I can add the pod stuff back later as 'chassis' instead. I can live without the 'pod' separation if needed.
|
|
|
|
To test this, I tried doing something like this using:
|
|
|
|
1. grab the osdmap:
|
|
ceph osd getmap -o /tmp/om
|
|
2. pull out the crushmap:
|
|
osdmaptool --export-crush /tmp/crush.bin
|
|
3. cnvert it to text:
|
|
crushtool -d /tmp/crush.bin -o /tmp/crush.txt
|
|
|
|
I then edited the rule for this pool as above, to remove the pod and go directly
|
|
to pulling from 4 racks then 2 hosts in each rack. I then compiled up the crush map
|
|
and then imported it into the extracted osdmap:
|
|
|
|
crushtool -c /tmp/crush.txt -o /tmp/crush.bin
|
|
osdmaptool /tmp/om --import-crush /tmp/crush.bin
|
|
|
|
I then ran upmap-cleanup on the new osdmap:
|
|
|
|
osdmaptool /tmp/om --upmap-cleanup
|
|
|
|
I did NOT get any of the verify_upmap messages. When I did the extraction of the osdmap WITHOUT
|
|
and changes to it, and then ran the upmap-cleanup, I got the same verify_upmap errors I am now
|
|
seeing in the ceph-mon logs.
|
|
|
|
So, should I just change the crushmap to remove the wrong rack->pod->host heirarchy, making it rack->host ?
|
|
Will I have other issues? I am surprised that crush allowed me to create this out of order rule to begin with.
|
|
|
|
Thanks for any suggestions.
|
|
|
|
|
|
|
|
|