upmap.txt - RADOS - Ceph

Bug #57796 » upmap.txt

Chris Durham, 10/10/2022 06:30 PM

       I am using pacific 16.2.10 on Rocky 8.6 Linux.
       After setting max_deviations to 1 on the ceph balancer, I achieved a near perfect balance of PGs and space on my OSDs. This is great.
       However, I started getting the following errors on my ceph-mon logs, every three minutes, for each of the OSDs that had PGs mapped
       by the balancer:
 -10-07T17:10:39.619+0000 7f7c2786d700 1 verify_upmap unable to get parent of osd.497, skipping for now
       After banging my head against the wall for a bit trying to figure this out, I think I have discovered the issue:
       Currently, I have my pool EC Pool, 480 OSDs, confgured with the following rule.
       rule mypoolname {
       	id -5
       	type erasure
       	step take myroot
       	step choose indep 4 type rack
       	step choose indep 2 type pod
       	step chooseleaf indep 1 type host
       	step emit
+      }
       basically, pick 4 racks, then 2 pods, and then one host in each rack, For a total of
 chunks. (The pool is a is a 6+2). The 4 racks are chosen from the myroot root entry, which is as follows.
       root myroot {
       	id -400
       	item rack1 weight N
       	item rack2 weight N
       	item rack3 weight N
       	item rack4 weight N
+      }
       This has worked fine since inception, over a year ago.
       The errors above, verify_upmap, started after I had the max_deviations set to 1 in the balancer and having it
       move things around, creating pg_upmap entries.
       I then discovered, while trying to figure this out, that the device types are:
       type 0 osd
       type 1 host
       type 2 chassis
       type 3 rack
       ...
       type 6 pod
       So pod is HIGHER on the heirarchy than rack. I have it as lower on my rule.
       What I want to do is remove the pods completely to work around this. Something like:
       rule mypoolname {
               id -5
               type erasure
               step take myroot
               step choose indep 4 type rack
               step chooseleaf indep 2 type host
               step emit
+      }
       This will pick 4 racks and then 2 hosts in each rack. Will this cause any problems? I can add the pod stuff back later as 'chassis' instead. I can live without the 'pod' separation if needed.
       To test this, I tried doing something like this using:
 . grab the osdmap:
       	ceph osd getmap -o /tmp/om
 . pull out the crushmap:
       	osdmaptool --export-crush  /tmp/crush.bin
 . cnvert it to text:
       	crushtool -d /tmp/crush.bin -o /tmp/crush.txt
       I then edited the rule for this pool as above, to remove the pod and go directly
       to pulling from 4 racks then 2 hosts in each rack. I then compiled up the crush map
       and then imported it into the extracted osdmap:
       	crushtool -c /tmp/crush.txt -o /tmp/crush.bin
       	osdmaptool /tmp/om --import-crush /tmp/crush.bin
       I then ran upmap-cleanup on the new osdmap:
       	osdmaptool /tmp/om --upmap-cleanup
       I did NOT get any of the verify_upmap messages. When I did the extraction of the osdmap WITHOUT
       and changes to it, and then ran the upmap-cleanup, I got the same verify_upmap errors I am now
       seeing in the ceph-mon logs.
       So, should I just change the crushmap to remove the wrong rack->pod->host heirarchy, making it rack->host ?
       Will I have other issues? I am surprised that crush allowed me to create this out of order rule to begin with.
       Thanks for any suggestions.

(1-1/1)

Project

General

Profile

Ceph » RADOS

Bug #57796 » upmap.txt