Bug #12231
closedcrush unable to generate 3 osds in teuthology run
0%
Description
Lately, wip-sam-testing (basically master) runs are reliably turning up a case 1 or 2 times per run where 3/6 osds are out and crush is unable to turn up more than 2 of the remaining 3 osds for at least one pg. I grabbed one of the osdmaps and found that on this one, the bad pg is pg 1.37
/home/sam/git-checkouts/ceph4/src/osdmaptool: osdmap file '/tmp/osdmap'
parsed '1.37' -> 1.37
1.37 raw ([4,1], p4) up ([1,4], p1) acting ([1,4], p1)
/home/sam/git-checkouts/ceph4/src/osdmaptool: osdmap file '/tmp/osdmap'
parsed '1.36' -> 1.36
1.36 raw ([4,1,3], p4) up ([4,1,3], p4) acting ([4,1,3], p4)
hashes (attached) has the draws for r=0 through 999999 on that pg and you'll see that indeed osd 3 does not win for the first time until between draws 50 and 60.
I see nothing new with the crush tunables. osdmaptool compiled on firefly agrees with the output, so it's not a change in crush. The straw weights appear to be 65535, so there is nothing wonky with the crush map construction. The two questions are:
1) Is this simply an indication that the hash is really bad and we need to begin switching it (possibly before jewel)?
2) Why has this not come up before? We started testing regularly with size 3 pools in teuthology in February. I haven't seen it yet in hammer runs either. Odd.
Files