Bug #10306
closedEC pool pgs stuck active+undersized+degraded with invalid osds in acting set
0%
Description
Ceph version: 0.87 (c51c8f9d80fa4e0168aa52685b8de40e42758578)
Osd Hosts: 9
Osds: 575
EC profile:
directory=/usr/lib/ceph/erasure-code
k=5
m=4
plugin=jerasure
ruleset-failure-domain=host
technique=reed_sol_van
ceph osd pool create ec_test 8192 8192 erasure gw_backer
result:
32 out of 8192 get stuck:
pg 23.da3 is active+undersized+degraded, acting [246,571,371,213,108,466,163,2147483647,435]
I don't know what that big number is, but I don't like it.
Updated by Aaron Bassett over 9 years ago
I tried making a pool with 4 and 4 and it had 1 pg get stuck this way.
Updated by Aaron Bassett over 9 years ago
It was brought to my attention that that large number is probably a signed 32 bit -1, printed unsigned. Maybe this is some kind of error message?
Updated by Loïc Dachary over 9 years ago
- Status changed from New to Rejected
2147483647 means that there were not enough OSDs to map the PG. Although you have exactly 9 hosts and the rule expects to find 9 hosts, there is a non zero probability that mapping will fail. You can resolve this by adding a new host or reducing the requirements so the total number of hosts required is 8 instead of 9 (k=5 m=3 for instance). There is a third option which is to ask crush to try harder.
(marking the ticket as Rejected because it is the expected behavior, feel free to re-open if you think differently)