Project

General

Profile

Actions

Bug #10306

closed

EC pool pgs stuck active+undersized+degraded with invalid osds in acting set

Added by Aaron Bassett over 9 years ago. Updated over 9 years ago.

Status:
Rejected
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Ceph version: 0.87 (c51c8f9d80fa4e0168aa52685b8de40e42758578)
Osd Hosts: 9
Osds: 575
EC profile:
directory=/usr/lib/ceph/erasure-code
k=5
m=4
plugin=jerasure
ruleset-failure-domain=host
technique=reed_sol_van

ceph osd pool create ec_test 8192 8192 erasure gw_backer

result:
32 out of 8192 get stuck:
pg 23.da3 is active+undersized+degraded, acting [246,571,371,213,108,466,163,2147483647,435]
I don't know what that big number is, but I don't like it.

Actions #1

Updated by Aaron Bassett over 9 years ago

I tried making a pool with 4 and 4 and it had 1 pg get stuck this way.

Actions #2

Updated by Aaron Bassett over 9 years ago

It was brought to my attention that that large number is probably a signed 32 bit -1, printed unsigned. Maybe this is some kind of error message?

Actions #3

Updated by Loïc Dachary over 9 years ago

  • Status changed from New to Rejected

2147483647 means that there were not enough OSDs to map the PG. Although you have exactly 9 hosts and the rule expects to find 9 hosts, there is a non zero probability that mapping will fail. You can resolve this by adding a new host or reducing the requirements so the total number of hosts required is 8 instead of 9 (k=5 m=3 for instance). There is a third option which is to ask crush to try harder.

(marking the ticket as Rejected because it is the expected behavior, feel free to re-open if you think differently)

Actions

Also available in: Atom PDF