Bug #7368
closedceph osd repair * blocks after some minutes and prevent other ceph pg repair commands
0%
Description
Hello,
this is a follow up of http://tracker.ceph.com/issues/7367
An unfortunate update To 0.75 endend with lots (~3000) of pg flagged inconsistent.
As iterating over the inconsistent pg is slow, I tried the ceph osd repair * command.
At first it works fine and lots of pg are fixed... After some minutes the rate of fixed pg decrease quite steadily to finally halt after ~ 10 minutes.
Repeating the command ceph osd repair * works again at the same speed, slow down and halt after approximatively the same time.
I won't say time is exponential because after 10 minutes I don't have any more fixed PG.
This is the first problem.
The second one is that after the ceph osd repair * command halted,
all the ceph pg repair p.xx commands are ignored (maybe they are queued, as the osd seems instructed to do the check, but a process seems stuck somewhere , and prevent the osd to execute the check).
Restarting the osd cure the problem and ceph pg repair p.xx works again.
Observed with 0.76 & 0.72 too.