Project

General

Profile

Feature #8609 ยป pg_repair.txt

cheng li, 01/25/2017 03:33 PM

 
pg inconsistent tests:
1. check ceph health OK
2. get pg number and osds of the test file:
root@ceph-ssd-1:~# ceph osd map rbd_ssd pgtest.txt
osdmap e179 pool 'rbd_ssd' (1) object 'pgtest.txt' -> pg 1.4004b70b (1.10b) -> up ([17,8,3], p17) acting ([17,8,3], p17)

3. check file content under data dir, no other files except the head file:
root@ceph-ssd-3:/var/lib/ceph/osd/ceph-17/current/1.10b_head# ll
total 12
drwxr-xr-x 2 ceph ceph 31 Dec 14 08:19 ./
drwxr-xr-x 180 ceph ceph 8192 Dec 13 07:10 ../
-rw-r--r-- 1 ceph ceph 0 Dec 13 07:10 __head_0000010B__1

4. create text file pg_test.txt, and upload it to ceph cluster
root@ceph-ssd-1:~# for i in {1..10} ; do echo original_text >> pgtest.txt ; done
root@ceph-ssd-1:~# cat pgtest.txt
original_text
original_text
original_text
original_text
original_text
original_text
original_text
original_text
original_text
original_text

root@ceph-ssd-1:~# rados -p rbd_ssd put pgtest.txt pgtest.txt

5. cehck data dir on primary osd, we can see pgtest.txt with suffix:
root@ceph-ssd-3:/var/lib/ceph/osd/ceph-17/current/1.10b_head# ll
total 16
drwxr-xr-x 2 ceph ceph 66 Dec 28 07:50 ./
drwxr-xr-x 180 ceph ceph 8192 Dec 13 07:10 ../
-rw-r--r-- 1 ceph ceph 0 Dec 13 07:10 __head_0000010B__1
-rw-r--r-- 1 ceph ceph 140 Dec 28 07:50 pgtest.txt__head_4004B70B__1
root@ceph-ssd-3:/var/lib/ceph/osd/ceph-17/current/1.10b_head# cat pgtest.txt__head_4004B70B__1
original_text
original_text
original_text
original_text
original_text
original_text
original_text
original_text
original_text
original_text

Also checked, osd 8 and osd 3. They are the same with primary osd 17.

6. modify file content on primary osd(just modify several chars, still has the same length):
root@ceph-ssd-3:/var/lib/ceph/osd/ceph-17/current/1.10b_head# cat pgtest.txt__head_4004B70B__1
original_text
original_text
original_text
original_text
original_text
original_text
original_text
original_text
original_text
modified_test

7. run pg deep-scrub, and monitor ceph with ceph -w until deep-scrub is completed.
ceph health is still ok, ceph deep-scrub can't find this small change. This coule be a bug.
root@ceph-ssd-2:/var/lib/ceph/osd/ceph-8/current/1.10b_head# ceph pg deep-scrub 1.10b
instructing pg 1.10b on osd.17 to deep-scrub

8. download pgtest.txt, we get the original file content, not the modified content.
root@ceph-ssd-1:~/test# rados -p rbd_ssd get pgtest.txt pgtest.txt
root@ceph-ssd-1:~/test# cat pgtest.txt
original_text
original_text
original_text
original_text
original_text
original_text
original_text
original_text
original_text
original_text

9. this time, we modify file length too.
root@ceph-ssd-3:/var/lib/ceph/osd/ceph-17/current/1.10b_head# cat pgtest.txt__head_4004B70B__1
original_text
original_text
original_text
original_text
original_text
original_text
original_text
original_text
original_text
modify_length_test

10. then run deep scrub, ceph reports pg inconsistent error.
root@ceph-ssd-2:/var/lib/ceph/osd/ceph-8/current/1.10b_head# ceph health detail
HEALTH_ERR 1 pgs inconsistent; 1 scrub errors
pg 1.10b is active+clean+inconsistent, acting [17,8,3]
1 scrub errors

11. now, we do pg repair directely. ceph comes back to health_ok soon.
root@ceph-ssd-2:/var/lib/ceph/osd/ceph-8/current/1.10b_head# ceph pg repair 1.10b
root@ceph-ssd-2:/var/lib/ceph/osd/ceph-8/current/1.10b_head# ceph health detail
HEALTH_OK

12. check file content of primary osd, it gets back to original content. pg repair works fine here.
root@ceph-ssd-3:/var/lib/ceph/osd/ceph-17/current/1.10b_head# cat pgtest.txt__head_4004B70B__1
original_text
original_text
original_text
original_text
original_text
original_text
original_text
original_text
original_text
original_text

Then I modified primary copy and one other copy with the same change. echo modify_test >> pgtest.txt__head_4004B70B__1
At last, pg repair got those two modified copies repaired.
ceph knows the unmodified one is the right copy, even the other two copies have the same content.
ceph may decide which is the correct copy by timestamp, not by voting.

13. Now, let's delete the file on primary node.
root@ceph-ssd-3:/var/lib/ceph/osd/ceph-17/current/1.10b_head# rm pgtest.txt__head_4004B70B__1
root@ceph-ssd-3:/var/lib/ceph/osd/ceph-17/current/1.10b_head# ls
__head_0000010B__1

14. do pg deep-scrub, ceph reports pg inconsistent error. Then do pg repair, ceph comes back to health_ok again.
Now, we check the file on primary osd. The deleted file comes back with correct content in it.
root@ceph-ssd-3:/var/lib/ceph/osd/ceph-17/current/1.10b_head# ls
__head_0000010B__1 pgtest.txt__head_4004B70B__1
root@ceph-ssd-3:/var/lib/ceph/osd/ceph-17/current/1.10b_head# cat pgtest.txt__head_4004B70B__1
original_text
original_text
original_text
original_text
original_text
original_text
original_text
original_text
original_text
original_text

15. Now, let's test with a large file. pgtest.bin
root@ceph-ssd-1:~/test# ceph osd map rbd_ssd pgtest.bin
osdmap e179 pool 'rbd_ssd' (1) object 'pgtest.bin' -> pg 1.19dfbcd2 (1.d2) -> up ([4,16,11], p4) acting ([4,16,11], p4)
root@ceph-ssd-1:~/test# dd if=/dev/zero of=pgtest.bin bs=1M count=1k
root@ceph-ssd-1:~/test# rados -p rbd_ssd put pgtest.bin pgtest.bin

16. check the file on primary node, it's 1GB. then we write 1MiB at the end of the file.
root@ceph-ssd-1:/var/lib/ceph/osd/ceph-4/current/1.d2_head# ll -h pgtest.bin__head_19DFBCD2__1
-rw-r--r-- 1 ceph ceph 1.0G Dec 28 08:20 pgtest.bin__head_19DFBCD2__1
root@ceph-ssd-1:/var/lib/ceph/osd/ceph-4/current/1.d2_head# dd if=/dev/zero of=pgtest.bin__head_19DFBCD2__1 bs=1M seek=1k count=1
1+0 records in
1+0 records out
1048576 bytes (1.0 MB) copied, 0.00262424 s, 400 MB/s
root@ceph-ssd-1:/var/lib/ceph/osd/ceph-4/current/1.d2_head# ll -h pgtest.bin__head_19DFBCD2__1
-rw-r--r-- 1 ceph ceph 1.1G Dec 28 08:28 pgtest.bin__head_19DFBCD2__1

17. do pg deep-scrub, and pg repair. The file size comes back to 1GB.
    (1-1/1)