pg inconsistent tests: 1. check ceph health OK 2. get pg number and osds of the test file: root@ceph-ssd-1:~# ceph osd map rbd_ssd pgtest.txt osdmap e179 pool 'rbd_ssd' (1) object 'pgtest.txt' -> pg 1.4004b70b (1.10b) -> up ([17,8,3], p17) acting ([17,8,3], p17) 3. check file content under data dir, no other files except the head file: root@ceph-ssd-3:/var/lib/ceph/osd/ceph-17/current/1.10b_head# ll total 12 drwxr-xr-x 2 ceph ceph 31 Dec 14 08:19 ./ drwxr-xr-x 180 ceph ceph 8192 Dec 13 07:10 ../ -rw-r--r-- 1 ceph ceph 0 Dec 13 07:10 __head_0000010B__1 4. create text file pg_test.txt, and upload it to ceph cluster root@ceph-ssd-1:~# for i in {1..10} ; do echo original_text >> pgtest.txt ; done root@ceph-ssd-1:~# cat pgtest.txt original_text original_text original_text original_text original_text original_text original_text original_text original_text original_text root@ceph-ssd-1:~# rados -p rbd_ssd put pgtest.txt pgtest.txt 5. cehck data dir on primary osd, we can see pgtest.txt with suffix: root@ceph-ssd-3:/var/lib/ceph/osd/ceph-17/current/1.10b_head# ll total 16 drwxr-xr-x 2 ceph ceph 66 Dec 28 07:50 ./ drwxr-xr-x 180 ceph ceph 8192 Dec 13 07:10 ../ -rw-r--r-- 1 ceph ceph 0 Dec 13 07:10 __head_0000010B__1 -rw-r--r-- 1 ceph ceph 140 Dec 28 07:50 pgtest.txt__head_4004B70B__1 root@ceph-ssd-3:/var/lib/ceph/osd/ceph-17/current/1.10b_head# cat pgtest.txt__head_4004B70B__1 original_text original_text original_text original_text original_text original_text original_text original_text original_text original_text Also checked, osd 8 and osd 3. They are the same with primary osd 17. 6. modify file content on primary osd(just modify several chars, still has the same length): root@ceph-ssd-3:/var/lib/ceph/osd/ceph-17/current/1.10b_head# cat pgtest.txt__head_4004B70B__1 original_text original_text original_text original_text original_text original_text original_text original_text original_text modified_test 7. run pg deep-scrub, and monitor ceph with ceph -w until deep-scrub is completed. ceph health is still ok, ceph deep-scrub can't find this small change. This coule be a bug. root@ceph-ssd-2:/var/lib/ceph/osd/ceph-8/current/1.10b_head# ceph pg deep-scrub 1.10b instructing pg 1.10b on osd.17 to deep-scrub 8. download pgtest.txt, we get the original file content, not the modified content. root@ceph-ssd-1:~/test# rados -p rbd_ssd get pgtest.txt pgtest.txt root@ceph-ssd-1:~/test# cat pgtest.txt original_text original_text original_text original_text original_text original_text original_text original_text original_text original_text 9. this time, we modify file length too. root@ceph-ssd-3:/var/lib/ceph/osd/ceph-17/current/1.10b_head# cat pgtest.txt__head_4004B70B__1 original_text original_text original_text original_text original_text original_text original_text original_text original_text modify_length_test 10. then run deep scrub, ceph reports pg inconsistent error. root@ceph-ssd-2:/var/lib/ceph/osd/ceph-8/current/1.10b_head# ceph health detail HEALTH_ERR 1 pgs inconsistent; 1 scrub errors pg 1.10b is active+clean+inconsistent, acting [17,8,3] 1 scrub errors 11. now, we do pg repair directely. ceph comes back to health_ok soon. root@ceph-ssd-2:/var/lib/ceph/osd/ceph-8/current/1.10b_head# ceph pg repair 1.10b root@ceph-ssd-2:/var/lib/ceph/osd/ceph-8/current/1.10b_head# ceph health detail HEALTH_OK 12. check file content of primary osd, it gets back to original content. pg repair works fine here. root@ceph-ssd-3:/var/lib/ceph/osd/ceph-17/current/1.10b_head# cat pgtest.txt__head_4004B70B__1 original_text original_text original_text original_text original_text original_text original_text original_text original_text original_text Then I modified primary copy and one other copy with the same change. echo modify_test >> pgtest.txt__head_4004B70B__1 At last, pg repair got those two modified copies repaired. ceph knows the unmodified one is the right copy, even the other two copies have the same content. ceph may decide which is the correct copy by timestamp, not by voting. 13. Now, let's delete the file on primary node. root@ceph-ssd-3:/var/lib/ceph/osd/ceph-17/current/1.10b_head# rm pgtest.txt__head_4004B70B__1 root@ceph-ssd-3:/var/lib/ceph/osd/ceph-17/current/1.10b_head# ls __head_0000010B__1 14. do pg deep-scrub, ceph reports pg inconsistent error. Then do pg repair, ceph comes back to health_ok again. Now, we check the file on primary osd. The deleted file comes back with correct content in it. root@ceph-ssd-3:/var/lib/ceph/osd/ceph-17/current/1.10b_head# ls __head_0000010B__1 pgtest.txt__head_4004B70B__1 root@ceph-ssd-3:/var/lib/ceph/osd/ceph-17/current/1.10b_head# cat pgtest.txt__head_4004B70B__1 original_text original_text original_text original_text original_text original_text original_text original_text original_text original_text 15. Now, let's test with a large file. pgtest.bin root@ceph-ssd-1:~/test# ceph osd map rbd_ssd pgtest.bin osdmap e179 pool 'rbd_ssd' (1) object 'pgtest.bin' -> pg 1.19dfbcd2 (1.d2) -> up ([4,16,11], p4) acting ([4,16,11], p4) root@ceph-ssd-1:~/test# dd if=/dev/zero of=pgtest.bin bs=1M count=1k root@ceph-ssd-1:~/test# rados -p rbd_ssd put pgtest.bin pgtest.bin 16. check the file on primary node, it's 1GB. then we write 1MiB at the end of the file. root@ceph-ssd-1:/var/lib/ceph/osd/ceph-4/current/1.d2_head# ll -h pgtest.bin__head_19DFBCD2__1 -rw-r--r-- 1 ceph ceph 1.0G Dec 28 08:20 pgtest.bin__head_19DFBCD2__1 root@ceph-ssd-1:/var/lib/ceph/osd/ceph-4/current/1.d2_head# dd if=/dev/zero of=pgtest.bin__head_19DFBCD2__1 bs=1M seek=1k count=1 1+0 records in 1+0 records out 1048576 bytes (1.0 MB) copied, 0.00262424 s, 400 MB/s root@ceph-ssd-1:/var/lib/ceph/osd/ceph-4/current/1.d2_head# ll -h pgtest.bin__head_19DFBCD2__1 -rw-r--r-- 1 ceph ceph 1.1G Dec 28 08:28 pgtest.bin__head_19DFBCD2__1 17. do pg deep-scrub, and pg repair. The file size comes back to 1GB.