Bug #23365
openCEPH device class not honored for erasure encoding.
0%
Description
To start, this cluster isn't happy. It is my destructive testing/learning cluster.
Recently I rebuilt the cluster adding SSDs (having used just HDDs before) and I have been having some issues. First was with performance, it dropped, by a fair amount (going down to just 2MBps, per stream), then I had pgs suddenly do "not active" without any failures. And now it is all sorts of mad. BUT, for this case, that is just some context.
When I first added the SSDs I had the reweight set VERY low to prevent data from the old pools being migrated (that have now been removed). Thinking that that could have been causing some of my issues I returned the weight to normal. This triggered some rebalancing, but the next day the SSDs had all filled (one died in the process, bad hardware). This puzzled me as the meta data pool is only about 80MBs and the write cache hovers around 3.5GBs.
So I started digging, thinking that the data pool may have been migrated the SSDs for some reason.
Lets get our data pool:
root@MediaServer:~# ceph df ... NAME ID USED %USED MAX AVAIL OBJECTS MigrationPool 17 6308G 100.00 0 2019061 MigrationPool-Meta 18 70932k 100.00 0 88098 MigrationPool-WriteCache 19 3395M 100.00 0 875 ...
We are interested in ID 17, here is that pools profile:
root@MediaServer:~# ceph osd pool get MigrationPool all ... erasure_code_profile: Erasure-D5F1-HDD ...
That profiles class:
root@MediaServer:~# ceph osd erasure-code-profile get Erasure-D5F1-HDD crush-device-class=hdd ...
Lets pick an SSD:
root@MediaServer:~# ceph osd df | sort -n -k1 ID CLASS WEIGHT REWEIGHT SIZE USE AVAIL %USE VAR PGS ... 11 ssd 0.09999 1.00000 95392M 88634M 6757M 92.92 1.55 14 ...
And finally, lets look for pool ID 17 in that SSD:
root@MediaServer:~# ceph pg ls-by-osd 11 | grep '17\.'
17.16 31854 0 0 0 0 106917534705 1557 1557 active+clean+remapped 2018-03-14 04:11:17.392929 2996'82494 3490:274645 [2,1,2147483647,12,11,3] 2 [2,1,6,12, 11 ,3] 2 2996'82494 2018-03-14 04:11:17.392736 2996'82494 2018-03-10 23:07:48.358707
17.35 31370 0 0 0 0 104997447663 1623 1623 active+clean 2018-03-14 05:31:53.160644 2996'81594 3490:315943 [12,3,1,6,2,11] 12 [12,3,1,6,2, 11 ] 12 2996'81594 2018-03-14 05:31:53.160520 2993'81192 2018-03-07 15:01:11.463540
17.36 31787 0 0 0 0 106702587303 1500 1500 active+clean 2018-03-14 04:10:48.600589 2996'82305 3490:418755 [12,2,1,3,8,11] 12 [12,2,1,3,8, 11 ] 12 2996'82305 2018-03-14 04:10:48.600464 2996'82305 2018-03-12 05:00:40.453809
17.3e 31662 0 0 0 0 106070903943 1553 1553 active+clean 2018-03-14 03:49:39.645138 3013'81823 3490:289769 [2,1,3,6,11,12] 2 [2,1,3,6, 11 ,12] 2 3013'81823 2018-03-14 03:49:39.645014 3013'81823 2018-03-12 03:38:14.041181
That's not good.... Ideas?