Bug #48060
opendata loss in EC pool
0%
Description
We have data LOSS in our EC pool k4m2.
Pool is used for RBD volumes. 15 RBD volumes have broken objects.
Broken object contains shards with different versions. ALL osd are UP and IN. No SMART errors.
Network latency between components is not more than 0.2 ms (10-40 Gbit bonded network interfaces).
OSD hosts have more than 20G of RAM per OSD and more than 4 dedicated cores per osd.
This is a production system.
root@osd-host:~# uname -a
Linux osd-host 5.4.0-42-generic #46~18.04.1-Ubuntu SMP Fri Jul 10 07:21:24 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
root@ik01:~# ceph version
ceph version 15.2.5 (2c93eff00150f0cc5f106a559557a58d3d7b6f1f) octopus (stable)
root@ik01:~# apt-cache policy ceph
ceph:
Installed: 15.2.5-1bionic
Candidate: 15.2.5-1bionic
Version table:
*** 15.2.5-1bionic 1001
1001 https://download.ceph.com/debian-octopus bionic/main amd64 Packages
root@mon-host:~# ceph osd dump | grep cinder-data
pool 30 'cinder-data' erasure profile k4m2 size 6 min_size 5 crush_rule 4 object_hash rjenkins pg_num 512 pgp_num 512 autoscale_mode warn last_change 377788 flags hashpspool,ec_overwrites,selfmanaged_snaps stripe_width 16384 application rbd
root@mon-host:~# ceph osd erasure-code-profile get k4m2
crush-device-class=ssd
crush-failure-domain=host
crush-root=default
jerasure-per-chunk-alignment=false
k=4
m=2
plugin=jerasure
technique=reed_sol_van
w=8
root@mon-host:~# ceph health detail
pg 30.e6 is stuck undersized for 44h, current state active+recovery_unfound+undersized+degraded+remapped, last acting [45,2147483647,6,2147483647,42,22]
root@ik01:~# ceph pg 30.e6 list_unfound| egrep '(rbd|need|have)'
"oid": "rbd_data.29.4ed8dede74eb6f.0000000000000224",
"need": "364963'36484992",
"have": "0'0",
We dumped out (with ceph-objectstore-tool) all shards from 0 to 5 and found that they have different versions:
"version": "363510'36481974",
"version": "364963'36484992",
"version": "364963'36484992",
"version": "364963'36484992",
"version": "363579'36482218",
"version": "359913'36472321",
"version": "359913'36472321",
We found more than 6 osd's that contained missing object pieces.
So we rebooted all OSD, one at a time. No extra healthy shards where found.
Is there any commands to "glue" broken pieces together and put them back in?
In our test system we tray to put wrong shard version in and the result was:
'/build/ceph-15.2.5/src/osd/osd_types.cc: 5698: FAILED ceph_assert(clone_size.count(clone))'
Currently IO operation hags when we tray to do some IO with broken objects (rbd import/export or rados get/put).
We have aprox. 12 hours before we accept permanent data loss with following command:
root@mon-host:~# ceph pg 30.e6 mark_unfound_lost delete
Files