Bug #24094
opensome objects are lost after one of osd in cache-tier is broken
0%
Description
- 9 machines
- 9*2 4T SSD as cache tier(size 1)
- 9*14 8T HDD as data pool(size 2)
- 9*1 256G PCIe SSD as index pool(size == 2)
Last day, a cache-tier SSD is failed, the Ceph cluster can not be accessed.
The cache-tier is rebalancing since a SSD is lost. To continue to provide the service, I removed the cache-tier without flush and evict the cache data.But I found that some object which created a week ago has problems, I can list it but can not download it. The cache-tier is set as follow:
- min flush = 600
- min evit = 43600
Also, another issue is that I can list some files, get it and write it, but the the index will not be updated.
xueleis-MacBook-Pro:~ xuelei$ aws s3 --endpoint http://s3.cloud ls s3://B6-2017-12-22-10-25-42/
2018-04-27 13:19:51 66 timecost.txt
xueleis-MacBook-Pro:~ xuelei$ aws s3 --endpoint http://s3.cloud cp Downloads/pg.txt s3://B6-2017-12-22-10-25-42/timecost.txt
upload: Downloads/pg.txt to s3://B6-2017-12-22-10-25-42/timecost.txt
xueleis-MacBook-Pro:~ xuelei$ aws s3 --endpoint http://s3.cloud ls s3://B6-2017-12-22-10-25-42/
2018-04-27 13:19:51 66 timecost.txt
Thanks
Updated by Lei Xue almost 6 years ago
New findings:
For the object: s3://B6-2017-12-22-10-25-42/timecost.txt, which index is .dir.0089274c-7a8b-4e66-83dd-d45e638415d7.1033843.1.118
[root@ceph5 ~]# rados -p default.rgw.buckets.index listomapkeys .dir.0089274c-7a8b-4e66-83dd-d45e638415d7.1033843.1.118 | grep time
B6-2017-12-22-10-25-42/timecost.txt
<----> s3cmd rm s3://B6-2017-12-22-10-25-42/timecost.txt
[root@ceph5 ~]# rados -p default.rgw.buckets.index listomapkeys .dir.0089274c-7a8b-4e66-83dd-d45e638415d7.1033843.1.118 | grep time
B6-2017-12-22-10-25-42/timecost.txt