Actions
Bug #24094
opensome objects are lost after one of osd in cache-tier is broken
Status:
New
Priority:
Normal
Assignee:
-
Category:
Tiering
Target version:
-
% Done:
0%
Source:
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
ceph-deploy
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
I have a small cluster to setup, some configs:
But I found that some object which created a week ago has problems, I can list it but can not download it. The cache-tier is set as follow:
- 9 machines
- 9*2 4T SSD as cache tier(size 1)
- 9*14 8T HDD as data pool(size 2)
- 9*1 256G PCIe SSD as index pool(size == 2)
Last day, a cache-tier SSD is failed, the Ceph cluster can not be accessed.
The cache-tier is rebalancing since a SSD is lost. To continue to provide the service, I removed the cache-tier without flush and evict the cache data.But I found that some object which created a week ago has problems, I can list it but can not download it. The cache-tier is set as follow:
- min flush = 600
- min evit = 43600
Also, another issue is that I can list some files, get it and write it, but the the index will not be updated.
xueleis-MacBook-Pro:~ xuelei$ aws s3 --endpoint http://s3.cloud ls s3://B6-2017-12-22-10-25-42/
2018-04-27 13:19:51 66 timecost.txt
xueleis-MacBook-Pro:~ xuelei$ aws s3 --endpoint http://s3.cloud cp Downloads/pg.txt s3://B6-2017-12-22-10-25-42/timecost.txt
upload: Downloads/pg.txt to s3://B6-2017-12-22-10-25-42/timecost.txt
xueleis-MacBook-Pro:~ xuelei$ aws s3 --endpoint http://s3.cloud ls s3://B6-2017-12-22-10-25-42/
2018-04-27 13:19:51 66 timecost.txt
Thanks
Actions