Project

General

Profile

Actions

Bug #24094

open

some objects are lost after one of osd in cache-tier is broken

Added by Lei Xue almost 6 years ago. Updated almost 6 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
Tiering
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
ceph-qa-suite:
ceph-deploy
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

I have a small cluster to setup, some configs:
  • 9 machines
  • 9*2 4T SSD as cache tier(size 1)
  • 9*14 8T HDD as data pool(size 2)
  • 9*1 256G PCIe SSD as index pool(size == 2)

Last day, a cache-tier SSD is failed, the Ceph cluster can not be accessed.

The cache-tier is rebalancing since a SSD is lost. To continue to provide the service, I removed the cache-tier without flush and evict the cache data.
But I found that some object which created a week ago has problems, I can list it but can not download it. The cache-tier is set as follow:
  • min flush = 600
  • min evit = 43600

Also, another issue is that I can list some files, get it and write it, but the the index will not be updated.

xueleis-MacBook-Pro:~ xuelei$ aws s3 --endpoint http://s3.cloud ls s3://B6-2017-12-22-10-25-42/
2018-04-27 13:19:51 66 timecost.txt
xueleis-MacBook-Pro:~ xuelei$ aws s3 --endpoint http://s3.cloud cp Downloads/pg.txt s3://B6-2017-12-22-10-25-42/timecost.txt
upload: Downloads/pg.txt to s3://B6-2017-12-22-10-25-42/timecost.txt
xueleis-MacBook-Pro:~ xuelei$ aws s3 --endpoint http://s3.cloud ls s3://B6-2017-12-22-10-25-42/
2018-04-27 13:19:51 66 timecost.txt

Thanks

Actions #1

Updated by Lei Xue almost 6 years ago

New findings:

For the object: s3://B6-2017-12-22-10-25-42/timecost.txt, which index is .dir.0089274c-7a8b-4e66-83dd-d45e638415d7.1033843.1.118

[root@ceph5 ~]# rados -p default.rgw.buckets.index listomapkeys .dir.0089274c-7a8b-4e66-83dd-d45e638415d7.1033843.1.118 | grep time
B6-2017-12-22-10-25-42/timecost.txt

<----> s3cmd rm s3://B6-2017-12-22-10-25-42/timecost.txt

[root@ceph5 ~]# rados -p default.rgw.buckets.index listomapkeys .dir.0089274c-7a8b-4e66-83dd-d45e638415d7.1033843.1.118 | grep time
B6-2017-12-22-10-25-42/timecost.txt

Actions #2

Updated by Orit Wasserman almost 6 years ago

  • Project changed from rgw to RADOS
Actions #3

Updated by Josh Durgin almost 6 years ago

  • Category set to Tiering
Actions

Also available in: Atom PDF