Bug #24094: some objects are lost after one of osd in cache-tier is broken - RADOS - Ceph

Actions

Copy link

Bug #24094

open

some objects are lost after one of osd in cache-tier is broken

Added by Lei Xue almost 6 years ago. Updated almost 6 years ago.

Status:

New

Priority:

Normal

Assignee:

Category:

Tiering

Target version:

% Done:

Source:

Tags:

Backport:

Regression:

Severity:

2 - major

Reviewed:

Affected Versions:

Ceph - v12.2.2, Ceph - v12.2.3, Ceph - v12.2.4, Ceph - v12.2.5

ceph-qa-suite:

ceph-deploy

Component(RADOS):

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

I have a small cluster to setup, some configs:

9 machines
9*2 4T SSD as cache tier(size 1)
9*14 8T HDD as data pool(size 2)
9*1 256G PCIe SSD as index pool(size == 2)

Last day, a cache-tier SSD is failed, the Ceph cluster can not be accessed.

The cache-tier is rebalancing since a SSD is lost. To continue to provide the service, I removed the cache-tier without flush and evict the cache data.
But I found that some object which created a week ago has problems, I can list it but can not download it. The cache-tier is set as follow:

min flush = 600
min evit = 43600

Also, another issue is that I can list some files, get it and write it, but the the index will not be updated.

xueleis-MacBook-Pro:~ xuelei$ aws s3 --endpoint http://s3.cloud ls s3://B6-2017-12-22-10-25-42/
2018-04-27 13:19:51 66 timecost.txt
xueleis-MacBook-Pro:~ xuelei$ aws s3 --endpoint http://s3.cloud cp Downloads/pg.txt s3://B6-2017-12-22-10-25-42/timecost.txt
upload: Downloads/pg.txt to s3://B6-2017-12-22-10-25-42/timecost.txt
xueleis-MacBook-Pro:~ xuelei$ aws s3 --endpoint http://s3.cloud ls s3://B6-2017-12-22-10-25-42/
2018-04-27 13:19:51 66 timecost.txt

Thanks

Actions

Copy link

Updated by Lei Xue almost 6 years ago

New findings:

For the object: s3://B6-2017-12-22-10-25-42/timecost.txt, which index is .dir.0089274c-7a8b-4e66-83dd-d45e638415d7.1033843.1.118

[root@ceph5 ~]# rados -p default.rgw.buckets.index listomapkeys .dir.0089274c-7a8b-4e66-83dd-d45e638415d7.1033843.1.118 | grep time
B6-2017-12-22-10-25-42/timecost.txt

<----> s3cmd rm s3://B6-2017-12-22-10-25-42/timecost.txt

[root@ceph5 ~]# rados -p default.rgw.buckets.index listomapkeys .dir.0089274c-7a8b-4e66-83dd-d45e638415d7.1033843.1.118 | grep time
B6-2017-12-22-10-25-42/timecost.txt

Actions

Copy link