Project

General

Profile

Actions

Bug #19267

open

rados list-inconsistent-obj sometimes doesn't flag that all 3 copies are bad

Added by cheng li about 7 years ago. Updated almost 7 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
Scrub/Repair
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

I tested ceph 10.2.3 with cluster of 3 osd nodes.

I upload a text file to ceph cluster, then manually change the text content on osd nodes.
After deep-scrub, ceph reports inconsistent error. But rados list-inconsistent-obj <pg> doesn't say all copies data are bad, instead it says only two copies are bad.

steps:
1. upload text file to ceph
rados -p rbd_ssd put testfile testfile
2. get the file location on osds
root@cheng-ceph1:~# ceph osd map rbd_ssd testfile
osdmap e97 pool 'rbd_ssd' (1) object 'testfile' -> pg 1.551a2b36 (1.36) -> up ([1,0,2], p1) acting ([1,0,2], p1)

3. make all 3 copies bad by updating /var/lib/ceph/osd/ceph-x/current/1.36_head/testfile__head_551A2B36__1
4. trigger deep-scrub
5. now ceph reports inconsistent err
root@cheng-ceph3:~# ceph health detail
HEALTH_ERR 1 pgs inconsistent; 3 scrub errors
pg 1.36 is active+clean+inconsistent, acting [1,0,2]
3 scrub errors

6. but list-inconsistent-obj shows only two copies are bad. In fact, all 3 copies are bad and have different size from the original text file. The original text file has only 18 chars.
@root@cheng-ceph3:~# rados -p rbd_ssd list-inconsistent-obj 1.36 |python -m json.tool {
"epoch": 95,
"inconsistents": [ {
"errors": [
"size_mismatch"
],
"object": {
"locator": "",
"name": "testfile",
"nspace": "",
"snap": "head"
},
"shards": [ {
"data_digest": "0xa3ba020a",
"errors": [
"size_mismatch"
],
"omap_digest": "0xffffffff",
"osd": 0,
"size": 21
}, {
"data_digest": "0xa3ba020a",
"errors": [
"size_mismatch"
],
"omap_digest": "0xffffffff",
"osd": 1,
"size": 22
}, {
"data_digest": "0xa3ba020a",
"errors": [],
"omap_digest": "0xffffffff",
"osd": 2,
"size": 23
}
]
}
]
}@

Another thing I don't understand is that ceph doesn't block user from putting object even 3 copies are bad.

Actions #1

Updated by Greg Farnum about 7 years ago

I don't understand. What about this output says that two copies are bad and one isn't?

Actions #2

Updated by cheng li about 7 years ago

Greg Farnum wrote:

I don't understand. What about this output says that two copies are bad and one isn't?

Thanks for your reply, sorry for my poor English. Let me detail it

I had made 3 copied of data bad, so it supposed that list-inconsistent-obj will tell me all 3 copies of data are size_mismatch.

But look at the output, it says osd.0 and osd.1 are size_mismatch, but no errors of osd.2

Actions #3

Updated by Greg Farnum about 7 years ago

  • Subject changed from rados list-inconsistent-obj can't list all bad replicas when all 3 copies are bad to rados list-inconsistent-obj sometimes doesn't flag that all 3 copies are bad

Oh I see, it's missing the error string.

I'm not sure if in this case it's just taking one of them as authoritative and publishing the ones that differ (since none are the same) or what. It should know what the correct size is based on the object_info though, so maybe that test is being skipped.

Actions #4

Updated by Greg Farnum almost 7 years ago

  • Project changed from Ceph to RADOS
  • Category changed from OSD to Scrub/Repair
Actions

Also available in: Atom PDF