Bug #43175
closedpgs inconsistent, union_shard_errors=missing
0%
Description
Hi,
Luminous 12.2.12.
2/3 OSDs - Filestore, 1/3 - Bluestore
size=3, min_size=2
Cluster used as S3 (RadosGW).
I have "pgs inconsistent" for about 80 PGs.
For example one of inconsistent objects:
rados list-inconsistent-obj 6.f32 | jq
{
"epoch": 31466,
"inconsistents": [
{
"object": {
"name": "d48c233a-cef5-4072-8fee-8e425695b655.319082.2_ovBck/SEKRETERYA 28.06.2019/SEBLA-HUKUK/Sebla-temyiz dilekçesine cevap - birleştirme için.docx",
"nspace": "",
"locator": "",
"snap": "head",
"version": 41926
},
"errors": [],
"union_shard_errors": [
"missing"
],
"selected_object_info": {
"oid": {
"oid": "d48c233a-cef5-4072-8fee-8e425695b655.319082.2_ovBck/SEKRETERYA 28.06.2019/SEBLA-HUKUK/Sebla-temyiz dilekçesine cevap - birleştirme için.docx",
"key": "",
"snapid": -2,
"hash": 2273537842,
"max": 0,
"pool": 6,
"namespace": ""
},
"version": "31462'45912",
"prior_version": "31410'41926",
"last_reqid": "osd.47.0:57943766",
"user_version": 41926,
"size": 62411,
"mtime": "2019-11-21 07:52:29.497853",
"local_mtime": "2019-11-21 07:52:29.513779",
"lost": 0,
"flags": [
"dirty",
"data_digest",
"omap_digest"
],
"legacy_snaps": [],
"truncate_seq": 0,
"truncate_size": 0,
"data_digest": "0x3b9127ee",
"omap_digest": "0xffffffff",
"expected_object_size": 0,
"expected_write_size": 0,
"alloc_hint_flags": 0,
"manifest": {
"type": 0,
"redirect_target": {
"oid": "",
"key": "",
"snapid": 0,
"hash": 0,
"max": 0,
"pool": -9223372036854776000,
"namespace": ""
}
},
"watchers": {}
},
"shards": [
{
"osd": 9,
"primary": false,
"errors": [
"missing"
]
},
{
"osd": 47,
"primary": true,
"errors": [
"missing"
]
},
{
"osd": 62,
"primary": false,
"errors": [],
"size": 62411,
"omap_digest": "0xffffffff",
"data_digest": "0x3b9127ee"
}
]
}
]
}
As you can see 2/3 OSDs have "errors": ["missing"]. Primary OSD (47) with this error too, but i can GET this object by awscli (through S3 API) and md5 of this object similar to Etag (the integrity of the object is not broken). If i stop OSD-62 (which has object according to report), i can successfully get this object using S3 API.
If i run PG repair, in cluster log i can see:
36:45.145709 osd.47 osd.47 172.19.0.17:6860/1093343 4254 : cluster [ERR] 6.f32 repair : stat mismatch, got 3556/3555 objects, 0/0 clones, 3556/3555 dirty, 0/0 omap, 0/0 pinned, 0/0 hit_set_archive, 0/0 whiteouts, 11169954579/11169814803 bytes, 0/0 hit_set_archive bytes.
2019-12-05 20:36:45.148312 osd.47 osd.47 172.19.0.17:6860/1093343 4255 : cluster [ERR] 6.f32 repair 1 missing, 0 inconsistent objects
2019-12-05 20:36:45.148434 osd.47 osd.47 172.19.0.17:6860/1093343 4256 : cluster [ERR] 6.f32 repair 4 errors, 2 fixed
After repairing, i run deep-scrub again.
2019-12-06 12:50:30.742346 osd.47 osd.47 172.19.0.17:6860/1093343 4268 : cluster [ERR] 6.f32 shard 62 6:4cf6c1e1:::d48c233a-cef5-4072-8fee-8e425695b655.319082.2_ovBck%2fSEKRETERYA 28.06.2019%2fSEBLA-HUKUK%2fSebla-temyiz dilek%c3%a7esine cevap - birle%c5%9ftirme i%c3%a7in.docx:head : missing
2019-12-06 12:50:32.872768 osd.47 osd.47 172.19.0.17:6860/1093343 4269 : cluster [ERR] 6.f32 shard 9 6:4cf6c1e1:::d48c233a-cef5-4072-8fee-8e425695b655.319082.2_ovBck%2fSEKRETERYA 28.06.2019%2fSEBLA-HUKUK%2fSebla-temyiz dilek%c3%a7esine cevap - birle%c5%9ftirme i%c3%a7in.docx:head : missing
2019-12-06 12:50:32.872781 osd.47 osd.47 172.19.0.17:6860/1093343 4270 : cluster [ERR] 6.f32 shard 47 6:4cf6c1e1:::d48c233a-cef5-4072-8fee-8e425695b655.319082.2_ovBck%2fSEKRETERYA 28.06.2019%2fSEBLA-HUKUK%2fSebla-temyiz dilek%c3%a7esine cevap - birle%c5%9ftirme i%c3%a7in.docx:head : missing
...
2019-12-06 13:14:45.485929 osd.47 osd.47 172.19.0.17:6860/1093343 4272 : cluster [ERR] 6.f32 deep-scrub 1 missing, 0 inconsistent objects
2019-12-06 13:14:45.485941 osd.47 osd.47 172.19.0.17:6860/1093343 4273 : cluster [ERR] 6.f32 deep-scrub 4 errors
After deep-scrub i can see "missing" on the same 2/3 OSDs.
Why can i get object successfully from S3 when 2/3 OSD missing object?
what does "missing" meen?