Project

General

Profile

Bug #43173

pgs inconsistent, union_shard_errors=missing

Added by Aleksandr Rudenko 4 months ago. Updated 4 months ago.

Status:
Duplicate
Priority:
Normal
Assignee:
-
Category:
Scrub/Repair
Target version:
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
OSD
Pull request ID:
Crash signature:

Description

Hi,

Luminous 12.2.12.
2/3 OSDs - Filestore, 1/3 - Bluestore
size=3, min_size=2
Cluster used as S3 (RadosGW).

I have "pgs inconsistent" for about 80 PGs.

For example one of inconsistent objects:

rados list-inconsistent-obj 6.f32 | jq
{
  "epoch": 31466,
  "inconsistents": [
    {
      "object": {
        "name": "d48c233a-cef5-4072-8fee-8e425695b655.319082.2_birleştirme için.docx",
        "nspace": "",
        "locator": "",
        "snap": "head",
        "version": 41926
      },
      "errors": [],
      "union_shard_errors": [
        "missing" 
      ],
      "selected_object_info": {
        "oid": {
          "oid": "d48c233a-cef5-4072-8fee-8e425695b655.319082.2_birleştirme için.docx",
          "key": "",
          "snapid": -2,
          "hash": 2273537842,
          "max": 0,
          "pool": 6,
          "namespace": "" 
        },
        "version": "31462'45912",
        "prior_version": "31410'41926",
        "last_reqid": "osd.47.0:57943766",
        "user_version": 41926,
        "size": 62411,
        "mtime": "2019-11-21 07:52:29.497853",
        "local_mtime": "2019-11-21 07:52:29.513779",
        "lost": 0,
        "flags": [
          "dirty",
          "data_digest",
          "omap_digest" 
        ],
        "legacy_snaps": [],
        "truncate_seq": 0,
        "truncate_size": 0,
        "data_digest": "0x3b9127ee",
        "omap_digest": "0xffffffff",
        "expected_object_size": 0,
        "expected_write_size": 0,
        "alloc_hint_flags": 0,
        "manifest": {
          "type": 0,
          "redirect_target": {
            "oid": "",
            "key": "",
            "snapid": 0,
            "hash": 0,
            "max": 0,
            "pool": -9223372036854776000,
            "namespace": "" 
          }
        },
        "watchers": {}
      },
      "shards": [
        {
          "osd": 9,
          "primary": false,
          "errors": [
            "missing" 
          ]
        },
        {
          "osd": 47,
          "primary": true,
          "errors": [
            "missing" 
          ]
        },
        {
          "osd": 62,
          "primary": false,
          "errors": [],
          "size": 62411,
          "omap_digest": "0xffffffff",
          "data_digest": "0x3b9127ee" 
        }
      ]
    }
  ]
}

As you can see 2/3 OSDs have "errors": ["missing"]. Primary OSD (47) with this error too, but i can GET this object by awscli (through S3 API) and md5 of this object similar to Etag (the integrity of the object is not broken). If i stop OSD-62 (which has object according to report), i can successfully get this object using S3 API.

If i run PG repair, in cluster log i can see:

36:45.145709 osd.47 osd.47 172.19.0.17:6860/1093343 4254 : cluster [ERR] 6.f32 repair : stat mismatch, got 3556/3555 objects, 0/0 clones, 3556/3555 dirty, 0/0 omap, 0/0 pinned, 0/0 hit_set_archive, 0/0 whiteouts, 11169954579/11169814803 bytes, 0/0 hit_set_archive bytes.
2019-12-05 20:36:45.148312 osd.47 osd.47 172.19.0.17:6860/1093343 4255 : cluster [ERR] 6.f32 repair 1 missing, 0 inconsistent objects
2019-12-05 20:36:45.148434 osd.47 osd.47 172.19.0.17:6860/1093343 4256 : cluster [ERR] 6.f32 repair 4 errors, 2 fixed

After repairing, i run deep-scrub again.

2019-12-06 12:50:30.742346 osd.47 osd.47 172.19.0.17:6860/1093343 4268 : cluster [ERR] 6.f32 shard 62 6:4cf6c1e1:::d48c233a-cef5-4072-8fee-8e425695b655.319082.2_ovBck%2fSEKRETERYA 28.06.2019%2fSEBLA-HUKUK%2fSebla-temyiz dilek%c3%a7esine cevap - birle%c5%9ftirme i%c3%a7in.docx:head : missing
2019-12-06 12:50:32.872768 osd.47 osd.47 172.19.0.17:6860/1093343 4269 : cluster [ERR] 6.f32 shard 9 6:4cf6c1e1:::d48c233a-cef5-4072-8fee-8e425695b655.319082.2_ovBck%2fSEKRETERYA 28.06.2019%2fSEBLA-HUKUK%2fSebla-temyiz dilek%c3%a7esine cevap - birle%c5%9ftirme i%c3%a7in.docx:head : missing
2019-12-06 12:50:32.872781 osd.47 osd.47 172.19.0.17:6860/1093343 4270 : cluster [ERR] 6.f32 shard 47 6:4cf6c1e1:::d48c233a-cef5-4072-8fee-8e425695b655.319082.2_ovBck%2fSEKRETERYA 28.06.2019%2fSEBLA-HUKUK%2fSebla-temyiz dilek%c3%a7esine cevap - birle%c5%9ftirme i%c3%a7in.docx:head : missing
...
2019-12-06 13:14:45.485929 osd.47 osd.47 172.19.0.17:6860/1093343 4272 : cluster [ERR] 6.f32 deep-scrub 1 missing, 0 inconsistent objects
2019-12-06 13:14:45.485941 osd.47 osd.47 172.19.0.17:6860/1093343 4273 : cluster [ERR] 6.f32 deep-scrub 4 errors

After deep-scrub i can see "missing" on the same 2/3 OSDs.

Why can i get object successfully from S3 when 2/3 OSD missing object?
what does "missing" meen?

History

#1 Updated by Neha Ojha 4 months ago

  • Status changed from New to Duplicate

Also available in: Atom PDF