Bug #39115: ceph pg repair doesn't fix itself if osd is bluestore - RADOS - Ceph

Actions

Copy link

Bug #39115

closed

ceph pg repair doesn't fix itself if osd is bluestore

Added by Iain Buclaw about 5 years ago. Updated over 4 years ago.

Status:

Duplicate

Priority:

Normal

Assignee:

David Zafman

Category:

Target version:

Ceph - v13.2.5

% Done:

Source:

Tags:

Backport:

Regression:

Severity:

2 - major

Reviewed:

Affected Versions:

Ceph - v13.2.5

ceph-qa-suite:

rados

Component(RADOS):

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

Running ceph pg repair on an inconsistent PG with missing data, I usually notice that the OSD is marked as down/up before the cluster returns to healthy.

e.g:

2019-04-04 11:32:47.226989 osd.3 [ERR] 51.10d shard 3 51:b0edece3::17308400658964302403:169:head : missing
2019-04-04 11:32:58.018797 osd.3 [ERR] 51.10d shard 9 51:b0fd8af8::2389926346209815023:169:head : missing
2019-04-04 11:32:58.018799 osd.3 [ERR] 51.10d shard 9 51:b0fd8af8::2389926346209815023:21:head : missing
2019-04-04 11:32:58.018813 osd.3 [ERR] 51.10d shard 9 51:b0fd8af8::2389926346209815023:22:head : missing
2019-04-04 11:32:58.018814 osd.3 [ERR] 51.10d shard 9 51:b0fd8af8::2389926346209815023:23:head : missing
2019-04-04 11:33:03.522849 mon.120 [INF] osd.3 failed (root=default,host=121) (connection refused reported by osd.6)
2019-04-04 11:33:04.122713 mon.120 [WRN] Health check failed: 1 osds down (OSD_DOWN)
2019-04-04 11:33:06.449766 mon.120 [WRN] Health check failed: Reduced data availability: 21 pgs peering (PG_AVAILABILITY)
2019-04-04 11:33:06.449797 mon.120 [INF] Health check cleared: OSD_SCRUB_ERRORS (was: 17 scrub errors)
2019-04-04 11:33:06.449812 mon.120 [INF] Health check cleared: PG_DAMAGED (was: Possible data damage: 1 pg inconsistent)
2019-04-04 11:33:08.785071 mon.120 [WRN] Health check failed: Degraded data redundancy: 1903317/24115186 objects degraded (7.893%), 121 pgs degraded (PG_DEGRADED)
2019-04-04 11:33:11.944066 mon.120 [INF] Health check cleared: PG_AVAILABILITY (was: Reduced data availability: 21 pgs peering)
2019-04-04 11:33:15.813380 mon.120 [WRN] Health check update: Degraded data redundancy: 2432784/24115186 objects degraded (10.088%), 155 pgs degraded (PG_DEGRADED)
2019-04-04 11:33:31.774861 mon.120 [INF] Health check cleared: OSD_DOWN (was: 1 osds down)
2019-04-04 11:33:31.996274 mon.120 [INF] osd.3 172.28.19.6:6804/3630259 boot
2019-04-04 11:33:32.959333 mon.120 [WRN] Health check update: Degraded data redundancy: 2177355/24115186 objects degraded (9.029%), 140 pgs degraded (PG_DEGRADED)
2019-04-04 11:33:38.913328 mon.120 [INF] Health check cleared: PG_DEGRADED (was: Degraded data redundancy: 1849337/24115186 objects degraded (7.669%), 116 pgs degraded)
2019-04-04 11:33:38.913364 mon.120 [INF] Cluster is now healthy

If the OSD is bluestore, this does not happen though, the same objects are just reported again and again as missing.

2019-04-04 14:45:33.743300 osd.9 [ERR] 51.1e5 shard 9 51:a785a6aa::3966486014759199568:169:head : missing
2019-04-04 14:45:33.743301 osd.9 [ERR] 51.1e5 shard 9 51:a785a6aa::3966486014759199568:21:head : missing
2019-04-04 14:45:33.743302 osd.9 [ERR] 51.1e5 shard 9 51:a785a6aa::3966486014759199568:22:head : missing
2019-04-04 14:45:33.743303 osd.9 [ERR] 51.1e5 shard 9 51:a785a6aa::3966486014759199568:23:head : missing
2019-04-04 14:45:33.743304 osd.9 [ERR] 51.1e5 shard 9 51:a785a6aa::3966486014759199568:24:head : missing
2019-04-04 14:45:33.743304 osd.9 [ERR] 51.1e5 shard 9 51:a785a6aa::3966486014759199568:25:head : missing
2019-04-04 14:45:33.743305 osd.9 [ERR] 51.1e5 shard 9 51:a785a6aa::3966486014759199568:26:head : missing
2019-04-04 14:45:33.743306 osd.9 [ERR] 51.1e5 shard 9 51:a785a6aa::3966486014759199568:27:head : missing
2019-04-04 14:45:33.743307 osd.9 [ERR] 51.1e5 shard 9 51:a785a6aa::3966486014759199568:28:head : missing
2019-04-04 14:45:33.743308 osd.9 [ERR] 51.1e5 shard 9 51:a785a6aa::3966486014759199568:33:head : missing
2019-04-04 14:45:33.743309 osd.9 [ERR] 51.1e5 shard 9 51:a785a6aa::3966486014759199568:34:head : missing
2019-04-04 14:45:33.743310 osd.9 [ERR] 51.1e5 shard 9 51:a785a6aa::3966486014759199568:36:head : missing
2019-04-04 14:45:39.026046 osd.9 [ERR] 51.1e5 shard 5 51:a785a6aa::3966486014759199568:169:head : missing
2019-04-04 14:45:39.026048 osd.9 [ERR] 51.1e5 shard 5 51:a785a6aa::3966486014759199568:21:head : missing
2019-04-04 14:45:39.026049 osd.9 [ERR] 51.1e5 shard 5 51:a785a6aa::3966486014759199568:22:head : missing
2019-04-04 14:45:39.026050 osd.9 [ERR] 51.1e5 shard 5 51:a785a6aa::3966486014759199568:23:head : missing
2019-04-04 14:45:39.026051 osd.9 [ERR] 51.1e5 shard 5 51:a785a6aa::3966486014759199568:24:head : missing
2019-04-04 14:45:39.026051 osd.9 [ERR] 51.1e5 shard 5 51:a785a6aa::3966486014759199568:25:head : missing
2019-04-04 14:45:39.026052 osd.9 [ERR] 51.1e5 shard 5 51:a785a6aa::3966486014759199568:26:head : missing
2019-04-04 14:45:39.026053 osd.9 [ERR] 51.1e5 shard 5 51:a785a6aa::3966486014759199568:27:head : missing
2019-04-04 14:45:39.026054 osd.9 [ERR] 51.1e5 shard 5 51:a785a6aa::3966486014759199568:28:head : missing
2019-04-04 14:45:39.026055 osd.9 [ERR] 51.1e5 shard 5 51:a785a6aa::3966486014759199568:33:head : missing
2019-04-04 14:45:39.026056 osd.9 [ERR] 51.1e5 shard 5 51:a785a6aa::3966486014759199568:34:head : missing
2019-04-04 14:45:39.026063 osd.9 [ERR] 51.1e5 shard 5 51:a785a6aa::3966486014759199568:36:head : missing
2019-04-04 14:46:05.500587 osd.9 [ERR] 51.1e5 shard 9 51:a7a6a3fe::18256662680708088428:169:head : missing
2019-04-04 14:46:10.795413 osd.9 [ERR] 51.1e5 shard 5 51:a7a6a3fe::18256662680708088428:169:head : missing
2019-04-04 14:47:46.115763 osd.9 [ERR] 51.1e5 repair 13 missing, 0 inconsistent objects
2019-04-04 14:47:46.115809 osd.9 [ERR] 51.1e5 repair 26 errors, 13 fixed
2019-04-04 14:47:51.417526 osd.9 [ERR] 51.1e5 shard 9 51:a785a6aa::3966486014759199568:169:head : missing
2019-04-04 14:47:51.417527 osd.9 [ERR] 51.1e5 shard 9 51:a785a6aa::3966486014759199568:21:head : missing
2019-04-04 14:47:51.417528 osd.9 [ERR] 51.1e5 shard 9 51:a785a6aa::3966486014759199568:22:head : missing
2019-04-04 14:47:51.417529 osd.9 [ERR] 51.1e5 shard 9 51:a785a6aa::3966486014759199568:23:head : missing
2019-04-04 14:47:51.417530 osd.9 [ERR] 51.1e5 shard 9 51:a785a6aa::3966486014759199568:24:head : missing
2019-04-04 14:47:51.417538 osd.9 [ERR] 51.1e5 shard 9 51:a785a6aa::3966486014759199568:25:head : missing
2019-04-04 14:47:51.417539 osd.9 [ERR] 51.1e5 shard 9 51:a785a6aa::3966486014759199568:26:head : missing
2019-04-04 14:47:51.417540 osd.9 [ERR] 51.1e5 shard 9 51:a785a6aa::3966486014759199568:27:head : missing
2019-04-04 14:47:51.417543 osd.9 [ERR] 51.1e5 shard 9 51:a785a6aa::3966486014759199568:28:head : missing
2019-04-04 14:47:51.417544 osd.9 [ERR] 51.1e5 shard 9 51:a785a6aa::3966486014759199568:33:head : missing
2019-04-04 14:47:51.417545 osd.9 [ERR] 51.1e5 shard 9 51:a785a6aa::3966486014759199568:34:head : missing
2019-04-04 14:47:51.417545 osd.9 [ERR] 51.1e5 shard 9 51:a785a6aa::3966486014759199568:36:head : missing
2019-04-04 14:47:56.711487 osd.9 [ERR] 51.1e5 shard 5 51:a785a6aa::3966486014759199568:169:head : missing
2019-04-04 14:47:56.711488 osd.9 [ERR] 51.1e5 shard 5 51:a785a6aa::3966486014759199568:21:head : missing
2019-04-04 14:47:56.711489 osd.9 [ERR] 51.1e5 shard 5 51:a785a6aa::3966486014759199568:22:head : missing
2019-04-04 14:47:56.711490 osd.9 [ERR] 51.1e5 shard 5 51:a785a6aa::3966486014759199568:23:head : missing
2019-04-04 14:47:56.711491 osd.9 [ERR] 51.1e5 shard 5 51:a785a6aa::3966486014759199568:24:head : missing
2019-04-04 14:47:56.711492 osd.9 [ERR] 51.1e5 shard 5 51:a785a6aa::3966486014759199568:25:head : missing
2019-04-04 14:47:56.711492 osd.9 [ERR] 51.1e5 shard 5 51:a785a6aa::3966486014759199568:26:head : missing
2019-04-04 14:47:56.711493 osd.9 [ERR] 51.1e5 shard 5 51:a785a6aa::3966486014759199568:27:head : missing
2019-04-04 14:47:56.711494 osd.9 [ERR] 51.1e5 shard 5 51:a785a6aa::3966486014759199568:28:head : missing
2019-04-04 14:47:56.711495 osd.9 [ERR] 51.1e5 shard 5 51:a785a6aa::3966486014759199568:33:head : missing
2019-04-04 14:47:56.711496 osd.9 [ERR] 51.1e5 shard 5 51:a785a6aa::3966486014759199568:34:head : missing
2019-04-04 14:47:56.711496 osd.9 [ERR] 51.1e5 shard 5 51:a785a6aa::3966486014759199568:36:head : missing
2019-04-04 14:48:23.182563 osd.9 [ERR] 51.1e5 shard 9 51:a7a6a3fe::18256662680708088428:169:head : missing
2019-04-04 14:48:28.472711 osd.9 [ERR] 51.1e5 shard 5 51:a7a6a3fe::18256662680708088428:169:head : missing
2019-04-04 14:50:03.649604 osd.9 [ERR] 51.1e5 deep-scrub 13 missing, 0 inconsistent objects
2019-04-04 14:50:03.649608 osd.9 [ERR] 51.1e5 deep-scrub 26 errors

As it so happens, the datastore where it's reported missing (osd.5) is a filestore, and I verified by inspecting the disk that all objects exist.

If I restart the osd that is using bluestore (osd.9), then the cluster returns back to healthy.

This is reproducible 100% of the time.

Related issues 1 (1 open — 0 closed)

Actions

Copy link

Updated by David Zafman about 5 years ago

It would be helpful to see a ceph pg deep-scrub (wait for it to finish) followed by the output of rados list-inconsistent-obj.

Actions

Copy link

Updated by Casey Bodley about 5 years ago

Project changed from rgw to Ceph

Actions

Copy link

Updated by Neha Ojha about 5 years ago

Project changed from Ceph to RADOS

Actions

Copy link

Updated by Iain Buclaw about 5 years ago

David Zafman wrote:

It would be helpful to see a ceph pg deep-scrub (wait for it to finish) followed by the output of rados list-inconsistent-obj.

Actually, this is more related to #39116 than what I first thought.

The following behaviour:

2019-04-04 11:33:03.522849 mon.120 [INF] osd.3 failed (root=default,host=121) (connection refused reported by osd.6)

Happens as a result of the assert in #39116, it just so happens that the cluster returns to healthy after the filestore OSD boots back up.

So perhaps the bluestore OSD does not assert / crash, but the cluster does not return to healthy either.

Actions

Copy link

Updated by Iain Buclaw about 5 years ago

# ceph pg deep-scrub 51.177
instructing pg 51.177 on osd.9 to deep-scrub

ceph -w logs:

2019-04-05 07:12:01.200548 osd.9 [ERR] 51.177 shard 0 51:eedfc6f8::4653376157940334334:47:head : missing
2019-04-05 07:12:06.580296 osd.9 [ERR] 51.177 shard 9 51:eedfc6f8::4653376157940334334:47:head : missing
2019-04-05 07:12:38.708078 osd.9 [ERR] 51.177 deep-scrub : stat mismatch, got 19714/19713 objects, 0/0 clones, 19714/19713 dirty, 0/0 omap, 0/0 pinned, 0/0 hit_set_archive, 0/0 whiteouts, 246025587/246016008 bytes, 0/0 manifest objects, 0/0 hit_set_archive bytes.
2019-04-05 07:12:38.708093 osd.9 [ERR] 51.177 deep-scrub 1 missing, 0 inconsistent objects
2019-04-05 07:12:38.708096 osd.9 [ERR] 51.177 deep-scrub 3 errors

On osd.0, confirmed file is not missing:

# find 51.177_head/ -name "*47_4653376157940334334*" 
51.177_head/DIR_7/DIR_7/DIR_B/DIR_F/47_4653376157940334334_head_1F63FB77__33

rados list-inconsistent-obj

{
    "epoch": 53141,
    "inconsistents": [
        {
            "object": {
                "name": "47",
                "nspace": "",
                "locator": "4653376157940334334",
                "snap": "head",
                "version": 4526675
            },
            "errors": [],
            "union_shard_errors": [
                "missing" 
            ],
            "selected_object_info": {
                "oid": {
                    "oid": "47",
                    "key": "4653376157940334334",
                    "snapid": -2,
                    "hash": 526646135,
                    "max": 0,
                    "pool": 51,
                    "namespace": "" 
                },
                "version": "52284'454181",
                "prior_version": "52249'377382",
                "last_reqid": "client.36576007.0:15677702",
                "user_version": 4526675,
                "size": 9579,
                "mtime": "2019-02-07 10:58:34.211435",
                "local_mtime": "2019-02-07 10:58:34.218297",
                "lost": 0,
                "flags": [
                    "dirty",
                    "data_digest",
                    "omap_digest" 
                ],
                "truncate_seq": 0,
                "truncate_size": 0,
                "data_digest": "0x39906511",
                "omap_digest": "0xffffffff",
                "expected_object_size": 0,
                "expected_write_size": 0,
                "alloc_hint_flags": 0,
                "manifest": {
                    "type": 0
                },
                "watchers": {}
            },
            "shards": [
                {
                    "osd": 0,
                    "primary": false,
                    "errors": [],
                    "size": 9579,
                    "omap_digest": "0xffffffff",
                    "data_digest": "0x39906511" 
                },
                {
                    "osd": 9,
                    "primary": true,
                    "errors": [
                        "missing" 
                    ]
                }
            ]
        }
    ]
}

Repairing:

# ceph pg repair 51.177
instructing pg 51.177 on osd.9 to repair

ceph -w logs:

2019-04-05 07:31:34.248658 osd.9 [ERR] 51.177 shard 9 51:eee4d2c1::171336252089860175:169:head : missing
2019-04-05 07:31:39.532834 osd.9 [ERR] 51.177 shard 0 51:eee4d2c1::171336252089860175:169:head : missing
2019-04-05 07:31:39.532835 osd.9 [ERR] 51.177 shard 0 51:eeea33e8::383290891110620472:41:head : missing
2019-04-05 07:31:44.802083 osd.9 [ERR] 51.177 shard 9 51:eeea33e8::383290891110620472:41:head : missing
2019-04-05 07:32:06.433237 mon.ap-120 [INF] osd.9 failed (root=default,host=ap-124) (connection refused reported by osd.8)
2019-04-05 07:32:06.485653 mon.ap-120 [WRN] Health check failed: 1 osds down (OSD_DOWN)
2019-04-05 07:32:09.979661 mon.ap-120 [WRN] Health check failed: Reduced data availability: 3 pgs inactive, 41 pgs peering (PG_AVAILABILITY)
2019-04-05 07:32:09.979712 mon.ap-120 [WRN] Health check failed: Degraded data redundancy: 312778/23936452 objects degraded (1.307%), 18 pgs degraded (PG_DEGRADED)
2019-04-05 07:32:11.240626 mon.ap-120 [INF] Health check cleared: OSD_SCRUB_ERRORS (was: 3 scrub errors)
2019-04-05 07:32:11.240652 mon.ap-120 [INF] Health check cleared: PG_DAMAGED (was: Possible data damage: 1 pg inconsistent)
2019-04-05 07:32:13.483748 mon.ap-120 [INF] Health check cleared: PG_AVAILABILITY (was: Reduced data availability: 3 pgs inactive, 41 pgs peering)
2019-04-05 07:32:19.295771 mon.ap-120 [WRN] Health check update: Degraded data redundancy: 2429737/23936452 objects degraded (10.151%), 159 pgs degraded (PG_DEGRADED)
2019-04-05 07:32:25.339488 mon.ap-120 [WRN] Health check update: Degraded data redundancy: 2429737/23936488 objects degraded (10.151%), 159 pgs degraded (PG_DEGRADED)
2019-04-05 07:32:35.207942 mon.ap-120 [INF] Health check cleared: OSD_DOWN (was: 1 osds down)
2019-04-05 07:32:35.580655 mon.ap-120 [INF] osd.9 172.28.19.9:6800/379046 boot
2019-04-05 07:32:37.834878 mon.ap-120 [WRN] Health check update: Degraded data redundancy: 2228323/23936488 objects degraded (9.309%), 147 pgs degraded (PG_DEGRADED)
2019-04-05 07:32:41.458732 mon.ap-120 [INF] Health check cleared: PG_DEGRADED (was: Degraded data redundancy: 1335902/23936488 objects degraded (5.581%), 85 pgs degraded)
2019-04-05 07:32:41.458757 mon.ap-120 [INF] Cluster is now healthy

Running deep scrub again:

2019-04-05 07:43:42.636565 osd.9 [ERR] 51.177 shard 9 51:eee4d2c1::171336252089860175:169:head : missing
2019-04-05 07:43:47.926850 osd.9 [ERR] 51.177 shard 0 51:eee4d2c1::171336252089860175:169:head : missing
2019-04-05 07:43:47.926851 osd.9 [ERR] 51.177 shard 0 51:eeea33e8::383290891110620472:41:head : missing
2019-04-05 07:43:53.195249 osd.9 [ERR] 51.177 shard 9 51:eeea33e8::383290891110620472:41:head : missing
2019-04-05 07:44:14.234075 osd.9 [ERR] 51.177 deep-scrub : stat mismatch, got 19733/19731 objects, 0/0 clones, 19733/19731 dirty, 0/0 omap, 0/0 pinned, 0/0 hit_set_archive, 0/0 whiteouts, 247108485/247093139 bytes, 0/0 manifest objects, 0/0 hit_set_archive bytes.
2019-04-05 07:44:14.234086 osd.9 [ERR] 51.177 deep-scrub 2 missing, 0 inconsistent objects
2019-04-05 07:44:14.234092 osd.9 [ERR] 51.177 deep-scrub 5 errors
2019-04-05 07:44:16.666888 mon.ap-120 [ERR] Health check update: 24 scrub errors (OSD_SCRUB_ERRORS)
2019-04-05 07:44:16.666934 mon.ap-120 [ERR] Health check update: Possible data damage: 2 pgs inconsistent (PG_DAMAGED)

rados list-inconsistent-obj again:

{
    "epoch": 53148,
    "inconsistents": [
        {
            "object": {
                "name": "169",
                "nspace": "",
                "locator": "171336252089860175",
                "snap": "head",
                "version": 4440812
            },
            "errors": [],
            "union_shard_errors": [
                "missing" 
            ],
            "selected_object_info": {
                "oid": {
                    "oid": "169",
                    "key": "171336252089860175",
                    "snapid": -2,
                    "hash": 2202740599,
                    "max": 0,
                    "pool": 51,
                    "namespace": "" 
                },
                "version": "50939'367208",
                "prior_version": "50939'336508",
                "last_reqid": "client.36575077.0:14561030",
                "user_version": 4440812,
                "size": 2005,
                "mtime": "2019-01-20 20:54:12.394827",
                "local_mtime": "2019-01-20 20:54:12.392736",
                "lost": 0,
                "flags": [
                    "dirty",
                    "data_digest",
                    "omap_digest" 
                ],
                "truncate_seq": 0,
                "truncate_size": 0,
                "data_digest": "0x54b33d8e",
                "omap_digest": "0xffffffff",
                "expected_object_size": 0,
                "expected_write_size": 0,
                "alloc_hint_flags": 0,
                "manifest": {
                    "type": 0
                },
                "watchers": {}
            },
            "shards": [
                {
                    "osd": 0,
                    "primary": false,
                    "errors": [
                        "missing" 
                    ]
                },
                {
                    "osd": 9,
                    "primary": true,
                    "errors": [],
                    "size": 2005,
                    "omap_digest": "0xffffffff",
                    "data_digest": "0x54b33d8e" 
                }
            ]
        },
        {
            "object": {
                "name": "41",
                "nspace": "",
                "locator": "383290891110620472",
                "snap": "head",
                "version": 4373192
            },
            "errors": [],
            "union_shard_errors": [
                "missing" 
            ],
            "selected_object_info": {
                "oid": {
                    "oid": "41",
                    "key": "383290891110620472",
                    "snapid": -2,
                    "hash": 399267703,
                    "max": 0,
                    "pool": 51,
                    "namespace": "" 
                },
                "version": "53145'527997",
                "prior_version": "50939'286779",
                "last_reqid": "osd.9.0:707605",
                "user_version": 4373192,
                "size": 13341,
                "mtime": "2018-12-12 06:37:42.094992",
                "local_mtime": "2018-12-12 06:37:42.096127",
                "lost": 0,
                "flags": [
                    "dirty",
                    "data_digest",
                    "omap_digest" 
                ],
                "truncate_seq": 0,
                "truncate_size": 0,
                "data_digest": "0xc89910ed",
                "omap_digest": "0xffffffff",
                "expected_object_size": 0,
                "expected_write_size": 0,
                "alloc_hint_flags": 0,
                "manifest": {
                    "type": 0
                },
                "watchers": {}
            },
            "shards": [
                {
                    "osd": 0,
                    "primary": false,
                    "errors": [],
                    "size": 13341,
                    "omap_digest": "0xffffffff",
                    "data_digest": "0xc89910ed" 
                },
                {
                    "osd": 9,
                    "primary": true,
                    "errors": [
                        "missing" 
                    ]
                }
            ]
        }
    ]
}

ceph pg repair again:

2019-04-05 07:49:09.475060 osd.9 [ERR] 51.177 shard 9 51:eee4d2c1::171336252089860175:169:head : missing
2019-04-05 07:49:14.755573 osd.9 [ERR] 51.177 shard 0 51:eee4d2c1::171336252089860175:169:head : missing
2019-04-05 07:49:14.755575 osd.9 [ERR] 51.177 shard 0 51:eeea33e8::383290891110620472:41:head : missing
2019-04-05 07:49:20.021187 osd.9 [ERR] 51.177 shard 9 51:eeea33e8::383290891110620472:41:head : missing
2019-04-05 07:49:41.227801 mon.ap-120 [INF] osd.9 failed (root=default,host=ap-124) (connection refused reported by osd.7)
2019-04-05 07:49:41.787921 mon.ap-120 [WRN] Health check failed: 1 osds down (OSD_DOWN)
2019-04-05 07:49:44.132558 mon.ap-120 [WRN] Health check failed: Reduced data availability: 3 pgs inactive, 69 pgs peering (PG_AVAILABILITY)
2019-04-05 07:49:46.327214 mon.ap-120 [WRN] Health check failed: Degraded data redundancy: 1052659/23936848 objects degraded (4.398%), 74 pgs degraded (PG_DEGRADED)
2019-04-05 07:49:46.327246 mon.ap-120 [INF] Health check cleared: OSD_SCRUB_ERRORS (was: 24 scrub errors)
2019-04-05 07:49:46.327259 mon.ap-120 [INF] Health check cleared: PG_DAMAGED (was: Possible data damage: 2 pgs inconsistent)
2019-04-05 07:49:49.745143 mon.ap-120 [WRN] Health check update: Reduced data availability: 1 pg inactive, 39 pgs peering (PG_AVAILABILITY)
2019-04-05 07:49:49.858132 mon.ap-120 [INF] Health check cleared: PG_AVAILABILITY (was: Reduced data availability: 1 pg inactive, 39 pgs peering)
2019-04-05 07:49:54.745487 mon.ap-120 [WRN] Health check update: Degraded data redundancy: 2429737/23936848 objects degraded (10.151%), 159 pgs degraded (PG_DEGRADED)
2019-04-05 07:50:11.077110 mon.ap-120 [INF] Health check cleared: OSD_DOWN (was: 1 osds down)
2019-04-05 07:50:11.339766 mon.ap-120 [INF] osd.9 172.28.19.9:6800/386390 boot
2019-04-05 07:50:14.761104 mon.ap-120 [WRN] Health check update: Degraded data redundancy: 1757404/23936848 objects degraded (7.342%), 112 pgs degraded (PG_DEGRADED)
2019-04-05 07:50:16.859843 mon.ap-120 [INF] Health check cleared: PG_DEGRADED (was: Degraded data redundancy: 1757404/23936848 objects degraded (7.342%), 112 pgs degraded)
2019-04-05 07:50:16.859884 mon.ap-120 [INF] Cluster is now healthy

Restarted both osd.0 and osd.9, running deep scrub for third time:

2019-04-05 08:00:44.170028 osd.9 [ERR] 51.177 shard 9 51:eee4d2c1::171336252089860175:169:head : missing
2019-04-05 08:00:49.461396 osd.9 [ERR] 51.177 shard 0 51:eee4d2c1::171336252089860175:169:head : missing
2019-04-05 08:00:49.461398 osd.9 [ERR] 51.177 shard 0 51:eeea33e8::383290891110620472:41:head : missing
2019-04-05 08:00:54.735733 osd.9 [ERR] 51.177 shard 9 51:eeea33e8::383290891110620472:41:head : missing
2019-04-05 08:01:20.882362 mon.ap-120 [ERR] Health check update: 24 scrub errors (OSD_SCRUB_ERRORS)
2019-04-05 08:01:20.882390 mon.ap-120 [ERR] Health check update: Possible data damage: 2 pgs inconsistent (PG_DAMAGED)
2019-04-05 08:01:15.761672 osd.9 [ERR] 51.177 deep-scrub : stat mismatch, got 19733/19731 objects, 0/0 clones, 19733/19731 dirty, 0/0 omap, 0/0 pinned, 0/0 hit_set_archive, 0/0 whiteouts, 247108485/247093139 bytes, 0/0 manifest objects, 0/0 hit_set_archive bytes.
2019-04-05 08:01:15.761683 osd.9 [ERR] 51.177 deep-scrub 2 missing, 0 inconsistent objects
2019-04-05 08:01:15.761685 osd.9 [ERR] 51.177 deep-scrub 5 errors

So my initial assumption was wrong, pg repair isn't working at all.

Actions

Copy link

Updated by Iain Buclaw about 5 years ago

Another PG, where the missing is reported on osd.0/filestore (not osd.9/bluestore in the previous).

# ceph pg deep-scrub 51.1fa
instructing pg 51.1fa on osd.0 to deep-scrub

Logs:

2019-04-05 08:11:42.606348 osd.0 [ERR] 51.1fa shard 9 51:5fd90f47::193061343227287063:169:head : missing
2019-04-05 08:11:47.944111 osd.0 [ERR] 51.1fa shard 0 51:5fd90f47::193061343227287063:169:head : missing
2019-04-05 08:11:58.658938 osd.0 [ERR] 51.1fa shard 9 51:5fe9e70d::238938732312934742:169:head : missing
2019-04-05 08:11:58.658940 osd.0 [ERR] 51.1fa shard 9 51:5fe9e70d::238938732312934742:21:head : missing
2019-04-05 08:11:58.658940 osd.0 [ERR] 51.1fa shard 9 51:5fe9e70d::238938732312934742:22:head : missing
2019-04-05 08:11:58.658941 osd.0 [ERR] 51.1fa shard 9 51:5fe9e70d::238938732312934742:23:head : missing
2019-04-05 08:12:04.019376 osd.0 [ERR] 51.1fa shard 0 51:5fe9e70d::238938732312934742:169:head : missing
2019-04-05 08:12:04.019379 osd.0 [ERR] 51.1fa shard 0 51:5fe9e70d::238938732312934742:21:head : missing
2019-04-05 08:12:04.019380 osd.0 [ERR] 51.1fa shard 0 51:5fe9e70d::238938732312934742:22:head : missing
2019-04-05 08:12:04.019381 osd.0 [ERR] 51.1fa shard 0 51:5fe9e70d::238938732312934742:23:head : missing
2019-04-05 08:12:25.412986 osd.0 [ERR] 51.1fa deep-scrub : stat mismatch, got 19524/19519 objects, 0/0 clones, 19524/19519 dirty, 0/0 omap, 0/0 pinned, 0/0 hit_set_archive, 0/0 whiteouts, 227386438/227371540 bytes, 0/0 manifest objects, 0/0 hit_set_archive bytes.
2019-04-05 08:12:25.412999 osd.0 [ERR] 51.1fa deep-scrub 5 missing, 0 inconsistent objects
2019-04-05 08:12:25.413002 osd.0 [ERR] 51.1fa deep-scrub 11 errors

rados list-inconsistent-obj

{
    "epoch": 53170,
    "inconsistents": [
        {
            "object": {
                "name": "169",
                "nspace": "",
                "locator": "193061343227287063",
                "snap": "head",
                "version": 4309859
            },
            "errors": [],
            "union_shard_errors": [
                "missing" 
            ],
            "selected_object_info": {
                "oid": {
                    "oid": "169",
                    "key": "193061343227287063",
                    "snapid": -2,
                    "hash": 3807419386,
                    "max": 0,
                    "pool": 51,
                    "namespace": "" 
                },
                "version": "50939'299113",
                "prior_version": "0'0",
                "last_reqid": "client.36576007.0:7932947",
                "user_version": 4309859,
                "size": 2160,
                "mtime": "2018-12-19 10:15:30.411967",
                "local_mtime": "2018-12-19 10:15:30.412661",
                "lost": 0,
                "flags": [
                    "dirty",
                    "data_digest" 
                ],
                "truncate_seq": 0,
                "truncate_size": 0,
                "data_digest": "0xd7ca497f",
                "omap_digest": "0xffffffff",
                "expected_object_size": 0,
                "expected_write_size": 0,
                "alloc_hint_flags": 0,
                "manifest": {
                    "type": 0
                },
                "watchers": {}
            },
            "shards": [
                {
                    "osd": 0,
                    "primary": true,
                    "errors": [
                        "missing" 
                    ]
                },
                {
                    "osd": 9,
                    "primary": false,
                    "errors": [],
                    "size": 2160,
                    "omap_digest": "0xffffffff",
                    "data_digest": "0xd7ca497f" 
                }
            ]
        },
        {
            "object": {
                "name": "169",
                "nspace": "",
                "locator": "238938732312934742",
                "snap": "head",
                "version": 4285966
            },
            "errors": [],
            "union_shard_errors": [
                "missing" 
            ],
            "selected_object_info": {
                "oid": {
                    "oid": "169",
                    "key": "238938732312934742",
                    "snapid": -2,
                    "hash": 2967967738,
                    "max": 0,
                    "pool": 51,
                    "namespace": "" 
                },
                "version": "50939'271117",
                "prior_version": "50905'206245",
                "last_reqid": "client.36574111.0:5414590",
                "user_version": 4285966,
                "size": 4794,
                "mtime": "2018-12-06 01:14:21.215731",
                "local_mtime": "2018-12-06 01:14:21.216145",
                "lost": 0,
                "flags": [
                    "dirty",
                    "data_digest" 
                ],
                "truncate_seq": 0,
                "truncate_size": 0,
                "data_digest": "0x5edc7438",
                "omap_digest": "0xffffffff",
                "expected_object_size": 0,
                "expected_write_size": 0,
                "alloc_hint_flags": 0,
                "manifest": {
                    "type": 0
                },
                "watchers": {}
            },
            "shards": [
                {
                    "osd": 0,
                    "primary": true,
                    "errors": [
                        "missing" 
                    ]
                },
                {
                    "osd": 9,
                    "primary": false,
                    "errors": [],
                    "size": 4794,
                    "omap_digest": "0xffffffff",
                    "data_digest": "0x5edc7438" 
                }
            ]
        },
        {
            "object": {
                "name": "21",
                "nspace": "",
                "locator": "238938732312934742",
                "snap": "head",
                "version": 4285956
            },
            "errors": [],
            "union_shard_errors": [
                "missing" 
            ],
            "selected_object_info": {
                "oid": {
                    "oid": "21",
                    "key": "238938732312934742",
                    "snapid": -2,
                    "hash": 2967967738,
                    "max": 0,
                    "pool": 51,
                    "namespace": "" 
                },
                "version": "50939'271107",
                "prior_version": "50905'206235",
                "last_reqid": "client.36574111.0:5414580",
                "user_version": 4285956,
                "size": 1014,
                "mtime": "2018-12-06 01:14:21.144941",
                "local_mtime": "2018-12-06 01:14:21.145108",
                "lost": 0,
                "flags": [
                    "dirty",
                    "data_digest" 
                ],
                "truncate_seq": 0,
                "truncate_size": 0,
                "data_digest": "0x224edafa",
                "omap_digest": "0xffffffff",
                "expected_object_size": 0,
                "expected_write_size": 0,
                "alloc_hint_flags": 0,
                "manifest": {
                    "type": 0
                },
                "watchers": {}
            },
            "shards": [
                {
                    "osd": 0,
                    "primary": true,
                    "errors": [
                        "missing" 
                    ]
                },
                {
                    "osd": 9,
                    "primary": false,
                    "errors": [],
                    "size": 1014,
                    "omap_digest": "0xffffffff",
                    "data_digest": "0x224edafa" 
                }
            ]
        },
        {
            "object": {
                "name": "22",
                "nspace": "",
                "locator": "238938732312934742",
                "snap": "head",
                "version": 4285957
            },
            "errors": [],
            "union_shard_errors": [
                "missing" 
            ],
            "selected_object_info": {
                "oid": {
                    "oid": "22",
                    "key": "238938732312934742",
                    "snapid": -2,
                    "hash": 2967967738,
                    "max": 0,
                    "pool": 51,
                    "namespace": "" 
                },
                "version": "50939'271108",
                "prior_version": "50905'206236",
                "last_reqid": "client.36574111.0:5414581",
                "user_version": 4285957,
                "size": 2136,
                "mtime": "2018-12-06 01:14:21.151310",
                "local_mtime": "2018-12-06 01:14:21.151584",
                "lost": 0,
                "flags": [
                    "dirty",
                    "data_digest" 
                ],
                "truncate_seq": 0,
                "truncate_size": 0,
                "data_digest": "0x5f174e41",
                "omap_digest": "0xffffffff",
                "expected_object_size": 0,
                "expected_write_size": 0,
                "alloc_hint_flags": 0,
                "manifest": {
                    "type": 0
                },
                "watchers": {}
            },
            "shards": [
                {
                    "osd": 0,
                    "primary": true,
                    "errors": [
                        "missing" 
                    ]
                },
                {
                    "osd": 9,
                    "primary": false,
                    "errors": [],
                    "size": 2136,
                    "omap_digest": "0xffffffff",
                    "data_digest": "0x5f174e41" 
                }
            ]
        },
        {
            "object": {
                "name": "23",
                "nspace": "",
                "locator": "238938732312934742",
                "snap": "head",
                "version": 4285958
            },
            "errors": [],
            "union_shard_errors": [
                "missing" 
            ],
            "selected_object_info": {
                "oid": {
                    "oid": "23",
                    "key": "238938732312934742",
                    "snapid": -2,
                    "hash": 2967967738,
                    "max": 0,
                    "pool": 51,
                    "namespace": "" 
                },
                "version": "50939'271109",
                "prior_version": "50905'206237",
                "last_reqid": "client.36574111.0:5414582",
                "user_version": 4285958,
                "size": 4794,
                "mtime": "2018-12-06 01:14:21.157871",
                "local_mtime": "2018-12-06 01:14:21.158320",
                "lost": 0,
                "flags": [
                    "dirty",
                    "data_digest" 
                ],
                "truncate_seq": 0,
                "truncate_size": 0,
                "data_digest": "0x5edc7438",
                "omap_digest": "0xffffffff",
                "expected_object_size": 0,
                "expected_write_size": 0,
                "alloc_hint_flags": 0,
                "manifest": {
                    "type": 0
                },
                "watchers": {}
            },
            "shards": [
                {
                    "osd": 0,
                    "primary": true,
                    "errors": [
                        "missing" 
                    ]
                },
                {
                    "osd": 9,
                    "primary": false,
                    "errors": [],
                    "size": 4794,
                    "omap_digest": "0xffffffff",
                    "data_digest": "0x5edc7438" 
                }
            ]
        }
    ]
}

Checked one of the reported missing files on osd.0, and the file is not missing on the physical disk.

# find 51.1fa_head -name "*169_193061343227287063*" 
51.1fa_head/DIR_A/DIR_F/DIR_B/DIR_9/DIR_0/169_193061343227287063_head_E2F09BFA__33

ceph pg repair 51.1fa logs

2019-04-05 08:16:56.694007 osd.0 [ERR] 51.1fa shard 9 51:5fd90f47::193061343227287063:169:head : missing
2019-04-05 08:17:01.955389 osd.0 [ERR] 51.1fa shard 0 51:5fd90f47::193061343227287063:169:head : missing
2019-04-05 08:17:12.493701 osd.0 [ERR] 51.1fa shard 9 51:5fe9e70d::238938732312934742:169:head : missing
2019-04-05 08:17:12.493703 osd.0 [ERR] 51.1fa shard 9 51:5fe9e70d::238938732312934742:21:head : missing
2019-04-05 08:17:12.493704 osd.0 [ERR] 51.1fa shard 9 51:5fe9e70d::238938732312934742:22:head : missing
2019-04-05 08:17:12.493704 osd.0 [ERR] 51.1fa shard 9 51:5fe9e70d::238938732312934742:23:head : missing
2019-04-05 08:17:17.773788 osd.0 [ERR] 51.1fa shard 0 51:5fe9e70d::238938732312934742:169:head : missing
2019-04-05 08:17:17.773790 osd.0 [ERR] 51.1fa shard 0 51:5fe9e70d::238938732312934742:21:head : missing
2019-04-05 08:17:17.773791 osd.0 [ERR] 51.1fa shard 0 51:5fe9e70d::238938732312934742:22:head : missing
2019-04-05 08:17:17.773792 osd.0 [ERR] 51.1fa shard 0 51:5fe9e70d::238938732312934742:23:head : missing
2019-04-05 08:17:38.850212 mon.ap-120 [INF] osd.0 failed (root=default,host=ap-120) (connection refused reported by osd.4)
2019-04-05 08:17:39.202716 mon.ap-120 [WRN] Health check failed: 1 osds down (OSD_DOWN)
2019-04-05 08:17:42.869226 mon.ap-120 [ERR] Health check update: 19 scrub errors (OSD_SCRUB_ERRORS)
2019-04-05 08:17:42.869254 mon.ap-120 [ERR] Health check update: Possible data damage: 1 pg inconsistent (PG_DAMAGED)
2019-04-05 08:17:42.869269 mon.ap-120 [WRN] Health check failed: Degraded data redundancy: 199697/23937388 objects degraded (0.834%), 15 pgs degraded (PG_DEGRADED)
2019-04-05 08:17:50.515536 mon.ap-120 [WRN] Health check update: Degraded data redundancy: 2394605/23937388 objects degraded (10.004%), 158 pgs degraded (PG_DEGRADED)
2019-04-05 08:18:04.646456 mon.ap-120 [INF] Health check cleared: OSD_DOWN (was: 1 osds down)
2019-04-05 08:18:04.817317 mon.ap-120 [INF] osd.0 172.28.19.5:6804/977646 boot
2019-04-05 08:18:09.017805 mon.ap-120 [WRN] Health check update: Degraded data redundancy: 1450925/23937388 objects degraded (6.061%), 84 pgs degraded (PG_DEGRADED)
2019-04-05 08:18:11.384578 mon.ap-120 [INF] Health check cleared: PG_DEGRADED (was: Degraded data redundancy: 1450925/23937388 objects degraded (6.061%), 84 pgs degraded)

osd.0 crashed with same assert stack trace.

Running deep-scrub again...

2019-04-05 08:21:15.039834 osd.0 [ERR] 51.1fa shard 9 51:5fd90f47::193061343227287063:169:head : missing
2019-04-05 08:21:20.318156 osd.0 [ERR] 51.1fa shard 0 51:5fd90f47::193061343227287063:169:head : missing
2019-04-05 08:21:30.853106 osd.0 [ERR] 51.1fa shard 9 51:5fe9e70d::238938732312934742:169:head : missing
2019-04-05 08:21:30.853108 osd.0 [ERR] 51.1fa shard 9 51:5fe9e70d::238938732312934742:21:head : missing
2019-04-05 08:21:30.853109 osd.0 [ERR] 51.1fa shard 9 51:5fe9e70d::238938732312934742:22:head : missing
2019-04-05 08:21:30.853111 osd.0 [ERR] 51.1fa shard 9 51:5fe9e70d::238938732312934742:23:head : missing
2019-04-05 08:21:36.134245 osd.0 [ERR] 51.1fa shard 0 51:5fe9e70d::238938732312934742:169:head : missing
2019-04-05 08:21:36.134246 osd.0 [ERR] 51.1fa shard 0 51:5fe9e70d::238938732312934742:21:head : missing
2019-04-05 08:21:36.134247 osd.0 [ERR] 51.1fa shard 0 51:5fe9e70d::238938732312934742:22:head : missing
2019-04-05 08:21:36.134248 osd.0 [ERR] 51.1fa shard 0 51:5fe9e70d::238938732312934742:23:head : missing
2019-04-05 08:22:00.949764 mon.ap-120 [ERR] Health check update: 30 scrub errors (OSD_SCRUB_ERRORS)
2019-04-05 08:22:00.949791 mon.ap-120 [ERR] Health check update: Possible data damage: 2 pgs inconsistent (PG_DAMAGED)
2019-04-05 08:21:57.076095 osd.0 [ERR] 51.1fa deep-scrub : stat mismatch, got 19524/19519 objects, 0/0 clones, 19524/19519 dirty, 0/0 omap, 0/0 pinned, 0/0 hit_set_archive, 0/0 whiteouts, 227386438/227371540 bytes, 0/0 manifest objects, 0/0 hit_set_archive bytes.
2019-04-05 08:21:57.076105 osd.0 [ERR] 51.1fa deep-scrub 5 missing, 0 inconsistent objects
2019-04-05 08:21:57.076107 osd.0 [ERR] 51.1fa deep-scrub 11 errors

And rados list-inconsistent-obj again.

{
    "epoch": 53177,
    "inconsistents": [
        {
            "object": {
                "name": "169",
                "nspace": "",
                "locator": "193061343227287063",
                "snap": "head",
                "version": 4309859
            },
            "errors": [],
            "union_shard_errors": [
                "missing" 
            ],
            "selected_object_info": {
                "oid": {
                    "oid": "169",
                    "key": "193061343227287063",
                    "snapid": -2,
                    "hash": 3807419386,
                    "max": 0,
                    "pool": 51,
                    "namespace": "" 
                },
                "version": "50939'299113",
                "prior_version": "0'0",
                "last_reqid": "client.36576007.0:7932947",
                "user_version": 4309859,
                "size": 2160,
                "mtime": "2018-12-19 10:15:30.411967",
                "local_mtime": "2018-12-19 10:15:30.412661",
                "lost": 0,
                "flags": [
                    "dirty",
                    "data_digest" 
                ],
                "truncate_seq": 0,
                "truncate_size": 0,
                "data_digest": "0xd7ca497f",
                "omap_digest": "0xffffffff",
                "expected_object_size": 0,
                "expected_write_size": 0,
                "alloc_hint_flags": 0,
                "manifest": {
                    "type": 0
                },
                "watchers": {}
            },
            "shards": [
                {
                    "osd": 0,
                    "primary": true,
                    "errors": [
                        "missing" 
                    ]
                },
                {
                    "osd": 9,
                    "primary": false,
                    "errors": [],
                    "size": 2160,
                    "omap_digest": "0xffffffff",
                    "data_digest": "0xd7ca497f" 
                }
            ]
        },
        {
            "object": {
                "name": "169",
                "nspace": "",
                "locator": "238938732312934742",
                "snap": "head",
                "version": 4285966
            },
            "errors": [],
            "union_shard_errors": [
                "missing" 
            ],
            "selected_object_info": {
                "oid": {
                    "oid": "169",
                    "key": "238938732312934742",
                    "snapid": -2,
                    "hash": 2967967738,
                    "max": 0,
                    "pool": 51,
                    "namespace": "" 
                },
                "version": "50939'271117",
                "prior_version": "50905'206245",
                "last_reqid": "client.36574111.0:5414590",
                "user_version": 4285966,
                "size": 4794,
                "mtime": "2018-12-06 01:14:21.215731",
                "local_mtime": "2018-12-06 01:14:21.216145",
                "lost": 0,
                "flags": [
                    "dirty",
                    "data_digest" 
                ],
                "truncate_seq": 0,
                "truncate_size": 0,
                "data_digest": "0x5edc7438",
                "omap_digest": "0xffffffff",
                "expected_object_size": 0,
                "expected_write_size": 0,
                "alloc_hint_flags": 0,
                "manifest": {
                    "type": 0
                },
                "watchers": {}
            },
            "shards": [
                {
                    "osd": 0,
                    "primary": true,
                    "errors": [
                        "missing" 
                    ]
                },
                {
                    "osd": 9,
                    "primary": false,
                    "errors": [],
                    "size": 4794,
                    "omap_digest": "0xffffffff",
                    "data_digest": "0x5edc7438" 
                }
            ]
        },
        {
            "object": {
                "name": "21",
                "nspace": "",
                "locator": "238938732312934742",
                "snap": "head",
                "version": 4285956
            },
            "errors": [],
            "union_shard_errors": [
                "missing" 
            ],
            "selected_object_info": {
                "oid": {
                    "oid": "21",
                    "key": "238938732312934742",
                    "snapid": -2,
                    "hash": 2967967738,
                    "max": 0,
                    "pool": 51,
                    "namespace": "" 
                },
                "version": "50939'271107",
                "prior_version": "50905'206235",
                "last_reqid": "client.36574111.0:5414580",
                "user_version": 4285956,
                "size": 1014,
                "mtime": "2018-12-06 01:14:21.144941",
                "local_mtime": "2018-12-06 01:14:21.145108",
                "lost": 0,
                "flags": [
                    "dirty",
                    "data_digest" 
                ],
                "truncate_seq": 0,
                "truncate_size": 0,
                "data_digest": "0x224edafa",
                "omap_digest": "0xffffffff",
                "expected_object_size": 0,
                "expected_write_size": 0,
                "alloc_hint_flags": 0,
                "manifest": {
                    "type": 0
                },
                "watchers": {}
            },
            "shards": [
                {
                    "osd": 0,
                    "primary": true,
                    "errors": [
                        "missing" 
                    ]
                },
                {
                    "osd": 9,
                    "primary": false,
                    "errors": [],
                    "size": 1014,
                    "omap_digest": "0xffffffff",
                    "data_digest": "0x224edafa" 
                }
            ]
        },
        {
            "object": {
                "name": "22",
                "nspace": "",
                "locator": "238938732312934742",
                "snap": "head",
                "version": 4285957
            },
            "errors": [],
            "union_shard_errors": [
                "missing" 
            ],
            "selected_object_info": {
                "oid": {
                    "oid": "22",
                    "key": "238938732312934742",
                    "snapid": -2,
                    "hash": 2967967738,
                    "max": 0,
                    "pool": 51,
                    "namespace": "" 
                },
                "version": "50939'271108",
                "prior_version": "50905'206236",
                "last_reqid": "client.36574111.0:5414581",
                "user_version": 4285957,
                "size": 2136,
                "mtime": "2018-12-06 01:14:21.151310",
                "local_mtime": "2018-12-06 01:14:21.151584",
                "lost": 0,
                "flags": [
                    "dirty",
                    "data_digest" 
                ],
                "truncate_seq": 0,
                "truncate_size": 0,
                "data_digest": "0x5f174e41",
                "omap_digest": "0xffffffff",
                "expected_object_size": 0,
                "expected_write_size": 0,
                "alloc_hint_flags": 0,
                "manifest": {
                    "type": 0
                },
                "watchers": {}
            },
            "shards": [
                {
                    "osd": 0,
                    "primary": true,
                    "errors": [
                        "missing" 
                    ]
                },
                {
                    "osd": 9,
                    "primary": false,
                    "errors": [],
                    "size": 2136,
                    "omap_digest": "0xffffffff",
                    "data_digest": "0x5f174e41" 
                }
            ]
        },
        {
            "object": {
                "name": "23",
                "nspace": "",
                "locator": "238938732312934742",
                "snap": "head",
                "version": 4285958
            },
            "errors": [],
            "union_shard_errors": [
                "missing" 
            ],
            "selected_object_info": {
                "oid": {
                    "oid": "23",
                    "key": "238938732312934742",
                    "snapid": -2,
                    "hash": 2967967738,
                    "max": 0,
                    "pool": 51,
                    "namespace": "" 
                },
                "version": "50939'271109",
                "prior_version": "50905'206237",
                "last_reqid": "client.36574111.0:5414582",
                "user_version": 4285958,
                "size": 4794,
                "mtime": "2018-12-06 01:14:21.157871",
                "local_mtime": "2018-12-06 01:14:21.158320",
                "lost": 0,
                "flags": [
                    "dirty",
                    "data_digest" 
                ],
                "truncate_seq": 0,
                "truncate_size": 0,
                "data_digest": "0x5edc7438",
                "omap_digest": "0xffffffff",
                "expected_object_size": 0,
                "expected_write_size": 0,
                "alloc_hint_flags": 0,
                "manifest": {
                    "type": 0
                },
                "watchers": {}
            },
            "shards": [
                {
                    "osd": 0,
                    "primary": true,
                    "errors": [
                        "missing" 
                    ]
                },
                {
                    "osd": 9,
                    "primary": false,
                    "errors": [],
                    "size": 4794,
                    "omap_digest": "0xffffffff",
                    "data_digest": "0x5edc7438" 
                }
            ]
        }
    ]
}

So the same behaviour on both the filestore and bluestore osd.

Actions

Copy link

Updated by David Zafman almost 5 years ago

Assignee set to David Zafman

Actions

Copy link

Updated by David Zafman almost 5 years ago

Since OSDs are crashing we should get stack traces out of the logs (e.g osd.9). Per http://tracker.ceph.com/issues/39115#note-5 osd.9 crashes right when the missing copy would be getting pulled (copied) from osd.0 to osd.9.

Actions

Copy link

Updated by Iain Buclaw almost 5 years ago

See #39116 for the stack trace.

I initially thought that this and the other issue were two separate problems. However they could be related, thereby one of them is a duplicate.

Actions

Copy link

#10

Updated by David Zafman over 4 years ago

Is duplicate of Bug #39116: Draining filestore osd, removing, and adding new bluestore osd causes OSDs to crash added

Actions

Copy link

#11

Updated by David Zafman over 4 years ago

Status changed from New to Duplicate

OSD crashes are the underlying issue here and we can't say anything about repair until there aren't any more crashes.

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » RADOS

Custom queries

Bug #39115

ceph pg repair doesn't fix itself if osd is bluestore

Updated by David Zafman about 5 years ago

Updated by Casey Bodley about 5 years ago

Updated by Neha Ojha about 5 years ago

Updated by Iain Buclaw about 5 years ago

Updated by Iain Buclaw about 5 years ago

Updated by Iain Buclaw about 5 years ago

Updated by David Zafman almost 5 years ago

Updated by David Zafman almost 5 years ago

Updated by Iain Buclaw almost 5 years ago

Updated by David Zafman over 4 years ago

Updated by David Zafman over 4 years ago