Project

General

Profile

Actions

Bug #39115

closed

ceph pg repair doesn't fix itself if osd is bluestore

Added by Iain Buclaw about 5 years ago. Updated over 4 years ago.

Status:
Duplicate
Priority:
Normal
Assignee:
David Zafman
Category:
-
Target version:
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
rados
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Running ceph pg repair on an inconsistent PG with missing data, I usually notice that the OSD is marked as down/up before the cluster returns to healthy.

e.g:

2019-04-04 11:32:47.226989 osd.3 [ERR] 51.10d shard 3 51:b0edece3::17308400658964302403:169:head : missing
2019-04-04 11:32:58.018797 osd.3 [ERR] 51.10d shard 9 51:b0fd8af8::2389926346209815023:169:head : missing
2019-04-04 11:32:58.018799 osd.3 [ERR] 51.10d shard 9 51:b0fd8af8::2389926346209815023:21:head : missing
2019-04-04 11:32:58.018813 osd.3 [ERR] 51.10d shard 9 51:b0fd8af8::2389926346209815023:22:head : missing
2019-04-04 11:32:58.018814 osd.3 [ERR] 51.10d shard 9 51:b0fd8af8::2389926346209815023:23:head : missing
2019-04-04 11:33:03.522849 mon.120 [INF] osd.3 failed (root=default,host=121) (connection refused reported by osd.6)
2019-04-04 11:33:04.122713 mon.120 [WRN] Health check failed: 1 osds down (OSD_DOWN)
2019-04-04 11:33:06.449766 mon.120 [WRN] Health check failed: Reduced data availability: 21 pgs peering (PG_AVAILABILITY)
2019-04-04 11:33:06.449797 mon.120 [INF] Health check cleared: OSD_SCRUB_ERRORS (was: 17 scrub errors)
2019-04-04 11:33:06.449812 mon.120 [INF] Health check cleared: PG_DAMAGED (was: Possible data damage: 1 pg inconsistent)
2019-04-04 11:33:08.785071 mon.120 [WRN] Health check failed: Degraded data redundancy: 1903317/24115186 objects degraded (7.893%), 121 pgs degraded (PG_DEGRADED)
2019-04-04 11:33:11.944066 mon.120 [INF] Health check cleared: PG_AVAILABILITY (was: Reduced data availability: 21 pgs peering)
2019-04-04 11:33:15.813380 mon.120 [WRN] Health check update: Degraded data redundancy: 2432784/24115186 objects degraded (10.088%), 155 pgs degraded (PG_DEGRADED)
2019-04-04 11:33:31.774861 mon.120 [INF] Health check cleared: OSD_DOWN (was: 1 osds down)
2019-04-04 11:33:31.996274 mon.120 [INF] osd.3 172.28.19.6:6804/3630259 boot
2019-04-04 11:33:32.959333 mon.120 [WRN] Health check update: Degraded data redundancy: 2177355/24115186 objects degraded (9.029%), 140 pgs degraded (PG_DEGRADED)
2019-04-04 11:33:38.913328 mon.120 [INF] Health check cleared: PG_DEGRADED (was: Degraded data redundancy: 1849337/24115186 objects degraded (7.669%), 116 pgs degraded)
2019-04-04 11:33:38.913364 mon.120 [INF] Cluster is now healthy

If the OSD is bluestore, this does not happen though, the same objects are just reported again and again as missing.

2019-04-04 14:45:33.743300 osd.9 [ERR] 51.1e5 shard 9 51:a785a6aa::3966486014759199568:169:head : missing
2019-04-04 14:45:33.743301 osd.9 [ERR] 51.1e5 shard 9 51:a785a6aa::3966486014759199568:21:head : missing
2019-04-04 14:45:33.743302 osd.9 [ERR] 51.1e5 shard 9 51:a785a6aa::3966486014759199568:22:head : missing
2019-04-04 14:45:33.743303 osd.9 [ERR] 51.1e5 shard 9 51:a785a6aa::3966486014759199568:23:head : missing
2019-04-04 14:45:33.743304 osd.9 [ERR] 51.1e5 shard 9 51:a785a6aa::3966486014759199568:24:head : missing
2019-04-04 14:45:33.743304 osd.9 [ERR] 51.1e5 shard 9 51:a785a6aa::3966486014759199568:25:head : missing
2019-04-04 14:45:33.743305 osd.9 [ERR] 51.1e5 shard 9 51:a785a6aa::3966486014759199568:26:head : missing
2019-04-04 14:45:33.743306 osd.9 [ERR] 51.1e5 shard 9 51:a785a6aa::3966486014759199568:27:head : missing
2019-04-04 14:45:33.743307 osd.9 [ERR] 51.1e5 shard 9 51:a785a6aa::3966486014759199568:28:head : missing
2019-04-04 14:45:33.743308 osd.9 [ERR] 51.1e5 shard 9 51:a785a6aa::3966486014759199568:33:head : missing
2019-04-04 14:45:33.743309 osd.9 [ERR] 51.1e5 shard 9 51:a785a6aa::3966486014759199568:34:head : missing
2019-04-04 14:45:33.743310 osd.9 [ERR] 51.1e5 shard 9 51:a785a6aa::3966486014759199568:36:head : missing
2019-04-04 14:45:39.026046 osd.9 [ERR] 51.1e5 shard 5 51:a785a6aa::3966486014759199568:169:head : missing
2019-04-04 14:45:39.026048 osd.9 [ERR] 51.1e5 shard 5 51:a785a6aa::3966486014759199568:21:head : missing
2019-04-04 14:45:39.026049 osd.9 [ERR] 51.1e5 shard 5 51:a785a6aa::3966486014759199568:22:head : missing
2019-04-04 14:45:39.026050 osd.9 [ERR] 51.1e5 shard 5 51:a785a6aa::3966486014759199568:23:head : missing
2019-04-04 14:45:39.026051 osd.9 [ERR] 51.1e5 shard 5 51:a785a6aa::3966486014759199568:24:head : missing
2019-04-04 14:45:39.026051 osd.9 [ERR] 51.1e5 shard 5 51:a785a6aa::3966486014759199568:25:head : missing
2019-04-04 14:45:39.026052 osd.9 [ERR] 51.1e5 shard 5 51:a785a6aa::3966486014759199568:26:head : missing
2019-04-04 14:45:39.026053 osd.9 [ERR] 51.1e5 shard 5 51:a785a6aa::3966486014759199568:27:head : missing
2019-04-04 14:45:39.026054 osd.9 [ERR] 51.1e5 shard 5 51:a785a6aa::3966486014759199568:28:head : missing
2019-04-04 14:45:39.026055 osd.9 [ERR] 51.1e5 shard 5 51:a785a6aa::3966486014759199568:33:head : missing
2019-04-04 14:45:39.026056 osd.9 [ERR] 51.1e5 shard 5 51:a785a6aa::3966486014759199568:34:head : missing
2019-04-04 14:45:39.026063 osd.9 [ERR] 51.1e5 shard 5 51:a785a6aa::3966486014759199568:36:head : missing
2019-04-04 14:46:05.500587 osd.9 [ERR] 51.1e5 shard 9 51:a7a6a3fe::18256662680708088428:169:head : missing
2019-04-04 14:46:10.795413 osd.9 [ERR] 51.1e5 shard 5 51:a7a6a3fe::18256662680708088428:169:head : missing
2019-04-04 14:47:46.115763 osd.9 [ERR] 51.1e5 repair 13 missing, 0 inconsistent objects
2019-04-04 14:47:46.115809 osd.9 [ERR] 51.1e5 repair 26 errors, 13 fixed
2019-04-04 14:47:51.417526 osd.9 [ERR] 51.1e5 shard 9 51:a785a6aa::3966486014759199568:169:head : missing
2019-04-04 14:47:51.417527 osd.9 [ERR] 51.1e5 shard 9 51:a785a6aa::3966486014759199568:21:head : missing
2019-04-04 14:47:51.417528 osd.9 [ERR] 51.1e5 shard 9 51:a785a6aa::3966486014759199568:22:head : missing
2019-04-04 14:47:51.417529 osd.9 [ERR] 51.1e5 shard 9 51:a785a6aa::3966486014759199568:23:head : missing
2019-04-04 14:47:51.417530 osd.9 [ERR] 51.1e5 shard 9 51:a785a6aa::3966486014759199568:24:head : missing
2019-04-04 14:47:51.417538 osd.9 [ERR] 51.1e5 shard 9 51:a785a6aa::3966486014759199568:25:head : missing
2019-04-04 14:47:51.417539 osd.9 [ERR] 51.1e5 shard 9 51:a785a6aa::3966486014759199568:26:head : missing
2019-04-04 14:47:51.417540 osd.9 [ERR] 51.1e5 shard 9 51:a785a6aa::3966486014759199568:27:head : missing
2019-04-04 14:47:51.417543 osd.9 [ERR] 51.1e5 shard 9 51:a785a6aa::3966486014759199568:28:head : missing
2019-04-04 14:47:51.417544 osd.9 [ERR] 51.1e5 shard 9 51:a785a6aa::3966486014759199568:33:head : missing
2019-04-04 14:47:51.417545 osd.9 [ERR] 51.1e5 shard 9 51:a785a6aa::3966486014759199568:34:head : missing
2019-04-04 14:47:51.417545 osd.9 [ERR] 51.1e5 shard 9 51:a785a6aa::3966486014759199568:36:head : missing
2019-04-04 14:47:56.711487 osd.9 [ERR] 51.1e5 shard 5 51:a785a6aa::3966486014759199568:169:head : missing
2019-04-04 14:47:56.711488 osd.9 [ERR] 51.1e5 shard 5 51:a785a6aa::3966486014759199568:21:head : missing
2019-04-04 14:47:56.711489 osd.9 [ERR] 51.1e5 shard 5 51:a785a6aa::3966486014759199568:22:head : missing
2019-04-04 14:47:56.711490 osd.9 [ERR] 51.1e5 shard 5 51:a785a6aa::3966486014759199568:23:head : missing
2019-04-04 14:47:56.711491 osd.9 [ERR] 51.1e5 shard 5 51:a785a6aa::3966486014759199568:24:head : missing
2019-04-04 14:47:56.711492 osd.9 [ERR] 51.1e5 shard 5 51:a785a6aa::3966486014759199568:25:head : missing
2019-04-04 14:47:56.711492 osd.9 [ERR] 51.1e5 shard 5 51:a785a6aa::3966486014759199568:26:head : missing
2019-04-04 14:47:56.711493 osd.9 [ERR] 51.1e5 shard 5 51:a785a6aa::3966486014759199568:27:head : missing
2019-04-04 14:47:56.711494 osd.9 [ERR] 51.1e5 shard 5 51:a785a6aa::3966486014759199568:28:head : missing
2019-04-04 14:47:56.711495 osd.9 [ERR] 51.1e5 shard 5 51:a785a6aa::3966486014759199568:33:head : missing
2019-04-04 14:47:56.711496 osd.9 [ERR] 51.1e5 shard 5 51:a785a6aa::3966486014759199568:34:head : missing
2019-04-04 14:47:56.711496 osd.9 [ERR] 51.1e5 shard 5 51:a785a6aa::3966486014759199568:36:head : missing
2019-04-04 14:48:23.182563 osd.9 [ERR] 51.1e5 shard 9 51:a7a6a3fe::18256662680708088428:169:head : missing
2019-04-04 14:48:28.472711 osd.9 [ERR] 51.1e5 shard 5 51:a7a6a3fe::18256662680708088428:169:head : missing
2019-04-04 14:50:03.649604 osd.9 [ERR] 51.1e5 deep-scrub 13 missing, 0 inconsistent objects
2019-04-04 14:50:03.649608 osd.9 [ERR] 51.1e5 deep-scrub 26 errors

As it so happens, the datastore where it's reported missing (osd.5) is a filestore, and I verified by inspecting the disk that all objects exist.

If I restart the osd that is using bluestore (osd.9), then the cluster returns back to healthy.

This is reproducible 100% of the time.


Related issues 1 (1 open0 closed)

Is duplicate of RADOS - Bug #39116: Draining filestore osd, removing, and adding new bluestore osd causes OSDs to crashNew04/04/2019

Actions
Actions #1

Updated by David Zafman about 5 years ago

It would be helpful to see a ceph pg deep-scrub (wait for it to finish) followed by the output of rados list-inconsistent-obj.

Actions #2

Updated by Casey Bodley about 5 years ago

  • Project changed from rgw to Ceph
Actions #3

Updated by Neha Ojha about 5 years ago

  • Project changed from Ceph to RADOS
Actions #4

Updated by Iain Buclaw about 5 years ago

David Zafman wrote:

It would be helpful to see a ceph pg deep-scrub (wait for it to finish) followed by the output of rados list-inconsistent-obj.

Actually, this is more related to #39116 than what I first thought.

The following behaviour:

2019-04-04 11:33:03.522849 mon.120 [INF] osd.3 failed (root=default,host=121) (connection refused reported by osd.6)

Happens as a result of the assert in #39116, it just so happens that the cluster returns to healthy after the filestore OSD boots back up.

So perhaps the bluestore OSD does not assert / crash, but the cluster does not return to healthy either.

Actions #5

Updated by Iain Buclaw about 5 years ago

# ceph pg deep-scrub 51.177
instructing pg 51.177 on osd.9 to deep-scrub

ceph -w logs:

2019-04-05 07:12:01.200548 osd.9 [ERR] 51.177 shard 0 51:eedfc6f8::4653376157940334334:47:head : missing
2019-04-05 07:12:06.580296 osd.9 [ERR] 51.177 shard 9 51:eedfc6f8::4653376157940334334:47:head : missing
2019-04-05 07:12:38.708078 osd.9 [ERR] 51.177 deep-scrub : stat mismatch, got 19714/19713 objects, 0/0 clones, 19714/19713 dirty, 0/0 omap, 0/0 pinned, 0/0 hit_set_archive, 0/0 whiteouts, 246025587/246016008 bytes, 0/0 manifest objects, 0/0 hit_set_archive bytes.
2019-04-05 07:12:38.708093 osd.9 [ERR] 51.177 deep-scrub 1 missing, 0 inconsistent objects
2019-04-05 07:12:38.708096 osd.9 [ERR] 51.177 deep-scrub 3 errors

On osd.0, confirmed file is not missing:

# find 51.177_head/ -name "*47_4653376157940334334*" 
51.177_head/DIR_7/DIR_7/DIR_B/DIR_F/47_4653376157940334334_head_1F63FB77__33

rados list-inconsistent-obj

{
    "epoch": 53141,
    "inconsistents": [
        {
            "object": {
                "name": "47",
                "nspace": "",
                "locator": "4653376157940334334",
                "snap": "head",
                "version": 4526675
            },
            "errors": [],
            "union_shard_errors": [
                "missing" 
            ],
            "selected_object_info": {
                "oid": {
                    "oid": "47",
                    "key": "4653376157940334334",
                    "snapid": -2,
                    "hash": 526646135,
                    "max": 0,
                    "pool": 51,
                    "namespace": "" 
                },
                "version": "52284'454181",
                "prior_version": "52249'377382",
                "last_reqid": "client.36576007.0:15677702",
                "user_version": 4526675,
                "size": 9579,
                "mtime": "2019-02-07 10:58:34.211435",
                "local_mtime": "2019-02-07 10:58:34.218297",
                "lost": 0,
                "flags": [
                    "dirty",
                    "data_digest",
                    "omap_digest" 
                ],
                "truncate_seq": 0,
                "truncate_size": 0,
                "data_digest": "0x39906511",
                "omap_digest": "0xffffffff",
                "expected_object_size": 0,
                "expected_write_size": 0,
                "alloc_hint_flags": 0,
                "manifest": {
                    "type": 0
                },
                "watchers": {}
            },
            "shards": [
                {
                    "osd": 0,
                    "primary": false,
                    "errors": [],
                    "size": 9579,
                    "omap_digest": "0xffffffff",
                    "data_digest": "0x39906511" 
                },
                {
                    "osd": 9,
                    "primary": true,
                    "errors": [
                        "missing" 
                    ]
                }
            ]
        }
    ]
}

Repairing:

# ceph pg repair 51.177
instructing pg 51.177 on osd.9 to repair

ceph -w logs:

2019-04-05 07:31:34.248658 osd.9 [ERR] 51.177 shard 9 51:eee4d2c1::171336252089860175:169:head : missing
2019-04-05 07:31:39.532834 osd.9 [ERR] 51.177 shard 0 51:eee4d2c1::171336252089860175:169:head : missing
2019-04-05 07:31:39.532835 osd.9 [ERR] 51.177 shard 0 51:eeea33e8::383290891110620472:41:head : missing
2019-04-05 07:31:44.802083 osd.9 [ERR] 51.177 shard 9 51:eeea33e8::383290891110620472:41:head : missing
2019-04-05 07:32:06.433237 mon.ap-120 [INF] osd.9 failed (root=default,host=ap-124) (connection refused reported by osd.8)
2019-04-05 07:32:06.485653 mon.ap-120 [WRN] Health check failed: 1 osds down (OSD_DOWN)
2019-04-05 07:32:09.979661 mon.ap-120 [WRN] Health check failed: Reduced data availability: 3 pgs inactive, 41 pgs peering (PG_AVAILABILITY)
2019-04-05 07:32:09.979712 mon.ap-120 [WRN] Health check failed: Degraded data redundancy: 312778/23936452 objects degraded (1.307%), 18 pgs degraded (PG_DEGRADED)
2019-04-05 07:32:11.240626 mon.ap-120 [INF] Health check cleared: OSD_SCRUB_ERRORS (was: 3 scrub errors)
2019-04-05 07:32:11.240652 mon.ap-120 [INF] Health check cleared: PG_DAMAGED (was: Possible data damage: 1 pg inconsistent)
2019-04-05 07:32:13.483748 mon.ap-120 [INF] Health check cleared: PG_AVAILABILITY (was: Reduced data availability: 3 pgs inactive, 41 pgs peering)
2019-04-05 07:32:19.295771 mon.ap-120 [WRN] Health check update: Degraded data redundancy: 2429737/23936452 objects degraded (10.151%), 159 pgs degraded (PG_DEGRADED)
2019-04-05 07:32:25.339488 mon.ap-120 [WRN] Health check update: Degraded data redundancy: 2429737/23936488 objects degraded (10.151%), 159 pgs degraded (PG_DEGRADED)
2019-04-05 07:32:35.207942 mon.ap-120 [INF] Health check cleared: OSD_DOWN (was: 1 osds down)
2019-04-05 07:32:35.580655 mon.ap-120 [INF] osd.9 172.28.19.9:6800/379046 boot
2019-04-05 07:32:37.834878 mon.ap-120 [WRN] Health check update: Degraded data redundancy: 2228323/23936488 objects degraded (9.309%), 147 pgs degraded (PG_DEGRADED)
2019-04-05 07:32:41.458732 mon.ap-120 [INF] Health check cleared: PG_DEGRADED (was: Degraded data redundancy: 1335902/23936488 objects degraded (5.581%), 85 pgs degraded)
2019-04-05 07:32:41.458757 mon.ap-120 [INF] Cluster is now healthy

Running deep scrub again:

2019-04-05 07:43:42.636565 osd.9 [ERR] 51.177 shard 9 51:eee4d2c1::171336252089860175:169:head : missing
2019-04-05 07:43:47.926850 osd.9 [ERR] 51.177 shard 0 51:eee4d2c1::171336252089860175:169:head : missing
2019-04-05 07:43:47.926851 osd.9 [ERR] 51.177 shard 0 51:eeea33e8::383290891110620472:41:head : missing
2019-04-05 07:43:53.195249 osd.9 [ERR] 51.177 shard 9 51:eeea33e8::383290891110620472:41:head : missing
2019-04-05 07:44:14.234075 osd.9 [ERR] 51.177 deep-scrub : stat mismatch, got 19733/19731 objects, 0/0 clones, 19733/19731 dirty, 0/0 omap, 0/0 pinned, 0/0 hit_set_archive, 0/0 whiteouts, 247108485/247093139 bytes, 0/0 manifest objects, 0/0 hit_set_archive bytes.
2019-04-05 07:44:14.234086 osd.9 [ERR] 51.177 deep-scrub 2 missing, 0 inconsistent objects
2019-04-05 07:44:14.234092 osd.9 [ERR] 51.177 deep-scrub 5 errors
2019-04-05 07:44:16.666888 mon.ap-120 [ERR] Health check update: 24 scrub errors (OSD_SCRUB_ERRORS)
2019-04-05 07:44:16.666934 mon.ap-120 [ERR] Health check update: Possible data damage: 2 pgs inconsistent (PG_DAMAGED)

rados list-inconsistent-obj again:

{
    "epoch": 53148,
    "inconsistents": [
        {
            "object": {
                "name": "169",
                "nspace": "",
                "locator": "171336252089860175",
                "snap": "head",
                "version": 4440812
            },
            "errors": [],
            "union_shard_errors": [
                "missing" 
            ],
            "selected_object_info": {
                "oid": {
                    "oid": "169",
                    "key": "171336252089860175",
                    "snapid": -2,
                    "hash": 2202740599,
                    "max": 0,
                    "pool": 51,
                    "namespace": "" 
                },
                "version": "50939'367208",
                "prior_version": "50939'336508",
                "last_reqid": "client.36575077.0:14561030",
                "user_version": 4440812,
                "size": 2005,
                "mtime": "2019-01-20 20:54:12.394827",
                "local_mtime": "2019-01-20 20:54:12.392736",
                "lost": 0,
                "flags": [
                    "dirty",
                    "data_digest",
                    "omap_digest" 
                ],
                "truncate_seq": 0,
                "truncate_size": 0,
                "data_digest": "0x54b33d8e",
                "omap_digest": "0xffffffff",
                "expected_object_size": 0,
                "expected_write_size": 0,
                "alloc_hint_flags": 0,
                "manifest": {
                    "type": 0
                },
                "watchers": {}
            },
            "shards": [
                {
                    "osd": 0,
                    "primary": false,
                    "errors": [
                        "missing" 
                    ]
                },
                {
                    "osd": 9,
                    "primary": true,
                    "errors": [],
                    "size": 2005,
                    "omap_digest": "0xffffffff",
                    "data_digest": "0x54b33d8e" 
                }
            ]
        },
        {
            "object": {
                "name": "41",
                "nspace": "",
                "locator": "383290891110620472",
                "snap": "head",
                "version": 4373192
            },
            "errors": [],
            "union_shard_errors": [
                "missing" 
            ],
            "selected_object_info": {
                "oid": {
                    "oid": "41",
                    "key": "383290891110620472",
                    "snapid": -2,
                    "hash": 399267703,
                    "max": 0,
                    "pool": 51,
                    "namespace": "" 
                },
                "version": "53145'527997",
                "prior_version": "50939'286779",
                "last_reqid": "osd.9.0:707605",
                "user_version": 4373192,
                "size": 13341,
                "mtime": "2018-12-12 06:37:42.094992",
                "local_mtime": "2018-12-12 06:37:42.096127",
                "lost": 0,
                "flags": [
                    "dirty",
                    "data_digest",
                    "omap_digest" 
                ],
                "truncate_seq": 0,
                "truncate_size": 0,
                "data_digest": "0xc89910ed",
                "omap_digest": "0xffffffff",
                "expected_object_size": 0,
                "expected_write_size": 0,
                "alloc_hint_flags": 0,
                "manifest": {
                    "type": 0
                },
                "watchers": {}
            },
            "shards": [
                {
                    "osd": 0,
                    "primary": false,
                    "errors": [],
                    "size": 13341,
                    "omap_digest": "0xffffffff",
                    "data_digest": "0xc89910ed" 
                },
                {
                    "osd": 9,
                    "primary": true,
                    "errors": [
                        "missing" 
                    ]
                }
            ]
        }
    ]
}

ceph pg repair again:

2019-04-05 07:49:09.475060 osd.9 [ERR] 51.177 shard 9 51:eee4d2c1::171336252089860175:169:head : missing
2019-04-05 07:49:14.755573 osd.9 [ERR] 51.177 shard 0 51:eee4d2c1::171336252089860175:169:head : missing
2019-04-05 07:49:14.755575 osd.9 [ERR] 51.177 shard 0 51:eeea33e8::383290891110620472:41:head : missing
2019-04-05 07:49:20.021187 osd.9 [ERR] 51.177 shard 9 51:eeea33e8::383290891110620472:41:head : missing
2019-04-05 07:49:41.227801 mon.ap-120 [INF] osd.9 failed (root=default,host=ap-124) (connection refused reported by osd.7)
2019-04-05 07:49:41.787921 mon.ap-120 [WRN] Health check failed: 1 osds down (OSD_DOWN)
2019-04-05 07:49:44.132558 mon.ap-120 [WRN] Health check failed: Reduced data availability: 3 pgs inactive, 69 pgs peering (PG_AVAILABILITY)
2019-04-05 07:49:46.327214 mon.ap-120 [WRN] Health check failed: Degraded data redundancy: 1052659/23936848 objects degraded (4.398%), 74 pgs degraded (PG_DEGRADED)
2019-04-05 07:49:46.327246 mon.ap-120 [INF] Health check cleared: OSD_SCRUB_ERRORS (was: 24 scrub errors)
2019-04-05 07:49:46.327259 mon.ap-120 [INF] Health check cleared: PG_DAMAGED (was: Possible data damage: 2 pgs inconsistent)
2019-04-05 07:49:49.745143 mon.ap-120 [WRN] Health check update: Reduced data availability: 1 pg inactive, 39 pgs peering (PG_AVAILABILITY)
2019-04-05 07:49:49.858132 mon.ap-120 [INF] Health check cleared: PG_AVAILABILITY (was: Reduced data availability: 1 pg inactive, 39 pgs peering)
2019-04-05 07:49:54.745487 mon.ap-120 [WRN] Health check update: Degraded data redundancy: 2429737/23936848 objects degraded (10.151%), 159 pgs degraded (PG_DEGRADED)
2019-04-05 07:50:11.077110 mon.ap-120 [INF] Health check cleared: OSD_DOWN (was: 1 osds down)
2019-04-05 07:50:11.339766 mon.ap-120 [INF] osd.9 172.28.19.9:6800/386390 boot
2019-04-05 07:50:14.761104 mon.ap-120 [WRN] Health check update: Degraded data redundancy: 1757404/23936848 objects degraded (7.342%), 112 pgs degraded (PG_DEGRADED)
2019-04-05 07:50:16.859843 mon.ap-120 [INF] Health check cleared: PG_DEGRADED (was: Degraded data redundancy: 1757404/23936848 objects degraded (7.342%), 112 pgs degraded)
2019-04-05 07:50:16.859884 mon.ap-120 [INF] Cluster is now healthy

Restarted both osd.0 and osd.9, running deep scrub for third time:

2019-04-05 08:00:44.170028 osd.9 [ERR] 51.177 shard 9 51:eee4d2c1::171336252089860175:169:head : missing
2019-04-05 08:00:49.461396 osd.9 [ERR] 51.177 shard 0 51:eee4d2c1::171336252089860175:169:head : missing
2019-04-05 08:00:49.461398 osd.9 [ERR] 51.177 shard 0 51:eeea33e8::383290891110620472:41:head : missing
2019-04-05 08:00:54.735733 osd.9 [ERR] 51.177 shard 9 51:eeea33e8::383290891110620472:41:head : missing
2019-04-05 08:01:20.882362 mon.ap-120 [ERR] Health check update: 24 scrub errors (OSD_SCRUB_ERRORS)
2019-04-05 08:01:20.882390 mon.ap-120 [ERR] Health check update: Possible data damage: 2 pgs inconsistent (PG_DAMAGED)
2019-04-05 08:01:15.761672 osd.9 [ERR] 51.177 deep-scrub : stat mismatch, got 19733/19731 objects, 0/0 clones, 19733/19731 dirty, 0/0 omap, 0/0 pinned, 0/0 hit_set_archive, 0/0 whiteouts, 247108485/247093139 bytes, 0/0 manifest objects, 0/0 hit_set_archive bytes.
2019-04-05 08:01:15.761683 osd.9 [ERR] 51.177 deep-scrub 2 missing, 0 inconsistent objects
2019-04-05 08:01:15.761685 osd.9 [ERR] 51.177 deep-scrub 5 errors

So my initial assumption was wrong, pg repair isn't working at all.

Actions #6

Updated by Iain Buclaw about 5 years ago

Another PG, where the missing is reported on osd.0/filestore (not osd.9/bluestore in the previous).

# ceph pg deep-scrub 51.1fa
instructing pg 51.1fa on osd.0 to deep-scrub

Logs:

2019-04-05 08:11:42.606348 osd.0 [ERR] 51.1fa shard 9 51:5fd90f47::193061343227287063:169:head : missing
2019-04-05 08:11:47.944111 osd.0 [ERR] 51.1fa shard 0 51:5fd90f47::193061343227287063:169:head : missing
2019-04-05 08:11:58.658938 osd.0 [ERR] 51.1fa shard 9 51:5fe9e70d::238938732312934742:169:head : missing
2019-04-05 08:11:58.658940 osd.0 [ERR] 51.1fa shard 9 51:5fe9e70d::238938732312934742:21:head : missing
2019-04-05 08:11:58.658940 osd.0 [ERR] 51.1fa shard 9 51:5fe9e70d::238938732312934742:22:head : missing
2019-04-05 08:11:58.658941 osd.0 [ERR] 51.1fa shard 9 51:5fe9e70d::238938732312934742:23:head : missing
2019-04-05 08:12:04.019376 osd.0 [ERR] 51.1fa shard 0 51:5fe9e70d::238938732312934742:169:head : missing
2019-04-05 08:12:04.019379 osd.0 [ERR] 51.1fa shard 0 51:5fe9e70d::238938732312934742:21:head : missing
2019-04-05 08:12:04.019380 osd.0 [ERR] 51.1fa shard 0 51:5fe9e70d::238938732312934742:22:head : missing
2019-04-05 08:12:04.019381 osd.0 [ERR] 51.1fa shard 0 51:5fe9e70d::238938732312934742:23:head : missing
2019-04-05 08:12:25.412986 osd.0 [ERR] 51.1fa deep-scrub : stat mismatch, got 19524/19519 objects, 0/0 clones, 19524/19519 dirty, 0/0 omap, 0/0 pinned, 0/0 hit_set_archive, 0/0 whiteouts, 227386438/227371540 bytes, 0/0 manifest objects, 0/0 hit_set_archive bytes.
2019-04-05 08:12:25.412999 osd.0 [ERR] 51.1fa deep-scrub 5 missing, 0 inconsistent objects
2019-04-05 08:12:25.413002 osd.0 [ERR] 51.1fa deep-scrub 11 errors

rados list-inconsistent-obj

{
    "epoch": 53170,
    "inconsistents": [
        {
            "object": {
                "name": "169",
                "nspace": "",
                "locator": "193061343227287063",
                "snap": "head",
                "version": 4309859
            },
            "errors": [],
            "union_shard_errors": [
                "missing" 
            ],
            "selected_object_info": {
                "oid": {
                    "oid": "169",
                    "key": "193061343227287063",
                    "snapid": -2,
                    "hash": 3807419386,
                    "max": 0,
                    "pool": 51,
                    "namespace": "" 
                },
                "version": "50939'299113",
                "prior_version": "0'0",
                "last_reqid": "client.36576007.0:7932947",
                "user_version": 4309859,
                "size": 2160,
                "mtime": "2018-12-19 10:15:30.411967",
                "local_mtime": "2018-12-19 10:15:30.412661",
                "lost": 0,
                "flags": [
                    "dirty",
                    "data_digest" 
                ],
                "truncate_seq": 0,
                "truncate_size": 0,
                "data_digest": "0xd7ca497f",
                "omap_digest": "0xffffffff",
                "expected_object_size": 0,
                "expected_write_size": 0,
                "alloc_hint_flags": 0,
                "manifest": {
                    "type": 0
                },
                "watchers": {}
            },
            "shards": [
                {
                    "osd": 0,
                    "primary": true,
                    "errors": [
                        "missing" 
                    ]
                },
                {
                    "osd": 9,
                    "primary": false,
                    "errors": [],
                    "size": 2160,
                    "omap_digest": "0xffffffff",
                    "data_digest": "0xd7ca497f" 
                }
            ]
        },
        {
            "object": {
                "name": "169",
                "nspace": "",
                "locator": "238938732312934742",
                "snap": "head",
                "version": 4285966
            },
            "errors": [],
            "union_shard_errors": [
                "missing" 
            ],
            "selected_object_info": {
                "oid": {
                    "oid": "169",
                    "key": "238938732312934742",
                    "snapid": -2,
                    "hash": 2967967738,
                    "max": 0,
                    "pool": 51,
                    "namespace": "" 
                },
                "version": "50939'271117",
                "prior_version": "50905'206245",
                "last_reqid": "client.36574111.0:5414590",
                "user_version": 4285966,
                "size": 4794,
                "mtime": "2018-12-06 01:14:21.215731",
                "local_mtime": "2018-12-06 01:14:21.216145",
                "lost": 0,
                "flags": [
                    "dirty",
                    "data_digest" 
                ],
                "truncate_seq": 0,
                "truncate_size": 0,
                "data_digest": "0x5edc7438",
                "omap_digest": "0xffffffff",
                "expected_object_size": 0,
                "expected_write_size": 0,
                "alloc_hint_flags": 0,
                "manifest": {
                    "type": 0
                },
                "watchers": {}
            },
            "shards": [
                {
                    "osd": 0,
                    "primary": true,
                    "errors": [
                        "missing" 
                    ]
                },
                {
                    "osd": 9,
                    "primary": false,
                    "errors": [],
                    "size": 4794,
                    "omap_digest": "0xffffffff",
                    "data_digest": "0x5edc7438" 
                }
            ]
        },
        {
            "object": {
                "name": "21",
                "nspace": "",
                "locator": "238938732312934742",
                "snap": "head",
                "version": 4285956
            },
            "errors": [],
            "union_shard_errors": [
                "missing" 
            ],
            "selected_object_info": {
                "oid": {
                    "oid": "21",
                    "key": "238938732312934742",
                    "snapid": -2,
                    "hash": 2967967738,
                    "max": 0,
                    "pool": 51,
                    "namespace": "" 
                },
                "version": "50939'271107",
                "prior_version": "50905'206235",
                "last_reqid": "client.36574111.0:5414580",
                "user_version": 4285956,
                "size": 1014,
                "mtime": "2018-12-06 01:14:21.144941",
                "local_mtime": "2018-12-06 01:14:21.145108",
                "lost": 0,
                "flags": [
                    "dirty",
                    "data_digest" 
                ],
                "truncate_seq": 0,
                "truncate_size": 0,
                "data_digest": "0x224edafa",
                "omap_digest": "0xffffffff",
                "expected_object_size": 0,
                "expected_write_size": 0,
                "alloc_hint_flags": 0,
                "manifest": {
                    "type": 0
                },
                "watchers": {}
            },
            "shards": [
                {
                    "osd": 0,
                    "primary": true,
                    "errors": [
                        "missing" 
                    ]
                },
                {
                    "osd": 9,
                    "primary": false,
                    "errors": [],
                    "size": 1014,
                    "omap_digest": "0xffffffff",
                    "data_digest": "0x224edafa" 
                }
            ]
        },
        {
            "object": {
                "name": "22",
                "nspace": "",
                "locator": "238938732312934742",
                "snap": "head",
                "version": 4285957
            },
            "errors": [],
            "union_shard_errors": [
                "missing" 
            ],
            "selected_object_info": {
                "oid": {
                    "oid": "22",
                    "key": "238938732312934742",
                    "snapid": -2,
                    "hash": 2967967738,
                    "max": 0,
                    "pool": 51,
                    "namespace": "" 
                },
                "version": "50939'271108",
                "prior_version": "50905'206236",
                "last_reqid": "client.36574111.0:5414581",
                "user_version": 4285957,
                "size": 2136,
                "mtime": "2018-12-06 01:14:21.151310",
                "local_mtime": "2018-12-06 01:14:21.151584",
                "lost": 0,
                "flags": [
                    "dirty",
                    "data_digest" 
                ],
                "truncate_seq": 0,
                "truncate_size": 0,
                "data_digest": "0x5f174e41",
                "omap_digest": "0xffffffff",
                "expected_object_size": 0,
                "expected_write_size": 0,
                "alloc_hint_flags": 0,
                "manifest": {
                    "type": 0
                },
                "watchers": {}
            },
            "shards": [
                {
                    "osd": 0,
                    "primary": true,
                    "errors": [
                        "missing" 
                    ]
                },
                {
                    "osd": 9,
                    "primary": false,
                    "errors": [],
                    "size": 2136,
                    "omap_digest": "0xffffffff",
                    "data_digest": "0x5f174e41" 
                }
            ]
        },
        {
            "object": {
                "name": "23",
                "nspace": "",
                "locator": "238938732312934742",
                "snap": "head",
                "version": 4285958
            },
            "errors": [],
            "union_shard_errors": [
                "missing" 
            ],
            "selected_object_info": {
                "oid": {
                    "oid": "23",
                    "key": "238938732312934742",
                    "snapid": -2,
                    "hash": 2967967738,
                    "max": 0,
                    "pool": 51,
                    "namespace": "" 
                },
                "version": "50939'271109",
                "prior_version": "50905'206237",
                "last_reqid": "client.36574111.0:5414582",
                "user_version": 4285958,
                "size": 4794,
                "mtime": "2018-12-06 01:14:21.157871",
                "local_mtime": "2018-12-06 01:14:21.158320",
                "lost": 0,
                "flags": [
                    "dirty",
                    "data_digest" 
                ],
                "truncate_seq": 0,
                "truncate_size": 0,
                "data_digest": "0x5edc7438",
                "omap_digest": "0xffffffff",
                "expected_object_size": 0,
                "expected_write_size": 0,
                "alloc_hint_flags": 0,
                "manifest": {
                    "type": 0
                },
                "watchers": {}
            },
            "shards": [
                {
                    "osd": 0,
                    "primary": true,
                    "errors": [
                        "missing" 
                    ]
                },
                {
                    "osd": 9,
                    "primary": false,
                    "errors": [],
                    "size": 4794,
                    "omap_digest": "0xffffffff",
                    "data_digest": "0x5edc7438" 
                }
            ]
        }
    ]
}

Checked one of the reported missing files on osd.0, and the file is not missing on the physical disk.

# find 51.1fa_head -name "*169_193061343227287063*" 
51.1fa_head/DIR_A/DIR_F/DIR_B/DIR_9/DIR_0/169_193061343227287063_head_E2F09BFA__33

ceph pg repair 51.1fa logs

2019-04-05 08:16:56.694007 osd.0 [ERR] 51.1fa shard 9 51:5fd90f47::193061343227287063:169:head : missing
2019-04-05 08:17:01.955389 osd.0 [ERR] 51.1fa shard 0 51:5fd90f47::193061343227287063:169:head : missing
2019-04-05 08:17:12.493701 osd.0 [ERR] 51.1fa shard 9 51:5fe9e70d::238938732312934742:169:head : missing
2019-04-05 08:17:12.493703 osd.0 [ERR] 51.1fa shard 9 51:5fe9e70d::238938732312934742:21:head : missing
2019-04-05 08:17:12.493704 osd.0 [ERR] 51.1fa shard 9 51:5fe9e70d::238938732312934742:22:head : missing
2019-04-05 08:17:12.493704 osd.0 [ERR] 51.1fa shard 9 51:5fe9e70d::238938732312934742:23:head : missing
2019-04-05 08:17:17.773788 osd.0 [ERR] 51.1fa shard 0 51:5fe9e70d::238938732312934742:169:head : missing
2019-04-05 08:17:17.773790 osd.0 [ERR] 51.1fa shard 0 51:5fe9e70d::238938732312934742:21:head : missing
2019-04-05 08:17:17.773791 osd.0 [ERR] 51.1fa shard 0 51:5fe9e70d::238938732312934742:22:head : missing
2019-04-05 08:17:17.773792 osd.0 [ERR] 51.1fa shard 0 51:5fe9e70d::238938732312934742:23:head : missing
2019-04-05 08:17:38.850212 mon.ap-120 [INF] osd.0 failed (root=default,host=ap-120) (connection refused reported by osd.4)
2019-04-05 08:17:39.202716 mon.ap-120 [WRN] Health check failed: 1 osds down (OSD_DOWN)
2019-04-05 08:17:42.869226 mon.ap-120 [ERR] Health check update: 19 scrub errors (OSD_SCRUB_ERRORS)
2019-04-05 08:17:42.869254 mon.ap-120 [ERR] Health check update: Possible data damage: 1 pg inconsistent (PG_DAMAGED)
2019-04-05 08:17:42.869269 mon.ap-120 [WRN] Health check failed: Degraded data redundancy: 199697/23937388 objects degraded (0.834%), 15 pgs degraded (PG_DEGRADED)
2019-04-05 08:17:50.515536 mon.ap-120 [WRN] Health check update: Degraded data redundancy: 2394605/23937388 objects degraded (10.004%), 158 pgs degraded (PG_DEGRADED)
2019-04-05 08:18:04.646456 mon.ap-120 [INF] Health check cleared: OSD_DOWN (was: 1 osds down)
2019-04-05 08:18:04.817317 mon.ap-120 [INF] osd.0 172.28.19.5:6804/977646 boot
2019-04-05 08:18:09.017805 mon.ap-120 [WRN] Health check update: Degraded data redundancy: 1450925/23937388 objects degraded (6.061%), 84 pgs degraded (PG_DEGRADED)
2019-04-05 08:18:11.384578 mon.ap-120 [INF] Health check cleared: PG_DEGRADED (was: Degraded data redundancy: 1450925/23937388 objects degraded (6.061%), 84 pgs degraded)

osd.0 crashed with same assert stack trace.

Running deep-scrub again...

2019-04-05 08:21:15.039834 osd.0 [ERR] 51.1fa shard 9 51:5fd90f47::193061343227287063:169:head : missing
2019-04-05 08:21:20.318156 osd.0 [ERR] 51.1fa shard 0 51:5fd90f47::193061343227287063:169:head : missing
2019-04-05 08:21:30.853106 osd.0 [ERR] 51.1fa shard 9 51:5fe9e70d::238938732312934742:169:head : missing
2019-04-05 08:21:30.853108 osd.0 [ERR] 51.1fa shard 9 51:5fe9e70d::238938732312934742:21:head : missing
2019-04-05 08:21:30.853109 osd.0 [ERR] 51.1fa shard 9 51:5fe9e70d::238938732312934742:22:head : missing
2019-04-05 08:21:30.853111 osd.0 [ERR] 51.1fa shard 9 51:5fe9e70d::238938732312934742:23:head : missing
2019-04-05 08:21:36.134245 osd.0 [ERR] 51.1fa shard 0 51:5fe9e70d::238938732312934742:169:head : missing
2019-04-05 08:21:36.134246 osd.0 [ERR] 51.1fa shard 0 51:5fe9e70d::238938732312934742:21:head : missing
2019-04-05 08:21:36.134247 osd.0 [ERR] 51.1fa shard 0 51:5fe9e70d::238938732312934742:22:head : missing
2019-04-05 08:21:36.134248 osd.0 [ERR] 51.1fa shard 0 51:5fe9e70d::238938732312934742:23:head : missing
2019-04-05 08:22:00.949764 mon.ap-120 [ERR] Health check update: 30 scrub errors (OSD_SCRUB_ERRORS)
2019-04-05 08:22:00.949791 mon.ap-120 [ERR] Health check update: Possible data damage: 2 pgs inconsistent (PG_DAMAGED)
2019-04-05 08:21:57.076095 osd.0 [ERR] 51.1fa deep-scrub : stat mismatch, got 19524/19519 objects, 0/0 clones, 19524/19519 dirty, 0/0 omap, 0/0 pinned, 0/0 hit_set_archive, 0/0 whiteouts, 227386438/227371540 bytes, 0/0 manifest objects, 0/0 hit_set_archive bytes.
2019-04-05 08:21:57.076105 osd.0 [ERR] 51.1fa deep-scrub 5 missing, 0 inconsistent objects
2019-04-05 08:21:57.076107 osd.0 [ERR] 51.1fa deep-scrub 11 errors

And rados list-inconsistent-obj again.

{
    "epoch": 53177,
    "inconsistents": [
        {
            "object": {
                "name": "169",
                "nspace": "",
                "locator": "193061343227287063",
                "snap": "head",
                "version": 4309859
            },
            "errors": [],
            "union_shard_errors": [
                "missing" 
            ],
            "selected_object_info": {
                "oid": {
                    "oid": "169",
                    "key": "193061343227287063",
                    "snapid": -2,
                    "hash": 3807419386,
                    "max": 0,
                    "pool": 51,
                    "namespace": "" 
                },
                "version": "50939'299113",
                "prior_version": "0'0",
                "last_reqid": "client.36576007.0:7932947",
                "user_version": 4309859,
                "size": 2160,
                "mtime": "2018-12-19 10:15:30.411967",
                "local_mtime": "2018-12-19 10:15:30.412661",
                "lost": 0,
                "flags": [
                    "dirty",
                    "data_digest" 
                ],
                "truncate_seq": 0,
                "truncate_size": 0,
                "data_digest": "0xd7ca497f",
                "omap_digest": "0xffffffff",
                "expected_object_size": 0,
                "expected_write_size": 0,
                "alloc_hint_flags": 0,
                "manifest": {
                    "type": 0
                },
                "watchers": {}
            },
            "shards": [
                {
                    "osd": 0,
                    "primary": true,
                    "errors": [
                        "missing" 
                    ]
                },
                {
                    "osd": 9,
                    "primary": false,
                    "errors": [],
                    "size": 2160,
                    "omap_digest": "0xffffffff",
                    "data_digest": "0xd7ca497f" 
                }
            ]
        },
        {
            "object": {
                "name": "169",
                "nspace": "",
                "locator": "238938732312934742",
                "snap": "head",
                "version": 4285966
            },
            "errors": [],
            "union_shard_errors": [
                "missing" 
            ],
            "selected_object_info": {
                "oid": {
                    "oid": "169",
                    "key": "238938732312934742",
                    "snapid": -2,
                    "hash": 2967967738,
                    "max": 0,
                    "pool": 51,
                    "namespace": "" 
                },
                "version": "50939'271117",
                "prior_version": "50905'206245",
                "last_reqid": "client.36574111.0:5414590",
                "user_version": 4285966,
                "size": 4794,
                "mtime": "2018-12-06 01:14:21.215731",
                "local_mtime": "2018-12-06 01:14:21.216145",
                "lost": 0,
                "flags": [
                    "dirty",
                    "data_digest" 
                ],
                "truncate_seq": 0,
                "truncate_size": 0,
                "data_digest": "0x5edc7438",
                "omap_digest": "0xffffffff",
                "expected_object_size": 0,
                "expected_write_size": 0,
                "alloc_hint_flags": 0,
                "manifest": {
                    "type": 0
                },
                "watchers": {}
            },
            "shards": [
                {
                    "osd": 0,
                    "primary": true,
                    "errors": [
                        "missing" 
                    ]
                },
                {
                    "osd": 9,
                    "primary": false,
                    "errors": [],
                    "size": 4794,
                    "omap_digest": "0xffffffff",
                    "data_digest": "0x5edc7438" 
                }
            ]
        },
        {
            "object": {
                "name": "21",
                "nspace": "",
                "locator": "238938732312934742",
                "snap": "head",
                "version": 4285956
            },
            "errors": [],
            "union_shard_errors": [
                "missing" 
            ],
            "selected_object_info": {
                "oid": {
                    "oid": "21",
                    "key": "238938732312934742",
                    "snapid": -2,
                    "hash": 2967967738,
                    "max": 0,
                    "pool": 51,
                    "namespace": "" 
                },
                "version": "50939'271107",
                "prior_version": "50905'206235",
                "last_reqid": "client.36574111.0:5414580",
                "user_version": 4285956,
                "size": 1014,
                "mtime": "2018-12-06 01:14:21.144941",
                "local_mtime": "2018-12-06 01:14:21.145108",
                "lost": 0,
                "flags": [
                    "dirty",
                    "data_digest" 
                ],
                "truncate_seq": 0,
                "truncate_size": 0,
                "data_digest": "0x224edafa",
                "omap_digest": "0xffffffff",
                "expected_object_size": 0,
                "expected_write_size": 0,
                "alloc_hint_flags": 0,
                "manifest": {
                    "type": 0
                },
                "watchers": {}
            },
            "shards": [
                {
                    "osd": 0,
                    "primary": true,
                    "errors": [
                        "missing" 
                    ]
                },
                {
                    "osd": 9,
                    "primary": false,
                    "errors": [],
                    "size": 1014,
                    "omap_digest": "0xffffffff",
                    "data_digest": "0x224edafa" 
                }
            ]
        },
        {
            "object": {
                "name": "22",
                "nspace": "",
                "locator": "238938732312934742",
                "snap": "head",
                "version": 4285957
            },
            "errors": [],
            "union_shard_errors": [
                "missing" 
            ],
            "selected_object_info": {
                "oid": {
                    "oid": "22",
                    "key": "238938732312934742",
                    "snapid": -2,
                    "hash": 2967967738,
                    "max": 0,
                    "pool": 51,
                    "namespace": "" 
                },
                "version": "50939'271108",
                "prior_version": "50905'206236",
                "last_reqid": "client.36574111.0:5414581",
                "user_version": 4285957,
                "size": 2136,
                "mtime": "2018-12-06 01:14:21.151310",
                "local_mtime": "2018-12-06 01:14:21.151584",
                "lost": 0,
                "flags": [
                    "dirty",
                    "data_digest" 
                ],
                "truncate_seq": 0,
                "truncate_size": 0,
                "data_digest": "0x5f174e41",
                "omap_digest": "0xffffffff",
                "expected_object_size": 0,
                "expected_write_size": 0,
                "alloc_hint_flags": 0,
                "manifest": {
                    "type": 0
                },
                "watchers": {}
            },
            "shards": [
                {
                    "osd": 0,
                    "primary": true,
                    "errors": [
                        "missing" 
                    ]
                },
                {
                    "osd": 9,
                    "primary": false,
                    "errors": [],
                    "size": 2136,
                    "omap_digest": "0xffffffff",
                    "data_digest": "0x5f174e41" 
                }
            ]
        },
        {
            "object": {
                "name": "23",
                "nspace": "",
                "locator": "238938732312934742",
                "snap": "head",
                "version": 4285958
            },
            "errors": [],
            "union_shard_errors": [
                "missing" 
            ],
            "selected_object_info": {
                "oid": {
                    "oid": "23",
                    "key": "238938732312934742",
                    "snapid": -2,
                    "hash": 2967967738,
                    "max": 0,
                    "pool": 51,
                    "namespace": "" 
                },
                "version": "50939'271109",
                "prior_version": "50905'206237",
                "last_reqid": "client.36574111.0:5414582",
                "user_version": 4285958,
                "size": 4794,
                "mtime": "2018-12-06 01:14:21.157871",
                "local_mtime": "2018-12-06 01:14:21.158320",
                "lost": 0,
                "flags": [
                    "dirty",
                    "data_digest" 
                ],
                "truncate_seq": 0,
                "truncate_size": 0,
                "data_digest": "0x5edc7438",
                "omap_digest": "0xffffffff",
                "expected_object_size": 0,
                "expected_write_size": 0,
                "alloc_hint_flags": 0,
                "manifest": {
                    "type": 0
                },
                "watchers": {}
            },
            "shards": [
                {
                    "osd": 0,
                    "primary": true,
                    "errors": [
                        "missing" 
                    ]
                },
                {
                    "osd": 9,
                    "primary": false,
                    "errors": [],
                    "size": 4794,
                    "omap_digest": "0xffffffff",
                    "data_digest": "0x5edc7438" 
                }
            ]
        }
    ]
}

So the same behaviour on both the filestore and bluestore osd.

Actions #7

Updated by David Zafman almost 5 years ago

  • Assignee set to David Zafman
Actions #8

Updated by David Zafman almost 5 years ago

Since OSDs are crashing we should get stack traces out of the logs (e.g osd.9). Per http://tracker.ceph.com/issues/39115#note-5 osd.9 crashes right when the missing copy would be getting pulled (copied) from osd.0 to osd.9.

Actions #9

Updated by Iain Buclaw almost 5 years ago

See #39116 for the stack trace.

I initially thought that this and the other issue were two separate problems. However they could be related, thereby one of them is a duplicate.

Actions #10

Updated by David Zafman over 4 years ago

  • Is duplicate of Bug #39116: Draining filestore osd, removing, and adding new bluestore osd causes OSDs to crash added
Actions #11

Updated by David Zafman over 4 years ago

  • Status changed from New to Duplicate

OSD crashes are the underlying issue here and we can't say anything about repair until there aren't any more crashes.

Actions

Also available in: Atom PDF