# ceph pg deep-scrub 51.177
instructing pg 51.177 on osd.9 to deep-scrub
ceph -w logs:
2019-04-05 07:12:01.200548 osd.9 [ERR] 51.177 shard 0 51:eedfc6f8::4653376157940334334:47:head : missing
2019-04-05 07:12:06.580296 osd.9 [ERR] 51.177 shard 9 51:eedfc6f8::4653376157940334334:47:head : missing
2019-04-05 07:12:38.708078 osd.9 [ERR] 51.177 deep-scrub : stat mismatch, got 19714/19713 objects, 0/0 clones, 19714/19713 dirty, 0/0 omap, 0/0 pinned, 0/0 hit_set_archive, 0/0 whiteouts, 246025587/246016008 bytes, 0/0 manifest objects, 0/0 hit_set_archive bytes.
2019-04-05 07:12:38.708093 osd.9 [ERR] 51.177 deep-scrub 1 missing, 0 inconsistent objects
2019-04-05 07:12:38.708096 osd.9 [ERR] 51.177 deep-scrub 3 errors
On osd.0:
# find 51.177_head/ -name "*47_4653376157940334334*"
51.177_head/DIR_7/DIR_7/DIR_B/DIR_F/47_4653376157940334334_head_1F63FB77__33
rados list-inconsistent-obj
{
"epoch": 53141,
"inconsistents": [
{
"object": {
"name": "47",
"nspace": "",
"locator": "4653376157940334334",
"snap": "head",
"version": 4526675
},
"errors": [],
"union_shard_errors": [
"missing"
],
"selected_object_info": {
"oid": {
"oid": "47",
"key": "4653376157940334334",
"snapid": -2,
"hash": 526646135,
"max": 0,
"pool": 51,
"namespace": ""
},
"version": "52284'454181",
"prior_version": "52249'377382",
"last_reqid": "client.36576007.0:15677702",
"user_version": 4526675,
"size": 9579,
"mtime": "2019-02-07 10:58:34.211435",
"local_mtime": "2019-02-07 10:58:34.218297",
"lost": 0,
"flags": [
"dirty",
"data_digest",
"omap_digest"
],
"truncate_seq": 0,
"truncate_size": 0,
"data_digest": "0x39906511",
"omap_digest": "0xffffffff",
"expected_object_size": 0,
"expected_write_size": 0,
"alloc_hint_flags": 0,
"manifest": {
"type": 0
},
"watchers": {}
},
"shards": [
{
"osd": 0,
"primary": false,
"errors": [],
"size": 9579,
"omap_digest": "0xffffffff",
"data_digest": "0x39906511"
},
{
"osd": 9,
"primary": true,
"errors": [
"missing"
]
}
]
}
]
}
Repairing:
# ceph pg repair 51.177
instructing pg 51.177 on osd.9 to repair
ceph -w logs:
2019-04-05 07:31:34.248658 osd.9 [ERR] 51.177 shard 9 51:eee4d2c1::171336252089860175:169:head : missing
2019-04-05 07:31:39.532834 osd.9 [ERR] 51.177 shard 0 51:eee4d2c1::171336252089860175:169:head : missing
2019-04-05 07:31:39.532835 osd.9 [ERR] 51.177 shard 0 51:eeea33e8::383290891110620472:41:head : missing
2019-04-05 07:31:44.802083 osd.9 [ERR] 51.177 shard 9 51:eeea33e8::383290891110620472:41:head : missing
2019-04-05 07:32:06.433237 mon.ap-120 [INF] osd.9 failed (root=default,host=ap-124) (connection refused reported by osd.8)
2019-04-05 07:32:06.485653 mon.ap-120 [WRN] Health check failed: 1 osds down (OSD_DOWN)
2019-04-05 07:32:09.979661 mon.ap-120 [WRN] Health check failed: Reduced data availability: 3 pgs inactive, 41 pgs peering (PG_AVAILABILITY)
2019-04-05 07:32:09.979712 mon.ap-120 [WRN] Health check failed: Degraded data redundancy: 312778/23936452 objects degraded (1.307%), 18 pgs degraded (PG_DEGRADED)
2019-04-05 07:32:11.240626 mon.ap-120 [INF] Health check cleared: OSD_SCRUB_ERRORS (was: 3 scrub errors)
2019-04-05 07:32:11.240652 mon.ap-120 [INF] Health check cleared: PG_DAMAGED (was: Possible data damage: 1 pg inconsistent)
2019-04-05 07:32:13.483748 mon.ap-120 [INF] Health check cleared: PG_AVAILABILITY (was: Reduced data availability: 3 pgs inactive, 41 pgs peering)
2019-04-05 07:32:19.295771 mon.ap-120 [WRN] Health check update: Degraded data redundancy: 2429737/23936452 objects degraded (10.151%), 159 pgs degraded (PG_DEGRADED)
2019-04-05 07:32:25.339488 mon.ap-120 [WRN] Health check update: Degraded data redundancy: 2429737/23936488 objects degraded (10.151%), 159 pgs degraded (PG_DEGRADED)
2019-04-05 07:32:35.207942 mon.ap-120 [INF] Health check cleared: OSD_DOWN (was: 1 osds down)
2019-04-05 07:32:35.580655 mon.ap-120 [INF] osd.9 172.28.19.9:6800/379046 boot
2019-04-05 07:32:37.834878 mon.ap-120 [WRN] Health check update: Degraded data redundancy: 2228323/23936488 objects degraded (9.309%), 147 pgs degraded (PG_DEGRADED)
2019-04-05 07:32:41.458732 mon.ap-120 [INF] Health check cleared: PG_DEGRADED (was: Degraded data redundancy: 1335902/23936488 objects degraded (5.581%), 85 pgs degraded)
2019-04-05 07:32:41.458757 mon.ap-120 [INF] Cluster is now healthy
OSD crash logs
2019-04-05 07:32:05.864 7fd837b89700 -1 *** Caught signal (Aborted) **
in thread 7fd837b89700 thread_name:tp_osd_tp
ceph version 13.2.5 (cbff874f9007f1869bfd3821b7e33b2a6ffd4988) mimic (stable)
1: /usr/bin/ceph-osd() [0xcaccb0]
2: (()+0x11390) [0x7fd85afdf390]
3: (gsignal()+0x38) [0x7fd85a512428]
4: (abort()+0x16a) [0x7fd85a51402a]
5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x250) [0x7fd85c961710]
6: (()+0x2f9787) [0x7fd85c961787]
7: (ReplicatedBackend::prepare_pull(eversion_t, hobject_t const&, std::shared_ptr<ObjectContext>, ReplicatedBackend::RPGHandle*)+0x12d4) [0xa2c364]
8: (ReplicatedBackend::recover_object(hobject_t const&, eversion_t, std::shared_ptr<ObjectContext>, std::shared_ptr<ObjectContext>, PGBackend::RecoveryHandle*)+0xdc) [0xa2ef4c]
9: (PrimaryLogPG::recover_missing(hobject_t const&, eversion_t, int, PGBackend::RecoveryHandle*)+0x1d9) [0x8b6b39]
10: (PrimaryLogPG::recover_primary(unsigned long, ThreadPool::TPHandle&)+0x9c9) [0x8f1979]
11: (PrimaryLogPG::start_recovery_ops(unsigned long, ThreadPool::TPHandle&, unsigned long*)+0x210) [0x8f7c70]
12: (OSD::do_recovery(PG*, unsigned int, unsigned long, ThreadPool::TPHandle&)+0x36a) [0x76793a]
13: (PGRecovery::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x19) [0x9c5b69]
14: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x590) [0x769240]
15: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x476) [0x7fd85c9676f6]
16: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x7fd85c9688b0]
17: (()+0x76ba) [0x7fd85afd56ba]
18: (clone()+0x6d) [0x7fd85a5e441d]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Shortly after start-up, another PG reports as inconsistent - but I suspect this is because it is next on the list of scheduled scrubs.
2019-04-05 07:33:03.211561 osd.9 [ERR] 51.137 shard 3 51:ec991f27::372956414560036835:41:head : missing
2019-04-05 07:33:08.422089 osd.9 [ERR] 51.137 shard 9 51:ec991f27::372956414560036835:41:head : missing
2019-04-05 07:34:31.983093 osd.9 [ERR] 51.137 shard 3 51:ecef2e12::237649098503494993:24:head : missing
2019-04-05 07:34:31.983094 osd.9 [ERR] 51.137 shard 3 51:ecef2e12::237649098503494993:25:head : missing
2019-04-05 07:34:31.983095 osd.9 [ERR] 51.137 shard 3 51:ecef2e12::237649098503494993:26:head : missing
2019-04-05 07:34:31.983096 osd.9 [ERR] 51.137 shard 3 51:ecef2e12::237649098503494993:27:head : missing
2019-04-05 07:34:31.983096 osd.9 [ERR] 51.137 shard 3 51:ecef2e12::237649098503494993:33:head : missing
2019-04-05 07:34:31.983097 osd.9 [ERR] 51.137 shard 3 51:ecef2e12::237649098503494993:34:head : missing
2019-04-05 07:34:31.983097 osd.9 [ERR] 51.137 shard 3 51:ecef2e12::237649098503494993:41:head : missing
2019-04-05 07:34:31.983098 osd.9 [ERR] 51.137 shard 3 51:ecef2e12::237649098503494993:42:head : missing
2019-04-05 07:34:37.212930 osd.9 [ERR] 51.137 shard 9 51:ecef2e12::237649098503494993:24:head : missing
2019-04-05 07:34:37.212931 osd.9 [ERR] 51.137 shard 9 51:ecef2e12::237649098503494993:25:head : missing
2019-04-05 07:34:37.212932 osd.9 [ERR] 51.137 shard 9 51:ecef2e12::237649098503494993:26:head : missing
2019-04-05 07:34:37.212933 osd.9 [ERR] 51.137 shard 9 51:ecef2e12::237649098503494993:27:head : missing
2019-04-05 07:34:37.212934 osd.9 [ERR] 51.137 shard 9 51:ecef2e12::237649098503494993:33:head : missing
2019-04-05 07:34:37.212935 osd.9 [ERR] 51.137 shard 9 51:ecef2e12::237649098503494993:34:head : missing
2019-04-05 07:34:37.212935 osd.9 [ERR] 51.137 shard 9 51:ecef2e12::237649098503494993:41:head : missing
2019-04-05 07:34:37.212936 osd.9 [ERR] 51.137 shard 9 51:ecef2e12::237649098503494993:42:head : missing
2019-04-05 07:34:52.624365 osd.9 [ERR] 51.137 scrub : stat mismatch, got 20016/20007 objects, 0/0 clones, 20016/20007 dirty, 0/0 omap, 0/0 pinned, 0/0 hit_set_archive, 0/0 whiteouts, 238983468/238906209 bytes, 0/0 manifest objects, 0/0 hit_set_archive bytes.
2019-04-05 07:34:52.624374 osd.9 [ERR] 51.137 scrub 9 missing, 0 inconsistent objects
2019-04-05 07:34:52.624377 osd.9 [ERR] 51.137 scrub 19 errors
2019-04-05 07:34:57.022091 mon.ap-120 [ERR] Health check failed: 19 scrub errors (OSD_SCRUB_ERRORS)
2019-04-05 07:34:57.022119 mon.ap-120 [ERR] Health check failed: Possible data damage: 1 pg inconsistent (PG_DAMAGED)