Project

General

Profile

Actions

Bug #39116

open

Draining filestore osd, removing, and adding new bluestore osd causes OSDs to crash

Added by Iain Buclaw about 5 years ago. Updated over 4 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
rados
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

2019-04-04 10:40:22.600 7f44cb2a9700 -1 log_channel(cluster) log [ERR] : Health check failed: Possible data damage: 1 pg inconsistent (PG_DAMAGED)
2019-04-04 10:42:27.859 7f44cb2a9700 -1 log_channel(cluster) log [ERR] : Health check update: Possible data damage: 2 pgs inconsistent (PG_DAMAGED)
2019-04-04 10:59:59.998 7f44cb2a9700 -1 log_channel(cluster) log [ERR] : overall HEALTH_ERR 34 scrub errors; Possible data damage: 2 pgs inconsistent
2019-04-04 11:30:56.586 7f44cb2a9700 -1 log_channel(cluster) log [ERR] : Health check update: Possible data damage: 1 pg inconsistent (PG_DAMAGED)
2019-04-04 11:33:06.445 7f44cb2a9700  0 log_channel(cluster) log [INF] : Health check cleared: PG_DAMAGED (was: Possible data damage: 1 pg inconsistent)
2019-04-04 11:40:01.996 7f44cb2a9700 -1 log_channel(cluster) log [ERR] : Health check failed: Possible data damage: 1 pg inconsistent (PG_DAMAGED)
2019-04-04 11:59:59.994 7f44cb2a9700 -1 log_channel(cluster) log [ERR] : overall HEALTH_ERR 5 scrub errors; Possible data damage: 1 pg inconsistent
2019-04-04 12:59:59.999 7f44cb2a9700 -1 log_channel(cluster) log [ERR] : overall HEALTH_ERR 8 scrub errors; Possible data damage: 1 pg inconsistent
2019-04-04 13:12:05.033 7f44cb2a9700 -1 log_channel(cluster) log [ERR] : Health check update: Possible data damage: 2 pgs inconsistent (PG_DAMAGED)
2019-04-04 13:14:32.941 7f44cb2a9700 -1 log_channel(cluster) log [ERR] : Health check update: Possible data damage: 3 pgs inconsistent (PG_DAMAGED)
2019-04-04 13:25:08.437 7f44cb2a9700 -1 log_channel(cluster) log [ERR] : Health check update: Possible data damage: 2 pgs inconsistent (PG_DAMAGED)
2019-04-04 13:59:59.995 7f44cb2a9700 -1 log_channel(cluster) log [ERR] : overall HEALTH_ERR 10 scrub errors; Possible data damage: 2 pgs inconsistent
2019-04-04 14:18:04.007 7f44cb2a9700 -1 log_channel(cluster) log [ERR] : Health check update: Possible data damage: 1 pg inconsistent (PG_DAMAGED)
2019-04-04 14:19:26.517 7f44cb2a9700  0 log_channel(cluster) log [INF] : Health check cleared: PG_DAMAGED (was: Possible data damage: 1 pg inconsistent)
2019-04-04 14:29:22.432 7f44cb2a9700 -1 log_channel(cluster) log [ERR] : Health check failed: Possible data damage: 1 pg inconsistent (PG_DAMAGED)
2019-04-04 14:53:59.230 7f44cb2a9700  0 log_channel(cluster) log [INF] : Health check cleared: PG_DAMAGED (was: Possible data damage: 1 pg inconsistent)
2019-04-04 14:56:22.430 7f44cb2a9700 -1 log_channel(cluster) log [ERR] : Health check failed: Possible data damage: 1 pg inconsistent (PG_DAMAGED)
2019-04-04 14:59:59.995 7f44cb2a9700 -1 log_channel(cluster) log [ERR] : overall HEALTH_ERR noout flag(s) set; 19 scrub errors; Possible data damage: 1 pg inconsistent
2019-04-04 15:09:24.482 7f44cb2a9700  0 log_channel(cluster) log [INF] : Health check cleared: PG_DAMAGED (was: Possible data damage: 1 pg inconsistent)

Each scrub on the new bluestore osd finds a new missing object. As per #39115, ceph pg repair is not enough to fix this. The newly added osd needs to be constantly restarted.


Related issues 2 (0 open2 closed)

Related to RADOS - Bug #43174: pgs inconsistent, union_shard_errors=missingResolvedMykola Golub

Actions
Has duplicate RADOS - Bug #39115: ceph pg repair doesn't fix itself if osd is bluestoreDuplicateDavid Zafman04/04/2019

Actions
Actions

Also available in: Atom PDF