Bug #59333: PgScrubber: timeout on reserving replicas - RADOS - Ceph

Actions

Copy link

Bug #59333

open

PgScrubber: timeout on reserving replicas

Added by Laura Flores about 1 year ago. Updated 10 months ago.

Status:

New

Priority:

Normal

Assignee:

Ronen Friedman

Category:

Target version:

% Done:

Source:

Tags:

Backport:

pacific,quincy,reef

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Component(RADOS):

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

/a/yuriw-2023-03-28_22:43:59-rados-wip-yuri11-testing-2023-03-28-0950-distro-default-smithi/7224215

2023-03-29T07:14:03.930 INFO:tasks.ceph:Checking cluster log for badness...
2023-03-29T07:14:03.930 DEBUG:teuthology.orchestra.run.smithi136:> sudo egrep '\[ERR\]|\[WRN\]|\[SEC\]' /var/log/ceph/ceph.log | egrep -v '\(MDS_ALL_DOWN\)' | egrep -v '\(MDS_UP_LESS_THAN_MAX\)' | egrep -v '\(POOL_APP_NOT_ENABLED\)' | egrep -v '\(OSD_SLOW_PING_TIME' | egrep -v 'but it is still running' | egrep -v 'objects unfound and apparently lost' | egrep -v 'overall HEALTH_' | egrep -v '\(OSDMAP_FLAGS\)' | egrep -v '\(OSD_' | egrep -v '\(PG_' | egrep -v '\(POOL_' | egrep -v '\(CACHE_POOL_' | egrep -v '\(SMALLER_PGP_NUM\)' | egrep -v '\(OBJECT_' | egrep -v '\(SLOW_OPS\)' | egrep -v '\(REQUEST_SLOW\)' | egrep -v '\(TOO_FEW_PGS\)' | egrep -v 'slow request' | egrep -v 'timeout on replica' | egrep -v 'late reservation from' | head -n 1
2023-03-29T07:14:04.091 INFO:teuthology.orchestra.run.smithi136.stdout:1680073594.2479906 osd.7 (osd.7) 154 : cluster [WRN] osd.7 PgScrubber: 2.1 timeout on reserving replicsa (since 2023-03-29T07:06:29.246198+0000)
2023-03-29T07:14:04.092 WARNING:tasks.ceph:Found errors (ERR|WRN|SEC) in cluster log
2023-03-29T07:14:04.092 DEBUG:teuthology.orchestra.run.smithi136:> sudo egrep '\[SEC\]' /var/log/ceph/ceph.log | egrep -v '\(MDS_ALL_DOWN\)' | egrep -v '\(MDS_UP_LESS_THAN_MAX\)' | egrep -v '\(POOL_APP_NOT_ENABLED\)' | egrep -v '\(OSD_SLOW_PING_TIME' | egrep -v 'but it is still running' | egrep -v 'objects unfound and apparently lost' | egrep -v 'overall HEALTH_' | egrep -v '\(OSDMAP_FLAGS\)' | egrep -v '\(OSD_' | egrep -v '\(PG_' | egrep -v '\(POOL_' | egrep -v '\(CACHE_POOL_' | egrep -v '\(SMALLER_PGP_NUM\)' | egrep -v '\(OBJECT_' | egrep -v '\(SLOW_OPS\)' | egrep -v '\(REQUEST_SLOW\)' | egrep -v '\(TOO_FEW_PGS\)' | egrep -v 'slow request' | egrep -v 'timeout on replica' | egrep -v 'late reservation from' | head -n 1
2023-03-29T07:14:04.110 DEBUG:teuthology.orchestra.run.smithi136:> sudo egrep '\[ERR\]' /var/log/ceph/ceph.log | egrep -v '\(MDS_ALL_DOWN\)' | egrep -v '\(MDS_UP_LESS_THAN_MAX\)' | egrep -v '\(POOL_APP_NOT_ENABLED\)' | egrep -v '\(OSD_SLOW_PING_TIME' | egrep -v 'but it is still running' | egrep -v 'objects unfound and apparently lost' | egrep -v 'overall HEALTH_' | egrep -v '\(OSDMAP_FLAGS\)' | egrep -v '\(OSD_' | egrep -v '\(PG_' | egrep -v '\(POOL_' | egrep -v '\(CACHE_POOL_' | egrep -v '\(SMALLER_PGP_NUM\)' | egrep -v '\(OBJECT_' | egrep -v '\(SLOW_OPS\)' | egrep -v '\(REQUEST_SLOW\)' | egrep -v '\(TOO_FEW_PGS\)' | egrep -v 'slow request' | egrep -v 'timeout on replica' | egrep -v 'late reservation from' | head -n 1
2023-03-29T07:14:04.169 DEBUG:teuthology.orchestra.run.smithi136:> sudo egrep '\[WRN\]' /var/log/ceph/ceph.log | egrep -v '\(MDS_ALL_DOWN\)' | egrep -v '\(MDS_UP_LESS_THAN_MAX\)' | egrep -v '\(POOL_APP_NOT_ENABLED\)' | egrep -v '\(OSD_SLOW_PING_TIME' | egrep -v 'but it is still running' | egrep -v 'objects unfound and apparently lost' | egrep -v 'overall HEALTH_' | egrep -v '\(OSDMAP_FLAGS\)' | egrep -v '\(OSD_' | egrep -v '\(PG_' | egrep -v '\(POOL_' | egrep -v '\(CACHE_POOL_' | egrep -v '\(SMALLER_PGP_NUM\)' | egrep -v '\(OBJECT_' | egrep -v '\(SLOW_OPS\)' | egrep -v '\(REQUEST_SLOW\)' | egrep -v '\(TOO_FEW_PGS\)' | egrep -v 'slow request' | egrep -v 'timeout on replica' | egrep -v 'late reservation from' | head -n 1
2023-03-29T07:14:04.229 INFO:teuthology.orchestra.run.smithi136.stdout:1680073594.2479906 osd.7 (osd.7) 154 : cluster [WRN] osd.7 PgScrubber: 2.1 timeout on reserving replicsa (since 2023-03-29T07:06:29.246198+0000)

Also, note that "replicas" is misspelled in the warning; this should be fixed.

Related issues 1 (0 open — 1 closed)

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » RADOS

Custom queries

Bug #59333

PgScrubber: timeout on reserving replicas

Updated by Laura Flores about 1 year ago

Updated by Radoslaw Zarzynski about 1 year ago

Updated by Yuri Weinstein about 1 year ago

Updated by Neha Ojha about 1 year ago

Updated by Neha Ojha about 1 year ago

Updated by Radoslaw Zarzynski about 1 year ago

Updated by Sridhar Seshasayee about 1 year ago

Updated by Laura Flores 12 months ago

Updated by Radoslaw Zarzynski 12 months ago

Updated by Laura Flores 11 months ago

Updated by Laura Flores 10 months ago

Updated by Laura Flores 10 months ago