Project

General

Profile

Actions

Bug #59333

open

PgScrubber: timeout on reserving replicas

Added by Laura Flores about 1 year ago. Updated 10 months ago.

Status:
New
Priority:
Normal
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
pacific,quincy,reef
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

/a/yuriw-2023-03-28_22:43:59-rados-wip-yuri11-testing-2023-03-28-0950-distro-default-smithi/7224215

2023-03-29T07:14:03.930 INFO:tasks.ceph:Checking cluster log for badness...
2023-03-29T07:14:03.930 DEBUG:teuthology.orchestra.run.smithi136:> sudo egrep '\[ERR\]|\[WRN\]|\[SEC\]' /var/log/ceph/ceph.log | egrep -v '\(MDS_ALL_DOWN\)' | egrep -v '\(MDS_UP_LESS_THAN_MAX\)' | egrep -v '\(POOL_APP_NOT_ENABLED\)' | egrep -v '\(OSD_SLOW_PING_TIME' | egrep -v 'but it is still running' | egrep -v 'objects unfound and apparently lost' | egrep -v 'overall HEALTH_' | egrep -v '\(OSDMAP_FLAGS\)' | egrep -v '\(OSD_' | egrep -v '\(PG_' | egrep -v '\(POOL_' | egrep -v '\(CACHE_POOL_' | egrep -v '\(SMALLER_PGP_NUM\)' | egrep -v '\(OBJECT_' | egrep -v '\(SLOW_OPS\)' | egrep -v '\(REQUEST_SLOW\)' | egrep -v '\(TOO_FEW_PGS\)' | egrep -v 'slow request' | egrep -v 'timeout on replica' | egrep -v 'late reservation from' | head -n 1
2023-03-29T07:14:04.091 INFO:teuthology.orchestra.run.smithi136.stdout:1680073594.2479906 osd.7 (osd.7) 154 : cluster [WRN] osd.7 PgScrubber: 2.1 timeout on reserving replicsa (since 2023-03-29T07:06:29.246198+0000)
2023-03-29T07:14:04.092 WARNING:tasks.ceph:Found errors (ERR|WRN|SEC) in cluster log
2023-03-29T07:14:04.092 DEBUG:teuthology.orchestra.run.smithi136:> sudo egrep '\[SEC\]' /var/log/ceph/ceph.log | egrep -v '\(MDS_ALL_DOWN\)' | egrep -v '\(MDS_UP_LESS_THAN_MAX\)' | egrep -v '\(POOL_APP_NOT_ENABLED\)' | egrep -v '\(OSD_SLOW_PING_TIME' | egrep -v 'but it is still running' | egrep -v 'objects unfound and apparently lost' | egrep -v 'overall HEALTH_' | egrep -v '\(OSDMAP_FLAGS\)' | egrep -v '\(OSD_' | egrep -v '\(PG_' | egrep -v '\(POOL_' | egrep -v '\(CACHE_POOL_' | egrep -v '\(SMALLER_PGP_NUM\)' | egrep -v '\(OBJECT_' | egrep -v '\(SLOW_OPS\)' | egrep -v '\(REQUEST_SLOW\)' | egrep -v '\(TOO_FEW_PGS\)' | egrep -v 'slow request' | egrep -v 'timeout on replica' | egrep -v 'late reservation from' | head -n 1
2023-03-29T07:14:04.110 DEBUG:teuthology.orchestra.run.smithi136:> sudo egrep '\[ERR\]' /var/log/ceph/ceph.log | egrep -v '\(MDS_ALL_DOWN\)' | egrep -v '\(MDS_UP_LESS_THAN_MAX\)' | egrep -v '\(POOL_APP_NOT_ENABLED\)' | egrep -v '\(OSD_SLOW_PING_TIME' | egrep -v 'but it is still running' | egrep -v 'objects unfound and apparently lost' | egrep -v 'overall HEALTH_' | egrep -v '\(OSDMAP_FLAGS\)' | egrep -v '\(OSD_' | egrep -v '\(PG_' | egrep -v '\(POOL_' | egrep -v '\(CACHE_POOL_' | egrep -v '\(SMALLER_PGP_NUM\)' | egrep -v '\(OBJECT_' | egrep -v '\(SLOW_OPS\)' | egrep -v '\(REQUEST_SLOW\)' | egrep -v '\(TOO_FEW_PGS\)' | egrep -v 'slow request' | egrep -v 'timeout on replica' | egrep -v 'late reservation from' | head -n 1
2023-03-29T07:14:04.169 DEBUG:teuthology.orchestra.run.smithi136:> sudo egrep '\[WRN\]' /var/log/ceph/ceph.log | egrep -v '\(MDS_ALL_DOWN\)' | egrep -v '\(MDS_UP_LESS_THAN_MAX\)' | egrep -v '\(POOL_APP_NOT_ENABLED\)' | egrep -v '\(OSD_SLOW_PING_TIME' | egrep -v 'but it is still running' | egrep -v 'objects unfound and apparently lost' | egrep -v 'overall HEALTH_' | egrep -v '\(OSDMAP_FLAGS\)' | egrep -v '\(OSD_' | egrep -v '\(PG_' | egrep -v '\(POOL_' | egrep -v '\(CACHE_POOL_' | egrep -v '\(SMALLER_PGP_NUM\)' | egrep -v '\(OBJECT_' | egrep -v '\(SLOW_OPS\)' | egrep -v '\(REQUEST_SLOW\)' | egrep -v '\(TOO_FEW_PGS\)' | egrep -v 'slow request' | egrep -v 'timeout on replica' | egrep -v 'late reservation from' | head -n 1
2023-03-29T07:14:04.229 INFO:teuthology.orchestra.run.smithi136.stdout:1680073594.2479906 osd.7 (osd.7) 154 : cluster [WRN] osd.7 PgScrubber: 2.1 timeout on reserving replicsa (since 2023-03-29T07:06:29.246198+0000)

Also, note that "replicas" is misspelled in the warning; this should be fixed.


Related issues 1 (0 open1 closed)

Related to RADOS - Bug #61815: PgScrubber cluster warning is misspelledResolvedPrashant D

Actions
Actions #1

Updated by Laura Flores about 1 year ago

/a/yuriw-2023-03-29_17:58:41-rados-wip-yuri11-testing-2023-03-28-0950-distro-default-smithi/7225752

Actions #2

Updated by Radoslaw Zarzynski about 1 year ago

  • Assignee set to Ronen Friedman

Assgining for a screening whether this is a real problem or not (a testing issue?).
If it is, we could reassign even for the sake of learning.

Actions #3

Updated by Yuri Weinstein almost 1 year ago

See the same on pacific 16.2.13 RC

http://qa-proxy.ceph.com/teuthology/yuriw-2023-04-25_14:15:06-smoke-pacific-release-distro-default-smithi/7251137/teuthology.log


2023-04-25T14:50:28.182 DEBUG:teuthology.orchestra.run.smithi019:> sudo egrep '\[ERR\]|\[WRN\]|\[SEC\]' /var/log/ceph/ceph.log | egrep -v '\(MDS_ALL_DOWN\)' | egrep -v '\(MDS_UP_LESS_THAN_MAX\)' | head -n 1
2023-04-25T14:50:28.201 INFO:teuthology.orchestra.run.smithi019.stdout:2023-04-25T14:46:14.940847+0000 mon.a (mon.0) 523 : cluster [WRN] Health check failed: 1 pool(s) do not have an application enabled (POOL_APP_NOT_ENABLED)
2023-04-25T14:50:28.202 WARNING:tasks.ceph:Found errors (ERR|WRN|SEC) in cluster log
2023-04-25T14:50:28.203 DEBUG:teuthology.orchestra.run.smithi019:> sudo egrep '\[SEC\]' /var/log/ceph/ceph.log | egrep -v '\(MDS_ALL_DOWN\)' | egrep -v '\(MDS_UP_LESS_THAN_MAX\)' | head -n 1
2023-04-25T14:50:28.263 DEBUG:teuthology.orchestra.run.smithi019:> sudo egrep '\[ERR\]' /var/log/ceph/ceph.log | egrep -v '\(MDS_ALL_DOWN\)' | egrep -v '\(MDS_UP_LESS_THAN_MAX\)' | head -n 1
2023-04-25T14:50:28.281 DEBUG:teuthology.orchestra.run.smithi019:> sudo egrep '\[WRN\]' /var/log/ceph/ceph.log | egrep -v '\(MDS_ALL_DOWN\)' | egrep -v '\(MDS_UP_LESS_THAN_MAX\)' | head -n 1
2023-04-25T14:50:28.295 INFO:teuthology.orchestra.run.smithi019.stdout:2023-04-25T14:46:14.940847+0000 mon.a (mon.0) 523 : cluster [WRN] Health check failed: 1 pool(s) do not have an application enabled (POOL_APP_NOT_ENABLED)
Actions #4

Updated by Neha Ojha 12 months ago

  • Backport set to pacific,quincy
Actions #5

Updated by Neha Ojha 12 months ago

  • Backport changed from pacific,quincy to pacific,quincy,reef
Actions #6

Updated by Radoslaw Zarzynski 12 months ago

bump up

Actions #7

Updated by Sridhar Seshasayee 12 months ago

/a/sseshasa-2023-05-02_03:09:13-rados-wip-sseshasa-testing-2023-05-01-2145-distro-default-smithi/7260258

Actions #8

Updated by Laura Flores 11 months ago

/a/yuriw-2023-05-11_15:01:38-rados-wip-yuri8-testing-2023-05-10-1402-distro-default-smithi/7271192

Actions #9

Updated by Radoslaw Zarzynski 11 months ago

Ronen, PTAL.

Actions #10

Updated by Laura Flores 11 months ago

/a/yuriw-2023-05-28_14:41:12-rados-reef-release-distro-default-smithi/7288683

Actions #11

Updated by Laura Flores 10 months ago

  • Related to Bug #61815: PgScrubber cluster warning is misspelled added
Actions #12

Updated by Laura Flores 10 months ago

Hey @Ronen, I added an extra tracker for the misspelling in the log since I thought it would be a good issue for Open Source Day. :)

Actions

Also available in: Atom PDF