Actions
Bug #48029
openExiting scrub checking -- not all pgs scrubbed.
Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:
0%
Source:
Tags:
Backport:
pacific
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
2020-10-28T09:09:10.147 INFO:tasks.ceph:pgid 1.7 last_scrub_stamp 2020-10-28T08:43:48.710032+0000 time.struct_time(tm_year=2020, tm_mon=10, tm_mday=28, tm_hour=8, tm_min=43, tm_sec=48, tm_wday=2, tm_yday=302, tm_isdst=-1) <= time.struct_time(tm_year=2020, tm_mon=10, tm_mday=28, tm_hour=8, tm_min=54, tm_sec=24, tm_wday=2, tm_yday=302, tm_isdst=0) 2020-10-28T09:09:10.147 INFO:tasks.ceph:pgid 2.3 last_scrub_stamp 2020-10-28T08:43:55.800877+0000 time.struct_time(tm_year=2020, tm_mon=10, tm_mday=28, tm_hour=8, tm_min=43, tm_sec=55, tm_wday=2, tm_yday=302, tm_isdst=-1) <= time.struct_time(tm_year=2020, tm_mon=10, tm_mday=28, tm_hour=8, tm_min=54, tm_sec=24, tm_wday=2, tm_yday=302, tm_isdst=0) 2020-10-28T09:09:10.148 INFO:tasks.ceph:pgid 1.3 last_scrub_stamp 2020-10-28T08:43:48.710032+0000 time.struct_time(tm_year=2020, tm_mon=10, tm_mday=28, tm_hour=8, tm_min=43, tm_sec=48, tm_wday=2, tm_yday=302, tm_isdst=-1) <= time.struct_time(tm_year=2020, tm_mon=10, tm_mday=28, tm_hour=8, tm_min=54, tm_sec=24, tm_wday=2, tm_yday=302, tm_isdst=0) 2020-10-28T09:09:10.149 INFO:tasks.ceph:pgid 2.d last_scrub_stamp 2020-10-28T08:43:55.800877+0000 time.struct_time(tm_year=2020, tm_mon=10, tm_mday=28, tm_hour=8, tm_min=43, tm_sec=55, tm_wday=2, tm_yday=302, tm_isdst=-1) <= time.struct_time(tm_year=2020, tm_mon=10, tm_mday=28, tm_hour=8, tm_min=54, tm_sec=24, tm_wday=2, tm_yday=302, tm_isdst=0) 2020-10-28T09:09:10.150 INFO:tasks.ceph:pgid 2.17 last_scrub_stamp 2020-10-28T08:43:55.800877+0000 time.struct_time(tm_year=2020, tm_mon=10, tm_mday=28, tm_hour=8, tm_min=43, tm_sec=55, tm_wday=2, tm_yday=302, tm_isdst=-1) <= time.struct_time(tm_year=2020, tm_mon=10, tm_mday=28, tm_hour=8, tm_min=54, tm_sec=24, tm_wday=2, tm_yday=302, tm_isdst=0) 2020-10-28T09:09:10.150 ERROR:teuthology.contextutil:Saw exception from nested tasks Traceback (most recent call last): File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/contextutil.py", line 33, in nested yield vars File "/home/teuthworker/src/git.ceph.com_ceph_master/qa/tasks/ceph.py", line 1875, in task osd_scrub_pgs(ctx, config) File "/home/teuthworker/src/git.ceph.com_ceph_master/qa/tasks/ceph.py", line 1277, in osd_scrub_pgs raise RuntimeError('Exiting scrub checking -- not all pgs scrubbed.')
/a/teuthology-2020-10-28_07:01:02-rados-master-distro-basic-smithi/5567239
Usually we see this when some PGs are not active+clean but here they are.
Updated by Neha Ojha over 3 years ago
rados/singleton-nomsgr/{all/osd_stale_reads mon_election/connectivity rados supported-random-distro$/{ubuntu_latest}} - same test as before
/a/teuthology-2020-11-04_07:01:02-rados-master-distro-basic-smithi/5590002
Updated by Laura Flores almost 2 years ago
- Backport set to pacific
/a/yuriw-2022-06-22_22:13:20-rados-wip-yuri3-testing-2022-06-22-1121-pacific-distro-default-smithi/6892691
Description: rados/singleton-nomsgr/{all/osd_stale_reads mon_election/classic rados supported-random-distro$/{centos_8}}
Updated by Radoslaw Zarzynski almost 2 years ago
The code that generated the exception is (from the main
branch):
def osd_scrub_pgs(ctx, config): # ... while loop: stats = manager.get_pg_stats() timez = [(stat['pgid'],stat['last_scrub_stamp']) for stat in stats] loop = False thiscnt = 0 re_scrub = [] for (pgid, tmval) in timez: t = tmval[0:tmval.find('.')].replace(' ', 'T') pgtm = time.strptime(t, '%Y-%m-%dT%H:%M:%S') if pgtm > check_time_now: thiscnt += 1 else: log.info('pgid %s last_scrub_stamp %s %s <= %s', pgid, tmval, pgtm, check_time_now) loop = True re_scrub.append(pgid) if thiscnt > prev_good: prev_good = thiscnt gap_cnt = 0 else: gap_cnt += 1 if gap_cnt % 6 == 0: for pgid in re_scrub: # re-request scrub every so often in case the earlier # request was missed. do not do it every time because # the scrub may be in progress or not reported yet and # we will starve progress. manager.raw_cluster_cmd('pg', 'deep-scrub', pgid) if gap_cnt > retries: raise RuntimeError('Exiting scrub checking -- not all pgs scrubbed.') if loop: log.info('Still waiting for all pgs to be scrubbed.') time.sleep(delays)
So the request to schedule deep-scrub
got somehow ignored.
Actions