Project

General

Profile

Actions

Bug #45135

closed

nautilus: "too few PGs per OSD (2 < min 30) (TOO_FEW_PGS)" in smoke (all suites seem broken)

Added by Yuri Weinstein about 4 years ago. Updated almost 4 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
rados, smoke
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Run http://pulpito.ceph.com/teuthology-2020-04-17_07:00:05-smoke-nautilus-testing-basic-smithi/
Jobs: ['4960947', '4960951', '4960948', '4960959', '4960965', '4960950', '4960962', '4960952', '4960967', '4960945', '4960960', '4960944', '4960961', '4960964', '4960968', '4960966', '4960946', '4960957', '4960949']
Logs: http://qa-proxy.ceph.com/teuthology/teuthology-2020-04-17_07:00:05-smoke-nautilus-testing-basic-smithi/4960944/teuthology.log

description: smoke/basic/{clusters/{fixed-3-cephfs.yaml openstack.yaml} objectstore/bluestore-bitmap.yaml
  tasks/cfuse_workunit_suites_blogbench.yaml}
duration: 1587.0086629390717
failure_reason: '"2020-04-17 08:00:33.977883 mon.a (mon.0) 62 : cluster [WRN] Health
  check failed: too few PGs per OSD (2 < min 30) (TOO_FEW_PGS)" in cluster log'
flavor: basic

notice timeout:

2020-04-17T08:19:59.075 INFO:teuthology.orchestra.run.smithi094.stderr:mon.a: injectargs:mon_health_to_clog = 'false'
2020-04-17T08:19:59.290 INFO:teuthology.orchestra.run.smithi094.stderr:mon.b: injectargs:mon_health_to_clog = 'false'
2020-04-17T08:19:59.505 INFO:teuthology.orchestra.run.smithi094.stderr:mon.c: injectargs:mon_health_to_clog = 'false'
2020-04-17T08:19:59.529 ERROR:teuthology.contextutil:Saw exception from nested tasks
Traceback (most recent call last):
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_py2/teuthology/contextutil.py", line 34, in nested
    yield vars
  File "/home/teuthworker/src/git.ceph.com_ceph_nautilus/qa/tasks/ceph.py", line 1922, in task
    healthy(ctx=ctx, config=dict(cluster=config['cluster']))
  File "/home/teuthworker/src/git.ceph.com_ceph_nautilus/qa/tasks/ceph.py", line 1484, in healthy
    ceph_cluster=cluster_name,
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_py2/teuthology/misc.py", line 867, in wait_until_healthy
    while proceed():
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_py2/teuthology/contextutil.py", line 134, in __call__
    raise MaxWhileTries(error_msg)
MaxWhileTries: 'wait_until_healthy' reached maximum tries (150) after waiting for 900 seconds

Related issues 1 (0 open1 closed)

Related to RADOS - Bug #41735: pg_autoscaler throws HEALTH_WARN with auto_scale on for all poolsResolved

Actions
Actions #1

Updated by Neha Ojha about 4 years ago

  • Subject changed from "too few PGs per OSD (2 < min 30) (TOO_FEW_PGS)" in smoke (all suites seem broken) to nautilus: "too few PGs per OSD (2 < min 30) (TOO_FEW_PGS)" in smoke (all suites seem broken)
  • Status changed from New to Triaged

The problem is that https://github.com/ceph/ceph/pull/34055/commits/fd608af305745830778d826c8e29a8ecd14d4748 removed "mon pg warn min per osd = 1" for all tests. This change was made in master following 1ac34a5ea3d1aca299b02e574b295dd4bf6167f4. But this commit is missing in nautilus and mon_pg_warn_min_per_osd defaults to 30, which is why most tests are failing.

Actions #2

Updated by Yuri Weinstein about 4 years ago

This PR was merged w/o usual testing and therefore to fix this we need to 1ac34a5ea3d1aca299b02e574b295dd4bf6167f4 or revert that

Actions #3

Updated by Lenz Grimmer about 4 years ago

  • Related to Bug #41735: pg_autoscaler throws HEALTH_WARN with auto_scale on for all pools added
Actions #5

Updated by Neha Ojha almost 4 years ago

  • Status changed from Triaged to Resolved
Actions

Also available in: Atom PDF