Project

General

Profile

Bug #42228

mgr/dashboard: backend API test failure "test_access_permissions"

Added by Laura Paduano 12 months ago. Updated 5 months ago.

Status:
Resolved
Priority:
Normal
Category:
-
Target version:
% Done:

0%

Source:
Q/A
Tags:
Backport:
nautilus
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
qa-suite
Labels (FS):
Pull request ID:
Crash signature:

Description

I got this error on my local system (based on master) and it also failed on a PR test (https://jenkins.ceph.com/job/ceph-dashboard-pr-backend/30/console):


2019-10-08 11:05:57,510.510 INFO:__main__:Stopped test: test_access_permissions (tasks.mgr.dashboard.test_cephfs.CephfsTest) in 22.789247s
2019-10-08 11:05:57,510.510 INFO:__main__:
2019-10-08 11:05:57,510.510 INFO:__main__:======================================================================
2019-10-08 11:05:57,510.510 INFO:__main__:ERROR: test_access_permissions (tasks.mgr.dashboard.test_cephfs.CephfsTest)
2019-10-08 11:05:57,511.511 INFO:__main__:----------------------------------------------------------------------
2019-10-08 11:05:57,511.511 INFO:__main__:Traceback (most recent call last):
2019-10-08 11:05:57,511.511 INFO:__main__:  File "/home/jenkins-build/build/workspace/ceph-dashboard-pr-backend/qa/tasks/mgr/dashboard/helper.py", line 154, in setUp
2019-10-08 11:05:57,511.511 INFO:__main__:    self.wait_for_health_clear(20)
2019-10-08 11:05:57,511.511 INFO:__main__:  File "/home/jenkins-build/build/workspace/ceph-dashboard-pr-backend/qa/tasks/ceph_test_case.py", line 131, in wait_for_health_clear
2019-10-08 11:05:57,511.511 INFO:__main__:    self.wait_until_true(is_clear, timeout)
2019-10-08 11:05:57,511.511 INFO:__main__:  File "/home/jenkins-build/build/workspace/ceph-dashboard-pr-backend/qa/tasks/ceph_test_case.py", line 163, in wait_until_true
2019-10-08 11:05:57,511.511 INFO:__main__:    raise RuntimeError("Timed out after {0}s".format(elapsed))
2019-10-08 11:05:57,511.511 INFO:__main__:RuntimeError: Timed out after 20s
2019-10-08 11:05:57,511.511 INFO:__main__:
2019-10-08 11:05:57,512.512 INFO:__main__:----------------------------------------------------------------------
2019-10-08 11:05:57,512.512 INFO:__main__:Ran 15 tests in 254.204s
2019-10-08 11:05:57,512.512 INFO:__main__:
2019-10-08 11:05:57,512.512 INFO:__main__:FAILED (errors=1)
2019-10-08 11:05:57,512.512 INFO:__main__:
2019-10-08 11:05:57,512.512 INFO:__main__:======================================================================
2019-10-08 11:05:57,512.512 INFO:__main__:ERROR: test_access_permissions (tasks.mgr.dashboard.test_cephfs.CephfsTest)
2019-10-08 11:05:57,512.512 INFO:__main__:----------------------------------------------------------------------
2019-10-08 11:05:57,513.513 INFO:__main__:Traceback (most recent call last):
2019-10-08 11:05:57,513.513 INFO:__main__:  File "/home/jenkins-build/build/workspace/ceph-dashboard-pr-backend/qa/tasks/mgr/dashboard/helper.py", line 154, in setUp
2019-10-08 11:05:57,513.513 INFO:__main__:    self.wait_for_health_clear(20)
2019-10-08 11:05:57,513.513 INFO:__main__:  File "/home/jenkins-build/build/workspace/ceph-dashboard-pr-backend/qa/tasks/ceph_test_case.py", line 131, in wait_for_health_clear
2019-10-08 11:05:57,513.513 INFO:__main__:    self.wait_until_true(is_clear, timeout)
2019-10-08 11:05:57,513.513 INFO:__main__:  File "/home/jenkins-build/build/workspace/ceph-dashboard-pr-backend/qa/tasks/ceph_test_case.py", line 163, in wait_until_true
2019-10-08 11:05:57,513.513 INFO:__main__:    raise RuntimeError("Timed out after {0}s".format(elapsed))
2019-10-08 11:05:57,513.513 INFO:__main__:RuntimeError: Timed out after 20s   

Related issues

Related to fs - Bug #42434: qa: TOO_FEW_PGS in mimic during upgrade suite tests Resolved
Related to mgr - Bug #44592: mgr/dashboard: ceph-api-nightly-nautilus-backend test failure Resolved
Copied to fs - Backport #44668: nautilus: mgr/dashboard: backend API test failure "test_access_permissions" Resolved

History

#1 Updated by Laura Paduano 12 months ago

  • Description updated (diff)

#2 Updated by Laura Paduano 12 months ago

  • Description updated (diff)

#3 Updated by Stephan Müller 12 months ago

  • Status changed from New to In Progress
  • Assignee set to Stephan Müller

#4 Updated by Stephan Müller 12 months ago

I tested it on an older compiled cluster and it worked... As on newer builds it fails I assume it's a code change somewhere else that makes this test fail :(

#5 Updated by Stephan Müller 12 months ago

On my new build I get this error, so the change that broke the test comes from outside the dashboard.

#6 Updated by Stephan Müller 12 months ago

I found out that the cluster somehow gets into an unhealthy state, which causes the problem.

  cluster:
    id:     50cd2934-64df-4f46-b868-154e688e6e42
    health: HEALTH_WARN
            2 pool(s) have non-power-of-two pg_num

  services:
    mon: 3 daemons, quorum a,b,c (age 26m)
    mgr: z(active, since 3m), standbys: x, y
    mds: cephfs:1 {0=a=up:active(laggy or crashed)}
    osd: 4 osds: 4 up (since 24m), 4 in (since 24m)
    rgw: 1 daemon active (8000)

  task status:

  data:
    pools:   6 pools, 50 pgs
    objects: 223 objects, 6.0 KiB
    usage:   8.0 GiB used, 4.0 TiB / 4.0 TiB avail
    pgs:     50 active+clean

The new power of two warning was merged 23.09.2019 -> https://github.com/ceph/ceph/pull/30525

#7 Updated by Stephan Müller 12 months ago

Directly after vstart_runner.py is executed the cluster seems to be fine

  cluster:
    id:     760fd545-32e4-43d3-b7e5-c7588811e4c8
    health: HEALTH_OK

  services:
    mon: 3 daemons, quorum a,b,c (age 2m)
    mgr: x(active, since 2m), standbys: y, z
    osd: 4 osds: 4 up (since 16s), 4 in (since 16s)
    rgw: 1 daemon active (8000)

  task status:

  data:
    pools:   4 pools, 32 pgs
    objects: 12 objects, 1.2 KiB
    usage:   8.0 GiB used, 4.0 TiB / 4.0 TiB avail
    pgs:     32 active+clean

#8 Updated by Stephan Müller 12 months ago

The problem seems to be

    CEPHFS = True

In our helper.py it will be used to create a cephfs, but exactly that makes the cluster unhealthy.

If that line is commented out (and all the lines that destroy potential cephfs) no id will be found.

#9 Updated by Stephan Müller 12 months ago

Another reason could be https://github.com/ceph/ceph/pull/30463 as it introduced a new method one could use to mount CephFs.

#10 Updated by Patrick Donnelly 12 months ago

  • Project changed from mgr to fs
  • Category deleted (dashboard/qa)
  • Assignee changed from Stephan Müller to Patrick Donnelly
  • Target version set to v15.0.0
  • Start date deleted (10/08/2019)
  • Component(FS) qa-suite added

The value of mon_pg_warn_min_per_osd is used for selecting the number of PGs. For vstart clusters, its value is 3. That's used here:

https://github.com/ceph/ceph/blob/e373be849360977a739c803ba2bab60e346d3d24/qa/tasks/cephfs/filesystem.py#L525-L527

I was able to reproduce the issue with vstart_runner and a cephfs test. I'm going to move this ticket to the cephfs project and write up a patch.

#11 Updated by Patrick Donnelly 12 months ago

  • Status changed from In Progress to Fix Under Review
  • Pull request ID set to 30816

#12 Updated by Patrick Donnelly 11 months ago

  • Status changed from Fix Under Review to Resolved

#13 Updated by Patrick Donnelly 11 months ago

  • Related to Bug #42434: qa: TOO_FEW_PGS in mimic during upgrade suite tests added

#14 Updated by Laura Paduano 6 months ago

  • Related to Bug #44592: mgr/dashboard: ceph-api-nightly-nautilus-backend test failure added

#15 Updated by Laura Paduano 6 months ago

  • Backport set to nautilus

#16 Updated by Laura Paduano 6 months ago

  • Status changed from Resolved to Pending Backport

#17 Updated by Laura Paduano 6 months ago

  • Copied to Backport #44668: nautilus: mgr/dashboard: backend API test failure "test_access_permissions" added

#18 Updated by Lenz Grimmer 5 months ago

  • Status changed from Pending Backport to Resolved

Also available in: Atom PDF