Bug #42228
closedmgr/dashboard: backend API test failure "test_access_permissions"
0%
Description
I got this error on my local system (based on master) and it also failed on a PR test (https://jenkins.ceph.com/job/ceph-dashboard-pr-backend/30/console):
2019-10-08 11:05:57,510.510 INFO:__main__:Stopped test: test_access_permissions (tasks.mgr.dashboard.test_cephfs.CephfsTest) in 22.789247s
2019-10-08 11:05:57,510.510 INFO:__main__:
2019-10-08 11:05:57,510.510 INFO:__main__:======================================================================
2019-10-08 11:05:57,510.510 INFO:__main__:ERROR: test_access_permissions (tasks.mgr.dashboard.test_cephfs.CephfsTest)
2019-10-08 11:05:57,511.511 INFO:__main__:----------------------------------------------------------------------
2019-10-08 11:05:57,511.511 INFO:__main__:Traceback (most recent call last):
2019-10-08 11:05:57,511.511 INFO:__main__: File "/home/jenkins-build/build/workspace/ceph-dashboard-pr-backend/qa/tasks/mgr/dashboard/helper.py", line 154, in setUp
2019-10-08 11:05:57,511.511 INFO:__main__: self.wait_for_health_clear(20)
2019-10-08 11:05:57,511.511 INFO:__main__: File "/home/jenkins-build/build/workspace/ceph-dashboard-pr-backend/qa/tasks/ceph_test_case.py", line 131, in wait_for_health_clear
2019-10-08 11:05:57,511.511 INFO:__main__: self.wait_until_true(is_clear, timeout)
2019-10-08 11:05:57,511.511 INFO:__main__: File "/home/jenkins-build/build/workspace/ceph-dashboard-pr-backend/qa/tasks/ceph_test_case.py", line 163, in wait_until_true
2019-10-08 11:05:57,511.511 INFO:__main__: raise RuntimeError("Timed out after {0}s".format(elapsed))
2019-10-08 11:05:57,511.511 INFO:__main__:RuntimeError: Timed out after 20s
2019-10-08 11:05:57,511.511 INFO:__main__:
2019-10-08 11:05:57,512.512 INFO:__main__:----------------------------------------------------------------------
2019-10-08 11:05:57,512.512 INFO:__main__:Ran 15 tests in 254.204s
2019-10-08 11:05:57,512.512 INFO:__main__:
2019-10-08 11:05:57,512.512 INFO:__main__:FAILED (errors=1)
2019-10-08 11:05:57,512.512 INFO:__main__:
2019-10-08 11:05:57,512.512 INFO:__main__:======================================================================
2019-10-08 11:05:57,512.512 INFO:__main__:ERROR: test_access_permissions (tasks.mgr.dashboard.test_cephfs.CephfsTest)
2019-10-08 11:05:57,512.512 INFO:__main__:----------------------------------------------------------------------
2019-10-08 11:05:57,513.513 INFO:__main__:Traceback (most recent call last):
2019-10-08 11:05:57,513.513 INFO:__main__: File "/home/jenkins-build/build/workspace/ceph-dashboard-pr-backend/qa/tasks/mgr/dashboard/helper.py", line 154, in setUp
2019-10-08 11:05:57,513.513 INFO:__main__: self.wait_for_health_clear(20)
2019-10-08 11:05:57,513.513 INFO:__main__: File "/home/jenkins-build/build/workspace/ceph-dashboard-pr-backend/qa/tasks/ceph_test_case.py", line 131, in wait_for_health_clear
2019-10-08 11:05:57,513.513 INFO:__main__: self.wait_until_true(is_clear, timeout)
2019-10-08 11:05:57,513.513 INFO:__main__: File "/home/jenkins-build/build/workspace/ceph-dashboard-pr-backend/qa/tasks/ceph_test_case.py", line 163, in wait_until_true
2019-10-08 11:05:57,513.513 INFO:__main__: raise RuntimeError("Timed out after {0}s".format(elapsed))
2019-10-08 11:05:57,513.513 INFO:__main__:RuntimeError: Timed out after 20s
Updated by Stephan Müller over 4 years ago
- Status changed from New to In Progress
- Assignee set to Stephan Müller
Updated by Stephan Müller over 4 years ago
I tested it on an older compiled cluster and it worked... As on newer builds it fails I assume it's a code change somewhere else that makes this test fail :(
Updated by Stephan Müller over 4 years ago
On my new build I get this error, so the change that broke the test comes from outside the dashboard.
Updated by Stephan Müller over 4 years ago
I found out that the cluster somehow gets into an unhealthy state, which causes the problem.
cluster: id: 50cd2934-64df-4f46-b868-154e688e6e42 health: HEALTH_WARN 2 pool(s) have non-power-of-two pg_num services: mon: 3 daemons, quorum a,b,c (age 26m) mgr: z(active, since 3m), standbys: x, y mds: cephfs:1 {0=a=up:active(laggy or crashed)} osd: 4 osds: 4 up (since 24m), 4 in (since 24m) rgw: 1 daemon active (8000) task status: data: pools: 6 pools, 50 pgs objects: 223 objects, 6.0 KiB usage: 8.0 GiB used, 4.0 TiB / 4.0 TiB avail pgs: 50 active+clean
The new power of two warning was merged 23.09.2019 -> https://github.com/ceph/ceph/pull/30525
Updated by Stephan Müller over 4 years ago
Directly after vstart_runner.py is executed the cluster seems to be fine
cluster: id: 760fd545-32e4-43d3-b7e5-c7588811e4c8 health: HEALTH_OK services: mon: 3 daemons, quorum a,b,c (age 2m) mgr: x(active, since 2m), standbys: y, z osd: 4 osds: 4 up (since 16s), 4 in (since 16s) rgw: 1 daemon active (8000) task status: data: pools: 4 pools, 32 pgs objects: 12 objects, 1.2 KiB usage: 8.0 GiB used, 4.0 TiB / 4.0 TiB avail pgs: 32 active+clean
Updated by Stephan Müller over 4 years ago
The problem seems to be
CEPHFS = True
In our helper.py it will be used to create a cephfs, but exactly that makes the cluster unhealthy.
If that line is commented out (and all the lines that destroy potential cephfs) no id will be found.
Updated by Stephan Müller over 4 years ago
Another reason could be https://github.com/ceph/ceph/pull/30463 as it introduced a new method one could use to mount CephFs.
Updated by Patrick Donnelly over 4 years ago
- Project changed from mgr to CephFS
- Category deleted (
151) - Assignee changed from Stephan Müller to Patrick Donnelly
- Target version set to v15.0.0
- Start date deleted (
10/08/2019) - Component(FS) qa-suite added
The value of mon_pg_warn_min_per_osd is used for selecting the number of PGs. For vstart clusters, its value is 3. That's used here:
I was able to reproduce the issue with vstart_runner and a cephfs test. I'm going to move this ticket to the cephfs project and write up a patch.
Updated by Patrick Donnelly over 4 years ago
- Status changed from In Progress to Fix Under Review
- Pull request ID set to 30816
Updated by Patrick Donnelly over 4 years ago
- Status changed from Fix Under Review to Resolved
Updated by Patrick Donnelly over 4 years ago
- Related to Bug #42434: qa: TOO_FEW_PGS in mimic during upgrade suite tests added
Updated by Laura Paduano about 4 years ago
- Related to Bug #44592: mgr/dashboard: ceph-api-nightly-nautilus-backend test failure added
Updated by Laura Paduano about 4 years ago
- Status changed from Resolved to Pending Backport
Updated by Laura Paduano about 4 years ago
- Copied to Backport #44668: nautilus: mgr/dashboard: backend API test failure "test_access_permissions" added
Updated by Lenz Grimmer about 4 years ago
- Status changed from Pending Backport to Resolved