Bug #42228
mgr/dashboard: backend API test failure "test_access_permissions"
0%
Description
I got this error on my local system (based on master) and it also failed on a PR test (https://jenkins.ceph.com/job/ceph-dashboard-pr-backend/30/console):
2019-10-08 11:05:57,510.510 INFO:__main__:Stopped test: test_access_permissions (tasks.mgr.dashboard.test_cephfs.CephfsTest) in 22.789247s
2019-10-08 11:05:57,510.510 INFO:__main__:
2019-10-08 11:05:57,510.510 INFO:__main__:======================================================================
2019-10-08 11:05:57,510.510 INFO:__main__:ERROR: test_access_permissions (tasks.mgr.dashboard.test_cephfs.CephfsTest)
2019-10-08 11:05:57,511.511 INFO:__main__:----------------------------------------------------------------------
2019-10-08 11:05:57,511.511 INFO:__main__:Traceback (most recent call last):
2019-10-08 11:05:57,511.511 INFO:__main__: File "/home/jenkins-build/build/workspace/ceph-dashboard-pr-backend/qa/tasks/mgr/dashboard/helper.py", line 154, in setUp
2019-10-08 11:05:57,511.511 INFO:__main__: self.wait_for_health_clear(20)
2019-10-08 11:05:57,511.511 INFO:__main__: File "/home/jenkins-build/build/workspace/ceph-dashboard-pr-backend/qa/tasks/ceph_test_case.py", line 131, in wait_for_health_clear
2019-10-08 11:05:57,511.511 INFO:__main__: self.wait_until_true(is_clear, timeout)
2019-10-08 11:05:57,511.511 INFO:__main__: File "/home/jenkins-build/build/workspace/ceph-dashboard-pr-backend/qa/tasks/ceph_test_case.py", line 163, in wait_until_true
2019-10-08 11:05:57,511.511 INFO:__main__: raise RuntimeError("Timed out after {0}s".format(elapsed))
2019-10-08 11:05:57,511.511 INFO:__main__:RuntimeError: Timed out after 20s
2019-10-08 11:05:57,511.511 INFO:__main__:
2019-10-08 11:05:57,512.512 INFO:__main__:----------------------------------------------------------------------
2019-10-08 11:05:57,512.512 INFO:__main__:Ran 15 tests in 254.204s
2019-10-08 11:05:57,512.512 INFO:__main__:
2019-10-08 11:05:57,512.512 INFO:__main__:FAILED (errors=1)
2019-10-08 11:05:57,512.512 INFO:__main__:
2019-10-08 11:05:57,512.512 INFO:__main__:======================================================================
2019-10-08 11:05:57,512.512 INFO:__main__:ERROR: test_access_permissions (tasks.mgr.dashboard.test_cephfs.CephfsTest)
2019-10-08 11:05:57,512.512 INFO:__main__:----------------------------------------------------------------------
2019-10-08 11:05:57,513.513 INFO:__main__:Traceback (most recent call last):
2019-10-08 11:05:57,513.513 INFO:__main__: File "/home/jenkins-build/build/workspace/ceph-dashboard-pr-backend/qa/tasks/mgr/dashboard/helper.py", line 154, in setUp
2019-10-08 11:05:57,513.513 INFO:__main__: self.wait_for_health_clear(20)
2019-10-08 11:05:57,513.513 INFO:__main__: File "/home/jenkins-build/build/workspace/ceph-dashboard-pr-backend/qa/tasks/ceph_test_case.py", line 131, in wait_for_health_clear
2019-10-08 11:05:57,513.513 INFO:__main__: self.wait_until_true(is_clear, timeout)
2019-10-08 11:05:57,513.513 INFO:__main__: File "/home/jenkins-build/build/workspace/ceph-dashboard-pr-backend/qa/tasks/ceph_test_case.py", line 163, in wait_until_true
2019-10-08 11:05:57,513.513 INFO:__main__: raise RuntimeError("Timed out after {0}s".format(elapsed))
2019-10-08 11:05:57,513.513 INFO:__main__:RuntimeError: Timed out after 20s
Related issues
History
#1 Updated by Laura Paduano over 4 years ago
- Description updated (diff)
#2 Updated by Laura Paduano over 4 years ago
- Description updated (diff)
#3 Updated by Stephan Müller over 4 years ago
- Status changed from New to In Progress
- Assignee set to Stephan Müller
#4 Updated by Stephan Müller over 4 years ago
I tested it on an older compiled cluster and it worked... As on newer builds it fails I assume it's a code change somewhere else that makes this test fail :(
#5 Updated by Stephan Müller over 4 years ago
On my new build I get this error, so the change that broke the test comes from outside the dashboard.
#6 Updated by Stephan Müller over 4 years ago
I found out that the cluster somehow gets into an unhealthy state, which causes the problem.
cluster: id: 50cd2934-64df-4f46-b868-154e688e6e42 health: HEALTH_WARN 2 pool(s) have non-power-of-two pg_num services: mon: 3 daemons, quorum a,b,c (age 26m) mgr: z(active, since 3m), standbys: x, y mds: cephfs:1 {0=a=up:active(laggy or crashed)} osd: 4 osds: 4 up (since 24m), 4 in (since 24m) rgw: 1 daemon active (8000) task status: data: pools: 6 pools, 50 pgs objects: 223 objects, 6.0 KiB usage: 8.0 GiB used, 4.0 TiB / 4.0 TiB avail pgs: 50 active+clean
The new power of two warning was merged 23.09.2019 -> https://github.com/ceph/ceph/pull/30525
#7 Updated by Stephan Müller over 4 years ago
Directly after vstart_runner.py is executed the cluster seems to be fine
cluster: id: 760fd545-32e4-43d3-b7e5-c7588811e4c8 health: HEALTH_OK services: mon: 3 daemons, quorum a,b,c (age 2m) mgr: x(active, since 2m), standbys: y, z osd: 4 osds: 4 up (since 16s), 4 in (since 16s) rgw: 1 daemon active (8000) task status: data: pools: 4 pools, 32 pgs objects: 12 objects, 1.2 KiB usage: 8.0 GiB used, 4.0 TiB / 4.0 TiB avail pgs: 32 active+clean
#8 Updated by Stephan Müller over 4 years ago
The problem seems to be
CEPHFS = True
In our helper.py it will be used to create a cephfs, but exactly that makes the cluster unhealthy.
If that line is commented out (and all the lines that destroy potential cephfs) no id will be found.
#9 Updated by Stephan Müller over 4 years ago
Another reason could be https://github.com/ceph/ceph/pull/30463 as it introduced a new method one could use to mount CephFs.
#10 Updated by Patrick Donnelly over 4 years ago
- Project changed from mgr to CephFS
- Category deleted (
151) - Assignee changed from Stephan Müller to Patrick Donnelly
- Target version set to v15.0.0
- Start date deleted (
10/08/2019) - Component(FS) qa-suite added
The value of mon_pg_warn_min_per_osd is used for selecting the number of PGs. For vstart clusters, its value is 3. That's used here:
I was able to reproduce the issue with vstart_runner and a cephfs test. I'm going to move this ticket to the cephfs project and write up a patch.
#11 Updated by Patrick Donnelly over 4 years ago
- Status changed from In Progress to Fix Under Review
- Pull request ID set to 30816
#12 Updated by Patrick Donnelly over 4 years ago
- Status changed from Fix Under Review to Resolved
#13 Updated by Patrick Donnelly over 4 years ago
- Related to Bug #42434: qa: TOO_FEW_PGS in mimic during upgrade suite tests added
#14 Updated by Laura Paduano about 4 years ago
- Related to Bug #44592: mgr/dashboard: ceph-api-nightly-nautilus-backend test failure added
#15 Updated by Laura Paduano about 4 years ago
- Backport set to nautilus
#16 Updated by Laura Paduano about 4 years ago
- Status changed from Resolved to Pending Backport
#17 Updated by Laura Paduano about 4 years ago
- Copied to Backport #44668: nautilus: mgr/dashboard: backend API test failure "test_access_permissions" added
#18 Updated by Lenz Grimmer almost 4 years ago
- Status changed from Pending Backport to Resolved