Bug #47423
closedvolume rm throws Permissioned denied error
0%
Description
2020-09-12T07:36:20.915 INFO:teuthology.orchestra.run.smithi152.stderr:mount.nfs: mounting 172.21.15.152:/ceph failed, reason given by server: No such file or directory 2020-09-12T07:36:20.916 DEBUG:teuthology.orchestra.run:got remote process result: 32 2020-09-12T07:36:20.917 INFO:teuthology.orchestra.run.smithi152:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph --log-early fs volume rm user_tes t_fs --yes-i-really-mean-it 2020-09-12T07:36:21.259 INFO:teuthology.orchestra.run.smithi152.stderr:Error EPERM: Permission denied: 'user_test_fs' 2020-09-12T07:36:21.261 DEBUG:teuthology.orchestra.run:got remote process result: 1 2020-09-12T07:36:21.262 INFO:teuthology.orchestra.run.smithi152:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph --log-early log 'Ended test tasks.cephfs.test_nfs.TestNFS.test_cluster_set_reset_user_config' 2020-09-12T07:36:22.327 INFO:tasks.cephfs_test_runner:test_cluster_set_reset_user_config (tasks.cephfs.test_nfs.TestNFS) ... ERROR
/a/teuthology-2020-09-12_07:01:02-rados-master-distro-basic-smithi/5427875
Updated by Kefu Chai over 3 years ago
- Project changed from RADOS to CephFS
- Priority changed from Normal to Urgent
Updated by Patrick Donnelly over 3 years ago
- Status changed from New to Triaged
- Assignee set to Varsha Rao
- Target version set to v16.0.0
- Source set to Q/A
- Backport set to octopus
Updated by Kefu Chai over 3 years ago
i suspect that it is https://github.com/ceph/ceph/pull/32581 which broke `test_cluster_set_reset_user_config` in `tasks.cephfs.test_nfs.TestNFS)`
Updated by Varsha Rao over 3 years ago
Kefu Chai wrote:
i suspect that it is https://github.com/ceph/ceph/pull/32581 which broke `test_cluster_set_reset_user_config` in `tasks.cephfs.test_nfs.TestNFS)`
Kefu you are right. I am not sure why it breaks as ganesha can write to cephfs successfully. When we try to delete the fs, it fails with permission error.
2020-09-13T07:32:10.238 INFO:teuthology.orchestra.run.smithi184:> sudo mount -t nfs -o port=2049 172.21.15.184:/ceph /mnt 2020-09-13T07:32:10.597 INFO:teuthology.orchestra.run.smithi184:> sudo touch /mnt/test 2020-09-13T07:32:10.630 INFO:teuthology.orchestra.run.smithi184:> sudo sudo ls /mnt 2020-09-13T07:32:10.709 INFO:teuthology.orchestra.run.smithi184.stdout:test
http://qa-proxy.ceph.com/teuthology/teuthology-2020-09-13_07:01:02-rados-master-distro-basic-smithi/5429864/teuthology.log
Updated by Rishabh Dave over 3 years ago
From what I see on master in my local repo, this issue (getting Permissioned denied
on volume rm
) is not just limited to this testcase. The issue occurs when I run "volume rm" on a new cluster too.
Updated by Varsha Rao over 3 years ago
- Subject changed from Test failure: test_cluster_set_reset_user_config (tasks.cephfs.test_nfs.TestNFS) to volume rm throws Permissioned denied error
- Assignee changed from Varsha Rao to Rishabh Dave
Updated by Sebastian Wagner over 3 years ago
2020-09-14T14:04:52.962 INFO:teuthology.orchestra.run.smithi079:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph --log-early nfs cluster config reset test 2020-09-14T14:04:57.701 INFO:teuthology.orchestra.run.smithi079.stdout:NFS-Ganesha Config Reset Successfully 2020-09-14T14:04:57.724 INFO:teuthology.orchestra.run.smithi079:> sudo rados -p nfs-ganesha -N test ls 2020-09-14T14:04:57.777 INFO:teuthology.orchestra.run.smithi079.stdout:rec-0000000000000002:nfs.ganesha-test.smithi079 2020-09-14T14:04:57.778 INFO:teuthology.orchestra.run.smithi079.stdout:grace 2020-09-14T14:04:57.778 INFO:teuthology.orchestra.run.smithi079.stdout:rec-0000000000000004:nfs.ganesha-test.smithi079 2020-09-14T14:04:57.778 INFO:teuthology.orchestra.run.smithi079.stdout:rec-0000000000000006:nfs.ganesha-test.smithi079 2020-09-14T14:04:57.778 INFO:teuthology.orchestra.run.smithi079.stdout:conf-nfs.ganesha-test 2020-09-14T14:05:27.782 INFO:teuthology.orchestra.run.smithi079:> sudo mount -t nfs -o port=2049 172.21.15.79:/ceph /mnt 2020-09-14T14:05:27.971 INFO:teuthology.orchestra.run.smithi079.stderr:mount.nfs: Protocol not supported 2020-09-14T14:05:27.973 DEBUG:teuthology.orchestra.run:got remote process result: 32 2020-09-14T14:05:27.973 INFO:teuthology.orchestra.run.smithi079:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph --log-early fs volume rm user_test_fs --yes-i-really-mean-it 2020-09-14T14:05:28.317 INFO:teuthology.orchestra.run.smithi079.stderr:Error EPERM: Permission denied: 'user_test_fs' 2020-09-14T14:05:28.321 DEBUG:teuthology.orchestra.run:got remote process result: 1 2020-09-14T14:05:28.321 INFO:teuthology.orchestra.run.smithi079:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph --log-early log 'Ended test tasks.cephfs.test_nfs.TestNFS.test_cluster_set_reset_user_config' 2020-09-14T14:05:29.151 INFO:tasks.cephfs_test_runner:test_cluster_set_reset_user_config (tasks.cephfs.test_nfs.TestNFS) ... ERROR 2020-09-14T14:05:29.152 INFO:tasks.cephfs_test_runner: 2020-09-14T14:05:29.152 INFO:tasks.cephfs_test_runner:====================================================================== 2020-09-14T14:05:29.152 INFO:tasks.cephfs_test_runner:ERROR: test_cluster_set_reset_user_config (tasks.cephfs.test_nfs.TestNFS) 2020-09-14T14:05:29.152 INFO:tasks.cephfs_test_runner:---------------------------------------------------------------------- 2020-09-14T14:05:29.153 INFO:tasks.cephfs_test_runner:Traceback (most recent call last): 2020-09-14T14:05:29.153 INFO:tasks.cephfs_test_runner: File "/home/teuthworker/src/git.ceph.com_ceph-c_wip-swagner3-testing-2020-09-14-1344/qa/tasks/cephfs/test_nfs.py", line 487, in test_cluster_set_reset_user_config 2020-09-14T14:05:29.153 INFO:tasks.cephfs_test_runner: self._cmd('fs', 'volume', 'rm', fs_name, '--yes-i-really-mean-it') 2020-09-14T14:05:29.153 INFO:tasks.cephfs_test_runner: File "/home/teuthworker/src/git.ceph.com_ceph-c_wip-swagner3-testing-2020-09-14-1344/qa/tasks/cephfs/test_nfs.py", line 16, in _cmd 2020-09-14T14:05:29.153 INFO:tasks.cephfs_test_runner: return self.mgr_cluster.mon_manager.raw_cluster_cmd(*args) 2020-09-14T14:05:29.154 INFO:tasks.cephfs_test_runner: File "/home/teuthworker/src/git.ceph.com_ceph-c_wip-swagner3-testing-2020-09-14-1344/qa/tasks/ceph_manager.py", line 1354, in raw_cluster_cmd 2020-09-14T14:05:29.154 INFO:tasks.cephfs_test_runner: 'stdout': StringIO()}).stdout.getvalue() 2020-09-14T14:05:29.154 INFO:tasks.cephfs_test_runner: File "/home/teuthworker/src/git.ceph.com_ceph-c_wip-swagner3-testing-2020-09-14-1344/qa/tasks/ceph_manager.py", line 1347, in run_cluster_cmd 2020-09-14T14:05:29.154 INFO:tasks.cephfs_test_runner: return self.controller.run(**kwargs) 2020-09-14T14:05:29.154 INFO:tasks.cephfs_test_runner: File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/remote.py", line 215, in run 2020-09-14T14:05:29.155 INFO:tasks.cephfs_test_runner: r = self._runner(client=self.ssh, name=self.shortname, **kwargs) 2020-09-14T14:05:29.155 INFO:tasks.cephfs_test_runner: File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/run.py", line 446, in run 2020-09-14T14:05:29.155 INFO:tasks.cephfs_test_runner: r.wait() 2020-09-14T14:05:29.155 INFO:tasks.cephfs_test_runner: File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/run.py", line 160, in wait 2020-09-14T14:05:29.155 INFO:tasks.cephfs_test_runner: self._raise_for_status() 2020-09-14T14:05:29.155 INFO:tasks.cephfs_test_runner: File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/run.py", line 182, in _raise_for_status 2020-09-14T14:05:29.156 INFO:tasks.cephfs_test_runner: node=self.hostname, label=self.label 2020-09-14T14:05:29.156 INFO:tasks.cephfs_test_runner:teuthology.exceptions.CommandFailedError: Command failed on smithi079 with status 1: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph --log-early fs volume rm user_test_fs --yes-i-really-mean-it' 2020-09-14T14:05:29.156 INFO:tasks.cephfs_test_runner: 2020-09-14T14:05:29.156 INFO:tasks.cephfs_test_runner:---------------------------------------------------------------------- 2020-09-14T14:05:29.156 INFO:tasks.cephfs_test_runner:Ran 3 tests in 147.605s 2020-09-14T14:05:29.157 INFO:tasks.cephfs_test_runner: 2020-09-14T14:05:29.157 INFO:tasks.cephfs_test_runner:FAILED (errors=1) 2020-09-14T14:05:29.157 INFO:tasks.cephfs_test_runner:
Updated by Rishabh Dave over 3 years ago
- Assignee changed from Rishabh Dave to Varsha Rao
Unlike volume rm
, fs fail
does not fail -
$ ./bin/ceph fs fail cephfs2 *** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH *** 2020-09-14T20:38:33.306+0530 7f084e002700 -1 WARNING: all dangerous and experimental features are enabled. 2020-09-14T20:38:33.329+0530 7f084e002700 -1 WARNING: all dangerous and experimental features are enabled. cephfs2 marked not joinable; MDS cannot join the cluster. All MDS ranks marked failed. $ ./bin/ceph fs rm cephfs2 --yes-i-really-mean-it *** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH *** 2020-09-14T20:38:42.961+0530 7fbaf5f0a700 -1 WARNING: all dangerous and experimental features are enabled. 2020-09-14T20:38:42.986+0530 7fbaf5f0a700 -1 WARNING: all dangerous and experimental features are enabled. $ ./bin/ceph fs volume rm a --yes-i-really-mean-it *** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH *** 2020-09-14T20:39:41.097+0530 7f98f50e2700 -1 WARNING: all dangerous and experimental features are enabled. 2020-09-14T20:39:41.125+0530 7f98f50e2700 -1 WARNING: all dangerous and experimental features are enabled. Error EPERM: Permission denied: 'a'
volume rm
too runs fs fail
which, unlike running it manually, fails with -1
/Permission denied
which is returned by volume rm
eventually. The issue is around mgr.x
's auth caps -
$ ./bin/ceph auth get mgr.x exported keyring for mgr.x [mgr.x] key = AQCrgl9f/Tg0DBAApL7TchAhtteBQu4w9X42hg== caps mds = "allow *" caps mon = "allow profile mgr" caps osd = "allow *"
I confirmed this by creating a client that had same caps and ran fs fail
myself. fs fail
failed -
$ ./bin/ceph fs fail a --name client.mgrx -k ceph.client.mgrx.keyring *** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH *** 2020-09-14T20:21:42.623+0530 7f6846f0f700 -1 WARNING: all dangerous and experimental features are enabled. 2020-09-14T20:21:42.645+0530 7f6846f0f700 -1 WARNING: all dangerous and experimental features are enabled. Error EPERM: Permission denied: 'a' $ ./bin/ceph fs fail a *** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH *** 2020-09-14T20:22:17.419+0530 7f352801a700 -1 WARNING: all dangerous and experimental features are enabled. 2020-09-14T20:22:17.443+0530 7f352801a700 -1 WARNING: all dangerous and experimental features are enabled. a marked not joinable; MDS cannot join the cluster. All MDS ranks marked failed.
I think a relatively minor patch should fix this issue -
./bin/ceph fs fail a --id mgrx -k ceph.client.mgrx.keyring *** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH *** 2020-09-14T20:49:15.163+0530 7fe4542f5700 -1 WARNING: all dangerous and experimental features are enabled. 2020-09-14T20:49:15.186+0530 7fe4542f5700 -1 WARNING: all dangerous and experimental features are enabled. a marked not joinable; MDS cannot join the cluster. All MDS ranks marked failed. $ cat ceph.client.mgrx.keyring [client.mgrx] key = AQBBh19fRU+1MhAAAfBVrEUbi2rDam451MkQ7g== caps mds = "allow *" caps mon = "allow rw, allow profile mgr" caps osd = "allow *"
Updated by Rishabh Dave over 3 years ago
- Assignee changed from Varsha Rao to Rishabh Dave
Updated by Rishabh Dave over 3 years ago
The issue with ticket assignee was because my page wasn't refreshed before hitting submit button.
Updated by Patrick Donnelly over 3 years ago
Rishabh Dave wrote:
Unlike
volume rm
,fs fail
does not fail -[...]
volume rm
too runsfs fail
which, unlike running it manually, fails with-1
/Permission denied
which is returned byvolume rm
eventually. The issue is aroundmgr.x
's auth caps -[...]
I confirmed this by creating a client that had same caps and ran
fs fail
myself.fs fail
failed -
[...]I think a relatively minor patch should fix this issue -
[...]
The mgr should already have that cap:
https://github.com/ceph/ceph/blob/9fcc49fae72c00a06aefd22786d9758792e69582/src/mon/MonCap.cc#L202
Must be something else.
Updated by Rishabh Dave over 3 years ago
- Status changed from Triaged to In Progress
Updated by Rishabh Dave over 3 years ago
- Status changed from In Progress to Fix Under Review
Updated by Patrick Donnelly over 3 years ago
- Status changed from Fix Under Review to Resolved
- Backport deleted (
octopus)