Project

General

Profile

Actions

Bug #47423

closed

volume rm throws Permissioned denied error

Added by Kefu Chai over 3 years ago. Updated over 3 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
Target version:
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
Yes
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

2020-09-12T07:36:20.915 INFO:teuthology.orchestra.run.smithi152.stderr:mount.nfs: mounting 172.21.15.152:/ceph failed, reason given by server: No such file or directory
2020-09-12T07:36:20.916 DEBUG:teuthology.orchestra.run:got remote process result: 32
2020-09-12T07:36:20.917 INFO:teuthology.orchestra.run.smithi152:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph --log-early fs volume rm user_tes
t_fs --yes-i-really-mean-it
2020-09-12T07:36:21.259 INFO:teuthology.orchestra.run.smithi152.stderr:Error EPERM: Permission denied: 'user_test_fs'
2020-09-12T07:36:21.261 DEBUG:teuthology.orchestra.run:got remote process result: 1
2020-09-12T07:36:21.262 INFO:teuthology.orchestra.run.smithi152:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph --log-early log 'Ended test tasks.cephfs.test_nfs.TestNFS.test_cluster_set_reset_user_config'
2020-09-12T07:36:22.327 INFO:tasks.cephfs_test_runner:test_cluster_set_reset_user_config (tasks.cephfs.test_nfs.TestNFS) ... ERROR

/a/teuthology-2020-09-12_07:01:02-rados-master-distro-basic-smithi/5427875

Actions #1

Updated by Kefu Chai over 3 years ago

  • Project changed from RADOS to CephFS
  • Priority changed from Normal to Urgent
Actions #2

Updated by Kefu Chai over 3 years ago

  • Regression changed from No to Yes
Actions #3

Updated by Patrick Donnelly over 3 years ago

  • Status changed from New to Triaged
  • Assignee set to Varsha Rao
  • Target version set to v16.0.0
  • Source set to Q/A
  • Backport set to octopus
Actions #4

Updated by Kefu Chai over 3 years ago

i suspect that it is https://github.com/ceph/ceph/pull/32581 which broke `test_cluster_set_reset_user_config` in `tasks.cephfs.test_nfs.TestNFS)`

see https://pulpito.ceph.com/kchai-2020-09-12_13:41:38-rados-wip-kefu-testing-2020-09-12-2016-distro-basic-smithi

Actions #5

Updated by Varsha Rao over 3 years ago

Kefu Chai wrote:

i suspect that it is https://github.com/ceph/ceph/pull/32581 which broke `test_cluster_set_reset_user_config` in `tasks.cephfs.test_nfs.TestNFS)`

see https://pulpito.ceph.com/kchai-2020-09-12_13:41:38-rados-wip-kefu-testing-2020-09-12-2016-distro-basic-smithi

Kefu you are right. I am not sure why it breaks as ganesha can write to cephfs successfully. When we try to delete the fs, it fails with permission error.

2020-09-13T07:32:10.238 INFO:teuthology.orchestra.run.smithi184:> sudo mount -t nfs -o port=2049 172.21.15.184:/ceph /mnt
2020-09-13T07:32:10.597 INFO:teuthology.orchestra.run.smithi184:> sudo touch /mnt/test
2020-09-13T07:32:10.630 INFO:teuthology.orchestra.run.smithi184:> sudo sudo ls /mnt
2020-09-13T07:32:10.709 INFO:teuthology.orchestra.run.smithi184.stdout:test

http://qa-proxy.ceph.com/teuthology/teuthology-2020-09-13_07:01:02-rados-master-distro-basic-smithi/5429864/teuthology.log
Actions #6

Updated by Rishabh Dave over 3 years ago

From what I see on master in my local repo, this issue (getting Permissioned denied on volume rm) is not just limited to this testcase. The issue occurs when I run "volume rm" on a new cluster too.

Actions #7

Updated by Varsha Rao over 3 years ago

  • Subject changed from Test failure: test_cluster_set_reset_user_config (tasks.cephfs.test_nfs.TestNFS) to volume rm throws Permissioned denied error
  • Assignee changed from Varsha Rao to Rishabh Dave
Actions #8

Updated by Sebastian Wagner over 3 years ago

2020-09-14T14:04:52.962 INFO:teuthology.orchestra.run.smithi079:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph --log-early nfs cluster config reset test
2020-09-14T14:04:57.701 INFO:teuthology.orchestra.run.smithi079.stdout:NFS-Ganesha Config Reset Successfully
2020-09-14T14:04:57.724 INFO:teuthology.orchestra.run.smithi079:> sudo rados -p nfs-ganesha -N test ls
2020-09-14T14:04:57.777 INFO:teuthology.orchestra.run.smithi079.stdout:rec-0000000000000002:nfs.ganesha-test.smithi079
2020-09-14T14:04:57.778 INFO:teuthology.orchestra.run.smithi079.stdout:grace
2020-09-14T14:04:57.778 INFO:teuthology.orchestra.run.smithi079.stdout:rec-0000000000000004:nfs.ganesha-test.smithi079
2020-09-14T14:04:57.778 INFO:teuthology.orchestra.run.smithi079.stdout:rec-0000000000000006:nfs.ganesha-test.smithi079
2020-09-14T14:04:57.778 INFO:teuthology.orchestra.run.smithi079.stdout:conf-nfs.ganesha-test
2020-09-14T14:05:27.782 INFO:teuthology.orchestra.run.smithi079:> sudo mount -t nfs -o port=2049 172.21.15.79:/ceph /mnt
2020-09-14T14:05:27.971 INFO:teuthology.orchestra.run.smithi079.stderr:mount.nfs: Protocol not supported
2020-09-14T14:05:27.973 DEBUG:teuthology.orchestra.run:got remote process result: 32
2020-09-14T14:05:27.973 INFO:teuthology.orchestra.run.smithi079:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph --log-early fs volume rm user_test_fs --yes-i-really-mean-it
2020-09-14T14:05:28.317 INFO:teuthology.orchestra.run.smithi079.stderr:Error EPERM: Permission denied: 'user_test_fs'
2020-09-14T14:05:28.321 DEBUG:teuthology.orchestra.run:got remote process result: 1
2020-09-14T14:05:28.321 INFO:teuthology.orchestra.run.smithi079:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph --log-early log 'Ended test tasks.cephfs.test_nfs.TestNFS.test_cluster_set_reset_user_config'
2020-09-14T14:05:29.151 INFO:tasks.cephfs_test_runner:test_cluster_set_reset_user_config (tasks.cephfs.test_nfs.TestNFS) ... ERROR
2020-09-14T14:05:29.152 INFO:tasks.cephfs_test_runner:
2020-09-14T14:05:29.152 INFO:tasks.cephfs_test_runner:======================================================================
2020-09-14T14:05:29.152 INFO:tasks.cephfs_test_runner:ERROR: test_cluster_set_reset_user_config (tasks.cephfs.test_nfs.TestNFS)
2020-09-14T14:05:29.152 INFO:tasks.cephfs_test_runner:----------------------------------------------------------------------
2020-09-14T14:05:29.153 INFO:tasks.cephfs_test_runner:Traceback (most recent call last):
2020-09-14T14:05:29.153 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/git.ceph.com_ceph-c_wip-swagner3-testing-2020-09-14-1344/qa/tasks/cephfs/test_nfs.py", line 487, in test_cluster_set_reset_user_config
2020-09-14T14:05:29.153 INFO:tasks.cephfs_test_runner:    self._cmd('fs', 'volume', 'rm', fs_name, '--yes-i-really-mean-it')
2020-09-14T14:05:29.153 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/git.ceph.com_ceph-c_wip-swagner3-testing-2020-09-14-1344/qa/tasks/cephfs/test_nfs.py", line 16, in _cmd
2020-09-14T14:05:29.153 INFO:tasks.cephfs_test_runner:    return self.mgr_cluster.mon_manager.raw_cluster_cmd(*args)
2020-09-14T14:05:29.154 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/git.ceph.com_ceph-c_wip-swagner3-testing-2020-09-14-1344/qa/tasks/ceph_manager.py", line 1354, in raw_cluster_cmd
2020-09-14T14:05:29.154 INFO:tasks.cephfs_test_runner:    'stdout': StringIO()}).stdout.getvalue()
2020-09-14T14:05:29.154 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/git.ceph.com_ceph-c_wip-swagner3-testing-2020-09-14-1344/qa/tasks/ceph_manager.py", line 1347, in run_cluster_cmd
2020-09-14T14:05:29.154 INFO:tasks.cephfs_test_runner:    return self.controller.run(**kwargs)
2020-09-14T14:05:29.154 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/remote.py", line 215, in run
2020-09-14T14:05:29.155 INFO:tasks.cephfs_test_runner:    r = self._runner(client=self.ssh, name=self.shortname, **kwargs)
2020-09-14T14:05:29.155 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/run.py", line 446, in run
2020-09-14T14:05:29.155 INFO:tasks.cephfs_test_runner:    r.wait()
2020-09-14T14:05:29.155 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/run.py", line 160, in wait
2020-09-14T14:05:29.155 INFO:tasks.cephfs_test_runner:    self._raise_for_status()
2020-09-14T14:05:29.155 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/run.py", line 182, in _raise_for_status
2020-09-14T14:05:29.156 INFO:tasks.cephfs_test_runner:    node=self.hostname, label=self.label
2020-09-14T14:05:29.156 INFO:tasks.cephfs_test_runner:teuthology.exceptions.CommandFailedError: Command failed on smithi079 with status 1: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph --log-early fs volume rm user_test_fs --yes-i-really-mean-it'
2020-09-14T14:05:29.156 INFO:tasks.cephfs_test_runner:
2020-09-14T14:05:29.156 INFO:tasks.cephfs_test_runner:----------------------------------------------------------------------
2020-09-14T14:05:29.156 INFO:tasks.cephfs_test_runner:Ran 3 tests in 147.605s
2020-09-14T14:05:29.157 INFO:tasks.cephfs_test_runner:
2020-09-14T14:05:29.157 INFO:tasks.cephfs_test_runner:FAILED (errors=1)
2020-09-14T14:05:29.157 INFO:tasks.cephfs_test_runner:

https://pulpito.ceph.com/swagner-2020-09-14_13:20:29-rados:cephadm-wip-swagner3-testing-2020-09-14-1344-distro-basic-smithi/5434701/

Actions #9

Updated by Rishabh Dave over 3 years ago

  • Assignee changed from Rishabh Dave to Varsha Rao

Unlike volume rm, fs fail does not fail -

$ ./bin/ceph fs fail cephfs2
*** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
2020-09-14T20:38:33.306+0530 7f084e002700 -1 WARNING: all dangerous and experimental features are enabled.
2020-09-14T20:38:33.329+0530 7f084e002700 -1 WARNING: all dangerous and experimental features are enabled.
cephfs2 marked not joinable; MDS cannot join the cluster. All MDS ranks marked failed.
$ ./bin/ceph fs rm cephfs2 --yes-i-really-mean-it
*** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
2020-09-14T20:38:42.961+0530 7fbaf5f0a700 -1 WARNING: all dangerous and experimental features are enabled.
2020-09-14T20:38:42.986+0530 7fbaf5f0a700 -1 WARNING: all dangerous and experimental features are enabled.
$ ./bin/ceph fs volume rm a --yes-i-really-mean-it
*** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
2020-09-14T20:39:41.097+0530 7f98f50e2700 -1 WARNING: all dangerous and experimental features are enabled.
2020-09-14T20:39:41.125+0530 7f98f50e2700 -1 WARNING: all dangerous and experimental features are enabled.
Error EPERM: Permission denied: 'a'

volume rm too runs fs fail which, unlike running it manually, fails with -1/Permission denied which is returned by volume rm eventually. The issue is around mgr.x's auth caps -

$ ./bin/ceph auth get mgr.x
exported keyring for mgr.x
[mgr.x]
    key = AQCrgl9f/Tg0DBAApL7TchAhtteBQu4w9X42hg==
    caps mds = "allow *" 
    caps mon = "allow profile mgr" 
    caps osd = "allow *" 

I confirmed this by creating a client that had same caps and ran fs fail myself. fs fail failed -

$ ./bin/ceph fs fail a --name client.mgrx -k ceph.client.mgrx.keyring 
*** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
2020-09-14T20:21:42.623+0530 7f6846f0f700 -1 WARNING: all dangerous and experimental features are enabled.
2020-09-14T20:21:42.645+0530 7f6846f0f700 -1 WARNING: all dangerous and experimental features are enabled.
Error EPERM: Permission denied: 'a'
$ ./bin/ceph fs fail a
*** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
2020-09-14T20:22:17.419+0530 7f352801a700 -1 WARNING: all dangerous and experimental features are enabled.
2020-09-14T20:22:17.443+0530 7f352801a700 -1 WARNING: all dangerous and experimental features are enabled.
a marked not joinable; MDS cannot join the cluster. All MDS ranks marked failed.

I think a relatively minor patch should fix this issue -

./bin/ceph fs fail a --id mgrx -k ceph.client.mgrx.keyring
*** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
2020-09-14T20:49:15.163+0530 7fe4542f5700 -1 WARNING: all dangerous and experimental features are enabled.
2020-09-14T20:49:15.186+0530 7fe4542f5700 -1 WARNING: all dangerous and experimental features are enabled.
a marked not joinable; MDS cannot join the cluster. All MDS ranks marked failed.
$ cat ceph.client.mgrx.keyring 
[client.mgrx]
    key = AQBBh19fRU+1MhAAAfBVrEUbi2rDam451MkQ7g==
    caps mds = "allow *" 
    caps mon = "allow rw, allow profile mgr" 
    caps osd = "allow *" 

Actions #10

Updated by Rishabh Dave over 3 years ago

  • Assignee changed from Varsha Rao to Rishabh Dave
Actions #11

Updated by Rishabh Dave over 3 years ago

The issue with ticket assignee was because my page wasn't refreshed before hitting submit button.

Actions #12

Updated by Patrick Donnelly over 3 years ago

Rishabh Dave wrote:

Unlike volume rm, fs fail does not fail -

[...]

volume rm too runs fs fail which, unlike running it manually, fails with -1/Permission denied which is returned by volume rm eventually. The issue is around mgr.x's auth caps -

[...]

I confirmed this by creating a client that had same caps and ran fs fail myself. fs fail failed -
[...]

I think a relatively minor patch should fix this issue -
[...]

The mgr should already have that cap:

https://github.com/ceph/ceph/blob/9fcc49fae72c00a06aefd22786d9758792e69582/src/mon/MonCap.cc#L202

Must be something else.

Actions #13

Updated by Rishabh Dave over 3 years ago

  • Status changed from Triaged to In Progress
Actions #14

Updated by Rishabh Dave over 3 years ago

  • Pull request ID set to 37190
Actions #15

Updated by Rishabh Dave over 3 years ago

  • Status changed from In Progress to Fix Under Review
Actions #16

Updated by Patrick Donnelly over 3 years ago

  • Status changed from Fix Under Review to Resolved
  • Backport deleted (octopus)
Actions

Also available in: Atom PDF