Project

General

Profile

Bug #45252

cephadm: fail to insert modules when creating iSCSI targets

Added by Kiefer Chang 7 months ago. Updated 4 months ago.

Status:
Resolved
Priority:
High
Category:
cephadm
Target version:
% Done:

0%

Source:
Development
Tags:
Backport:
octopus
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature:

Description

How to reproduce:

  • Enable cephadm, create a pool and enable rbd application on it.
  • Create an iSCSI container with that pool. Setup user/password/trusted_ip accordingly.
  • Setup rbd-target-api's endpoint to Dashboard, e.g.:
    ceph dashboard iscsi-gateway-add http://<user>:<pass>@<ip>:<port> 
    
  • Go to Block/iSCSI/Targets page, create a target. Error in rbd-target-api log:
2020-04-24 05:15:51,513    ERROR [rbd-target-api:113:unhandled_exception()] - Unhandled Exception
Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/rtslib_fb/node.py", line 71, in _create_in_cfs_ine
    os.mkdir(self.path)
FileNotFoundError: [Errno 2] No such file or directory: '/sys/kernel/config/target/iscsi'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/rtslib_fb/fabric.py", line 156, in _check_self
    self._create_in_cfs_ine('any')
  File "/usr/lib/python3.6/site-packages/rtslib_fb/node.py", line 74, in _create_in_cfs_ine
    % self.__class__.__name__)
rtslib_fb.utils.RTSLibError: Could not create ISCSIFabricModule in configFS

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/rtslib_fb/utils.py", line 429, in modprobe
    kmod.Kmod().modprobe(module)
  File "kmod/kmod.pyx", line 106, in kmod.kmod.Kmod.modprobe
  File "kmod/kmod.pyx", line 82, in lookup
kmod.error.KmodError: Could not modprobe

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/flask/app.py", line 1612, in full_dispatch_request
    rv = self.dispatch_request()
  File "/usr/lib/python3.6/site-packages/flask/app.py", line 1598, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "/usr/bin/rbd-target-api", line 106, in decorated
    return f(*args, **kwargs)
  File "/usr/bin/rbd-target-api", line 300, in target
    target.manage('init')
  File "/usr/lib/python3.6/site-packages/ceph_iscsi_config/target.py", line 710, in manage
    'mutual_password_encryption_enabled'])
  File "/usr/lib/python3.6/site-packages/ceph_iscsi_config/discovery.py", line 14, in set_discovery_auth_lio
    iscsi_fabric.clear_discovery_auth_settings()
  File "/usr/lib/python3.6/site-packages/rtslib_fb/fabric.py", line 224, in clear_discovery_auth_settings
    self._check_self()
  File "/usr/lib/python3.6/site-packages/rtslib_fb/fabric.py", line 158, in _check_self
    modprobe(self.kernel_module)
  File "/usr/lib/python3.6/site-packages/rtslib_fb/utils.py", line 431, in modprobe
    raise RTSLibError("Could not load module: %s" % module)
rtslib_fb.utils.RTSLibError: Could not load module: iscsi_target_mod
2020-04-24 05:15:51,514     INFO [_internal.py:87:_log()] - ::ffff:192.168.121.1 - - [24/Apr/2020 05:15:51] "PUT /api/target/iqn.2001-07.com.ceph:1587705336635 HTTP/1.1" 500 -
Looks like lio module files are not bind-mounted inside the container.
Tested with these images:
  • quay.io/ceph-ci/ceph:master
  • docker.io/ceph/daemon-base latest-master-devel

insert_error.txt View (3.74 KB) Kiefer Chang, 05/19/2020 02:44 AM

History

#1 Updated by Ricardo Marques 7 months ago

  • Category set to cephadm

#2 Updated by Kiefer Chang 7 months ago

  • Description updated (diff)

#3 Updated by Kiefer Chang 7 months ago

  • Priority changed from Normal to High

#5 Updated by Matthew Oliver 7 months ago

Just spit balling. the container shares the host kernel, so we could also insmod the required kernel modules before the container has started. Ie add it to the unit.run script or something.

#6 Updated by Matthew Oliver 7 months ago

  • Status changed from New to In Progress
  • Assignee set to Matthew Oliver

OK so progress. I've tried preloading the kernel mod (iscsi-target-mod) and that works.

But the next error, and you can see a traceback to it too in the description, is access to write to the configfs.

Because it's a configfs I can't just go chmod the /sys/kernel/configfs/target. And I tried mounting with `-o uid=xxx,gid=xxx`. But gid and uid aren't valid options for configfs do just ignored.

The only thing that fixed this issue is making the container a privileged one :(

Once I did:

diff --git a/src/cephadm/cephadm b/src/cephadm/cephadm
index 54ca99701f..c0f97c1c0e 100755
--- a/src/cephadm/cephadm
+++ b/src/cephadm/cephadm
@@ -1649,6 +1649,9 @@ def get_container(fsid, daemon_type, daemon_id,
     elif daemon_type == CephIscsi.daemon_type:
         entrypoint = CephIscsi.entrypoint
         name = '%s.%s' % (daemon_type, daemon_id)
+        # So the container can modprobe iscsi_target_mod and have write perms
+        # to configfs we need to make this a privileged container.
+        privileged = True
     else:
         entrypoint = ''
         name = ''

Everything worked.. including not having to preload the kernel module. The ceph-isci script could do that itself.

I assume we want to limit our privileged containers, but not too sure on what else to try to get permissions to write to configfs. But happy to keep poking.

#7 Updated by Matthew Oliver 7 months ago

I've thrown the diff into a PR: https://github.com/ceph/ceph/pull/34898

But if we take this approach we should probably discuss possible security implications before merging anything or if there is any other approach first.

#8 Updated by Sebastian Wagner 7 months ago

  • Pull request ID set to 34898

#9 Updated by Sebastian Wagner 7 months ago

  • Status changed from In Progress to Pending Backport

#10 Updated by Kiefer Chang 6 months ago

Still seeing this after PR 34898 merged.
insert_error.txt contains more info

#11 Updated by Matthew Oliver 6 months ago

Hmm, that didn't happen on my test system. I might need to rebuild to check, I might have to reboot the host just in case.

Maybe we also need to mount /lib/modules/.. to the container too. Either that or preload the kernel module in the systemd unit start up script.

Anyone have any strong opinions over one then the other?

#12 Updated by Matthew Oliver 6 months ago

I've created a PR to bind mount /lib/modules RO: https://github.com/ceph/ceph/pull/35141

Once I have the PR applied and deploy an iscsi container:

cephadm enter iscsi.iscsi.ironic-moliver.dgqkba
[ceph: root@ironic-moliver /]# mount |grep modules
/dev/sda1 on /usr/lib/modules type ext4 (ro,relatime,data=ordered)
[ceph: root@ironic-moliver /]# ls /lib/modules/4.12.14-lp15
4.12.14-lp150.12.82-default/ 4.12.14-lp151.28.36-default/

#13 Updated by Sebastian Wagner 6 months ago

  • Status changed from Pending Backport to Resolved
  • Target version set to v15.2.4

#14 Updated by Sebastian Wagner 4 months ago

  • Target version changed from v15.2.4 to v15.2.5

Also available in: Atom PDF