Project

General

Profile

Actions

Feature #44044

closed

qa: add network namespaces to kernel/ceph-fuse mounts for partition testing

Added by Patrick Donnelly about 4 years ago. Updated about 4 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
-
Target version:
% Done:

0%

Source:
Development
Tags:
Backport:
Reviewed:
Affected Versions:
Component(FS):
ceph-fuse, kceph, qa-suite
Labels (FS):
qa
Pull request ID:

Description

In teuthology, we want to shutdown the kernel mount without any kind of cleanup like sending SIGKILL to ceph-fuse. We end up doing this by putting the kernel client on a separate node and use impi to hard reset the machine. This is not optimal because we require a separate node for each kernel client.

It'd be better if we had a way to shutdown the cephfs mount without any kind of cleanup. This would allow us to have kernel clients all on the same node and selectively "kill" them.

Obviously, this shouldn't necessarily cause the unmount. Applications may still be using the mount (open fd) or their cwd is on the mount. All operations should return ESHUTDOWN (or similar). `umount [-f]` should work normally.


Related issues 1 (0 open1 closed)

Related to CephFS - Bug #47734: client: hang after statfsResolvedPatrick Donnelly

Actions
Actions #1

Updated by Xiubo Li about 4 years ago

  • Status changed from New to In Progress
Actions #2

Updated by Xiubo Li about 4 years ago

Will add one mount option "suspend=<on|off>" to suspend the specified mount point.

Currently the remount is not working, need to fix this first.

Actions #3

Updated by Xiubo Li about 4 years ago

From Jeff's idea and comments of the first version to fulfill the "halt" mount option, which will try to close all the monc/osdc/mdsc connections without doing any cleanup beforehand, but the socket close routine will send one FIN to the peer, so this couldn't be 100% simulate pulling cable or hard reset the node case.

Dig into the iptable/netfilter code, we can fulfill the iptable DROP rules in kceph directly if there is no any potential problems for this, but it will by pass the userspace iptable app.

Actions #4

Updated by Xiubo Li about 4 years ago

This is for ceph-fuse: https://github.com/ceph/ceph/pull/33576

This will use a separating network namespace to isolate the fuse client from the os, then we can just shutdown
the veth inferace of the network namespace container, with this it will just DROP all the socket packets from the cluster without any response.

This is just for fuse client in userspace, next will try this in kclient.

Actions #5

Updated by Xiubo Li about 4 years ago

For now both kernel and fuse are working the https://github.com/ceph/ceph/pull/33576.

# ./unshare_ns_mount.sh 

This will help to isolate the network namespace from OS for the mount client!

usage: unshare_ns_mount.sh [OPTIONS [paramters]] [--brxip <ip_address/mask>]
OPTIONS:
  --fuse    <ceph-fuse options>
    The ceph-fuse command options
    $ unshare_ns_mount.sh --fuse -m 192.168.0.1:6789 /mnt/cephfs -o nonempty

  --kernel  <mount options>
    The mount command options
    $ unshare_ns_mount.sh --kernel -t ceph 192.168.0.1:6789:/ /mnt/cephfs -o fs=a

  --suspend <mountpoint>
    Down the veth interface in the network namespace
    $ unshare_ns_mount.sh --suspend /mnt/cephfs

  --resume  <mountpoint>
    Up the veth interface in the network namespace
    $ unshare_ns_mount.sh --resume /mnt/cephfs

  --umount  <mountpoint>
    Umount and delete the network namespace
    $ unshare_ns_mount.sh --umount /mnt/cephfs

  --brxip   <ip_address/mask>
    Specify ip/mask for ceph-brx and it only makes sense for --fuse/--kernel options
    (default: 192.168.255.254/16, netns ip: 192.168.0.1/16 ~ 192.168.255.253/16)
    $ unshare_ns_mount.sh --fuse -m 192.168.0.1:6789 /mnt/cephfs --brxip 172.19.255.254/12
    $ unshare_ns_mount.sh --kernel 192.168.0.1:6789:/ /mnt/cephfs --brxip 172.19.255.254/12

  -h, --help
    Print help

Defaultly it will use the 192.168.X.Y/16 private network IPs for the ceph-brx and netnses as above. And you can also specify your own new ip/mask for the ceph-brx, like:

  $ unshare_ns_mount.sh --fuse /mnt/cephfs --brxip 172.19.100.100/12

Then the each netns will get a new ip from the ranges:

 [172.16.0.1 ~ 172.19.100.99]/12 and [172.19.100.101 ~ 172.31.255.254]/12

Actions #6

Updated by Patrick Donnelly about 4 years ago

  • Tracker changed from Bug to Feature
  • Project changed from Linux kernel client to CephFS
  • Subject changed from fs/ceph: add sysfs control file to hard shutdown mount to qa: add network namespaces to kernel/ceph-fuse mounts for partition testing
  • Target version set to v16.0.0
  • Source set to Development
  • Pull request ID set to 33576
  • Component(FS) ceph-fuse, kceph, qa-suite added
  • Labels (FS) qa added
Actions #7

Updated by Xiubo Li about 4 years ago

Have added the qa/ test case by transfering the bash code to python.

In some case could just s/mount_X.kill()/mount_X.suspend_netns()/ and s/mount_X.kill_cleanup() ...mount() again /mount_X.resume_netns()/

Actions #8

Updated by Xiubo Li about 4 years ago

  • Status changed from In Progress to Fix Under Review
Actions #9

Updated by Greg Farnum about 4 years ago

  • Status changed from Fix Under Review to Resolved

I'm marking this Resolved for now, but it wouldn't surprise me if we develop some tests on it we want to take back to Octopus so that may change.

Actions #10

Updated by Patrick Donnelly over 3 years ago

  • Related to Bug #47734: client: hang after statfs added
Actions

Also available in: Atom PDF