Project

General

Profile

Cephfs encryption support

Summary

Implement encryption support for Cephfs. The encryption will be file level, and the algorithm is as below,
What is the advantages of this approach?
(1) The first should be its simplicity. It is almost OSD and MDS independent. The code are basically at the client side, and self-contained.
(2) It does client side encryption, the advantages are,
(1) The encrypted data are related to user's key.
(2) The data path for plain text is short, only in page cache.
(3) OSDs are intended to do io intensive job, we donot wanna bother them to do cpu intensive job, thus we can use cheap and low power machines.
(3) The encryption is transparent for application.

Why not use eCryptfs?
While the design inspired by eCryptfs, directly using eCryptfs has many limitations, mainly due to the stacked filesystem design. Linux VFS has no special support for stacked filesystem. The lower file system is never aware of the existence of upper file system, namely, eCryptfs. That will cause many problems from synchronization and consistency, especially for network and distributed file system. That is why eCryptfs can not work well on nfs, cifs, gfs etc. Basically, the problem is that even lower file system has done the synchronization, it will not notify eCryptfs. This also happens when you manipulate directly on the lower filesystem. For Ceph, it has already automatically synchronized the metadata while multiple clients operate the same file, using eCryptfs, the application could not see the synchronization, since eCryptfs has its own metadata cache, which is not synchronized. The second is that eCryptfs maintains its own page cache, which typically results in double-caching, consume much more memory.

Why not use dmcrypt on local file system for OSD?
The encryption is independent to user's key. The data path of encrypted data is very short, only on disk. The encryption is coarse granunity.

Owners

  • Li Wang (ubuntukylin)
  • Name (Affiliation)
  • Name

Interested Parties

  • Anip Patel (Arizona State University)
  • Name (Affiliation)
  • Name

Current Status

With original design

Detailed Description

1 When user mount a ceph directory, he can specify a passphrase and the encryption algorithm and length of key etc as mount options. The passphrase will be hashed several times, call it TOKEN. Token as well as other information will be buffered in the superblock. TOKEN is hashed several times to get HTOKEN, HTOKEN is stored as extend attribute of the current root directory.
2 When user try to mount an encrypted directory, a passphrase is required to given, then it is hashed to get TOKEN' and HTOKEN', if HTOKEN' != HTOKEN, warn the user that the key is not used before, if the user reply to proceed, then TOKEN' is buffered in the superblock, as TOKEN.
3 When a file is created, a random key (FEK, file encryption key) is generated, and this key is encrypted by TOKEN, we get EFEK (encrypted FEK). the EFEK and other encryption related information inherited from the superblock, as well as a hashed FEK HFEK are stored in the extend attribute of file.
4 When a file is opened, retrieve the extend attribute, we get EFEK, use TOKEN to decrypt EFEK, get FEK, Hash FEK to get HFEK', compare with HFEK, if not equal, returns EIO, otherwise buffer FEK in the inode.
5 When a file is read in readpage()/readpages(), the encrypted pages are decrypted transparently by using FEK, and the plain data are sent to application
6 When a file is written in writepage()/writepages(), the pages are encrypted transparently by using FEK, and then written to OSDs.

Some points,

(1) What about if no page cache?
Both from consideration of performance and simplicity, we perfer use page cache. If no page cache, we have two choices, with encryption enabled, the same file is not allowed by opened by the second writer, alternatively, we enforce O_LAZYIO on the file, but application is supposed to be aware of this.
(2) What about file opened with O_DIRECT
Return ENOSUPPORT if it is encrypted.
(3) If the user mount the root directory and then try to navigate down into an encrypted directory, when does he get asked for the passphrase?
We think the duty of encryption is to protect the confidence of the content of files, that is, user cannot view the plain text if without valid password. But not to prevent user from navigating into the directory, even damage and delete the file, those are something should be done by access control (acl, selinux etc). So, our design for the initial version of encryption is that we donot care this, user could navigate into the encyrpted directory, so what, he will see all encrypted text, and the encrypted file name. This is also what eCryptfs is doing, user could read/write/delete the encrypted file directly from the lower file system, provided the access control allows him to do so.
(4) Can user mount the same directory with another passphrase?
As described above, the user will be warned that the key not used before. However, if the user insist on proceeding, it will mount. Then for the files encrypted by the old key, will return EIO. For newly created files, will be encrypted with the new key.
(5) Does the directory need be empty before being encrypted?
No. The original file will remain unecrypted, unless it is copied to a new file, with the original deleted, then rename the new file back.
(6) Can a parent or child directory of a encrypted directory be mounted as another passphrase?
The algorithm is, if a direcotory is not encrypted, then its name will remain unencrypted, unless 'cp -r' to generate a new directory. If a direcotory is encrypted, it will remain encrypted by the original key. So it only denpend on the direcotry is encrypted or not before.

Work items

Coding tasks

  1. Task 1
  2. Task 2
  3. Task 3

Build / release tasks

  1. Task 1
  2. Task 2
  3. Task 3

Documentation tasks

  1. Task 1
  2. Task 2
  3. Task 3

Deprecation tasks

  1. Task 1
  2. Task 2
  3. Task 3