Bug #37489
openRGW does not use SHA-NI extensions if available, could reduce CPU usage by ~66%
0%
Description
The S3 protocol requires to calculate the SHA256 hash of the whole file for multi-part uploads, this is currently implemented with libnss.
The libnss SHA256 implementation does not support SHA-NI and is therefore somewhat slow (~100 MB/s), this leads to unnecessarily high CPU usage if large amounts of data are to be uploaded. I've profiled rgw and it spents around 70% of the CPU time in the SHA256 implementation in libnss.
(Note that this is completely independent from TLS which is not the bottleneck, any modern setup uses a fast AEAD cipher suite; it's just the content hash in the S3 protocol)
An implementation using SHA-NI requires only 2 cycles/byte (we got 1500 MB/s on a 3GHz CPU), i.e., we could probably reduce the CPU usage by about a factor of 3 for the same throughput (cut 70% SHA256 to ~5% plus same 30% for other stuff).
The proper fix would be to implement this in nss, but that would probably take about one eternity until it shows up in your distribution's libnss. A good C++ implementation of sha256 with the necessary optimizations can be found in Botan.
Related: what's the current state of CPU feature detection in Ceph? I can remember that there were some problems with ARM64 in the past.
Updated by Matt Benjamin over 5 years ago
The lack of acceleration in libnss has been called out in other areas as well. The TLS support in the new boost::asio HTTP front-end is via openssl.
Matt
Updated by Matt Benjamin over 5 years ago
Oh, Botan has a Blake2b implementation as well, among other things.
Matt
Updated by Paul Emmerich over 5 years ago
Botan seems really nice compared to libnss for crypto in C++
Updated by Casey Bodley over 5 years ago
- Status changed from New to 12
- Assignee set to Matt Benjamin