Feature #64375
Updated by Samuel Just 3 months ago
TLDR: let's drop this until 2025. GCC's support is too immature and it's measurably slower. Chaining lambdas as continuations to futures is relatively error prone as the developer must reason carefully about captured reference lifetimes. C++ coroutines should help with this as a developer can co_await a future without losing the currently scoped variables. Seastar already has support for using seastar::future's with C++ coroutines. https://github.com/athanatos/ceph/tree/sjust/backburner/sjust/wip-crimson-coroutines adds support to crimson's future wrappers -- including static checking of errorated futures and support for checking interruptions upon resume for interruptible futures. However, I suggest delaying this change until at least 2025 for two reasons: 1. The above branch only converts a small portions of the critical IO path in ClientRequest, but in the process I hit three serious code generation bugs with gcc 11.4.1. One seems to be fixed in 12.2.1, the other two seem to be fixed in 13.2.1. Still, debugging them was a significant time sink and hitting so many in so little code strongly suggests that gcc's implementation simply isn't reliable yet. 2. With workarounds, I was able to do some performance testing with and without those changes. It seems the the coroutine version is measurably slower (~2%) despite having only converted relatively small portion code. There are likely improvements to be made, but I do not judge them worth pursuing at this time given the above compiler issues. GCC Bugs: - https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98401 The specific symptom I observed is the pg param being destructed multiple times resulting in the refcount going rapidly to 0 destroying the PG prematurely. - https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102217 This one appears to cause the generated code to double-free the awaiter holding the future. This one seems to be fixed in gcc 13.2.1. - https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101244 This example isn't precisely as described in the bug, but it seems similar. It causes the generated code to incorrectly execute process_pg_op unconditionally before the predicate. It seems to be fixed in gcc 12.2.1. Perf result summary: The below results are 4 concurrent fio rbd writers each with 128 qd (well past saturation). Results are close enough that it's probable that further work could close the gap, but given the immaturity of the compiler support, it's not worth doing right now. Results aren't meaningfully different with gcc 13. gcc version 13.2.1 20231205 (Red Hat 13.2.1-6) (GCC) both branches have the following applied to src/seastar/CMakeLists.txt to allow building with gcc 13: <pre> -if (CMAKE_CXX_COMPILER_ID STREQUAL "GNU") - if (NOT Cxx_Compiler_BZ107852_Free AND CMAKE_CXX_COMPILER_VERSION VERSION_GREATER 13) - include (CheckGcc107852) - list (APPEND Seastar_PRIVATE_CXX_FLAGS - -Wno-error=stringop-overflow - -Wno-error=array-bounds) - endif() -endif () - </pre> sjust/wip-crimson-coroutines 4c5681c099752b5e36e3a67859b59362343c1ed0 Block_size 4096 4096 4096 Time 100 100 100 Core 2 4 8 Tool Fio RBD Fio RBD Fio RBD Version 2024029 2024029 2024029 OPtype Rand Write Rand Write Rand Write CommitID 4c5681c0 4c5681c0 4c5681c0 OSD Crimson Crimson Crimson Store Bluestore Bluestore Bluestore Thread_num 128 128 128 Client_num 4 4 4 Latency(ms) 56.6480625 23.11821 9.99745 Bandwidth(MB/s) 37.0182656 90.7411251199 157.70769408 IOPS 9037.66 22153.61 38502.87 sjust/wip-crimson-coroutines-base e160811c5fec46717a117ac02b6b29609f067233 Block_size 4096 4096 4096 Time 100 100 100 Core 2 4 8 Tool Fio RBD Fio RBD Fio RBD Version 2024029 2024029 2024029 OPtype Rand Write Rand Write Rand Write CommitID e160811c e160811c e160811c OSD Crimson Crimson Crimson Store Bluestore Bluestore Bluestore Thread_num 128 128 128 Client_num 4 4 4 Latency(ms) 55.26808 22.9293124999 12.9919900000 Bandwidth(MB/s) 37.94628608 91.4753433600 161.4054912 IOPS 9264.22 22332.86 39405.64 gcc version 11.4.1 20230605 (Red Hat 11.4.1-2) (GCC) About a 1/38 throughput hit No changes, CMakeLists change above not applied. sjust/wip-crimson-coroutines 4c5681c099752b5e36e3a67859b59362343c1ed0 Block_size 4096 4096 4096 Time 100 100 100 Core 2 4 8 Tool Fio RBD Fio RBD Fio RBD Version 2024029 2024029 2024029 OPtype Rand Write Rand Write Rand Write CommitID 4c5681c0 4c5681c0 4c5681c0 OSD Crimson Crimson Crimson Store Bluestore Bluestore Bluestore Thread_num 128 128 128 Client_num 4 4 4 Latency(ms) 57.35153 23.1843599999 9.9319775 Bandwidth(MB/s) 36.57537536 90.4843059200 158.11516416 IOPS 8929.54 22090.89 38602.35 Block_size 4096 4096 4096 Time 100 100 100 Core 2 4 8 Tool Fio RBD Fio RBD Fio RBD Version 2024029 2024029 2024029 OPtype Rand Write Rand Write Rand Write CommitID 4c5681c0 4c5681c0 4c5681c0 OSD Crimson Crimson Crimson Store Bluestore Bluestore Bluestore Thread_num 128 128 128 Client_num 4 4 4 Latency(ms) 57.1692875 23.2074850000 13.169925 Bandwidth(MB/s) 36.6791475199 90.39757312 159.24107264 IOPS 8954.87 22069.7 38877.21 Block_size 4096 4096 4096 Time 100 100 100 Core 2 4 8 Tool Fio RBD Fio RBD Fio RBD Version 2024029 2024029 2024029 OPtype Rand Write Rand Write Rand Write CommitID 4c5681c0 4c5681c0 4c5681c0 OSD Crimson Crimson Crimson Store Bluestore Bluestore Bluestore Thread_num 128 128 128 Client_num 4 4 4 Latency(ms) 56.8740425 23.188265 13.3796974999 Bandwidth(MB/s) 36.87575552 90.49010176 156.84764672 IOPS 9002.89 22092.32 38292.88 sjust/wip-crimson-coroutines-base e160811c5fec46717a117ac02b6b29609f067233 Block_size 4096 4096 4096 Time 100 100 100 Core 2 4 8 Tool Fio RBD Fio RBD Fio RBD Version 2024029 2024029 2024029 OPtype Rand Write Rand Write Rand Write CommitID e160811c e160811c e160811c OSD Crimson Crimson Crimson Store Bluestore Bluestore Bluestore Thread_num 128 128 128 Client_num 4 4 4 Latency(ms) 55.7295474999 22.8178975000 12.99293 Bandwidth(MB/s) 37.63249152 91.93051136 161.46226176 IOPS 9187.62 22443.99 39419.51 Block_size 4096 4096 4096 Time 100 100 100 Core 2 4 8 Tool Fio RBD Fio RBD Fio RBD Version 2024029 2024029 2024029 OPtype Rand Write Rand Write Rand Write CommitID e160811c e160811c e160811c OSD Crimson Crimson Crimson Store Bluestore Bluestore Bluestore Thread_num 128 128 128 Client_num 4 4 4 Latency(ms) 55.38967 23.1957575000 12.9069425 Bandwidth(MB/s) 37.8617856000 90.47964672 162.6132992 IOPS 9243.6 22089.77 39700.53