Skip to content
  • Fabian Meumertzheim's avatar
    Build Turbine native image with PGO · 5f43fd34
    Fabian Meumertzheim authored
    The Turbine native image is now optionally built with Oracle GraalVM, making use of its profile-guided optimization capability. A script generates a PGO profile by running Turbine on a representative Java target (currently `skyframe_cluster`).
    
    On Linux x86_64, this improves the overall wall time of the benchmark by ~10%:
    ```
    ===== Benchmarking prebuilt Turbine =====
    Benchmark 1: build_target
      Time (mean ± σ):     51.813 s ±  0.900 s    [User: 0.017 s, System: 0.008 s]
      Range (min … max):   50.795 s … 52.931 s    5 runs
    
    ===== Benchmarking Turbine built from HEAD =====
    Benchmark 1: build_target
      Time (mean ± σ):     45.898 s ±  0.683 s    [User: 0.012 s, System: 0.012 s]
      Range (min … max):   45.063 s … 46.776 s    5 runs
    ```
    
    On macOS with an M3 Max, performance remains mostly unchanged (though without PGO, the binary built from HEAD is slightly slower than the prebuilt one, so in total there could be a somewhat more noticeably improvement):
    ```
    ===== Benchmarking prebuilt Turbine =====
    Benchmark 1: build_target
      Time (mean ± σ):     16.928 s ±  0.167 s    [User: 0.009 s, System: 0.010 s]
      Range (min … max):   16.688 s … 17.080 s    5 runs
    
    ===== Benchmarking Turbine built from HEAD =====
    Benchmark 1: build_target
      Time (mean ± σ):     16.767 s ±  0.161 s    [User: 0.009 s, System: 0.011 s]
      Range (min … max):   16.571 s … 16.994 s    5 runs
    ```
    
    With `--jobs=4`, the M3 Max reproduces a similar 10% improvement in build wall time, indicating that at a higher level of parallelism Turbine performance doesn't really have an impact on wall time anymore.
    
    When manually building `//src/main/java/com/google/devtools/build/lib/cmdline` with Turbine with and without PGO, the M3 Max shows a significant improvement:
    
    ```
    Regular java_binary:
      Time (mean ± σ):     317.1 ms ±   5.3 ms    [User: 939.3 ms, System: 202.3 ms]
      Range (min … max):   305.7 ms … 341.8 ms    40 runs
    GraalVM CE:
      Time (mean ± σ):      61.7 ms ±   0.9 ms    [User: 49.0 ms, System: 9.8 ms]
      Range (min … max):    60.4 ms …  65.6 ms    46 runs
    Oracle GraalVM, without PGO:
      Time (mean ± σ):      40.6 ms ±   0.7 ms    [User: 31.8 ms, System: 6.6 ms]
      Range (min … max):    39.2 ms …  42.9 ms    68 runs
    Oracle GraalVM, with PGO:
      Time (mean ± σ):      34.0 ms ±   0.8 ms    [User: 25.8 ms, System: 6.1 ms]
      Range (min … max):    32.4 ms …  37.6 ms    82 runs
    ```
    
    This commit also bundles a few fixes for the benchmark and Turbine build:
    * Updates `apple_support`, which provides the toolchain used to build Turbine on macOS.
    * Adds a workaround for https://github.com/bazelbuild/bazel/issues/20161 to the benchmark.
    * Updates the benchmark to align its Java toolchain's source/target versions with those of Bazel, which were bumped since the benchmark was added, thus breaking its toolchain selection.
    * Raise the number of warmup runs in the benchmark to reduce the risk of statistical outliers observed during runs on macOS.
    * Remove GraalVM flags already set by `rules_graalvm`.
    * Add `--strict-image-heap` to opt into stricter GraalVM behavior that will become the default in the future.
    
    Closes #22197.
    
    PiperOrigin-RevId: 741057464
    Change-Id: I47e826efe04a8aa502921dbed22be2edcf406ab9
    5f43fd34