Bazel Caching Explained (pt. 4): Persistent Action Cache

Bazel Caching Explained (pt. 4): Persistent Action Cache

ยท

9 min read

Author's Note

I was on a short vacation last week so this blog post should arrive a bit later than usual. I expect the next blog post and this one to be released somewhat close to each others, but no promises.

I wanted to do some drawing for this blog post but was not able to come up with anything meaningful. Instead, there are a lot more code invovled so this is going to be a more technical blog post in this series.

Persistent Action Cache

Today we will dive into the technical details of Bazel's Persistent Action Cache.

As explained in Part 1 of this blog series, Action Cache is a store that helps Bazel de-duplicate action execution.

The more common form of Action Cache is often known to expert users via usages of --remote_cache= or --disk_cache= flags in Bazel, but that's not what we are talking about today.

Instead of diving into the optional remote cache of Bazel, we will be discussing something much more simpler: Bazel's Persistent Action Cache. This is something that is enabled by default by Bazel out-of-the-box, no flags, no configuration, no remote server needed.

But first, let's talk about how we are going to test and observe the effect of Bazel's Persistent Action Cache.

Test setup

I will be demonstrating this blog post with my favorite minimal Bazel example. This should be fairly simple to reproduce so I would encourage readers at home to try it out.

# Start our test
> mkdir test
> cd test

# Initialize an empty Bazel workspace
> touch WORKSPACE
> touch BUILD

This is my favorite setup to start exploring how Bazel is supposed to work at a fundamental level.

# Build everything inside the empty workspace
> bazel build //...
Starting local Bazel server and connecting to it...
INFO: Analyzed 0 targets (1 packages loaded, 0 targets configured).
INFO: Found 0 targets...
INFO: Elapsed time: 1.817s, Critical Path: 0.01s
INFO: 1 process: 1 internal.
INFO: Build completed successfully, 1 total action

Wait, this is an empty workspace... why was there 1 action?

Good question. To find out what happened, we need to leverage the Bazel's Profile which we introduced in Part 2.

> bazel clean

> bazel build //... --profile=p.json

> grep 'action processing' p.json | jq '.'
{
  "cat": "action processing",
  "name": "BazelWorkspaceStatusAction stable-status.txt",
  "ph": "X",
  "ts": 39639,
  "dur": 1915,
  "pid": 1,
  "tid": 667
}

It seems like the only action that was executed was Bazel's default workspace status, nothing strange. Bazel would execute default workspace status to generate a stable-status.txt file before every build.

> cat bazel-out/stable-status.txt
BUILD_EMBED_LABEL
BUILD_HOST GB-AABBCCDD
BUILD_USER sngoc

Next let's take a closer look at Bazel's output_base of this empty workspace.

> bazel info output_base
/private/var/tmp/_bazel_sngoc/a98bf8c6ee980636a51260e3b7396bf0

> ls -al $(bazel info output_base)
drwxr-xr-x    - sngoc 17 Aug 19:26 action_cache
.rw-r--r--   63 sngoc 17 Aug 19:28 command.log
.rw-r--r-- 3.5k sngoc 17 Aug 19:26 command.profile.gz
.rw-r--r--   18 sngoc 17 Aug 18:51 DO_NOT_BUILD_HERE
drwxr-xr-x    - sngoc 17 Aug 19:06 execroot
drwxr-xr-x    - sngoc 17 Aug 19:06 external
lrwxr-xr-x   62 sngoc 17 Aug 18:51 install -> /var/tmp/_bazel_sngoc/install/6b64cf61be23912f883b7ed0a05b85cf
lrwxr-xr-x   59 sngoc 17 Aug 18:51 java.log -> java.log.gb-jk4pfwkwgq.sngoc.log.java.20220817-185152.14700
.rw-r--r-- 296k sngoc 17 Aug 19:28 java.log.gb-jk4pfwkwgq.sngoc.log.java.20220817-185152.14700
.rw-r--r--  523 sngoc 17 Aug 19:28 javalog.properties
.rw-r--r--   63 sngoc 17 Aug 19:28 lock
.rw-r--r--  772 sngoc 17 Aug 18:51 README
drwx------    - sngoc 17 Aug 18:51 server

> ls -al $(bazel info output_base)/action_cache
.rw-r--r-- 146 sngoc 17 Aug 19:26 action_cache_v13.blaze
.rw-r--r--  53 sngoc 17 Aug 19:06 filename_index_v13.blaze

This is where the default local Persistent Action Cache lives. Unfortunately, these files are binary contents. It does not make a lot of sense for us just to cat them out into the terminal.

Instead, Bazel provides a helpful bazel dump --action_cache command that would help us read the current content of the Persistent Action Cache

> bazel dump --action_cache
Warning: this information is intended for consumption by developers
only, and may change at any time. Script against it at your own risk!

String indexer content:

size = 1
bazel-out/stable-status.txt <==> 0

Action cache (2 records):

0, bazel-out/stable-status.txt:
      actionKey = 709e80c88487a2411e1ee4dfb9f22a861492d20c4765150c0c794abd70f8147c
      usedClientEnvKey =
      digestKey = 0c6b6bbb3ae9aba0c06d15f02eeb52022ef09fc7f1638894fb8e654728998b99

      packed_len = 106

Wait, it said 2 records but there was only 1

Yes, this is another footgun in Bazel. If we take a peek under the implementation of bazel dump --action_cache command in CompactPersistentActionCache.java, we can spot that it would always filter out Action Cache records with type VALIDATION_KEY.

  // file: src/main/java/com/google/devtools/build/lib/actions/cache/CompactPersistentActionCache.java

  // Key of the action cache record that holds information used to verify referential integrity
  // between action cache and string indexer. Must be < 0 to avoid conflict with real action
  // cache records.
  private static final int VALIDATION_KEY = -10;

  ...

    /** Dumps action cache content. */
  @Override
  public void dump(PrintStream out) {
    out.println("String indexer content:\n");
    out.println(indexer);
    out.println("Action cache (" + map.size() + " records):\n");
    for (Map.Entry<Integer, byte[]> entry : map.entrySet()) {
      if (entry.getKey() == VALIDATION_KEY) {
        continue;
      }
      String content;
      try {
        content = decode(indexer, entry.getValue()).toString();
      } catch (IOException e) {
        content = e + "\n";
      }
      out.println(
          entry.getKey()
              + ", "
              + indexer.getStringForIndex(entry.getKey())
              + ":\n"
              + content
              + "\n      packed_len = "
              + entry.getValue().length
              + "\n");
    }
  }

More Action and Implementation

Only having the default workspace status action makes our test setup quite uninteresting. So let's add in a basic action so that we have more room to play with.

# file: BUILD

genrule(
    name = "create_a",
    outs = ["a"],
    cmd = """
    touch $@
    """,
)

and re-build our workspace

> bazel build //...
INFO: Analyzed target //:create_a (5 packages loaded, 8 targets configured).
INFO: Found 1 target...
Target //:create_a up-to-date:
  bazel-bin/a
INFO: Elapsed time: 0.960s, Critical Path: 0.72s
INFO: 2 processes: 1 internal, 1 darwin-sandbox.
INFO: Build completed successfully, 2 total actions

> bazel dump --action_cache
Warning: this information is intended for consumption by developers
only, and may change at any time. Script against it at your own risk!

String indexer content:

size = 2
bazel-out/stable-status.txt <==> 0
bazel-out/darwin_arm64-fastbuild/bin/a <==> 1

Action cache (3 records):

0, bazel-out/stable-status.txt:
      actionKey = 709e80c88487a2411e1ee4dfb9f22a861492d20c4765150c0c794abd70f8147c
      usedClientEnvKey =
      digestKey = 0c6b6bbb3ae9aba0c06d15f02eeb52022ef09fc7f1638894fb8e654728998b99

      packed_len = 106

1, bazel-out/darwin_arm64-fastbuild/bin/a:
      actionKey = f4f9b6c49cb59aad2db6980439a3bc09a2ef12bb11040caee4d983bb403b75aa
      usedClientEnvKey = 0372ca79f9c9888bfd417db9ed0278cec2690a97de58e7b1c0ffd35f9d047a7d
      digestKey = a407267c91fa74d87cdb0c7bf6f9ce8f5b7000b3bc95ca06c30b85111ee84ac4

      packed_len = 138

Content

Wait I thought the Action Cache supposed to cache the action

This seems to be only hashes?

Yes. This is another footgun in Bazel.

Bazel's ActionCache implementation here, or to be more precise, the ActionCache java interface, is not meant to cache action results ๐Ÿ™ƒ. It's explained in the JavaDoc of the interface here:

// file: src/main/java/com/google/devtools/build/lib/actions/cache/ActionCache.java

/**
 * An interface defining a cache of already-executed Actions.
 *
 * <p>This class' naming is misleading; it doesn't cache the actual actions, but it stores a
 * fingerprint of the action state (ie. a hash of the input and output files on disk), so
 * we can tell if we need to rerun an action given the state of the file system.
 *
 * <p>Each action entry uses one of its output paths as a key (after conversion
 * to the string).
 */
public interface ActionCache {

So effectively, Bazel's Persistent Action Cache only tracks the current state of the input files and output files when the action was last executed. If the input files were updated, or if there are missing output files, Bazel will check these new states against the Persistent Action Cache and determine whether the action needs to be rerun or not.

By having Persistent Action Cache enabled by default, Bazel has gained similar features with make file changes detection.

Let's do a quick exercise to confirm how Action Cache works.

> cat BUILD
[
    [
        genrule(
            name = "create_{}_{}".format(name, i),
            outs = ["{}_{}.txt".format(name, i)],
            cmd = """
            echo {} > $@
            """.format(i),
        )
        for name in [
            "a",
            "b",
        ]
    ]
    for i in range(0, 1000)
]

Here is a quick way to create 2000 actions to create 2000 files in my workspace. With this, I intentionally create a really wide build graph that cannot be parallelized immediately. With such a build graph, we should be able to observe some significant impact on total build time on my M1 laptop.

# Wipe out previous build results
> bazel clean
INFO: Starting clean.

# Clean build
> bazel build //...
INFO: Analyzed 2000 targets (5 packages loaded, 2007 targets configured).
INFO: Found 2000 targets...
INFO: Elapsed time: 9.652s, Critical Path: 0.09s
INFO: 2001 processes: 1 internal, 2000 darwin-sandbox.
INFO: Build completed successfully, 2001 total actions

# Cached build
> bazel build //...
INFO: Analyzed 2000 targets (0 packages loaded, 0 targets configured).
INFO: Found 2000 targets...
INFO: Elapsed time: 0.531s, Critical Path: 0.00s
INFO: 1 process: 1 internal.
INFO: Build completed successfully, 1 total action

So a clean build takes roughly 9.7 seconds and a fully cached build takes 0.5 seconds.

Now let's invalidate half of the build graph by changing b to c.

> git diff
diff --git a/BUILD b/BUILD
index 7581584..0519e57 100644
--- a/BUILD
+++ b/BUILD
@@ -9,7 +9,7 @@
         )
         for name in [
             "a",
-            "b",
+            "c",
         ]
     ]
     for i in range(0, 1000)

# Rebuild incrementally
> bazel build //...
INFO: Analyzed 2000 targets (1 packages loaded, 2000 targets configured).
INFO: Found 2000 targets...
INFO: Elapsed time: 5.083s, Critical Path: 0.15s
INFO: 1001 processes: 1 internal, 1000 darwin-sandbox.
INFO: Build completed successfully, 1001 total actions

We can see that we are correctly executing 1000 actions as expected. This is because half of the build graph previously was creating files such as b_10.txt would now have to be re-executed to create files such as c_10.txt.

Now the question is: if we were to change the BUILD file back from c -> b, what would happen?

  • Zero actions get executed since the Persistent Action Cache has cached the b actions from the first run.

  • Or 1000 actions get executed because the Persistent Action Cache only cache latest actions

Peaking into the dump output of Persistent Action Cache, we can see that there are records of b kept in there.

> bazel dump --action_cache | grep '^bazel-out.*c_.*\.txt' | wc -l
    1000
> bazel dump --action_cache | grep '^bazel-out.*a_.*\.txt' | wc -l
    1000
> bazel dump --action_cache | grep '^bazel-out.*b_.*\.txt' | wc -l
    1000

And when we revert the change we made above, then we can see that Actions were not being re-executed.

> git diff
diff --git a/BUILD b/BUILD
index 0519e57..7581584 100644
--- a/BUILD
+++ b/BUILD
@@ -9,7 +9,7 @@
         )
         for name in [
             "a",
-            "c",
+            "b",
         ]
     ]
     for i in range(0, 1000)

> bazel build //...
INFO: Analyzed 2000 targets (1 packages loaded, 2000 targets configured).
INFO: Found 2000 targets...
INFO: Elapsed time: 0.482s, Critical Path: 0.01s
INFO: 1 process: 1 internal.
INFO: Build completed successfully, 1 total action

Summary

Bazel's Persistent Action Cache is boring, which is a good thing.

The implementation is relatively short and simple to read through in 1 coffee session. I would definitely recommend using Google's Code Search to browse through Bazel's source code so that you can get the references provided.

In my experience, Bazel is a complex piece of software. But like any other, it's made up of simple, sometimes cleverly made components that you could examine and learn from quite easily and quickly. It's Bazel source and test code where you would be able to get the most information out of.

Did you find this article valuable?

Support Son Luong Ngoc by becoming a sponsor. Any amount is appreciated!