Implementation insights

In this document we provide some insights to the internal workings of the saker.java.compile() task. This is mostly for the curious who are interested in the inner mechanisms of the incremental compilation handling.

In a nutshell, the task works the following way for clean builds:

Determine the input source and class path files.
Parse the source files using the Java Compiler API.
If there are any annotation processors, instantiate them, and run the annotation processing manually.
- The processing is run completely without javac.
Ask javac to generate the bytecode class files based on the parsed source files.

If incremental compilation is being done, additional operations are included in the above workflow:

Determine the input source and class path files.
Check file changes, additions, and removals.
- If any class path file changed (e.g. JAR file modified), then we (maybe) fall back to full recompilation.
Parse the changed source files using the Java Compiler API.
Determine if there are any structural changes in the modified source files.
- If yes, then determine how each change affects the other source files.
- Add the affected files to the changed source file set, and go to 3.
If there are any annotation processors, run annotation processing.
- Any generated source files by the processors are parsed, and the changes are detected accordingly as previously.
Ask javac to generate the bytecode class files based on the parsed source files.

Communicating with `javac`

Thankfully the Java Compiler provides an API to invoke it programmatically. However, the API has some missing features, and we need to use private parts of the Java Compiler as well to implement our incremental compiler. We're not happy that it is necessary, but we aim to provide stability by testing the compatibility on each Java major release version.

When javac parses a source file, it returns a CompilationUnitTree representation of it. These trees can be freely examined by us, and based on it we construct a complete representation of the classes in the parsed Java source files.

After parsing the class signatures, we determine if there were any changes compared to the signatures from the previous compilation. Any detected changes will cause other dependent source files to be parsed until there are no more changes detected.

After we run the annotation processing, we ask javac to generate the bytecode for the parsed source files. Any compilation errors will be reported by javac in this phase. After the bytecode is generated, we examine the parsed compilation unit trees once again to determine the depencies for each generated class. This is the phase that mainly differentiates our solution from Gradle, as we retrieve the dependency data directly from javac, rather than from the generated bytecode files. The difference lies in that the in-memory compilation unit trees contain more information about the source files than the generated class files. This allows us to work with finer dependency representations.

Annotation processing

Incremental annotation processing was an unsolved problem until early 2018. It is not straight forward to implement this feature as processors are basically aribtrary plugins into the compilation process where they can do anything they like. Based on this, it is extremely hard to correctly determine the dependencies for an annotation processor, as there's very little opportunity for them to report the dependencies to the compiler.

However, (thankfully) the Filer API for the annotation processors was constructed in a way that allows reporting dependencies for generated resources. As far as we know annotation processor implementations haven't really bother to pass the originating elements for generated resources, as nobody really used them. This made implementing incremental processing even more harder, as the annotation processors didn't even use the APIs correctly.

Our main goal for incremental annotation processing was the following:

Support source retention annotations.
- As we've previously mentioned in Feature comparison, we believe that the tooling shouldn't impose restrictions on the codebase. If we didn't support annotations with source retention policy, then that would mean that annotations would needed to be part of the resulting class files even though it is not necessary for them to be present there.
Support partial and minimal annotation processing.
- If a processor doesn't generate resources based on some Java elements, then the changing of those elements shouldn't cause the processor to be reinvoked.
- The processors shouldn't need information about the whole compilation set in order to produce their results, but only the ones they are interested about.
Support arbitrary processor locations.
- As a processor author, you may be fimilar with the StandardLocation enumeration. In addition to those, we wanted to support arbitrary processor locations for input and output.
Parallel annotation processing
- The work of annotation processors can be cleanly separated. They should be run in a multi-threaded way so they can complete faster.

If you're reading this, you can probably guess that we succeeded with these goals. However, while implementing them we needed to put some restrictions on the annotation processors. The main one is that they must use the Filer API to access all the resources they need for their operations. They mustn't use the java.io.File, java.nio.file.Files or related APIs.

In the end, we ended up implementing our own annotation processing mechanism that completely ignores the one that javac runs. This has quite a maintenance and implementational cost, but the benefits worth it.

An interesting note is that we support tracking changes in the JavaDoc of the Java elements. If a processor reads the documentation for a module, package, class, method, or field, and it is changed by the developer, then the processor will be reinvoked. Similarly, parameter name changes are also tracked when annotation processors are used. The processors are allowed to generate resources based on documentation contents, and they will work correctly. We found that this is an important use-case for example when generating help messages for a command line argument parser.

Also note that one thing that we don't currently support is the tracking of element positions in a class. If a developer simply reorders the methods or fields in a class, the processors won't be reinvoked. This will most likely be implemented in the future.

Performance comparison

Overview

Implementation insights
- Communicating with javac
- Annotation processing

Implementation insights

Communicating with javac

Annotation processing

Communicating with `javac`