Task factories represent a task that is to be executed by the build runtime. They are a specification of how the task should be executed, and what the task itself actually does.
Taks factories must implement the equals(
Task factories should not have any state, and all of the data contained in them should be immutable.
Task factories can specify capabilities which are hints for the build runtime to fine-grain the execution behaviour. They can be used to signal that a task completes quickly, is remote executeble, etc... Additional capabilities can be added to this interface in the future.
The task factories can also be used to start inner tasks. Inner tasks are handled differently by the build system,
Task factories can implement a strategy to choose a suitable build environment to execute on. This is mostly relevant when designing tasks for remote execution. See getExecutionEnvironmentSelector().
In order to avoid thrashing the system due to too high level of concurrency, tasks can report the level of computation they will use to do their work. See getRequestedComputationTokenCount().
Task factories are strongly encouraged to implement the Externalizable interface for faster serialization.
|public static final String|
CAPABILITY_CACHEABLE = "saker.task.remote.cacheable"
Capability string for specify that the results of the task can be cached and retrieved.
|public static final String|
CAPABILITY_INNER_TASKS_COMPUTATIONAL = "saker.task.inner.tasks.computational"
Capability string for specifying a task that will start inner task with computation tokens.
|public static final String|
CAPABILITY_REMOTE_DISPATCHABLE = "saker.task.remote.dispatchable"
Capability string for specify a task that can be executed remotely on build clusters.
|public static final String|
CAPABILITY_SHORT_TASK = "saker.task.short"
Capability string for specify a task that is considered to be short.
Creates a task instance.
Checks if this task equals will execute exactly the same computations given the same circumstances as the parameter.
|public default Set<|
Gets the capabilities of this task.
|public default TaskExecutionEnvironmentSelector|
Gets an environment selector to determine if the task can execute in a given build environment.
|public default int|
Gets the computation token count consumed by this task during execution.
Returns a hash code value for the object.
Cacheable tasks allow the build system to retrieve the result of the execution from external sources, or publish their results to a database.
Cacheable tasks are used with build caches. Build caches are background daemon processes which provide access to results of previously run tasks. If a task reports themself as cacheable, the build system may try to retrieve its previously run result from a build cache configured to the current execution. After a cacheable task executes, the build system may publish the results of the task to the configured build cache, so the outputs will be available for future reuse.
This capability serves as a hint, and the build system may decide that it won't use the build cache to retrieve the results. This may be due to performance, configuration, build environment or other arbitrary reasons.
The build system will only retrieve the results for a task if the published task is applicable to the current build environment. Meaning, that if any dependendencies of the published task have been changed in the current run, then it won't be reused.
Cacheable tasks are strongly recommended to comply with the following restrictions:
- The task identifier for the task should have a stable hash code. This means that the task identifier should return the same hash code for the same objects between different executions of the Java process. This usually requires that the task identifier doesn't derive its hash code from the identity hash code, class hash code, or in any way runtime dependent values. With that in mind, enums cannot be used as task identifiers, because their hash code is not stable.
- The task cannot wait on the result of another task, but it can only retrieve its finished results. This means that the task may only use the finished result retrieval methods of other tasks. This requirement is aligned with the computation token usage.
The above restrictions are required in order to provide an efficient and sane implementation for the build system, and may be lifted in the future, but task implementations should align their behaviour with these in place nonetheless.
As a general rule of thumb, only tasks should report this capability which do more work than the time it takes to retrieve their results from a network cache. That is, the time the task computation takes should outweight the network communication times.
If a task wishes to start inner tasks that report 1 or more computation tokens, then the enclosing task must report this capability. This is in order to ensure that the proper restrictions are placed in the build system for the enclosing and inner tasks as well. See getRequestedComputationTokenCount() for the nature of restrictions.
Remote dispatchable tasks can be transferred to remote executor instances, therefore improving the number of concurrently executing tasks and ensuring horizontal scalability.
This capability only used when the user configures the build execution to use at least one cluster instance.
When specifiying this capability, the task will be a candidate for remote dispatching. The build runtime is not required to actually execute this task on a remote machine, but it will make efforts to property distribute it based on current workloads.
When a task reports themselves as remote dispatchable, a restriction is placed on them that they cannot wait for
other tasks. This restriction is necessary, as the deadlock detection is only feasible on the main executor
machine. (Note that this restriction is usually non-distruptive. As generally remote dispatchable tasks are used
for heavily computational workload, they usually report computation tokens to signal the amount of work done. In that case, they already can't wait for other tasks.
This restriction may be lifted in the future, or may be only employed if the task is actually being run on a
Tasks can retrieve finished results nonetheless.
Designing a task to be remote dispatchable can improve performance, as it will result in more utilization of overall resources available to the build system. Remote dispatchable tasks should be carefully implemented, and use the appropriate functions for avoiding performance traps. See the remote execution guide of the build system for best practices.
Good example for a remote executable task is C++ compilation, where source files can be transferred to clusters, compiled, and the result returned back to the main executor. For a large set of files, the compilation tasks can be distributed to multiple machines, and the overall compilation can complete much faster than if only a single machine was used.
To choose an appropriate build environment for the task, getExecutionEnvironmentSelector() can be used.
If a task reports themselves as short then they are considered to be fast to execute. This is in a sense that the execution of the task is shorter than creating a separate thread and running them concurrently. As a general rule of thumb, if the execution time of a task is comparable to the time that a thread takes to start, then it should be short.
It is recommended that tasks which wait for no other tasks, have no dependencies, do no heavy computations, and do no I/O operations, are good subjects to be short.
The following additional restrictions apply to short tasks:
- They can only wait for tasks which are also short capable.
- They cannot wait for tasks which are not yet started.
- They cannot be remote dispatchable.
- They cannot report computation tokens.
The build system can run short tasks without creating a separate thread for them. This means that starting a short task will not return control to the starter, but wait for the execution of the task and then return control. This is an optimization can reduce unnecessary load on the OS and the build system.
Every task instance is used for only one invocation.
The checks for equality should also take the execution environment selector into account.
Indicates whether some other object is "equal to" this one.
equals method implements an equivalence relation on non-null object references:
- It is reflexive: for any non-null reference value
- It is symmetric: for any non-null reference values
trueif and only if
- It is transitive: for any non-null reference values
- It is consistent: for any non-null reference values
y, multiple invocations of
trueor consistently return
false, provided no information used in
equalscomparisons on the objects is modified.
- For any non-null reference value
equals method for class
Object implements the most discriminating possible equivalence
relation on objects; that is, for any non-null reference values
y, this method returns
true if and only if
y refer to the same object (
x == y has the value
Note that it is generally necessary to override the
hashCode method whenever this method is overridden,
so as to maintain the general contract for the
hashCode method, which states that equal objects must have
equal hash codes.
trueif this object is the same as the obj argument;
Implementation should return a new instance for every invocation of this method, as TaskExecutionEnvironmentSelector is a stateful class.
If two task factories equal, then their returned environment selectors should equal as well.
If an environment selector fails to find a suitable environment, then an instance of TaskEnvironmentSelectionFailedException will be thrown by the build system and the build execution will abort.
The default implementation returns a selector which enables the task to use any build environment.
Computation tokens are used to prevent thrashing of the execution machine when too many concurrent operations are running. A computation token represents one unit of computational operation that uses one CPU thread on 100% usage. This method returns the average number of computation tokens the task uses during its execution. The task will start to run when the requested number of tokens are available for it.
If a task returns
> 0 amount of computation tokens then a restriction is placed on them that
they can't wait for other tasks in the build system. This is in order to prevent involuntarily deadlocking the
(Reasoning: Tasks will not start execution until they can allocate the required amount of computation tokens for themselves. If a tasks attempts to wait for a task which cannot start due to not being able to allocate enough computation tokens will deadlock the build execution, although they could probably finish if computation tokens didn't exist. Implementing active deadlock detection for this behaviour is not deemed to be feasible, so the above restriction is placed on tasks which require computation tokens.)
If your task really needs to wait for an input task then we recommend waiting for them in a parent task and start the actual computation in a sub-task with computation tokens. Dependencies on input tasks can be specified by using the finished retrieval methods of the task futures which do not require waiting for the subject task.
The default implementation returns 0, meaning no computation tokens requested.
The general contract of
- Whenever it is invoked on the same object more than once during an execution of a Java application, the
hashCodemethod must consistently return the same integer, provided no information used in
equalscomparisons on the object is modified. This integer need not remain consistent from one execution of an application to another execution of the same application.
- If two objects are equal according to the
equals(Object)method, then calling the
hashCodemethod on each of the two objects must produce the same integer result.
- It is not required that if two objects are unequal according to the
Object)method, then calling the
hashCodemethod on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hash tables.
As much as is reasonably practical, the hashCode method defined by class
Object does return distinct
integers for distinct objects. (This is typically implemented by converting the internal address of the object
into an integer, but this implementation technique is not required by the Java™ programming language.)