In this chapter we will go more in-depth into workflow semantics and will address more complex workflow examples. We will also show the syntax and semantic differences between various scripting languages like groovy, jruby.
Job variables are workflow variables which can either be defined statically at
the workflow-level or dynamically inside a task.
They are similar to a dictionary (HashMap).
When they are defined statically, they are accessible in any task.
When they are defined dynamically, they are only accessible in children tasks.
Statically-defined job variables can as well be modified dynamically inside a task. In that case, the modification will be visible in children tasks only.
Some variables such as PA_JOB_ID, PA_TASK_ID, PA_USER, are automatically set by the system.
The following diagram illustrate this behavior:
When used inside tasks, Job variables are of type String.
It is possible to control the syntax of a variable, through the Model attribute.
When the Model attribute is used, the variable will still be of type String, but the variable definition will be controlled.
This can make sure that a user does not enter a wrong value which can fail the workflow.
For example, create a new workflow with a groovy task and define the following job variable (unselect the task before)
Task Variables are similar to Job Variables, with the following differences:
Another way to transfer information between tasks is by using the task result
Inside each task, it is possible to set a result by doing an affectation to a variable called result.
The direct child task will be able to access this result by another variable called results (with “s” since it can have multiple parents).
The exact type of the results variable is language-dependant, but always an aggregate type such as array or list, as it aggregate results from several parent tasks.
Let’s illustrate this by an example. Create a new job and call it Job results. Create two Groovy Tasks, position them on the same line, and replace the scripts by the following:
2 Resource selection
Resource selection allows to choose specific ProActive nodes to execute tasks. It is
useful when the resource manager controls different machine groups,
with different libraries installed, or even different operating systems. It can be
especially useful, when heterogeneous machines are connected to the
scheduler. Selection is done by writing selection scripts able to determine
if a task
can be executed on the ProActive Node or not.
Let’s show by an example how we can select a specific machine for execution. Create a new job in the studio named Selection job with a single groovy task
Open the Node Selection panel and click on Add This will open the following dialog:
3 Data management
When we create a file in a task, the file will be located in the working directory
the task. This directory is called in the ProActive terminology the Local
Space. This directory is volatile and will be deleted after the task is
finished, so it’s
mandatory to transfer any output file produced.
To illustrate this, let’s create a new job called LocalSpace job with a single Windows cmd task.
Replace the script content with the following and execute the job:
4 Control structures
As we already saw previously with the replicate example, Control structures allow to
build dynamic workflows with control flow decisions.
There are three kinds of control structures:
5 Multi-Node Tasks
We already saw briefly that a ProActive Task can reserve more than one ProActive
Node for its execution. The reason behind this feature is that not all tasks simply
execute a basic script, but often tasks will call an external program. That
program can be multi-threaded by using multiple cores on
the machine. In that scenario, it’s important to precisely match the number of
ProActive Nodes used by our task with the machine resources actually used by the program.
Otherwise, the scheduler could dispatch on the same machine more tasks that the machine
resources can handle.
Open again the Multi node job, click on the task, and open the Multi-Node Execution panel. Open the Topology list, there are many choices in it, but we are going to focus on the two most useful ones:
6 On-Error policies
Through this course, we run many failing jobs, and each time we observed that the
scheduler tries to execute a failing task several times, then continue the job
execution with other tasks. This is the standard behavior for failing tasks, but
each workflow can define its
own failover policy.
Let’s open again, the Result job which was creating an error.
Click on the desktop outside the task to open the job parameter panel.
Click on the panel Error Handling.
Here you can see the Maximum Number of Executions Attempts (2 by default), and two other settings:
7 Fork Environment
When a ProActive Task is executed on a ProActive Node, a dedicated Java Virtual
Machine is started to execute the Task.
The forked JVM parameters are automatically configured by the ProActive Node, but sometimes it may be necessary to provide additional configurations to the JVM. This configuration can be performed thanks to a Fork Environment or Fork Environment Script.
Let’s demonstrate this by an example. Create a bash Task containing the pwd command to display the current directory.
Execute this task, you should see in the output something like:
ProActive natively supports containers (Docker, Kubernetes,..). As a first approach,
let’s run a simple bash command from a basic Linux
From the Studio, drag’n drop a Dockerfile task (Tasks->Containers->Dockerfile).
Open the Task Implementation view to see the script: It simply prints “Hello” from a freshly running Ubuntu container.
Execute the workflow. In case the docker image is not present locally (here Ubuntu 18.04), you will see image-pull-related logs in the job output.
9 Generic Information
Generic Information key features:
10 Third-Party Credentials
Third-Party credentials key features: