ProActive Workflows & Scheduling (PWS)

1. Overview

1.1. What is ProActive AI Orchestration (PAIO)?

ProActive AI Orchestration (PAIO) is a complete DSML platform (Data Science and Machine Learning) including a ML Studio, AutoML, Data Science Orchestration and MLOps for the deployment, training, execution and scalability of artificial intelligence and machine learning models on any type of infrastructure. Created for data scientists and ML engineers, the solution is simple to use and accelerate the development and deployment of machine learning models.

PAIO overview

ProActive AI Orchestration platform provides a rich catalog of generic machine learning tasks that can be connected together to build either basic or advanced machine learning workflows for various use cases such as: fraud detection, text analysis, online offer recommendations, prediction of equipment failures, facial expression analysis, etc. PAIO workflows enable users to manage machine learning pipelines through the different phases of the development lifecycle and allow them to better control tasks parallelization, by running the tasks on resources matching constraints (Multi-CPU, GPU, FPGA, data locality, libraries, etc).

PAIO Open Studio ActiveEon

The ProActive AI Orchestration platform is an open source solution, and it can be tested online without installation on our try platforms here.

PAIO also encompasses a range of interfaces that serve as centralized hubs, offering real-time monitoring, collaboration, and decision-making capabilities across various stages of the machine learning operational pipeline. By providing insights into model performance, infrastructure health, and deployment status, these interfaces empower teams to optimize and enhance the reliability of their machine learning systems.

MLOps dashboard model servers

The MLOps Dashboard (screenshot above) serves as a centralized hub for data scientists, engineers, and stakeholders involved in deploying and monitoring machine learning models. It provides a comprehensive view of the deployment pipelines, real-time performance metrics, and key indicators. Furthermore, it includes features specifically designed to manage and monitor the underlying model servers. Refer to MLOps Dashboard section for more information.

1.2. Glossary

The following terms are used throughout the documentation:

ProActive Workflows & Scheduling

The full distribution of ProActive for Workflows & Scheduling, it contains the ProActive Scheduler server, the REST & Web interfaces, the command line tools. It is the commercial product name.

ProActive Scheduler

Can refer to any of the following:

  • A complete set of ProActive components.

  • An archive that contains a released version of ProActive components, for example activeeon_enterprise-pca_server-OS-ARCH-VERSION.zip.

  • A set of server-side ProActive components installed and running on a Server Host.

Resource Manager

ProActive component that manages ProActive Nodes running on Compute Hosts.

Scheduler

ProActive component that accepts Jobs from users, orders the constituent Tasks according to priority and resource availability, and eventually executes them on the resources (ProActive Nodes) provided by the Resource Manager.

Please note the difference between Scheduler and ProActive Scheduler.
REST API

ProActive component that provides RESTful API for the Resource Manager, the Scheduler and the Catalog.

Resource Manager Web Interface

ProActive component that provides a web interface to the Resource Manager.

Scheduler Web Interface

ProActive component that provides a web interface to the Scheduler.

Workflow Studio

ProActive component that provides a web interface for designing Workflows.

ProActive AI Orchestration

PAIO component that provides a web interface for designing and composing ML Workflows with drag and drop.

Job Planner Portal

ProActive component that provides a web interface for planning Workflows, and creating Calendar Definitions

Catalog

ProActive component that provides storage and versioning of Workflows and other ProActive Objects. It is also possible to query the Catalog for specific Workflows through a REST API and also with Graphql API.

Job Planner

A ProActive component providing advanced scheduling options for Workflows.

Bucket

ProActive notion used with the Catalog to refer to a specific collection of ProActive Objects and in particular ProActive Workflows.

Server Host

The machine on which ProActive Scheduler is installed.

SCHEDULER_ADDRESS

The IP address of the Server Host.

ProActive Node

One ProActive Node can execute one Task at a time. This concept is often tied to the number of cores available on a Compute Host. We assume a task consumes one core (more is possible, so on a 4 cores machines you might want to run 4 ProActive Nodes. One (by default) or more ProActive Nodes can be executed in a Java process on the Compute Hosts and will communicate with the ProActive Scheduler to execute tasks. We distinguish two types of ProActive Nodes:

  • Server ProActive Nodes: Nodes that are running in the same host as ProActive server;

  • Remote ProActive Nodes: Nodes that are running on machines other than ProActive Server.

Compute Host

Any machine which is meant to provide computational resources to be managed by the ProActive Scheduler. One or more ProActive Nodes need to be running on the machine for it to be managed by the ProActive Scheduler.

Examples of Compute Hosts:

PROACTIVE_HOME

The path to the extracted archive of ProActive Scheduler release, either on the Server Host or on a Compute Host.

Workflow

User-defined representation of a distributed computation. Consists of the definitions of one or more Tasks and their dependencies.

Generic Information

Are additional information which are attached to Workflows.

Job

An instance of a Workflow submitted to the ProActive Scheduler. Sometimes also used as a synonym for Workflow.

Job Icon

An icon representing the Job and displayed in portals. The Job Icon is defined by the Generic Information workflow.icon.

Task

A unit of computation handled by ProActive Scheduler. Both Workflows and Jobs are made of Tasks.

Task Icon

An icon representing the Task and displayed in the Studio portal. The Task Icon is defined by the Task Generic Information task.icon.

ProActive Agent

A daemon installed on a Compute Host that starts and stops ProActive Nodes according to a schedule, restarts ProActive Nodes in case of failure and enforces resource limits for the Tasks.

2. Get Started

To submit your first Machine Learning (ML) workflow to ProActive Scheduler, install it in your environment (default credentials: admin/admin) or just use our demo platform try.activeeon.com.

ProActive Scheduler provides comprehensive interfaces that allow to:

We also provide a REST API and command line interfaces for advanced users.

3. Create a First Predictive Solution

Suppose you need to predict houses prices based on this information (features) provided by the estate agency:

  • CRIM per capita crime rate by town

  • ZN proportion of residential lawd zoned for lots over 25000

  • INDUS proportion of non-retail business acres per town

  • CHAS Charles River dummy variable

  • NOX nitric oxides concentration

  • RM average number of rooms per dwelling

  • AGE proportion of owner-occupied units built prior to 1940

  • DIS weighted distances to five Boston Employment centres

  • RAD index of accessibility to radial highways

  • TAX full-value property-tax rate per $10 000

  • PTRATIO pupil-teacher ratio by town

  • B 1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town

  • LSTAT % lower status of the population

  • MDEV Median value of owner-occupied homes in $1000' s

Predicting houses prices is a complex problem, but we can simplify it a bit for this step-by-step example. We’ll show you how you can easily create a predictive analytics solution using PAIO.

3.1. Manage the Canvas

To use PAIO, you need to select the Machine Learning preset as main catalog in the ProActive Studio. This preset contains a set of buckets containing machine learning tasks and workflows that enables you to upload and prepare data, train a model and test it.

  1. Open ProActive Workflow Studio home page.

  2. Create a new workflow.

  3. Change palette preset to Machine Learning.

  4. Click on ai-machine-learning catalog and pin it open, and same for the ai-data-visualization catalog.

  5. Organize your canvas.

Change palette preset allows the user to visualise different set of catalogs in the studio. 
100000

3.2. Upload Data

To upload data into the Workflow, you need to use a dataset stored in a CSV file.

  1. Once dataset has been converted to CSV format, upload it into a cloud storage service for example Amazon S3. For this tutorial, we will use Boston house prices dataset available on this link: https://s3.eu-west-2.amazonaws.com/activeeon-public/datasets/boston-houses-prices.csv

  2. Drag and drop the Import_Data task from the ai-machine-learning bucket in the ProActive AI Orchestration.

  3. Click on the task and click General Parameters in the left to change the default parameters of this task.

  4. Put in FILE_URL variable the S3 link to upload your dataset.

  5. Set the other parameters according to your dataset format.

This task uploads the data into the workflow that we can for model training and testing.

If you want to skip these steps, you can directly use the Load_Boston_Dataset Task by a simple drag and drop.

100000

3.3. Prepare Data

This step consists of preparing the data for the training and testing of the predictive model. So in this example, we will simply split our datset into two separate datasets: one for training and one for testing.

To do this, we use the Split_Data Task in the machine_learning bucket.

  1. Drag and drop the Split_Data Task into the canvas, and connect it to the Import_Data or Load_Boston_Dataset Task.

  2. By default, the ratio is 0.7 this means that 70% of the dataset will be used for training the model and 0.3 for testing it.

  3. Click the Split_Data Task and set the TRAIN_SIZE variable to 0.6.

100000

3.4. Train a Predictive Model

Using PAIO, you can easily create different ML models in a single experiment and compare their results. This type of experimentation helps you find the best solution for your problem. You can also enrich the ai-machine-learning bucket by adding new ML algorithms and publish or customize an existing task according to your requirements as the tasks are open source.

To change the code of a task click on it and click the Task Implementation. You can also add new variables to a specific task.

In this step, we will create two different types of models and then compare their scores to decide which algorithm is most suitable to our problem. As the Boston dataset used for this example consists of predicting price of houses (continuous label). As such, we need to deal with a regression predictive problem.

To solve this problem, we have to choose a regression algorithm to train the predictive model. To see the available regression algorithms available on the PAIO, see ML Regression Section in the ai-machine-learning bucket.

For this example, we will use Linear_Regression Task and Support_Vector_Regression Task.

  1. Find the Linear_Regression Task and Support_Vector_Regression Task and drag them into the canvas.

  2. Find the Train_Model Task and drag it twice into the canvas and set its LABEL_COLUMN variable to LABEL.

  3. Connect the Split_Data Task to the two Train_Model Tasks in order to give it access to the training data. Connect then the Linear_Regression Task to the first Train_Model Task and Support_Vector_Regression to the second Train_Model Task.

  4. To be able to download the model learned by each algorithm, drag two Download_Model Tasks and connect them to each Train_Model Task.

100000

3.5. Test the Predictive Model

To evaluate the two learned predictive models, we will use the testing data that was separated out by the Split_Data Task to score our trained models. We can then compare the results of the two models to see which generated better results.

  1. Find the Predict_Model Task and drag and drop it twice into the canvas and set its LABEL_COLUMN variable to LABEL.

  2. Connect the first Predict_Model Task to the Train_Model Task that is connected to Support_Vector_Regression Task.

  3. Connect the second Predict_Model Task to the Train_Model Task that is connected to Linear_Regression Task.

  4. Connect both Predict_Model Tasks to the Split_Data Task.

  5. Find the Preview_Results Task in the ML bucket and drag and drop it twice into the canvas.

  6. Connect each Preview_Results Task with Predict_Model.

100000
if you have a pickled file (.pkl) containing a predictive model that you have learned using another platform, and you need to test it in the PAIO, you can load it using Import_Model Task.

3.6. Run the Experiment and Preview the Results

Now the workflow is completed, let’s execute it by:

  1. Click the Execute button on the menu to run the workflow.

  2. Click the Scheduling & Orchestration button to track the workflow execution progress.

  3. Click the Visualization tab and track the progress of your workflow execution (a green check mark appears on each Task when its execution is finished).

  4. Visualize the output logs by clicking on the output tab and check the streaming check box.

  5. Click the Tasks tab, select a Preview_Results task and click on the Preview tab, then click either on Open in browser to preview the results on your browser or on Save as file to download the results locally.

100000

4. Automated Machine Learning (AutoML)

The ai-auto-ml-optimization bucket contains the Distributed_Auto_ML workflow that can be easily used to find the operating parameters for any system whose performance can be measured as a function of adjustable parameters. It is an estimator that minimizes the posterior expected value of a loss function. This bucket also comes with a set of workflows' examples that demonstrates how we can optimize mathematical functions, PAIO workflows and machine/deep learning algorithms from scripts using AutoML tuners. In the following subsections, several tables represent the main variables that characterize the AutoML workflows. In addition to the variables mentioned below, there is a set of generic variables that are common between all workflows which can be found in the subsection AI Workflows Common Variables.

AutoML 1

4.1. Distributed AutoML

The Distributed_Auto_ML workflow proposes six algorithms for distributed hyperparameters' optimization. The choice of the sampling/search strategy depends strongly on the tackled problem. Distributed_Auto_ML workflow comes with specific pipelines (parallel or sequential) and visualization tools (Visdom or TensorBoard) as described in the subsections below.

AutoML 2

Variables:

Table 1. Distributed_Auto_ML Variables

Variable name

Description

Type

TUNING_ALGORITHM

Specifies the tuner algorithm that will be used for hyperparameter optimization.

List [Bayes, Grid, Random, QuasiRandom, CMAES, MOCMAES] (default=Random)

MAX_ITERATIONS

Specifies the number of maximum iterations. It should be an integer number higher than zero. Set -1 for an infinite loop.

Int (default=2)

PARALLEL_EXECUTIONS_PER_ITERATION

Specifies the number of parallel executions per iteration. It should be an integer number higher than zero.

Int (default=2)

NUMBER_OF_REPETITIONS

Specifies the number of hyperparameter sampling repetitions. Ensures every experiment is repeated a given number of times. It should be an integer number higher than one. Set -1 to never see repetitions.

Int (default=-1)

PAUSE_AFTER_EVERY_ITERATIONS

If higher than zero, pause the workflow after every specified number of iterations. Set -1 to disable.

Int (default=-1)

STOP_IF_LOSS_IS_LOWER_THAN

If higher than zero, stop the workflow execution if loss is lower than the specified value. Set -1 to disable.

Int (default=-1)

TARGET_WORKFLOW

Specifies the workflow path from the catalog that should be optimized.

String (default=ai-auto-ml-optimization/Himmelblau_Function)

TARGET_NATIVE_SCHEDULER

Name of the native scheduler node source to use on the target workflow tasks when deployed inside a cluster such as SLURM, LSF, etc.

String (default=empty)

TARGET_NATIVE_SCHEDULER_PARAMS

Parameters given to the native scheduler (SLURM, LSF, etc) while requesting a ProActive node used to deploy the target workflow tasks.

String (default=empty)

TARGET_NODE_ACCESS_TOKEN

If not empty, the target workflow tasks will be run only on nodes that contains the specified token.

String (default=empty)

TARGET_NODE_SOURCE_NAME

If not empty, the target workflow tasks will be run only on nodes belonging to the specified node source.

String (default=empty)

TARGET_CONTAINER_PLATFORM

Specifies the container platform to be used for executing the target workflow tasks.

List [no-container, docker, podman, singularity] (default=empty)

TARGET_CONTAINER_IMAGE

Specifies the name of the container image that will be used to run the target workflow tasks.

List [docker://activeeon/dlm3, docker://activeeon/cuda, docker://activeeon/cuda2, docker://activeeon/rapidsai, docker://activeeon/nvidia:rapidsai, docker://activeeon/nvidia:pytorch, docker://activeeon/nvidia:tensorflow, docker://activeeon/tensorflow:latest, docker://activeeon/tensorflow:latest-gpu] (default=empty)

TARGET_CONTAINER_GPU_ENABLED

If True, it will activate the use of GPU for the target workflow tasks on the selected container platform.

Boolean (default=empty)

TARGET_NVIDIA_RAPIDS_ENABLED

If True, it will activate the use of NVIDIA RAPIDS for the target workflow tasks on the selected container platform.

Boolean (default=empty)

VISDOM_ENABLED

If True, the Visdom service is started allowing the user to visualize the hyperparameter optimization using the Visdom web interface.

Boolean (default=False)

VISDOM_PROXYFIED

If True, requests to Visdom are sent via a proxy server.

Boolean (default=False)

TENSORBOARD_ENABLED

If True, the TensorBoard service is started allowing the user to visualize the hyperparameter optimization using the TensorBoard web interface.

Boolean (default=False)

TENSORBOARD_PROXYFIED

If True, requests to TensorBoard are sent via a proxy server.

Boolean (default=False)

AutoML Full

How to define the search space:

This subsection describes common building blocks to define a search space:

  • uniform: Uniform continuous distribution.

  • quantized_uniform: Uniform discrete distribution.

  • log: Logarithmic uniform continuous distribution.

  • quantized_log: Logarithmic uniform discrete distribution.

  • choice: Uniform choice distribution between non-numeric samples.

Which tuner algorithm to choose?

The choice of the tuner depends on the following aspects:

  • Time required to evaluate the model.

  • Number of hyperparameters to optimize.

  • Type of variable.

  • The size of the search space.

In the following, we briefly describe the different tuners proposed by the Distributed_Auto_ML workflow:

  • Grid sampling applies when all variables are discrete, and the number of possibilities is low. A grid search is a naive approach that will simply try all possibilities making the search extremely long even for medium-sized problems.

  • Random sampling is an alternative to grid search when the number of discrete parameters to optimize, and the time required for each evaluation is high. Random search picks the point randomly from the configuration space.

  • QuasiRandom sampling ensures a much more uniform exploration of the search space than traditional pseudo random. Thus, quasi random sampling is preferable when not all variables are discrete, the number of dimensions is high, and the time required to evaluate a solution is high.

  • Bayes search models the search space using gaussian process regression, which allows an estimation of the loss function, and the uncertainty on that estimate at every point of the search space. Modeling the search space suffers from the curse of dimensionality, which makes this method more suitable when the number of dimensions is low.

  • CMAES search (Covariance Matrix Adaptation Evolution Strategy) is one of the most powerful black-box optimization algorithm. However, it requires a significant number of model evaluation (in the order of 10 to 50 times the number of dimensions) to converge to an optimal solution. This search method is more suitable when the time required for a model evaluation is relatively low.

  • MOCMAES search (Multi-Objective Covariance Matrix Adaptation Evolution Strategy) is a multi-objective algorithm optimizing multiple tradeoffs simultaneously. To do that, MOCMAES employs a number of CMAES algorithms.

Here is a table that summarizes when to use each algorithm.

Algorithm

Time

Dimensions

Continuity

Conditions

Multi-objective

Grid

Low

Low

Discrete

Yes

No

Random

High

High

Discrete

Yes

No

QuasiRandom

High

High

Mixed

Yes

No

Bayes

High

Medium

Mixed

Yes

No

CMAES

Low

Low

Mixed

No

No

MOCMAES

Low

Low

Mixed

No

Yes

4.2. Objective Functions

The following workflows represent some mathematical functions that can be optimized by the Distributed_Auto_ML tuners.

Himmelblau_Function: is a multi-modal function containing four identical local minima. It’s used to test the performance of optimization algorithms. For more info, please click here.

448
948

Kursawe_Multiobjective_Function: is a multiobjective function proposed by Frank Kursawe. It has two objectives (f1, f2) to minimize. For more info, please click here.

648
548

4.3. Hyperparameter Optimization

The following workflows represent some machine learning and deep learning algorithms that can be optimized. These workflows have several common variables as in Distributed_Auto_ML. Some workflows are characterized by few additional variables.

CIFAR_10_Image_Classification: trains a simple deep CNN on the CIFAR10 images dataset using the Keras library.

Table 2. CIFAR_10_Image_Classification Variables

Variable name

Description

Type

NUM_EPOCHS

The number of times data is passed forward and backward through the training algorithm.

Integer (default=3)

INPUT_VARIABLES

A set of specific variables (usecase-related) that are used in the model training process.

JSON format

SEARCH_SPACE

Specifies the representation of the search space which has to be defined using dictionaries or by entering the path of a json file stored in the catalog.

JSON format

INSTANCE_NAME

Specifies the name to be provided for the instance.

String (default=tensorboard-server)

CONTAINER_LOG_PATH

Specifies the path where the docker logs are created and stored on the docker container.

String (default=/graphs/$INSTANCE_NAME)

CONTAINER_ROOTLESS_ENABLED

If True, the user will be able to run the workflow in a rootless mode.

(default=True)

The following workflows have common variables with the above illustrated workflows.

CIFAR_10_Image_Classification: trains a simple deep CNN on the CIFAR10 images dataset using the Keras library.

CIFAR_100_Image_Classification: trains a simple deep CNN on the CIFAR100 images dataset using the Keras library.

Image_Object_Detection: trains a YOLO model on the coco dataset using PAIO deep learning generic tasks.

Digits_Classification: python script illustrating an example of multiple machine learning models optimization.

Text_Generation: trains a simple Long Short-Term Memory (LSTM) to learn sequences of characters from 'The Alchemist' book. It’s a novel by Brazilian author Paulo Coelho that was first published in 1988.

The following workflows contain a search space containing a set of possible neural networks architectures that can be used by Distributed_Auto_ML to automatically find the best combinations of neural architectures within the search space.

Single_Handwritten_Digit_Classification: trains a simple deep CNN on the MNIST dataset using the PyTorch library. This example allows to search for two types of neural architectures defined in the Handwritten_Digit_Classification_Search_Space.json file.

Multiple_Objective_Handwritten_Digit_Classification: trains a simple deep CNN on the MNIST dataset using the PyTorch library. This example allows optimizing multiple losses, such as accuracy, number of parameters, and memory access cost (MAC) measure.

4.5. Distributed Training

The following workflows illustrate some examples of multi-node and multi-gpu distributed learning.

TensorFlow_Keras_Multi_Node_Multi_GPU: is a TensorFlow + Keras workflow template for distributed training (multi-node multi-gpu) with AutoML support.

TensorFlow_Keras_Multi_GPU_Horovod: is a Horovod workflow template that support multi-gpu and AutoML.

4.6. Templates

The following workflows represent python templates that can be used to implement a generic machine learning task.

Python_Task: is a simple Python task template pre-configured to run with Distributed_Auto_ML.

R_Task: is a simple R task template pre-configured to run with Distributed_Auto_ML.

5. Federated Learning (FL)

Federated Learning (FL) enables to train an algorithm across multiple decentralized devices (or servers) holding local data samples, without exchanging them. The ai-federated-learning bucket contains a few examples of Federated Learning workflows that can be easily used to build a common and robust machine learning model without sharing data, thus allowing to address critical issues such as data privacy, data security, data access rights and access to heterogeneous data. This bucket uses the Flower library to implement federated learning workflows. The Flower library is a friendly federated learning framework that presents a unified approach for federated learning. It help federating any workload using any ML framework, and any programming language.

FlowerArchitecture

5.1. PyTorch Federated Learning Tasks

The following workflows represent a client/server templates that can be used to implement a Federated Learning workflow using PyTorch.

PyTorch_FL_Client_Task: is a Federated Learning Client task template using PyTorch.

PyTorch_FL_Server_Task: is a Federated Learning Server task template using PyTorch.

5.2. TensorFlow Federated Learning Tasks

The following workflows represent a client/server templates that can be used to implement a Federated Learning workflow using TensorFlow/Keras.

TensorFlow_FL_Client_Task: is a Federated Learning Client task template using TensorFlow/Keras.

TensorFlow_FL_Server_Task: is a Federated Learning Server task template using TensorFlow/Keras.

5.3. Federated Learning Workflows

The following workflows uses the federated learning to train a deep Convolutional Neural Network (ConvNet/CNN) on the CIFAR10 images dataset using the Flower library.

PyTorch_Federated_Learning_Example: shows an example of Federated Learning workflow using PyTorch.

TensorFlow_Federated_Learning_Example: shows an example of Federated Learning workflow using TensorFlow/Keras.

References:

6. MLOps Dashboard

In the domain of machine learning operations (MLOps), the successful deployment and continuous monitoring of machine learning models are crucial for ensuring their reliability and performance. However, in addition to managing models, it is equally important to handle the infrastructure where the models are deployed. This is where an MLOps dashboard, designed specifically for model deployment, monitoring, and model servers, becomes a powerful asset.

An MLOps dashboard serves as a centralized hub for data scientists, engineers, and stakeholders involved in deploying and monitoring machine learning models. It provides a comprehensive view of the deployment pipelines, real-time performance metrics, and key indicators. Furthermore, it includes features specifically designed to manage and monitor the underlying model servers.

Model servers, also known as serving infrastructure, are responsible for hosting the deployed machine learning models and providing predictions or inferences to applications or users. An MLOps dashboard equipped with model server management capabilities allows users to seamlessly handle the infrastructure aspect of model deployment.

The MLOps dashboard extends its monitoring capabilities by incorporating the monitoring of the underlying infrastructure’s health and performance. It provides real-time insights into server metrics, resource utilization, and availability, allowing teams to promptly identify and address any infrastructure-related issues. This comprehensive monitoring capability ensures that the model servers are performing optimally and can handle the predicted workloads efficiently.

Collaboration is also a key aspect of an MLOps dashboard. It enables seamless communication and collaboration among data scientists, engineers, and other stakeholders involved in both model deployment and server management. The dashboard allows users to share insights, discuss server performance trends, and provide feedback, fostering a collaborative environment that facilitates continuous improvement and innovation for both models and infrastructure.

To facilitate this process, the MLOps dashboard provides three distinct tabs:

They provide a comprehensive and intuitive interface for data scientists, engineers, DevOps, and all stakeholders involved in MLOps.

6.1. Model Servers Monitoring

The Model Servers Monitoring tab focuses on overseeing the health and performance of the model servers or serving infrastructure. It is divided into two main parts.

MLOps dashboard model servers

In the first part, there are 6 main widgets providing general information about the model servers. These widgets offer valuable insights into the overall performance and usage of the serving infrastructure. Here are the widgets included in this tab:

  1. Model Servers: This widget displays the number of currently running model servers. It provides a quick overview of the active instances responsible for serving machine learning models.

  2. GPUs: This widget shows the number of running GPUs. It indicates the availability and utilization of GPU resources within the model servers, which is particularly relevant for GPU-accelerated machine learning workloads.

  3. Deployed Models: This widget provides the total count of deployed models. It offers a summary of the number of machine learning models that have been successfully deployed and are currently running on the model servers.

  4. TimeSpan Inferences and Total Inferences: These widgets track the number of inferences performed within a specific timespan and the total number of inferences overall. They give insights into the workload and usage patterns of the deployed models, allowing teams to assess the level of usage and demand for the served models.

  5. Average Inference Time: This widget displays the average time taken for an inference to be processed. It provides an indication of the computational efficiency and latency of the model servers in generating predictions or inferences. Additionally, the minimum and maximum inference times help identify the performance variations.

  6. Average Inference Rate: This widget shows the average rate at which inferences are processed, indicating the throughput or number of inferences handled per unit of time (per minute). The minimum and maximum inference rates provide insights into the server’s capacity and ability to handle varying workloads.

In the second part of the Model Servers Monitoring tab, there is a table listing the model servers along with their respective specific characteristics. The table provides detailed information about each model server instance. Here are the columns representing the characteristics of the model servers:

  1. ID: This column displays the unique identifier assigned to each model server instance for easy reference and identification.

  2. Info: The Info column presents relevant variables and their corresponding values used to launch the model server. It includes details such as the Docker Image utilized, GPU Enabled status, Endpoint ID, Node Source, and whether it is Proxyfied or not.

  3. Instance Name: This column specifies the name given to the model server instance, enabling users to easily identify and differentiate between different instances.

  4. Status: The Status column indicates the current status of the model server instance. It provides important visibility into the current state of each model server instance, allowing users to quickly identify whether an instance is actively serving, has completed its task, or requires further attention due to issues encountered. It can take one of the following values:

    • Running: This status indicates that the model server instance is currently active and operational, ready to serve model predictions or inferences.

    • Finished: The "Finished" status indicates that the model server instance is no longer active and not serving predictions.

    • Finished with issues: The "Finished with issues" status indicates that the model server instance has encountered problems or issues during its operation. It suggests that the instance has completed its task, but there may have been complications or errors along the way that require attention or investigation.

  5. Start Time: The Start Time column indicates the datetime when the model server instance was initiated.

  6. Node: This column identifies the specific node or machine where the model server instance is running, providing insights into the underlying infrastructure allocation.

  7. GPUs: The GPUs column displays the number of GPUs utilized by the corresponding model server instance. It highlights the GPU resource allocation for each instance, particularly useful for GPU-accelerated workloads.

  8. Model Registry: This column indicates the location or source where the deployed models are stored, facilitating easy access and retrieval.

  9. Model Control Mode: The Model Control Mode column specifies the mode of model control for the server instance, which can be Poll or None: where all actions can be performed using this mode such as deploying, undeploying, activating or deactivating models, or Explicit: where the models can be activated and deactivated but can not be deployed or undeployed. Using the Model Control Mode all models are loaded from the Model Registry in which models that can not be loaded are marked as UNAVAILABLE.

  10. Nb of models: This column shows the count of models deployed on the specific model server instance, providing an overview of the model quantity hosted on that instance.

  11. Total Inferences: The Total Inferences column represents the total number of inferences performed on the model server instance since its start.

  12. TimeSpan Inferences: This column displays the number of inferences performed on the model server instance within a specified timespan.

  13. Average Inference Time: The Average Inference Time column indicates the average duration taken by the model server instance to process an inference.

  14. Min Inference Time: This column represents the minimum time taken for an inference on the model server instance.

  15. Max Inference Time: The Max Inference Time column displays the maximum time taken for an inference on the model server instance.

  16. Inference Rate: This column presents the rate or frequency at which inferences are processed by the model server instance, indicating the throughput or performance of the server.

  17. Min Inference Rate: The Min Inference Rate column shows the lowest inference rate observed on the model server instance.

  18. Max Inference Rate: This column represents the highest inference rate observed on the model server instance.

  19. Actions: The Actions column contains buttons that allow users to interact with the model server instance. It includes options such as (1) Deploy a new Model on a running instance, (2) Stop a running model server instance, or (3) Re-Submit a stopped model server instance.

At the top of this tab, there is a "New Model Server Instance" button that allows users to launch a new model server. When clicked, a window will open, presenting variables that need to be specified by the user. There are two types of variables: General or Advanced. The Table below displays all the variables used to start a new model server.

Table 3. New_Model_Server_Instance variables

Variable name

Description

Type

Workflow variables

INSTANCE_NAME

Instance name of the new model server.

String (default=Empty) [General].

GPU_ENABLED

If True, container will run with NVIDIA GPU support.

Boolean (default=False) [General].

MODEL_REGISTRY_PATH

Path to the model repository.

String (default="/opt/models") [General].

MODEL_CONTROL_MODE

The model control mode determines how changes to the model repository are handled by the model server.

List [none, explicit, poll] (default="explicit") [General].

NODE_SOURCE

If not empty, the workflow tasks will be run only on nodes belonging to the specified node source.

List [Default, LocalNodes] (default=Empty) [Advanced].

NODE_ACCESS_TOKEN

If not empty, the workflow tasks will be run only on nodes that contains the specified token.

List [model-server-xxx] (default=Empty) [Advanced].

NATIVE_SCHEDULER_PARAMS

Parameters given to the native scheduler (SLURM, LSF, etc) while requesting a ProActive node used to deploy the workflow tasks.

String (default=Empty) [Advanced].

DOCKER_IMAGE

Docker image used to start the model server.

String (default="nvcr.io/nvidia/tritonserver:22.10-py3") [Advanced].

In the model servers table, when a model server is selected, it shows another subtable that lists all the models stored in the model registry of that specific model server.

MLOps dashboard models

This subtable provides additional information about each model. Here are the columns characterizing each model:

  1. Server Name: The Server Name column displays the name of the model server where the model is deployed, providing a clear association between the model and its corresponding server.

  2. Model Name: This column represents the name or identifier of the model deployed on the model server, allowing for easy identification and differentiation between different models.

  3. Version: The Version column indicates the version of the model deployed. It is particularly relevant when a model has multiple deployed versions. Model versioning enables reproducibility and traceability, facilitates performance monitoring and evaluation of different model versions, supports experimentation and iterative development, provides a safety net for rollbacks and recovery, and enhances collaboration and teamwork among data scientists.

  4. State: The State column specifies the current state of the model, which can be Active or Inactive. It indicates whether the model is actively serving predictions or has been deactivated.

  5. Deployment Time: The Deployment Time column denotes the timestamp or date when the model was deployed on the model server, providing visibility into the model’s deployment history.

  6. Total Inferences: This column represents the total number of inferences performed by the model since its deployment on the model server.

  7. TimeSpan Inferences The TimeSpan Inferences column displays the number of inferences performed by the model within a specific timespan, allowing for tracking and monitoring of recent model activity.

  8. Inference Time: The Inference Time column indicates the average duration taken by the model to process an inference.

  9. Min Inference Time: This column represents the minimum time taken for an inference by the model.

  10. Max Inference Time: The Max Inference Time column displays the maximum time taken for an inference by the model.

  11. Inference Rate: This column presents the rate or frequency at which inferences are processed by the model, indicating the throughput or performance of the model.

  12. Min Inference Rate: The Min Inference Rate column shows the lowest inference rate observed for the model.

  13. Max Inference Rate: This column represents the highest inference rate observed for the model.

  14. Actions: The Actions column contains buttons that allow users to interact with the model. The available actions depend on the Model Control Mode of the corresponding model server. For model servers with Model Control Mode as Poll or None, the "Undeploy" button is available to remove the model from the server. For model servers with Model Control Mode as Poll, None, or Explicit, the "Activate" and "Deactivate" buttons are available to control the state of the model.

This tab and the other two tabs include a "Refresh" button, which, when clicked, refreshes the page and displays the last update datetime.

Additionally, all tabs offer an autorefresh feature that allows users to specify the time period for automatic refreshing. Users can choose from a list of predefined time periods, such as 15 seconds, 30 seconds, 5 minutes, 15 minutes, 30 minutes, 1 hour, etc., to determine how frequently the page should be refreshed automatically.

All the values displayed in the widgets and tables are calculated based on a selected time window. The time window can be chosen from a list of predefined options located at the top of the page, such as "Last 15 minutes," "Last 30 minutes," "Last 1 hour," "Last 24 hours," "Yesterday," "This month," "Previous month," and more. Alternatively, the user can select "Use Calendar," which activates a calendar feature. By choosing this option, the user can manually select the desired "From" and "To" dates, allowing for a custom time window selection.

6.2. Models Resource Usage

The second tab of the MLOps monitoring dashboard is dedicated to the Models Resource Usage, providing users with valuable insights into the CPU and GPU resource utilization. This tab is thoughtfully divided into two main parts to ensure a comprehensive understanding of the system’s resource consumption.

MLOps dashboard tab2

The first part features ten main widgets that offer users a wealth of general information about the CPU and GPU usage. These widgets provide real-time metrics, such as average CPU and GPU utilization, memory consumption, etc. By presenting this data in a visually appealing and easily comprehensible format, users can efficiently monitor and evaluate the overall resource consumption of their system. Whether it’s tracking performance trends or identifying potential bottlenecks, these widgets empower users to make informed decisions and optimize their resources effectively.

  1. Avg. CPU Utilization and Avg. GPU Utilization: These widgets provide users with valuable insights into the average CPU and GPU utilization across all model servers. By calculating and displaying the average utilization percentages, users can quickly evaluate the overall resource usage of their model servers. These metrics allows users to gauge the overall CPU and GPU load and monitor any potential spikes or fluctuations in resource usage.

  2. CPU Memory Consumption and GPU Memory Consumption: These widgets provide users with insights into the memory usage of both the CPU and GPU across all model servers allowing users to monitor the memory consumption patterns and identify any potential memory-related issues. These information are crucial for ensuring efficient memory allocation and optimizing the performance of the model servers.

  3. Total CPU Available Memory and Total GPU Available Memory: These widgets present the overall amount of memory that is available for CPU and GPU usage across all model servers. They provide users with a numerical value in gigabytes (GB), indicating the total amount of memory that can be allocated to CPU tasks. These information allow users to understand the total capacity of CPU and GPU memory and make informed decisions regarding memory allocation for their models servers.

  4. Total CPU Memory Usage and Total GPU Memory Usage: These widgets display the memory consumption of the CPU and GPU resources across all model servers. It presents users with information about the total amount of memory being used by the CPU, typically measured in gigabytes (GB). These metrics allow users to monitor the overall memory usage of the CPU and GPU resources and identify any potential memory-related issues or constraints.

  5. Total CPU Free Memory and Total GPU Free Memory: These widgets present the amount of free memory available for CPU and GPU usage across all model servers. It would display a numerical value in gigabytes (GB). These information help users understand the remaining memory capacity that can be allocated to CPU and GPU tasks.

In the second part of the Model Resources Usage tab, there are several graphs that display time series data for each model server. These graphs provide users with detailed information about various metrics related to CPU and GPU utilization, memory usage, and power consumption. The first two graphs are related to CPU Resources and the rest of the graphs are related to GPU Resources:

  1. CPU Utilization: This graph illustrates the CPU utilization over time for each model server. It presents the percentage of CPU resources being utilized by the respective servers, allowing users to analyze trends and identify periods of high or low CPU usage.

  2. Memory Usage: This graph showcases the memory usage over time for each model server. It provides insights into the amount of memory being utilized by the servers, helping users monitor memory consumption patterns and identify any potential memory-related issues.

  3. GPU Utilization This graph displays the GPU utilization over time for each model server. It shows the percentage of GPU resources being utilized by the servers, enabling users to track GPU usage trends and optimize resource allocation for GPU-intensive tasks.

  4. Avg. GPU Utilization per Model Server: This graph presents the average GPU utilization per model server over time. It provides a comparative view of GPU utilization across different servers, allowing users to identify variations and patterns in resource usage.

  5. GPU Used Memory: This graph visualizes the GPU memory usage over time for each model server. It illustrates the amount of GPU memory being actively used by the servers, aiding in monitoring memory consumption and optimizing GPU resource allocation.

  6. GPU Free Memory: This graph shows the GPU free memory over time for each model server. It provides information about the available free memory on the GPUs, helping users track memory availability and ensure optimal memory usage.

  7. GPU Power Usage (Watts): This graph displays the power usage of the GPUs over time for each model server. It shows the power consumption in watts, enabling users to monitor the energy usage of the GPUs and evaluate their power requirements.

6.3. Dashboard Resource Usage

The third tab in this dashboard is dedicated to "Dashboard Resource Usage" providing information about the resource consumption of the entire system, it focuses on monitoring the resources utilized by the MLOps infrastructure as a whole. This tab is divided into two main parts:

MLOps dashboard tab3

The first part focuses on providing information about the overall system. This part includes five main metrics:

  1. CPU Utilization: This metric indicates the overall CPU utilization of the system. It provides information on the average or current CPU usage across all components of the MLOps infrastructure. Monitoring CPU utilization helps users assess the system’s workload and identify any potential performance issues or bottlenecks.

  2. Memory Consumption: This metric reflects the total memory consumption of the system. It provides insights into the amount of memory being used by the MLOps infrastructure as a whole. Monitoring memory consumption helps users ensure sufficient memory resources are available and identify any excessive memory usage that may impact system performance.

  3. Total Available Memory: This metric represents the total amount of memory available in the system. It provides an understanding of the overall memory capacity that can be allocated to various processes and applications within the MLOps infrastructure.

  4. Used Memory: This metric indicates the total amount of memory currently in use by the system. It helps users assess the memory usage and understand how much memory is actively being utilized by processes and applications.

  5. Free Memory: This metric reflects the amount of memory that is currently unoccupied and available for use. It helps users determine the remaining memory capacity in the system and ensure that sufficient free memory is available for optimal performance.

In the second part of the "Dashboard Resource Usage" tab, there are time series graphs that provide insights into various metrics related to CPU utilization, memory usage, disk memory, and network traffic. The specific graphs in this section include:

  1. CPU Utilization:

    • iowait: indicates the percentage of time the CPU is idle but waiting for I/O operations.

    • irq: displays the CPU utilization due to hardware interrupts.

    • nice: represents the CPU utilization by processes with a user-defined priority.

    • softirq: shows the CPU utilization due to software interrupts.

    • steal: indicates the CPU utilization stolen by other virtual machines in a virtualized environment.

    • system: reflects the CPU utilization by the system/kernel processes.

    • user: represents the CPU utilization by user processes.

  2. Memory Usage:

    • Used: displays the memory usage in use by the system.

    • Buffers: represents the memory used for buffering data from disk.

    • Cached: shows the memory used for caching data from disk.

    • Free: indicates the amount of free memory available in the system.

  3. Used Disk Memory:

    • Graphs for specific files or directories, such as "/etc/timezone," "/usr/share/zoneinfo/Etc/UTC," "/etc/hostname," "/etc/hosts," "/etc/resolv.conf," "/etc/prometheus," "/opt/dashboard," and "/opt/grafana/conf." These graphs provide information about the disk memory usage for each specific file or directory.

  4. Available Disk Space:

    • Graphs for specific files or directories, such as "/etc/timezone," "/usr/share/zoneinfo/Etc/UTC," "/etc/hostname," "/etc/hosts," "/etc/resolv.conf," "/etc/prometheus," "/opt/dashboard," and "/opt/grafana/conf." These graphs indicate the available disk space for each specific file or directory.

  5. Network Traffic:

    • eth0 receive: displays the network traffic received on the eth0 network interface.

    • lo receive: represents the network traffic received on the loopback interface.

    • eth0 transmit: shows the network traffic transmitted on the eth0 network interface.

    • lo transmit: indicates the network traffic transmitted on the loopback interface.

7. Model as a Service for Machine Learning (MaaS_ML)

Once a predictive model is built, tested and validated, you can easily use it in real world production pipelines by deploying it as a REST Web Service via the MaaS_ML service. MaaS_ML is dedicated to make deployments of lightweight machine learning (ML) models simple, portable, and scalable, and to easily manage their lifetimes. This will be particularly useful for engineering or business teams that want to take advantage of this model.

The life cycle of any MaaS_ML instance (i.e., from starting the generic service instance, deploying an AI specific model to pausing or deleting the instance) can be managed in three different ways in PAIO :

  • Using the Studio Portal and more specifically the bucket ai-model-as-a-service where specific generic tasks are provided to process all the possible actions (i.e., MaaS_ML_Service_Start, MaaS_ML_Deploy_Model, MaaS_ML_Call_Prediction, MaaS_ML_Actions[Finish/Pause/Resume]). These tasks can be easily integrated to your AI pipelines/workflows as you can see in this Deployment Pipeline Example.

  • Using the Service Automation Portal by executing the different actions associated to MaaS_ML (i.e. Deploy_ML_Model, Pause_MaaS_ML, Update_MaaS_ML, Finish_MaaS_ML.)

  • Using the Swagger UI which is accessible once the MaaS_ML instance is up and running.

Once a MaaS_ML instance is up and running, it could be used for:

  • AI Model Deployment or Update: the user has to provide a valid specific AI Model identifier in order to deploy the model of his/her choice.

  • Call of Predictions: when a specific AI model is running, the user can request predictions for a specific payload. This latter has to be converted into json data in order to get prediction values.

  • Deploy a New Specific AI Model: the running generic AI model can be used to deploy a new specific AI model.

Using MaaS_ML, you can easily deploy and use any machine learning model as a REST Web Service on a physical or a virtual compute host on which there is an available ProActive Node. Going through the ProActive Scheduler, you can also trigger the deployment of a specific VM using the Resource Manager elastic policies, and, eventually, deploy a Model-Service on that specific node.

In the following subsections, we will illustrate the MaaS_ML instance life cycle, from starting the generic service instance, deploying a specific model, pausing it, to deleting the instance. We will also describe how the MaaS_ML instance life cycle can be managed via four different ways in PAIO:

In the description below, multiple tables represent the main variables that characterize the MaaS_ML workflows. In addition to the variables mentioned below, there is a set of generic variables that are common between all workflows which can be found in the subsection AI Workflows Common Variables. The management of the life cycle of MaaS_ML will be detailed in the next subsections.

7.1. MaaS_ML Via Workflow Execution Portal

Click on the button Submit a Job and then search for MaaS_ML_Service workflow as described in the image below.

MaaS ML Search

Check the service parameters and click on the Submit button to start a MaaS_ML service instance.

To get more information about the parameters of the service, please check the section Start a Generic Service Instance.

MaaS ML Submit

You can now monitor the service status, access its endpoint and execute its different actions:

  • Deploy_ML_Model : enables you to deploy a trained ML model in one click.

  • Update_MaaS_ML_Parameters : enables you to update the parameters of the service instance.

  • Finish_MaaS_ML : stops and deletes the service instance.

MaaS ML Workflow Management

When you are done with the service instance, you can terminate it by clicking on Terminate_Job_and_Service button as shown in the image below.

Terminate MaaS ML

7.2. MaaS_ML Via Studio Portal

7.2.1. Start a Generic Service Instance

Open the Studio Portal.

Create a new workflow.

Add the ai-model_as_a_service bucket by clicking in the View menu field > Add Bucket Menu to the Palette > ai-model_as_a_service.

Drag and drop the MaaS_ML_Service_Start task from the bucket.

Execute the workflow by setting the different workflow’s variables as described in the Table below.

Table 4. MaaS_ML_Service_Start variables

Variable name

Description

Type

Workflow variables

MODEL_SERVICE_INSTANCE_NAME

Service instance name.

String (default="maas_ml-${PA_JOB_ID}").

MODEL_SERVICE_PROXIFIED

Allows access to the endpoint through a HTTP(s) Proxy.

Boolean (default=False).

MODEL_SERVICE_ENTRYPOINT

This entry script starts the service and defines the different functions to deploy the model, scores the prediction requests based on the deployed model, and returns the results. This script is specific to your model. This file should be stored in the Catalog under the model_as_service_resources bucket. More information about this file can be found in the Customize the Service section.

String (default="ml_service").

MODEL_SERVICE_YAML_FILE

A YAML file that describes the OpenAPI Specification ver. 2 (known as Swagger Spec) of the service. This file should be stored in the catalog under the model_as_service_resources bucket. More information about the structure of this file can be found in the section Customize the Service.

String (default="ml_service-api").

MODEL_SERVICE_USER_NAME

A valid user name having the needed privileges to execute this action.

String (default="user").

MODEL_SERVICE_NODE_NAME

The name of the node where the service will be deployed. If empty, the service will be deployed on an available node selected randomly.

String (default=Empty)

USE_NVIDIA_RAPIDS

If True, the service will be configured to use the GPU and the Nvidia Rapids library.

Boolean (default=False)

Task variables

SERVICE_ID

The name of the service. Please keep the default value for this variable.

String (default="MaaS_ML")

INSTANCE_NAME

The name of the service that will be deployed.

String (default="maas-ml-${PA_JOB_ID}")

ENGINE

Container engine.

String (default="$CONTAINER_PLATFORM")

GPU_ENABLED

If True, the service will be configured to use the GPU and the Nvidia Rapids library.

Boolean (default=False)

PROXIFIED

It takes by default the value of MODEL_SERVICE_PROXYFIED workflow variable.

String (default="$MODEL_SERVICE_PROXYFIED")

PYTHON_ENTRYPOINT

It takes by default the value of MODEL_SERVICE_ENTRYPOINT workflow variable.

String (default="$MODEL_SERVICE_ENTRYPOINT")

YAML_FILE

It takes by default the value of MODEL_SERVICE_YAML_FILE workflow variable.

String (default="$MODEL_SERVICE_YAML_FILE")

USER_NAME

It takes by default the value of MODEL_SERVICE_USER_NAME workflow variable.

String (default="$MODEL_SERVICE_USER_NAME")

NODE_NAME

It takes by default the value of MODEL_SERVICE_NODE_NAME workflow variable.

String (default="$MODEL_SERVICE_NODE_NAME")

GPU_ENABLED

If True, the service will be configured to use the GPU and the Nvidia Rapids library.

Boolean (default=$USE_NVIDIA_RAPIDS)

7.2.2. Deploy a Specific ML Model

You can also deploy a specific ML model directly from the Studio Portal.

Drag and drop the MaaS_ML_Deploy_Model task from the ai-model-as-a-service bucket.

Execute the workflow and set the different workflow’s variables as follows:

Table 5. MaaS_ML_Deploy_Model variables

Variable name

Description

Type

Workflow variables

CONTAINER_PLATFORM

Specifies the type of container platform to be used (no container, docker, singularity, or podman).

String (default=docker)

CONTAINER_GPU_ENABLED

If True, containers will run based on images containing libraries that are compatible with GPU.

Boolean (default=False)

CONTAINER_IMAGE

Specifies the name of the image that will be used to run the different workflow tasks.

String (default=Empty).

SERVICE_TOKEN

A valid token generated by the MaaS_ML Service for user authentication.

String (default=Empty).

Task variables

DEPLOY_MODEL_ENDPOINT

A URL endpoint defined by the user where the AI Model was deployed.

URL (default=Empty).

API_EXTENSION

The base path to access the deployment endpoint.

String (default="/api/deploy")

MODEL_URL

A valid URL specified by the user referencing the model that needs to be deployed.

URL (default= https://activeeon-public.s3.eu-west-2.amazonaws.com/models[] )

SERVICE_TOKEN

A valid token generated by the MaaS_ML Service for user authentication.

String (default=Empty).

DRIFT_DETECTION_WINDOW_SIZE

The size of the data to be extracted from the old training dataset to be used as a baseline data for the drift detection.

Integer (default=50).

MODEL_NAME

The name of the model to be deployed.

String

MODEL_VERSION

The version number of the model that will be deployed.

Integer (default=1)

BASELINE_DATA_URL

URL of the dataset to be deployed and used in the data drift detection process.

URL (default=\https://activeeon-public.s3.eu-west-2.amazonaws.com/datasets/baseline_data.csv)

7.2.3. Call the Service for Prediction

Once the model is deployed, you can also call the service for prediction directly from the Studio Portal.

Drag and drop the MaaS_ML_Call_Prediction task from the ai-model-as-a-service bucket.

Execute the Workflow and set the different workflow’s variables as follows:

Table 6. MaaS_ML_Call_Prediction variables

Variable name

Description

Type

Workflow variables

CONTAINER_PLATFORM

Specifies the type of container platform to be used (no container, docker, singularity, or podman).

String (default=docker)

CONTAINER_GPU_ENABLED

If True, containers will run based on images containing libraries that are compatible with GPU.

Boolean (default=False)

CONTAINER_IMAGE

Specifies the name of the image that will be used to run the different workflow tasks.

String (default=Empty).

SERVICE_TOKEN

A valid token generated by the MaaS_ML Service for user authentication.

String (default=Empty).

Task variables

PREDICT_MODEL_ENDPOINT

The endpoint of the started service.

URL (default=Empty)

SERVICE_TOKEN

A valid token generated by the MaaS_ML Service for user authentication.

String (default=Empty).

PREDICT_EXTENSION

The base path to access the prediction endpoint.

String (default="/api/predict")

INPUT_DATA

Entry data that needs to be scored by the deployed model.

JSON (default=Empty)

LABEL_COLUMN

Name of the label column. It needs to be set if data is labeled.

String (default=Empty)

DATA_DRIFT_DETECTOR

Name of the data drift detector to be used in the drift detection process.

List [HDDM, Page Hinkley, ADWIN] (default="HDDM")

MODEL_NAME

The name of the model to be deployed.

String

MODEL_VERSION

The version number of the model that will be deployed.

Integer (default=1)

SAVE_PREDICTIONS

Save the resulted predictions in order to be able to display them through the data analytics dashboard.

boolean (default=False)

DRIFT_ENABLED

True if a detector is needed to detect data drifts in the input data based on the baseline data.

boolean (default=False)

DRIFT_NOTIFICATION

True if the user needs to get a notification via Proactive if a data drift is detected.

boolean (default=False)

7.2.4. Delete/Finish the Service

You can also delete the service instance using the Studio Portal.

Drag and drop the MaaS_ML_Actions task from the ai-model-as-a-service bucket.

Execute the Workflow and set the different workflow’s variables as follows:

Table 7. MaaS_ML_Actions variables

Variable name

Description

Type

Task variables

ACTION

The action that will be processed regarding the service status.

List [Pause_MaaS_ML, Resume_MaaS_ML, Finish_MaaS_ML] (default="Finish_MaaS_ML")

INSTANCE_NAME

The name of the service that the action will be processed on.

String (default="maas-ml-${PA_JOB_ID}")

INSTANCE_ID

The service instance ID.

String (default=Empty)

7.3. MaaS_ML Via Service Automation Portal

7.3.1. Start a Generic Service Instance

Search for MaaS_ML in Services Workflows List.

Set the following variables:

Table 8. MaaS_ML variables

Variable name

Description

Type

BUILD_IMAGE_IF_NOT_EXISTS

Pull and build the singularity image if the Singularity Image File (SIF) file is not available.

Boolean (default=True)

DEBUG_ENABLED

If True, the user will be able to examine the stream of output results of each task.

Boolean (default=True)

DOCKER_IMAGE

Specifies the name of the Docker image that will be used to run the different workflow tasks.

String (default="activeeon/maas_ml")

ENDPOINT_ID

The endpoint_id that will be used if PROXYFIED is set to True.

String (default="maas-ml-gui")

ENGINE

Container engine.

List (default="docker")

GPU_ENABLED

If True, the service will be configured to use the GPU and the Nvidia Rapids library.

Boolean (default=False)

HTTPS_ENABLED

True if the protocol https is needed for the defined model-service.

Boolean (default=False)

INSTANCE_NAME

The name of the service that will be deployed.

String (default="maas-ml")

NODE_NAME

The name of the node where the service will be deployed. If empty, the service will be deployed on an available node selected randomly.

String (default=Empty)

PROXYFIED

True if a proxy is needed to protect the access to this model-service endpoint.

Boolean (default=False)

PYTHON_ENTRYPOINT

This entry script starts the service and defines the different functions to deploy the model, scores the prediction requests based on the deployed model, and returns the results. This script is specific to your model. This file should be stored in the Catalog under the model_as_service_resources bucket. More information about this file can be found in the Customize the Service section.

String (default="ml_service").

SERVICE_PORT

Controls the port used to start the Model Service from Service Automation Portal. -1 for random port allocation.

Integer (default="-1").

SINGULARITY_IMAGE_PATH

Location of the singularity image on the node file system (this path will be used to either store the singularity image or the image will be directly used if the file is present).

String (default="/tmp/maas_ml.sif")

TRACE_ENABLED

True if the user wants to keep a trace on the different changes occurring in the service.

Boolean (default=True)

YAML_FILE

A YAML file that describes the OpenAPI Specification ver. 2 (known as Swagger Spec) of the service. This file should be stored in the catalog under the model_as_service_resources bucket. More information about the structure of this file can be found in the section Customize the Service.

String (default="ml_service-api").

Click on Execute Action and follow the progress of the service creation.

500

7.3.2. Deploy a Specific ML Model

Once the status of your generic model service is displayed as RUNNING on Service Automation, you can deploy your model by following the steps below :

Select and execute the Deploy_ML_Model from Actions to deploy your model.

Set the Following variables:

Table 9. Deploy_ML_Model variables

Variable name

Description

Type

BASELINE_DATA_URL

URL of the dataset to be deployed and used in the data drift detection process.

URL (default= https://activeeon-public.s3.eu-west-2.amazonaws.com/datasets/baseline_data.csv)

MODEL_NAME

The name of the model to be deployed.

String (default="iris_flowers_classifier")

MODEL_URL

A valid URL specified by the user referencing to the model that needs to be deployed.

URL (default= https://activeeon-public.s3.eu-west-2.amazonaws.com/models/model.pkl)

MODEL_VERSION

The version number of the model that will be deployed.

Integer (default=1)

USER_NAME

A valid username having the needed privileges to execute this action.

String (default="user")

Click on Execute Action and follow the progress of the model deployment.

Check that the status correctly evolves to AI_MODEL_DEPLOYED.

7.3.3. Delete/Finish or Update the Service Instance

You can delete the launched service instance directly from Service Automation Portal:

Set the action Finish under Actions and click on Execute Action.

MAAS ML Delete Service

There is also one more action that can be executed from Service Automation Portal which is:

  • Update_MaaS_ML_Parameters: This action enables you to update the variable values associated to the MaaS_ML instance according to your new preferences.

7.3.4. MaaS_ML Analytics

When the MaaS_ML service instance is running, the user is able to access the MaaS_ML Analytics page by clicking on the instance’s endpoint which contains 4 tabs:

  • Audit and Traceability

  • Dataset Analytics

  • Data Drift Analytics

  • Predictions Preview

Audit and Traceability

When clicking on the endpoint, the user is redirected to a 4 tabs webpage. By default, the Audit and Traceability page is opened. In this page, the user can check the different chosen values of the MaaS_ML instance variables. In addition, the MaaS_ML traceability information and warnings are listed in a table where each row represents information about the initialization, deployment, prediction, etc. corresponding to different date/time(s). The figure below shows an overview of the Audit and Traceability tab.

MaaS ML traceability
Dataset Analytics

As MaaS_ML supports versioning, you are able to deploy multiple model versions for the same model type. When deploying different model versions, you have the possibility to associate each version with a subset of the data used to train the model i.e. the baseline data. The main job of the baseline data is to help in detecting drifts in the future input datasets. Data drift detection is detailed in Data Drift Detection (DDD) subsection. Using the several baseline datasets, optionally, deployed with the different model versions, you are able to compare the changes occurring from one model version to another, specifically regarding the datasets used to train them.

As shown in the figure below, using the three dropdowns on the top of this tab page, you can choose the model name, the feature (or column) name you would like to monitor and the metric which is based on some data statistical functions (Mean, Minimum, Maximum, Variance, Standard Deviation). By choosing these three values, the first graph will show the evolution of the values (according to the chosen statistical function) of the chosen feature relative to the different model versions. You also have the possibility to monitor multiple features at the same time by choosing multiple feature names in the second dropdown. You can add or remove any of the displayed graphical lines using the features dropdown. Details about the obtained values are displayed by hovering over the markers on each graphical line.

If you click on one of these markers, a histogram will appear in the second graph of this tab page. The displayed histogram shows a comparison of the probability density distributions of the data values of the selected feature among all the deployed model version. By clicking on the content of the legend, you can include or exclude from the comparison any of the model versions.

MaaS ML data analytics tab
Data Drift Analytics

Coming soon!

Predictions Preview

When the user calls a deployed model of a specific version to obtain some predictions, he can choose to save the resulting predictions. The saved predictions can be previewed in the Predictions Preview tab page. As shown in the figure below, you can choose the model name and the model version using the dropdowns in the top of the page. According to your choices, the predictions dataframe will be previewed. The figure below shows an example of the previewed predictions.

MaaS ML predictions tab

7.4. MaaS_ML Via Swagger UI

To access the Swagger UI, click on the button "GO TO SWAGGER UI" in the top of the Traceability & Audit tab in the MaaS_ML Analytics page.

Through this Swagger UI, you are now able to:

  • Ask for an api_token

  • Deploy a model

  • List the deployed models

  • Make predictions

  • Return the stored traceability information

  • Remove deployed model

  • Update the service parameter

7.4.1. Deploy/delete a Specific ML Model version

You can also deploy a specific ML model using the Swagger UI:

Open the Swagger UI.

Select the get_token operation and get an api_token by entering your username (default value is user).

Select the deploy operation and set the provided token and upload the model version that need to be deployed.

MAAS Deploy Swagger

Select list_saved_models to return the list of all already deployed models.

Select delete_deployed_model to remove a specific model version.

7.4.2. Call the Service for Predictions

Once the model is deployed, you can call the service for predictions using the Swagger UI:

Open the Swagger UI.

Select the get_token operation and get an api_token by entering your username (default value is user).

Select the predict operation and set the provided token, the distinct parameters (drift_enabled, drift_notification, detector, model_name, model_version, etc) and the data that you need to score.

MAAS Predict Swagger

7.5. Deployment Pipeline Examples

You can connect the different tasks in a single workflow to get the full pipeline from the model training step to the model deployment and consumption steps. Each task will propagate the acquired variables to its children tasks. The following workflows are available on the ai-model_as_a_service bucket:

Diabetics_Deploy_Predict_Classifier_Model: trains a Diabetics Classifier based on a Random Forest Algorithm and then deploys this classifier in a MaaS_ML service instance. IRIS_Deploy_Predict_Flower_Classifier_Model_Interactive: trains an Iris Flower Classifier, starts a service instance where the trained model is deployed, and the input data is scored by consuming the endpoints exposed by the MaaS_ML service. The figure below describes this workflow. IRIS_Deploy_Flower_Classifier_Model: trains an Iris Flower Classifier and deploys it in a new service instance. This instance is stopped when the user triggers the signal through the Workflow Execution portal.

MAAS ML IRIS Workflow Example Interactive

7.6. Customize the Service

It is possible to customize the model as a service defined by default and adapt it to your specific needs. Indeed, you can customize the following elements according to your needs:

  • The file specified in the PYTHON_ENTRYPOINT variable

  • The file specified in the YAML_FILE variable

  • The docker image specified in the DOCKER_IMAGE variable

In the following, we describe in depth the content of each element:

PYTHON_ENTRYPOINT file: The following python script refers to the ml_service.py file stored in the catalog under the ai-model_as_a_service_resources bucket. This script defines the different functions needed to deploy the model, score data and generate tokens. It is possible to edit this script to make it more customized to your model. The entry script must take into consideration the:

  • List of users that are allowed to consume the service endpoints

  • Format of the model expected by the deployment, and the prediction functions (e.g., pickle, joblib, etc.)

  • Format of the incoming data (e.g., JSON, Array, Matrix, etc.)

  • Data format expected by the model (e.g., JSON, Array, Matrix, etc.)

The ai-model_as_a_service_resources bucket can be found under the Catalog section in the Automation Dashboard portal.

YAML_FILE file: The following YAML script refers to the ml_service-api.yml file stored in the catalog under the ai-model_as_a_service_resources bucket. This script defines the OpenAPI specification describing the entire API built once a model_service is started. You can adapt and edit this script in order to customize your service.

DOCKER_IMAGE name: Choose your own image containing the different dependencies required to run your ENTRYPOINT_SCRIPT. Activeeon provides a pre-built image activeeon/model_as_a_service ìncluding different machine learning and deep learning libraries. If you need to use your own docker image to start the service, you need to install the following libraries in your image:

# install java
apt-get update && apt-get install -y openjdk-11-jdk
apt-get install ca-certificates-java && update-ca-certificates -f
JAVA_HOME /usr/lib/jvm/java-11-openjdk-amd64/
export JAVA_HOME
apt-get clean

# install python libraries
pip install connexion[swagger-ui](1)
pip install py4j(2)

# install your dependent libraries
...
1 Connexion allows you to write an OpenAPI specification, then maps the endpoints to your Python functions.
2 py4j enables Python programs running in a Python interpreter to dynamically access Java objects in a Java Virtual Machine.

7.7. Data Drift Detection (DDD)

The data evolves over time and can therefore cause degradations affecting the intrinsic characteristics and behavior of the learning model. Data drift is one of the main reasons why the accuracy of the model degrades over time. Therefore, it is important that the model is able to adapt to these changes. Monitoring data drifts allows detecting the model performance drops (as in the figure below) to take actions accordingly. To deal with this problem, we have integrated a data drift detection mechanism in the Machine Learning as a Service module of ProActive.

drift

This mechanism enables the discovery of data drifts at a fine level of granularity. We have developed a DDD mechanism that not only allows us to discover that a drift has occurred, but also to indicate the exact location of the drift in the input dataset. It can specify on which attributes or characteristics and more precisely on which lines the drift occurred.

This allows the user to better manage the drift and to act accordingly. In fact, the input data set is monitored using different methods of data drift detection in which the user is free to choose the one that best suits his needs. Currently, we use three well-known methods for this purpose: HDDM, Page Hinkley and ADWIN. This list is likely to be expanded in the future to include other drift detection methods. To detect drift in a new dataset, it is necessary to compare it to the old training dataset on which the model was trained. Thus, any detected drift indicates that the model is not the best predictor and that new training on the new dataset should take place.

As the DDD function is part of the MaaS_ML module, it can also be launched from different ProActive portals.

7.7.1. Via Studio Portal

The data drift detection mechanism is added to the tasks and workflows of the bucket ai-model_as_a_service. This mechanism is directly linked to the deployment of the model in MaaS_ML (where the user deploys the model, and a part of the old training dataset to be used for drift detection in the new input data set), and to the call of the prediction service in MaaS_ML (where the drift detector is chosen, and the detection process is started using the chosen detector).

The workflow IRIS_Deploy_Predict_Flower_Classifier_Model, found in the ai-model_as_a_service bucket in the ProActive Studio Portal, shows an example of pipeline using the generic tasks MaaS_ML_Deploy_Model and MaaS_ML_Call_Prediction including the DDD mechanism.

In particular, in the MaaS_ML_Deploy_Model task, the user is asked to enter DRIFT_DETECTION_WINDOW_SIZE which is a task variable specifying the size of the data to be extracted from the old training dataset (the dataset on which the model was initially trained). For example, if the user chooses a value = 50 for this variable, the algorithm will randomly choose 50 lines from the old training dataset. This subset of data (we call it baseline data) will be saved in the service in order to be used afterward for the data drift detection process which will be enacted in the MaaS_ML_Call_Prediction.

In the Call_Prediction_Service, the user is asked to choose the data drift detector to be used in the drift detection process. This can be chosen by using the task variable DATA_DRIFT_DETECTOR in which the user can choose one of HDDM, Page Hinkley or ADWIN as a drift detector. The algorithm here concatenates the deployed baseline_data to the new input (to be predicted) dataset. The chosen drift detector then will use the concatenated data to extract the rows and columns where the drift took place in the new data. These drift detection algorithms are enhanced in ProActive to be able to detect the attributes where the drift occurred (the columns).

The obtained predictions and drifts can be viewed in the resulting output of the ProActive Scheduler Portal.

7.7.2. Via Service Automation Portal and Swagger UI

As we have mentioned earlier in this documentation, a model can be deployed using the MaaS_ML in the Service_Automation of the Automation Dashboard Portal. To enable the data drift detection process, the DRIFT_ENABLED variable should be set to True. Once the service is launched, the model can be deployed by choosing Deploy_MaaS_ML action. In the Deploy_MaaS_ML variables, the user can specify the url of the baseline_data using the BASELINE_DATA_URL variable that appears in the popup window of Deploy_Model_Service action. In case you need to change the baseline data, it can be updated using the BASELINE_DATA_URL variable of the Update_MaaS_ML action of MaaS_ML.

Once the model is deployed via Service Automation portal, the Swagger user interface can be opened via the MaaS_ML instance api, offering different endpoints to help the user manage the drift detection mechanism. This mechanism has been particularly integrated into the following three endpoints:

  • In the /deploy() endpoint, a user can choose the model (using model_file variable) and its associated baseline data (using baseline_data variable) to be deployed.

  • In the /predict() endpoint, the user specifies the drift detection method, and the input data to be predicted. Using the baseline data and the specified drift detector method, our algorithm can detect in which attributes and particularly which rows the drift took place in the input data. The results are shown in Response Body of the /predict() endpoint and in the Traceability and Audit page.

In case a data drift has occurred, a user will receive a notification using the ProActive Notification service in the Automation Dashboard.

8. Model as a Service for Deep Learning (MaaS_DL)

MaaS_DL is a model deployment service for putting AI models to production. MaaS_DL comes with new capabilities, compared to MaaS_ML, enabling users to deploy deep learning models, to update the deployed models with an updated version and to easily rollback to any previous version(s). It provides out-of-the-box integration with TensorFlow Serving TFX taking advantage of its flexibility and high-performance serving system.

The life cycle of any MaaS_DL instance (i.e., from starting the generic service instance, deploying an AI specific model to pausing or deleting the instance) can be managed in three different ways in PAIO :

  • Using the Studio Portal and more specifically the bucket ai-model-as-a-service where specific generic tasks are provided to process all the possible actions (i.e., MaaS_DL_Service_Start, MaaS_DL_Deploy_Model, MaaS_DL_Actions[Finish/Pause/Resume], MaaS_DL_Undeploy_Model). These tasks can be easily integrated to your AI pipelines/workflows as you can see in this Deployment Pipeline Example.

  • Using the Service Automation Portal by executing the different actions associated to MaaS_DL (i.e. Deploy_DL_Model, Redeploy_DL_Model, Undeploy_DL_Model)

  • Using the Swagger UI which is accessible once the MaaS_DL instance is up and running.

Using MaaS_DL, you can easily deploy and use any machine or deep learning model as a REST Web Service on a physical or a virtual compute host on which there is an available ProActive Node. Going through the ProActive Scheduler, you can also trigger the deployment of a specific VM using the Resource Manager elastic policies, and eventually, deploy a Model-Service on that specific node.

In the following subsections, we will describe the MaaS_DL instance life cycle, from starting the generic service instance, deploying a specific model, undeploying it, to deleting the instance. We will also describe how the MaaS_DL instance life cycle can be managed via four different ways in PAIO:

In the description below, multiple tables represent the main variables that characterize the MaaS_DL workflows. In addition to the variables mentioned below, there is a set of generic variables that are common between all workflows which can be found in the subsection AI Workflows Common Variables. The management of the life cycle of MaaS_DL will be detailed in the next subsections.

8.1. MaaS_DL Via Workflow Execution Portal

Click on the Submit a Job button and then search for MaaS_DL_Service workflow as described in the image below.

MaaS DL Search

Check the service parameters and click on the Submit button to start a MaaS_DL service instance.

To get more information about the service parameters, please check the section MaaS_DL Via Service Automation Portal.

MaaS DL Submit

You can now monitor the service status, access its endpoint and execute its different actions:

  • Deploy_DL_Model : enables you to deploy a trained ML model in one click.

  • Finish_MaaS_DL : stops and deletes the service instance.

  • Redeploy_DL_Model : enables you to redeploy a DL model which was already redeployed.

  • Undeploy_ML_Model : enables you to undeploy an already deployed model.

MaaS DL Workflow Management

When you are done with the service instance, you can terminate it by clicking on Terminate_Job_and_Service button as shown in the image below.

Terminate MaaS DL

8.2. MaaS_DL Via Studio Portal

8.2.1. Start a Generic Service Instance

Open the Studio Portal.

Create a new workflow.

Add the ai-model_as_a_service bucket by clicking in the View menu field > Add Bucket Menu to the Palette > ai-model_as_a_service.

Drag and drop the MaaS_DL_Service_Start task from the bucket.

Execute the workflow by setting the different workflow’s variables as described in the Table below.

Table 10. MaaS_DL_Service_Start variables

Variable name

Description

Type

Workflow variables

MODEL_SERVICE_INSTANCE_NAME

Service instance name.

String (default="maas_dl-${PA_JOB_ID}").

MODEL_SERVICE_PROXIFIED

Allows access to the endpoint through an Http(s) Proxy.

Boolean (default=False).

MODEL_SERVICE_ENTRYPOINT

This entry script starts the service and defines the different functions to deploy the model, scores the prediction requests based on the deployed model, and returns the results. This script is specific to your model. This file should be stored in the Catalog under the model_as_service_resources bucket. More information about this file can be found in the Customize the Service section.

String (default="dl_service").

MODEL_SERVICE_YAML_FILE

A YAML file that describes the OpenAPI Specification ver. 2 (known as Swagger Spec) of the service. This file should be stored in the catalog under the model_as_service_resources bucket. More information about the structure of this file can be found in the section Customize the Service.

String (default="dl_service-api").

MODEL_SERVICE_USER_NAME

A valid user name having the needed privileges to execute this action.

String (default="user").

MODEL_SERVICE_NODE_NAME

The name of the node where the service will be deployed. If empty, the service will be deployed on an available node selected randomly.

String

Task variables

SERVICE_ID

The name of the service. Please keep the default value for this variable.

String (default="MaaS_DL")

INSTANCE_NAME

The name of the service that will be deployed.

String (default="$MODEL_SERVICE_INSTANCE_NAME")

ENGINE

Container engine.

String (default="$CONTAINER_PLATFORM")

PROXIFIED

It takes by default the value of MODEL_SERVICE_PROXYFIED workflow variable.

String (default="$MODEL_SERVICE_PROXYFIED")

PYTHON_ENTRYPOINT

It takes by default the value of MODEL_SERVICE_ENTRYPOINT workflow variable.

String (default="$MODEL_SERVICE_ENTRYPOINT")

YAML_FILE

It takes by default the value of MODEL_SERVICE_YAML_FILE workflow variable.

String (default="$MODEL_SERVICE_YAML_FILE")

USER_NAME

It takes by default the value of MODEL_SERVICE_USER_NAME workflow variable.

String (default="$MODEL_SERVICE_USER_NAME")

NODE_NAME

It takes by default the value of MODEL_SERVICE_NODE_NAME workflow variable.

String (default="$MODEL_SERVICE_NODE_NAME")

8.2.2. Deploy a Specific DL Model

You can also deploy a specific DL model directly from the Studio Portal.

Drag and drop the MaaS_DL_Deploy_Model task from the ai-model-as-a-service bucket.

Execute the workflow and set the different workflow’s variables as follows:

Table 11. MaaS_DL_Deploy_Model variables

Variable name

Description

Type

Workflow variables

CONTAINER_PLATFORM

Specifies the type of container platform to be used (no container, docker, singularity, or podman).

String (default=docker)

CONTAINER_GPU_ENABLED

If True, containers will run based on images containing libraries that are compatible with GPU.

Boolean (default=False)

CONTAINER_IMAGE

Specifies the name of the image that will be used to run the different workflow tasks.

String (default=Empty).

SERVICE_TOKEN

A valid token generated by the MaaS_DL Service for user authentication.

String (default=Empty).

Task variables

MaaS_DL_INSTANCE_ENDPOINT

The endpoint of the started service.

URL (default=Empty).

DEPLOY_ENDPOINT

The base path to access the deployment endpoint.

String (default="/api/deploy")

MODEL_URL

A valid URL specified by the user referencing the model that needs to be deployed.

URL (default=Empty )

MODEL_VERSION

The version number of the model that will be deployed.

Integer (default=Empty )

SERVICE_TOKEN

A valid token generated by the MaaS_DL Service for user authentication.

String (default=Empty).

APPEND

If True, the model will be appended to the list of already deployed models.

Boolean (default=True).

8.2.3. Undeploy a Specific DL Model

You can also undeploy a specific DL model directly from the Studio Portal.

Drag and drop the MaaS_DL_Undeploy_Model task from the ai-model-as-a-service bucket.

Execute the Workflow and set the different workflow’s variables as follows:

Table 12. MaaS_DL_Undeploy_Model variables

Variable name

Description

Type

Task variables

MaaS_DL_INSTANCE_ENDPOINT

The endpoint of the started service.

URL (default=Empty).

UNDEPLOY_ENDPOINT

The base path to access the undeployment endpoint.

String (default="/api/undeploy_model")

MODEL_NAME

The name of the model to be undeployed.

String (default=Empty )

MODEL_VERSION

The version number of the model that will be undeployed.

Integer (default=Empty )

8.2.4. Call the Service for Prediction

Once the model is deployed, you can also call the service for prediction directly from the Studio Portal.

Drag and drop the MaaS_DL_Call_Prediction task from the ai-model-as-a-service bucket.

Execute the Workflow and set the different workflow’s variables as follows:

Table 13. MaaS_DL_Call_Prediction variables

Variable name

Description

Type

Workflow variables

CONTAINER_PLATFORM

Specifies the type of container platform to be used (no container, docker, singularity, or podman).

String (default=docker)

CONTAINER_GPU_ENABLED

If True, containers will run based on images containing libraries that are compatible with GPU.

Boolean (default=False)

CONTAINER_IMAGE

Specifies the name of the image that will be used to run the different workflow tasks.

String (default=Empty).

SERVICE_TOKEN

A valid token generated by the MaaS_DL Service for user authentication.

String (default=Empty).

Task variables

MaaS_DL_INSTANCE_ENDPOINT

The endpoint of the started service.

URL (default=Empty).

PREDICT_ENDPOINT

The base path to access the endpoint for prediction.

String (default="/api/predict")

SERVICE_TOKEN

A valid token generated by the MaaS_DL Service for user authentication.

String (default=Empty).

MODEL_NAME

The name of the deployed model that will be used for predictions.

String (default=Empty)

MODEL_VERSION

The version number of the model that will be used for predictions.

Integer (default=Empty)

INSTANCES

Entry data that needs to be scored by the deployed model.

String (default=Empty)

CLASS_NAMES

Sorted class names that were used to train the deployed model.

String (default=Empty)

8.2.5. Delete/Finish the Service

You can also delete the service instance using the Studio Portal.

Drag and drop the MaaS_DL_Actions task from the ai-model-as-a-service bucket.

Execute the Workflow and set the different workflow’s variables as follows:

Table 14. MaaS_DL_Actions variables

Variable name

Description

Type

Task variables

ACTION

The action that will be processed regarding the service status.

List [Pause_MaaS_DL, Resume_MaaS_DL, Finish_MaaS_DL] (default="Finish_MaaS_DL")

INSTANCE_NAME

The name of the service that the action will be processed on.

String (default="maas-dl-${PA_JOB_ID}")

INSTANCE_ID

The service instance ID.

String (default=Empty)

8.3. MaaS_DL Via Service Automation Portal

8.3.1. Start a Generic Service Instance

Search for MaaS_DL in Services Workflows List.

Set the following variables:

Table 15. MaaS_DL variables

Variable name

Description

Type

BUILD_IMAGE_IF_NOT_EXISTS

Pull and build the singularity image if the Singularity Image File (SIF) file is not available.

Boolean (default=True)

DEBUG_ENABLED

If True, the user will be able to examine the stream of output results of each task.

Boolean (default=True)

DOCKER_IMAGE

Specifies the name of the Docker image that will be used to run the different workflow tasks.

String (default="activeeon/maas_dl")

ENDPOINT_ID

The endpoint_id that will be used if PROXYFIED is set to True.

String (default="maas_dl-gui")

ENGINE

Container engine.

List (default="docker")

HTTPS_ENABLED

True if the protocol https is needed for the defined model-service.

Boolean (default=False)

INSTANCE_NAME

The name of the service that will be deployed.

String (default="maas_dl")

MODEL_BASE_PATH

Location of the model on the node file system (this path will be used to store the model).

String (default="/tmp")

MODELS_DEPLOYMENT_REFRESH

The amount of seconds to periodically poll for updated versions of the model configuration file.

Integer (default = 30)

NATIVE_SCHEDULER

Name of the Native Scheduler node source to use when the workflow tasks must be deployed inside a cluster such as SLURM, LSF, etc.

String (default=Empty)

NATIVE_SCHEDULER_PARAMS

Parameters given to the native scheduler (SLURM, LSF, etc) while requesting a ProActive node used to deploy the workflow tasks.

String (default=Empty)

NODE_NAME

The name of the node where the service will be deployed. If empty, the service will be deployed on an available node selected randomly.

String (default=Empty)

PROXYFIED

True if a proxy is needed to protect the access to this model-service endpoint.

Boolean (default=False)

PYTHON_ENTRYPOINT

This entry script starts the service and defines the different functions to deploy the model, scores the prediction requests based on the deployed model, and returns the results. This script is specific to your model. This file should be stored in the Catalog under the model_as_service_resources bucket. More information about this file can be found in the Customize the Service section.

String (default="dl_service").

SERVICE_PORT

Controls the port used to start the Model Service from Service Automation Portal. -1 for random port allocation.

Integer (default="-1").

SINGULARITY_IMAGE_PATH

Location of the singularity image on the node file system (this path will be used to either store the singularity image or the image will be directly used if the file is present).

String (default="/tmp/maas_dl.sif")

TRACE_ENABLED

True if the user wants to keep a trace on the different changes occurring in the service.

Boolean (default=True)

YAML_FILE

A YAML file that describes the OpenAPI Specification ver. 2 (known as Swagger Spec) of the service. This file should be stored in the catalog under the model_as_service_resources bucket. More information about the structure of this file can be found in the section Customize the Service.

String (default="dl_service-api").

Click on Execute Action and follow the progress of the service creation.

500

8.3.2. Deploy a Specific DL Model

Once the status of your generic model service is displayed as RUNNING on Service Automation, you can deploy your model by following the steps below :

Select and execute the Deploy_DL_Model from Actions to deploy your model.

Set the Following variables:

Table 16. Deploy_DL_Model variables

Variable name

Description

Type

APPEND

If True, the model will be appended to the list of already deployed models.

Boolean (default= True)

MODEL_NAME

The name of the model to be deployed.

String (default= "mnist_model")

MODEL_URL

A valid URL specified by the user referencing to the model that needs to be deployed.

URL (default= Empty)

MODEL_VERSION

The version number of the model that will be deployed.

Integer (default= 1)

USER_NAME

A valid user name having the needed privileges to execute this action.

String (default= "user")

Click on Execute Action and follow the progress of the model deployment.

8.3.3. Redeploy a Specific DL Model

It is also possible to redeploy a specific DL model version that has been already deployed and saved at least once in this service instance by following the steps below :

Select and execute the Redeploy_DL_Model from Actions to redeploy your model.

Set the Following variables:

Table 17. Redeploy_DL_Model variables

Variable name

Description

Type

APPEND

If True, the model will be appended to the list of already deployed models.

Boolean (default= True)

MODEL_NAME

The name of the model to be redeployed.

String (default= "mnist_model")

MODEL_VERSION

The version number of the model that will be redeployed.

Integer (default= 1)

USER_NAME

A valid user name having the needed privileges to execute this action.

String (default= "user")

Click on Execute Action and follow the progress of the model deployment.

8.3.4. Delete/Finish the Service Instance

You can delete the launched service instance directly from Service Automation Portal:

Set the action Finish under Actions and click on execute.

MAAS DL Delete Service

8.4. MaaS_DL Via Swagger UI

To access the Swagger UI, click on the second link in the top of the Traceability & Audit page.

Through this Swagger UI, you are now able to:

  • Ask for an api_token

  • Deploy a model

  • List the deployed models

  • List the saved models in MODELS_PATH repository

  • Delete the saved models in MODELS_PATH repository

  • Make predictions

  • Redeploy a previous deployed model

  • Return the stored traceability information

  • Remove deployed model

  • Upload a new model config that will be used by Tensorflow model server

  • Return the model config used by the Tensorflow model server

8.4.1. Deploy/Undeploy/Redeploy a Specific DL Model

You can also deploy a specific DL model using the Swagger UI:

Open the Swagger UI.

Open the get_token operation and get an api_token by entering your username (default value is user).

Open the deploy operation and set the provided token and upload the model that need to be deployed.

NEW MAAS DL Deploy Swagger

Open list_deployed_models to return the list of all already deployed models.

Open list_saved_models to return the list of saved models in MODELS_PATH repository.

Open redeploy to redeploy a previously deployed model using its token.

Open undeploy to remove a deployed model using its token.

Open clean_saved_models to delete the list of saved models in MODELS_PATH repository.

8.4.2. Call the Service for Predictions

Once the model is deployed, you can call the service for predictions using the Swagger UI:

Open the Swagger UI.

Open the get_token operation and get an api_token by entering your username (default value is user).

Open the predict operation and set the provided token and the data that you need to score.

MAAS DL Predict Swagger

8.4.3. Upload/Download Model Configuration

Once the model is deployed, you can download and/or upload the model configuration file using the Swagger UI: Open the Swagger UI.

Open the get_token operation and get an api_token by entering your username (default value is user).

Open the download_model_config to return the model config used by the Tensorflow model server.

Open the upload_model_config operation and set the provided token and the new model config file that will be used by Tensorflow model serve.

MAAS DL model config

8.5. Deployment Pipeline Example

You can connect the different tasks in a single workflow to get the full pipeline from the model training step to the model deployment and consumption steps. Each task will propagate the acquired variables to its children tasks. The following workflow example is available on the ai-model_as_a_service bucket under the name of MNIST_Model_Training_and_Deployment. This example trains a Mnist model, starts a service instance where the trained model is deployed as a service using the Maas_DL PSA service.

NEW MAAS DL MNIST Workflow Example

9. AutoFeat

The performance of a machine learning model depends not only on the model and the hyper-parameters but also on how we process and feed different types of variables to the model.

Before starting the modelling phase, it is required to perform various tasks for data preparation. Encoding categorical data is one of the most crucial tasks. In real life, data commonly come with categorical string values and most of the machine learning models perform mathematical operations. However, the harsh truth is that mathematics is totally dependent on numbers. As a matter of fact, we can say that most of the machine learning models only accept numerical variables (generally floats or integers) and not strings. Then, preprocessing and encoding the categorical variables become a crucial step to convert these variables into numbers that can help in predicting the results in a machine learning task.

AutoFeat provides a complete solution to assist data scientists to encode successfully their categorical data.

In real-world problems, most of the time we require choosing one encoding method for the proper working of the model. Working with different encoders can influence the results of the model.

AutoFeat currently supports the following encoding methods:

  • Label: converts each value in a categorical feature into an integer value between 0 and n-1, where n is the number of distinct categories of the variable.

  • Binary: stores categories as binary bitstrings.

  • OneHot: creates a new feature for each category in the categorical variable and replaces it with either 1 (presence of the feature) or 0 (absence of the feature). The number of the new features depends on the number of categories in the categorical variable.

  • Dummy: transforms the categorical variable into a set of binary variables (also known as dummy variables). The dummy encoding is a small improvement over the one-hot-encoding, such it uses n-1 features to represent n categories.

  • BaseN: encodes the categories into arrays of their base-n representation. A base of 1 is equivalent to one-hot encoding and a base of 2 is equivalent to binary encoding.

  • Target: replaces a categorical value with the mean of the target variable.

  • Hash: maps each category to an integer within a pre-determined range n_components. n_components is the number of dimensions, in other words, the number of bits to use to represent the feature. We use 8 bits by default.

The most of these methods are implemented using the python Category Encoders library. Examples can be found in the Category Encoders Examples notebook .

As we already mentioned, the performance of ML algorithms depends on how categorical variables are encoded. The results produced by the model vary depending on the used encoding technique. Thus, the hardest part of categorical encoding can sometimes be finding the right categorical encoding method.

There are numerous research papers and studies dedicated to the analysis of the performance of categorical encoding approaches applied to different datasets. Based on the common factors shared by the datasets using the same encoding method, we have implemented an algorithm for finding the best suited method for your data.

To access the AutoFeat page, please follow the steps below:

  1. Open the Studio Portal.

  2. Create a new workflow.

  3. Drag and drop the Import_Data_And_Automate_Feature_Engineering task from the ai-machine-learning bucket in the ProActive AI Orchestration.

  4. Click on the task and click General Parameters in the left to change the default parameters of this task.

Import Data And Automate Feature Engineering Task
  1. Put in FILE_PATH variable the S3 link to upload your dataset.

  2. Set the other parameters according to your dataset format.

  3. Click on the Execute button to run the workflow and start AutoFeat.

Import Data And Automate Feature Engineering Execute

To get more information about the parameters of the service, please check the section Import_Data_And_Automate_Feature_Engineering.

  1. Open the Workflow Execution Portal.

  2. You can now access the AutoFeat Page by clicking on the endpoint AutoFeat as shown in the image below.

AutoFeat endpoint

You will be redirected to AutoFeat page which initially contains three tabs that we describe in the following sections.

9.1. Data Preview

AutoFeat loads data from external sources. The dataset could be potentially very large. Initially, only the 10 first data rows are displayed.

The Refresh button enables users to see the last updates made on their data.

AutoFeat data preview

9.2. Column summaries

Whenever AutoFeat loads data from external sources, it also identifies the datatype of each column. AutoFeat does a great job at datatype recognition. Each decision can be overridden manually by the user, if required.

AutoFeat also creates some summary statistics for each column. A table is displaying the missing values, minimum, maximum, mean and zeros for each numerical feature, and the cardinality (category counts) for each categorical feature.

AutoFeat column summaries

9.3. Data Preprocessing

A preview of the data is displayed in the Data Preprocessing as follows.

Data Preprocessing

It is possible to change a column information. These changes can include:

  • Column Name: There should rarely be a reason to change the field name.

  • Column Type: AutoFeat automatically recognizes the data type, so the default settings typically do not need to be changed. There are two different data types; Categorical and Numerical.

  • Category Type: Categorical variables can be divided into two categories; Ordinal such the categories have an inherent order and Nominal if the categories do not have any inherent order.

  • Label Column: Only one column can be selected as the label column.

  • Coding Method: The encoding method used for converting the categorical data values into numerical values. The value is set to Auto by default. Thereafter, the best suited method for encoding the categorical feature is automatically identified. The data scientist still has the ability to override every decision and select another encoding method from the drop-down menu. Different methods are supported by AutoFeat such as Label, OneHot, Dummy, Binary, Base N, Hash and Target. Some of those methods require specifying additional encoding parameters. These parameters vary depending on the selected method (e.g., the base and the number of components for BaseN and Hash, respectively, and the target column for Target encoding method). Some of those values are set by default, if no values are specified by the user.

Data Preprocessing

It is also possible to perform the following actions on the dataset:

  • Save, to save the last changes made on a column information.

  • Restore, to restore the original version of the dataset loaded from the external source.

  • Delete Column, to delete a column from the dataset.

  • Preview Encoded Data, to display the encoding results in a new tab.

  • Cancel and Quit, to discard any changes the user may have made and finish the workflow execution.

Once the encoding parameters are set, the user can proceed to display the encoded dataset by clicking on the Preview Encoded Data. He can also check and compare different encoding methods and/or parameters based on the obtained results.

9.4. Encoded data

This page displays the data encoding results based on the selected parameters. At this stage, the user can validate the results by clicking on the button Proceed, or erase the encoded dataset by clicking on the button Delete.

The user can also download the results as a csv file by clicking on the Download button.

AutoFeat encoded data

9.5. ML Pipeline Example

You can connect different tasks in a single workflow to get the full pipeline from data preprocessing to model training and deployment. Each task will propagate the acquired variables to its children tasks. The following workflow example Vehicle_Type_Using_Model_Explainability uses the Import_Data_And_Automate_Feature_Engineering task to prepare the data. It is available on the machine_learning_workflows bucket.

Vehicle Type Using Model Explainability

This workflow predicts vehicle type based on silhouette measurements, and apply ELI5 and Kernel Explainer to understand the model’s global behavior or specific predictions.

10. ProActive Analytics

The ProActive Analytics is a dashboard that provides an overview of executed workflows along with their input variables and results.

It offers several functionalities, including:

  • Advanced search by name, user, date, state, etc.

  • Execution metrics summary about durations, encountered issues, etc.

  • Charts to track variables and results evolution and correlation.

  • Data exportation in multiple formats for further use in analytics tools.

ProActive Analytics is very useful to compare metrics and charts of workflows that have common variables and results. For example, a ML algorithm might take different variables values and produce multiple results. It would be interesting to analyze the correlation and evolution of the algorithm results regarding the input variation (See also a similar example of AutoML). The following sections will show you some key features of the dashboard and how to use them for a better understanding of your job executions.

Job Analytics Page includes a search window that allows users to search for jobs based on specific criteria (see screenshot below). The job search panel allows selecting multi-value filters for the following job parameters:

  • Workflow Name(s): Jobs can be filtered by workflow name. Selecting/Typing one or more workflow names is provided by a built-in auto-complete feature that helps you search for workflows or buckets from the ProActive Catalog.

  • Project Name(s): You can also filter by one or more project names. You just have to specify the project names for the jobs you would like to analyze.

  • Job Status: You can specify the state of jobs you are looking for. The possible job status are: Pending, Running, Stalled, Paused, In_Error, Finished, Canceled, Failed, and Killed. For more information about job status, check the documentation here. Multiple values are accepted as well.

  • User(s): This filter allows to either select only the jobs of the connected/current user or to specify a list of users that have executed the jobs. By default, the toggle filter is activated to select only the user jobs.

  • Submission Time: From the dropdown list, users can select a submission time frame (e.g., yesterday, last week, this month, etc.), or choose custom dates.

  • Variables and results: It is possible to choose whether to display or not the workflow’s variables and results. When deactivated, the charts related to variables and results evolution/correlation will not be displayed in the dashboard.

More advanced search options (highlighted in advanced search hints) could be used to provide filter values such as wildcards. For example, names that start with a specific string value are selected using value*. Other supported expressions are: *value for Ends with, *value* for Contains, !value for Not equals, and !*value* for Not contains.

Now you can hit the search button to request jobs from the scheduler database according to the provided filter values. The search bar at the top shows a summary of the active search filters.

JA search
Figure 1. JA-search

10.1.1. Execution Metrics

As shown in the screenshot below, Job Analytics Portal provides a summary of the most important job execution metrics. For instance, the dashboard shows:

  • A first panel that displays the number of total jobs that correspond to the search query. It also shows the ratio of successful jobs over the total number, and the number of jobs that are in progress and not yet finished. Please note that the number of in-progress jobs corresponds to the moment when the search query is executed and it is not automatically refreshed.

  • A second summary panel that displays the number of jobs with issues. We distinguish two types of issues: jobs that are finished but have encountered issues during their execution and interrupted jobs that did not finish their execution and were stopped due to diverse causes, such as insufficient resources, manual interruption, etc. Interrupted jobs include four status: In-Error, Failed, Canceled, and Killed.

  • The last metric gives an overview of the average duration of the selected jobs.

JA metrics

10.1.2. Job Charts

Job Analytics includes three types of charts:

  • Job duration chart: This chart shows durations per job. The x-axis shows the job ID and the y-axis shows the job duration. Hovering over the lines will also display the same information as a tooltip (see screenshot below). Using the duration chart will eventually help the users to identify any abnormal performance behaviour among several workflow executions.

JA duration
Figure 2. JA duration
  • Job variables chart: This chart is intended to show all variable values of selected jobs. It represents the evolution chart for all numeric-only variables of the selected jobs. The chart provides the ability to hide or show specific input variables by clicking on the variable name in the legend, as shown in the figure below.

  • Job results chart: This chart is intended to show all result values of selected jobs. It represents the evolution chart for all numeric-only results of the selected jobs. The chart provides also the ability to hide or show specific results by clicking on the variable name in the legend, as shown in the figure below.

JA chart
Figure 3. JA results chart

All charts provide some advanced features such as "maximize" and "enlarge" to better visualize the results, and "move" to customize the dashboard layout (see top left side of charts). All of them provide the hovering feature as previously described and two types of charts to display: line and bar charts. Switching from one to the other can be activated through a toggle button located at the top right of the chart. Same for show/hide variables and results.

10.1.3. Job Execution Table

The last element of the Job Analytics dashboard shows a summary table that contains all job executions returned by the search query. It includes the job ID, status, duration, submission time, variables, results, etc. The jobs table provides many features:

  • Filtering: users can specify filter values for every column. For instance, the picture below applies a filter on the duration where we filter only jobs that last more than 30s. For string values, we can apply string-related filters such as Contains. For dates, a calendar is displayed to help users select the right date. Please note that variables and results types are not automatically detected. Therefore, users can choose either the Contains filter or the Greater than and Less than filters.

  • Sort, hide, pin left and right columns: allows users to easily handle and display data with respect to their needs.

  • Export the job data to CSV format: enables users to exploit and process job data using other analytics tools such as R, Matlab, BI tools, ML APIs, etc.

  • Clear and apply filters: When filters are applied, the displayed data is updated. Therefore, we provide a button (see apply filters to charts on the top left of the of the table screenshot) that allows synchronizing the charts with the filtered data in the table. Finally, it is possible to clear all filters. This will automatically deactivate the synchronization.

  • Link to scheduler jobs: data in the job ID column is linked to the job executions in the scheduler. For example, if users want to access to the logs of a failing job, they can click on the corresponding job ID to be redirected to the job location in the Scheduling Portal.

We note also that clicking on the issue types and charts described in the previous sections filters the table to show the corresponding jobs.

It is important to notice that the dashboard layout and search preferences are saved in the browser cache so that users can have access to their last dashboard and search settings.
JA table
Figure 4. JA table

11. ProActive Jupyter Kernel

The ActiveEon Jupyter Kernel adds a kernel backend to Jupyter. This kernel interfaces directly with the ProActive scheduler and constructs tasks and workflows to execute them on the fly.

With this interface, users can run their code locally and test it using a native python kernel, and by a simple switch to ProActive kernel, run it on remote public or private infrastructures without having to modify the code. See the example below:

Direct execution from Jupyter with ActiveEon Kernel
Figure 5. Direct execution from Jupyter with ActiveEon Kernel

11.1. Installation

11.1.1. Requirements

Python 2 or 3

11.1.2. Using PyPi

  • open a terminal

  • install the ProActive jupyter kernel with the following commands:

$ pip install proactive proactive-jupyter-kernel --upgrade
$ python -m proactive-jupyter-kernel.install

11.1.3. Using source code

  • open a terminal

  • clone the repository on your local machine:

$ git clone git@github.com:ow2-proactive/proactive-jupyter-kernel.git
  • install the ProActive jupyter kernel with the following commands:

$ pip install proactive-jupyter-kernel/
$ python -m proactive-jupyter-kernel.install

11.2. Platform

You can use any jupyter platform. We recommend to use jupyter lab. To launch it from your terminal after having installed it:

$ jupyter lab

or in daemon mode:

$ nohup jupyter lab &>/dev/null &

When opened, click on the ProActive icon to open a notebook based on the ProActive kernel.

11.3. Help

As a quick start, we recommend the user to run the #%help() pragma using the following script:

#%help()

This script gives a brief description of all the different pragmas that the ProActive Kernel provides.

To get a more detailed description of a needed pragma, the user can run the following script:

#%help(pragma=PRAGMA_NAME)

11.4. Connection

11.4.1. Using connect()

If you are trying ProActive for the first time, sign up on the try platform. Once you receive your login and password, connect to the trial platform using the #%connect() pragma:

#%connect(login=YOUR_LOGIN, password=YOUR_PASSWORD)

To connect to another ProActive server host, use the later pragma this way:

#%connect(host=YOUR_HOST, [port=YOUR_PORT], login=YOUR_LOGIN, password=YOUR_PASSWORD)
Notice that the port parameter is optional. The default connexion port is 8080.

You can also connect to a distant server by providing its url in the following way:

#%connect(url=YOUR_SERVER_URL, login=YOUR_LOGIN, password=YOUR_PASSWORD)

By providing the complete url of the server, users can eventually connect through the secure HTTPS protocol.

11.4.2. Using a configuration file

For automatic sign in, create a file named proactive_config.ini in your notebook working directory.

Fill your configuration file according to one of the following two formats:

  • By providing the server host and port:

[proactive_server]
host=YOUR_HOST
port=YOUR_PORT
[user]
login=YOUR_LOGIN
password=YOUR_PASSWORD
  • By providing the server url:

[proactive_server]
url=YOUR_SERVER_URL
[user]
login=YOUR_LOGIN
password=YOUR_PASSWORD

Save your changes and restart the ProActive kernel.

You can also force the current kernel to connect using any .ini config file through the #%connect() pragma:

#%connect(path=PATH_TO/YOUR_CONFIG_FILE.ini)

(For more information about this format please check configParser)

11.5. Usage

11.5.1. Creating a Python task

To create a new task, use the pragma #%task() followed by the task implementation script written into a notebook block code. To use this pragma, a task name has to be provided at least. Example:

#%task(name=myTask)
print('Hello world')

General usage:

#%task(name=TASK_NAME, [language=SCRIPT_LANGUAGE], [dep=[TASK_NAME1,TASK_NAME2,...]], [generic_info=[(KEY1,VAL1), (KEY2,VALUE2),...]], [variables=[(VAR1,VAL1), (VAR2,VALUE2),...]], [export=[VAR_NAME1,VAR_NAME2,...]], [import=[VAR_NAME1,VAR_NAME2,...]], [path=IMPLEMENTATION_FILE_PATH])\n'

Users can also provide more information about the task using the pragma’s options. In the following, we give more details about the possible options:

Language

The language parameter is needed when the task script is not written in native Python. If not provided, Python will be selected as the default language. The supported programming languages are:

  • Linux_Bash

  • Windows_Cmd

  • DockerCompose

  • Scalaw

  • Groovy

  • Javascript

  • Jython

  • Python

  • Ruby

  • Perl

  • PowerShell

  • R

Here is an example that shows a task implementation written in Linux_Bash:

#%task(name=myTask, language=Linux_Bash)
echo 'Hello, World!'
Dependencies

One of the most important notions in workflows is the dependencies between tasks. To specify this information, use the dep parameter. Its value should be a list of all tasks on which the new task depends. Example:

#%task(name=myTask,dep=[parentTask1,parentTask2])
print('Hello world')
Variables

To specify task variables, you should provide the variables parameter. Its value should be a list of tuples (key,value) that corresponds to the names and adequate values of the corresponding task variables. Example:

#%task(name=myTask, variables=[(var1,value1),(var2,value2)])
print('Hello world')
Generic information

To specify the values of some advanced ProActive variables called Generic Information, you should provide the generic_info parameter. Its value should be a list of tuples (key,value) that corresponds to the names and adequate values of the Generic Information. Example:

#%task(name=myTask, generic_info=[(var1,value1),(var2,value2)])
print('Hello world')
Export/import variables

The export and import parameters ensure variables propagation between the different tasks of a workflow. If myTask1 variables var1 and var2 are needed in myTask2, both pragmas have to specify this information as follows:

  • myTask1 should include an export parameter with a list of these variable names,

  • myTask2 should include an import parameter with a list including the same names.

Example:

myTask1 implementation block would be:

#%task(name=myTask1, export=[var1,var2])
var1 = "Hello"
var2 = "ActiveEon!"

and myTask2 implementation block would be:

#%task(name=myTask2, dep=[myTask1], import[var1,var2])
print(var1 + " from " + var2)
Implementation file

It is also possible to use an external implementation file to define the task implementation. To do so, the option path should be used.

Example:

#%task(name=myTask,path=PATH_TO/IMPLEMENTATION_FILE.py)

11.5.2. Importing libraries

The main difference between the ProActive and 'native language' kernels resides in the way the memory is accessed during blocks execution. In a common native language kernel, the whole script code (all the notebook blocks) is locally executed in the same shared memory space; whereas the ProActive kernel will execute each created task in an independent process. In order to facilitate the transition from native language to ProActive kernels, we included the pragma #%import(). This pragma gives the user the ability to add libraries that are common to all created tasks, and thus relative distributed processes, that are implemented in the same native script language.

The import pragma is used as follows:

#%import([language=SCRIPT_LANGUAGE]).

Example:

#%import(language=Python)
import os
import pandas
If the language is not specified, Python is considered as default language.

11.5.3. Adding a fork environment

To configure a fork environment for a task, use the #%fork_env() pragma. To do so, you have to provide the name of the corresponding task and the fork environment implementation.

Example:

#%fork_env(name=TASK_NAME)
dockerImageName = 'activeeon/dlm3'
dockerRunCommand =  'docker run '
dockerParameters = '--rm '
paHomeHost = variables.get("PA_SCHEDULER_HOME")
paHomeContainer = variables.get("PA_SCHEDULER_HOME")
proActiveHomeVolume = '-v '+paHomeHost +':'+paHomeContainer+' '
workspaceHost = localspace
workspaceContainer = localspace
workspaceVolume = '-v '+localspace +':'+localspace+' '
containerWorkingDirectory = '-w '+workspaceContainer+' '
preJavaHomeCmd = dockerRunCommand + dockerParameters + proActiveHomeVolume + workspaceVolume + containerWorkingDirectory + dockerImageName

Or, you can provide the task name and the path of a .py file containing the fork environment code:

#%fork_env(name=TASK_NAME, path=PATH_TO/FORK_ENV_FILE.py)

11.5.4. Adding a selection script

To add a selection script to a task, use the #%selection_script() pragma. To do so, you have to provide the name of the corresponding task and the selection code implementation.

Example:

#%selection_script(name=TASK_NAME)
selected = True

Or, you can provide the task name and the path of a .py file containing the selection code:

#%selection_script(name=TASK_NAME, path=PATH_TO/SELECTION_CODE_FILE.py)

11.5.5. Adding job fork environment and/or selection script

If the selection scripts and/or the fork environments are the same for all job tasks, we can add them just once using the job_selection_script and/or the job_fork_env pragmas.

Usage:

For a job selection script, please use:

#%job_selection_script([language=SCRIPT_LANGUAGE], [path=./SELECTION_CODE_FILE.py], [force=on/off])

For a job fork environment, use:

#%job_fork_env([language=SCRIPT_LANGUAGE], [path=./FORK_ENV_FILE.py], [force=on/off])

The force parameter defines whether the pragma has to overwrite the task selection scripts or the fork environment already set.

11.5.6. Adding pre and/or post scripts

Sometimes, specific scripts has to be executed before and/or after a particular task. To do that, the solution provides pre_script and post_script pragmas.

To add a pre-script to a task, please use:

#%pre_script(name=TASK_NAME, language=SCRIPT_LANGUAGE, [path=./PRE_SCRIPT_FILE.py])

To add a post-script to a task, use:

#%post_script(name=TASK_NAME, language=SCRIPT_LANGUAGE, [path=./POST_SCRIPT_FILE.py])

11.5.7. Branch control

The branch control provides the ability to choose between two alternative task flows, with the possibility to merge back to a common one.

To add a branch control to the current workflow, four specific tasks and one control condition should be added in accordance with the following order:

  1. a branch task,

  2. the related branching condition script,

  3. an if task that should be executed if the result of the condition task if true,

  4. an else task that should be executed if the result of the condition task if false,

  5. a continuation task that should be executed after the if or the else tasks.

To add a branch task, you can rely on the following macro:

#%branch([name=TASK_NAME], [dep=[TASK_NAME1,TASK_NAME2,...]], [generic_info=[(KEY1,VAL1), (KEY2,VALUE2),...]], [language=SCRIPT_LANGUAGE], [path=./FORK_ENV_FILE.py])

For the branching condition script, use:

#%condition()

For an if task, please use:

#%if([name=TASK_NAME], [generic_info=[(KEY1,VAL1),(KEY2,VALUE2),...]], [language=SCRIPT_LANGUAGE], [path=./FORK_ENV_FILE.py])

For an else task, use:

#%else([name=TASK_NAME], [generic_info=[(KEY1,VAL1),(KEY2,VALUE2),...]], [language=SCRIPT_LANGUAGE], [path=./FORK_ENV_FILE.py])

And finally, for the continuation task:

#%continuation([name=TASK_NAME], [generic_info=[(KEY1,VAL1),(KEY2,VALUE2),...]], [language=SCRIPT_LANGUAGE], [path=./FORK_ENV_FILE.py])

11.5.8. Loop control

The loop control provides the ability to repeat a set of tasks.

To add a loop control to the current workflow, two specific tasks and one control condition should be added in the following order:

  1. a start task,

  2. the related looping condition script,

  3. a loop task.

For a start task, use:

#%start([name=TASK_NAME], [dep=[TASK_NAME1,TASK_NAME2,...]], [generic_info=[(KEY1,VAL1), (KEY2,VALUE2),...]], [language=SCRIPT_LANGUAGE], [path=./FORK_ENV_FILE.py])

For the looping condition script, use:

#%condition()

For a loop task, please use:

#%loop([name=TASK_NAME], [generic_info=[(KEY1,VAL1),(KEY2,VALUE2),...]], [language=SCRIPT_LANGUAGE], [path=./FORK_ENV_FILE.py])

11.5.9. Replicate control

The replication allows executing multiple tasks in parallel when only one task is defined, and the number of tasks to run could change.

Through the ProActive Jupyter Kernel, users can add replicate controls in two main ways, a generic and a straight forward way.

Generic usage

To add a replicate control to the current workflow in the generic method, three specific tasks and one control runs script should be added according to the following order:

  1. a split task,

  2. the related replication runs script,

  3. a process task,

  4. a merge task.

For a split task, use:

#%split([name=TASK_NAME], [dep=[TASK_NAME1,TASK_NAME2,...]], [generic_info=[(KEY1,VAL1), (KEY2,VALUE2),...]], [language=SCRIPT_LANGUAGE], [path=./FORK_ENV_FILE.py])

For the replication runs script, use:

#%runs()

For a process task, please use:

#%process([name=TASK_NAME], [generic_info=[(KEY1,VAL1),(KEY2,VALUE2),...]], [language=SCRIPT_LANGUAGE], [path=./FORK_ENV_FILE.py])

And finally, for a merge task, use:

#%merge([name=TASK_NAME], [generic_info=[(KEY1,VAL1),(KEY2,VALUE2),...]], [language=SCRIPT_LANGUAGE], [path=./FORK_ENV_FILE.py])
Straight forward usage

The straight forward method to add a replication is most of all useful when the parallelism that should be implemented is a task parallelism (the generic usage is more adapted to data parallelism).

To add a replication to a task, just add the runs control script by providing the runs option of the task pragma. Example:

#%task(name=T2,dep=[T1],runs=3)
print("This output should be displayed 3 times ...")
To construct a valid workflow, straight forward replicated tasks must have one and only one parent task and one child task at most. (More information about replicate validation criteria are available here).

11.5.10. Delete a task

To delete a task from the workflow, the user should run the pragma #%delete_task() in the following way:

#%delete_task(name=TASK_NAME)

11.5.11. Create a job

To create a job, specify job variables and/or job generic information, use the #%job() pragma:

#%job(name=JOB_NAME, [generic_info=[(KEY1,VAL1), (KEY2,VALUE2),...]], [variables=[(VAR1,VAL1), (VAR2,VALUE2),...]])
It is not necessary to create and assign a name explicitly to the job. If not done by the user, this step is implicitly performed when the job is submitted (check section Submit your job to the scheduler for more information).

11.5.12. Visualize job

To visualize the created workflow, use the #%draw_job() pragma to plot the workflow graph that represents the job into a separate window:

#%draw_job()

Two optional parameters can be used to configure the way the kernel plots the workflow graph.

inline plotting:

If this parameter is set to off, plotting the workflow graph is done through a Matplotlib external window. The default value is on.

#%draw_job(inline=off)

save the workflow graph locally:

To be sure that the workflow is saved into a .png file, this option needs to be set to on. The default value is off.

#%draw_job(save=on)

Note that the job’s name can take one of the following possible values:

  1. The parameter name 's value, if provided

  2. The job’s name, if created

  3. The notebook’s name, if the kernel can retrieve it

  4. Unnamed_job, otherwise

General usage:

#%draw_job([name=JOB_NAME], [inline=off], [save=on])

11.5.13. Export the workflow graph in dot format

To export the created workflow into a GraphViz .dot format, use the #%write_dot() pragma:

#%write_dot(name=FILE_NAME)

11.5.14. Import a workflow from a dot file

To create a workflow according to a GraphViz .dot file, use the pragma #%import_dot():

#%import_dot(path=PATH_TO/FILE_NAME.dot)

By default, the workflow will contain Python tasks with empty implementation scripts. If you want to modify or add any information to a specific task, please use, as explained in Creating a Python task, the #%task() pragma.

11.5.15. Submit your job to the scheduler

To submit the job to the ProActive Scheduler, the user has to use the #%submit_job() pragma:

#%submit_job()

If the job is not created, or is not up-to-date, the #%submit_job() creates a new job named as the old one. To provide a new name, use the same pragma and provide a name as parameter:

#%submit_job([name=JOB_NAME])

If the job’s name is not set, the ProActive kernel uses the current notebook name, if possible, or gives a random one.

11.5.16. List all submitted jobs

To get all submitted job IDs and names, use list_submitted_jobs pragma this way:

#%list_submitted_jobs()

11.5.17. Export the workflow in XML format

To export the created workflow in .xml format, use the #%export_xml() pragma:

#%export_xml([name=FILENAME])

Notice that the .xml file will be saved under one of the following names:

  1. The parameter name 's value, if provided

  2. The job’s name, if created

  3. The notebook’s name, if the kernel can retrieve it

  4. Unnamed_job, otherwise

11.5.18. Get results

After the execution of a ProActive workflow, two outputs can be obtained:

  • results: values that have been saved in the task result variable,

  • console outputs: classic outputs that have been displayed/printed.

To get task results, please use the #%get_task_result() pragma by providing the task name, and either the job ID or the job name:

#%get_task_result([job_id=JOB_ID], [job_name=JOB_NAME], task_name=TASK_NAME)

The result(s) of all the tasks of a job can be obtained with the #%get_job_result() pragma, by providing the job name or the job ID:

#%get_job_result([job_id=JOB_ID], [job_name=JOB_NAME])

To get and display console outputs of a task, you can use the #%print_task_output() pragma in the following way:

#%print_task_output([job_id=JOB_ID], [job_name=JOB_NAME], task_name=TASK_NAME)

Finally, the #%print_job_output() pragma allows printing all job outputs, by providing the job name or the job ID:

#%print_job_output([job_id=JOB_ID], [job_name=JOB_NAME])
If neither job_name nor the job_id are provided, the last submitted job is selected by default.

11.6. Display and use ActiveEon Portals directly in Jupyter

Finally, to have the hand on more parameters and features, the user should use ActiveEon Studio portals. The main ones are the Resource Manager, the Scheduling Portal and the Workflow Execution.

The example below shows how the user can directly monitor his submitted job’s execution in the scheduling portal:

Directly in Jupyter: Submit

To show the resource manager portal related to the host you are connected to, just run:

#%show_resource_manager([host=YOUR_HOST], [height=HEIGHT_VALUE], [width=WIDTH_VALUE])

For the related scheduling portal:

#%show_scheduling_portal([host=YOUR_HOST], [height=HEIGHT_VALUE], [width=WIDTH_VALUE])

To monitor your jobs with Workflow Execution inside Jupyter, use:

#%show_workflow_execution([host=YOUR_HOST], [height=HEIGHT_VALUE], [width=WIDTH_VALUE])
The parameters height and width allow the user to adjust the size of the window inside the notebook.

12. Customize the ML Bucket

12.1. Create or Update an ML Task

Machine Learning Bucket contains various open source tasks that can be easily used by a simple drag and drop.

It is possible to enrich the ML Bucket by adding your own tasks. (see section 4.3)

It is also possible to customize the code of the generic ML tasks. In this case, you need to drag and drop the targeted task to modify its code in the Task Implementation section.

It is also possible to add or/and delete variables of each task, set your own fork environments, etc. More details available on ProActive User Guide

12.2. Set the Fork Environment

A fork execution environment is a new Java Virtual Machine (JVM) which is started exclusively to execute a task. Starting a new JVM means that the task inside it will run in a new environment. This environment can be set up by the creator of the task. A new JVMs is set up with a new classpath, new system properties and more customization.

We used a Docker fork environment for all the ML tasks. activeeon/dlm3 was used as a docker container for all tasks. If your task needs to install new ML libraries which are not available in this container, then, use your own docker container or an appropriate environment with the needed libraries.

The use of docker containers is recommended as that way other tasks will not be affected by change. Docker containers provide isolation so that the host machine’s software stays the same. More details available on ProActive User Guide

12.3. Publish a ML Task

The Catalog menu provides the possibility for a user to publish newly created or/and update tasks inside Machine Learning Bucket, you need just to click on Catalog Menu then Publish current Workflow to the Catalog. Choose machine-leaning Bucket to store your newly added workflow on it. If the Task with the same name already exists in the 'machine-leaning' bucket, then, it will be updated. We recommend submitting Tasks with a commit message for easier differentiation between the different submitted versions.

More details available on ProActive User Guide

12.4. Create a ML Workflow

The quickstart tutorial on try.activeeon.com shows you how to build a simple workflow using ProActive Studio.

We show below an example of a workflow created with the Studio:

ML Workflow Example

At the left part, are illustrated the General Parameters of the workflow with the following information:

  • Name: the name of the workflow.

  • Project: the project name to which belongs the workflow.

  • Tags: the tags of the workflow.

  • Description: the textual description of the workflow.

  • Documentation: if the workflow has a Generic Information named "Documentation", then its URL value is displayed as a link.

  • Job Priority: the priority assigned to the workflow. It is by default set to NORMAL, but can be increased or decreased once the job is submitted.

The workflow represented in the above is available on the ai-machine-learning-workflows bucket.

13. ML Workflows Examples

The PAIO provides a fast, easy and practical way to execute different workflows using the ML bucket. We present useful ML workflows for different applications in the following subsections.

To test these workflows, you need to add the ai-machine-learning-workflows bucket as main catalog in the ProActive Studio.

  1. Open ProActive AI Orchestration home page.

  2. Create a new workflow.

  3. Change palette preset to Machine Learning.

  4. Click on ai-machine-learning catalog and pin it open.

  5. Drag and drop the workflow example of your choice.

  6. Execute the chosen workflow, track its progress and preview its results.

More details about these workflows are available in this in ActiveEon’s Auto ML Blog

13.1. Basic ML

The following workflows present some ML basic examples. These workflows are built using generic ML and data visualization tasks available on the ML and Data Visualization buckets.

Diabetics_Detection_using_K_means: trains and tests a clustering model using Mean_shift algorithm.

Vehicle_Type_Using_Model_Explainability: predicts vehicle type based on silhouette measurements, and apply ELI5 and Kernel Explainer to understand the model’s global behavior or specific predictions.

Parallel_Regression_Model_Training: trains three different regression models.

Parallel_Classification_Model_Training: trains three different classification models.

Nested_Cross_Validation: trains a logistic regression model using a nested cross-validation strategies.

Iris_Flowers_Classification_using_Logistic_Regression: trains and tests a predictive model using logistic_regressive algorithm.

House_Price_Prediction_using_Linear_Regression: trains and tests a regression model using Mean_shift algorithm.

13.2. Basic AutoML

The following workflows present some ML basic examples using AutoML generic tasks available in the ai-machine-learning bucket.

Breast_Cancer_Detection_Using_AutoSklearn_Classifier: tests several ML pipelines and selects the best model for Cancer Breast detection.

California_Housing_Prediction_Using_TPOT_Regressor: tests several ML pipelines and selects the best model for California housing prediction.

13.3. Log Analysis

The following workflows are designed to detect anomalies in log files. They are constructed using generic tasks which are available on the ai-machine-learning and ai-data-visualization buckets.

Anomaly_Detection_in_Apache_Logs: detects intrusions in apache logs using a predictive model trained using Support Vector Machines algorithm.

Anomaly_detection_in_HDFS_Blocks: trains and test an anomaly detection model for detecting anomalies in HDFS Blocks.

Anomaly_detection_in_HDFS_Nodes: trains and test an anomaly detection model for detecting anomalies in HDFS Nodes.

Unsupervised_Anomaly_Detection: detects anomalies using an Unsupervised One-Class SVM.

13.4. Data Analytics

The following workflows are designed to feature engineering and fusion.

Data_Fusion_And_Encoding: fuses different data structures.

Data_Anomaly_Detection: detects anomalies on energy consumption by customers.

Diabetics_Results_Visualization_Using_Tableau: visualizes Diabetics Results Using Tableau.

13.5. In Memory Workflows

The following workflows are designed for in-memory execution using IPython. IPython enables all types of parallel applications to be developed, executed, debugged, and monitored interactively. For more details, please visit the ipyparallel website.

In_Memory_Iris_Flowers_Classification: classifies Iris flowers using the logistic regression algorithm. This workflow uses an external IPython Engine for in-memory execution.

Start_IPython_Cluster: starts an IPython parallel computing cluster.

13.6. GPU Accelerated Workflows

The following workflows are designed to train machine learning models on GPU using NVIDIA RAPIDS. This reduces the training time from days to minutes.

Train_Classification_Model_On_GPU: trains a machine learning model for data classification on GPU using NVIDIA RAPIDS.

Train_Multiple_Classification_Models_On_GPU: trains multiple machine learning models for data classification on GPU using NVIDIA RAPIDS.

Train_Multiple_Regression_Models_On_GPU: trains multiple machine learning models for data regression on GPU using NVIDIA RAPIDS.

Train_Regression_Model_On_GPU: trains a machine learning model for data regression on GPU using NVIDIA RAPIDS.

Please find in the table below the list of algorithms which have GPU support and that can be tested using the generic tasks in the ai-machine-learning bucket.

Category

Algorithm

5.2 ML Classification

Support_Vector_Machines

Logistic_Regression

5.3 ML Regression

Support_Vector_Regression

Linear_Regression

5.4 ML Clustering

K_Means

5.6 ML Ensemble Learning

XGBoost

Random_Forest

14. Deep Learning Workflows Examples

PAIO provides a fast, easy and practical way to execute deep learning workflows. In the following subsections, we present useful deep learning workflows for text and image classification and generation.

You can test these workflows by following these steps:

  1. Open ProActive AI Orchestration home page.

  2. Create a new workflow.

  3. Click on Catalog menu then Add Bucket as Extra Catalog Menu and select ai-deep-learning-workflows bucket.

  4. Open this added extra catalog menu and drag and drop the workflow example of your choice.

  5. Execute the chosen workflow, track its progress and preview its results.

14.1. Azure Cognitive Services

The following workflows present useful examples composed by pre-built Azure cognitive services available on ai-azure-cognitive-services bucket.

Emotion_Detection_in_Bing_News: is a mashup that searches for images of a person using Azure Bing Image Search then performs an emotion detection using Azure Emotion API.

Sentiment_Analysis_in_Bing_News: is a mashup that searches for news related to a given search term using Azure Bing News API then performs a sentiment analysis using Azure Text Analytics API.

14.2. Microsoft Cognitive Toolkit

The following workflows present useful examples for predictive models training and test using Microsoft Cognitive Toolkit (CNTK).

CNTK_ConvNet: trains a Convolutional Neural Network (CNN) on CIFAR-10 dataset.

CNTK_SimpleNet: trains a 2-layer fully connected deep neural network with 50 hidden dimensions per layer.

GAN_Generate_Fake_MNIST_Images: generates fake MNIST images using a Generative Adversarial Network (GAN).

DCGAN_Generate_Fake_MNIST_Images: generates fake MNIST images using a Deep Convolutional Generative Adversarial Network (DCGAN).

14.3. Mixed Workflows

The following workflow presents an example of a workflow built using pre-built Azure cognitive services tasks available on the ai-azure-cognitive-services bucket and custom AI tasks available on the ai-deep-learning bucket.

Custom_Sentiment_Analysis: is a mashup that searches for news related to a given search term using Azure Bing News API then performs a sentiment analysis using a custom deep learning based pretrained model.

14.4. Training Custom AI Workflows - PyTorch library

This section presents custom AI workflows using tasks available on the ai-deep-learning bucket. Such tasks enable you to train your own AI models by a simple drag and drop of custom AI task.

IMDB_Sentiment_Analysis: trains a model to perform sentiment identification and categorization expressed in a piece of text, especially in order to determine the opinion of IMDB users regarding specific movies [positive or negative]. NOTE: Instead of training a model from scratch, a pre-trained sentiment analysis model is available on this link.

Language_Detection: build an RNN model to perform language detection from a text data.

Train_Image_Classification: trains a model to classify images from ants and bees.

Train_Image_Segmentation: trains a segmentation model using SegNet network on Oxford-IIIT Pet Dataset.

Train_Image_Object_Detection: trains objects using YOLOv3 model on COCO dataset proposed by Microsoft Research.

Deep_Model_Explainability: explains a ResNet-18 model using GradientExplainer.

Search_Train_Image_Classification: queries images from a search engine (Bing or DuckDuckGo) and trains a model to classify them.

14.5. Prediction Custom AI Workflows - PyTorch library

This section presents custom AI workflows using tasks available on the ai-deep-learning bucket. Such tasks enable you to test your own AI models by a simple drag and drop of custom AI task.

Image_Classification: predicts a classification model using ResNet_18 network on Ants_vs_Bees Dataset. The pre-trained image classification model is available on this link.

Fake_Celebrity_Faces_Generation: generates a wild diversity of fake faces using a GAN model that was trained based on thousands of real celebrity photos. The pre-trained GAN model is available on this link.

Image_Segmentation: predicts a segmentation model using SegNet network on Oxford-IIIT Pet Dataset. The pre-trained image segmentation model is available on this link.

Image_Object_Detection: detects objects using a pre-trained YOLOv3 model on COCO dataset proposed by Microsoft Research. The pre-trained model is available on this link.

Search_Classify_Images: queries images into search engine (Bing or DuckDuckGo) and predicts a model to classify rocket_vs_plane images. The pre-trained image classiffication model is available on this link.

14.6. Templates

The following workflows represent python templates that can be used to implement a generic machine learning task.

Horovod_Task: is a template to implement a Horovod task with multi-gpu support.

Horovod_Docker_Task: is a template to implement a Horovod task using a Docker container with multi-gpu support.

Horovod_Slurm_Task: is a template to implement a Horovod task using a native SLURM scheduler with multi-gpu support.

TensorFlow_Task: is a simple TensorFlow task template.

Keras_Task: is a simple Keras task template.

PyTorch_Task: is a simple PyTorch task template.

It is recommended to use an enabled-GPU node to run the deep learning tasks.

15. References

15.1. AI Workflows Common Variables

In the following table, you can find the variables that are common between most of the available AI workflows in PAIO associated to their descriptions.

Variable name

Description

Type

NATIVE_SCHEDULER

Name of the Native Scheduler node source to use when the workflow tasks must be deployed inside a cluster such as SLURM, LSF, etc.

String (default=empty)

NATIVE_SCHEDULER_PARAMS

Parameters given to the native scheduler (SLURM, LSF, etc) while requesting a ProActive node used to deploy the workflow tasks.

String (default=empty)

NODE_SOURCE_NAME

If not empty, the workflow tasks will be run only on nodes belonging to the specified node source.

String (default=empty)

NODE_ACCESS_TOKEN

If not empty, the workflow tasks will be run only on nodes that contains the specified token.

String (default=empty)

WORK_DIR

Defines the working directory for the data space used to transfer files automatically between the workflow tasks.

String

CONTAINER_PLATFORM

Specifies the container platform to be used for executing the workflow tasks.

List [no-container, docker, podman, singularity] (default=docker)

CONTAINER_GPU_ENABLED

If True, it will activate the use of GPU on the selected container platform.

Boolean (default=True)

CONTAINER_IMAGE

Specifies the name of the container image that will be used to run the workflow tasks.

List [docker://activeeon/dlm3, docker://activeeon/cuda, docker://activeeon/cuda2, docker://activeeon/rapidsai, docker://activeeon/tensorflow:latest, docker://activeeon/tensorflow:latest-gpu] (default=empty)

15.2. ML Bucket

The ai-machine-learning bucket contains diverse generic ML tasks that enable you to easily compose workflows for predictive models learning and testing. This bucket can be easily customized according to your needs. This bucket offers different options, you can customize it by adding new tasks or update the existing tasks.

All ML tasks were implemented using Scikit-learn library.

15.2.1. Public Datasets

Load_Boston_Dataset

Task Overview: Load and return the Boston House-Prices dataset.

Table 18. Boston Dataset Description
Features Targets Dimensionality Samples Total

Real, positive

Real 5. -50

13

506

Task Variables:

Table 19. Load_Boston_Dataset_Task variables

Variable name

Description

Type

TASK_ENABLED

If False, the task will be ignored, it will not be executed.

Boolean (default=True)

LIMIT_OUTPUT_VIEW

Specifies how many rows of the dataframe will be previewed in the browser to check each task results.

Int (default=-1) (-1 means preview all the rows)

Usage:

  • The Boston House-Prices is a dataset for regression, you can only use it with a regression algorithm, such as Linear Regression and Support Vector Regression.

  • After this task, you can use the Split_Data task to divide the dataset into training and testing sets.

More information about this dataset can be found here.
Load_Iris_Dataset

Task Overview: Load and return the iris dataset.

Table 20. Iris Dataset Description
Features Classes Dimensionality Samples per class Samples total

Real, positive

3

4

50

150

Task Variables:

Table 21. Load_Iris_Dataset_Task variables

Variable name

Description

Type

TASK_ENABLED

If False, the task will be ignored, it will not be executed.

Boolean (default=True)

LIMIT_OUTPUT_VIEW

Specifies how many rows of the dataframe will be previewed in the browser to check each task results.

Int (default=-1) (-1 means preview all the rows)

Usage:

  • The Iris is a dataset for classification, you can only use it with a classification algorithm, such as Support Vector Machines and Logistic Regression.

  • After this task, you can use the Split_Data task to divide the dataset into training and testing sets.

More information about this dataset can be found here.

15.2.2. Input and Output Data

Download_Model

Task Overview: Download a trained model on your computer device.

Task Variables:

Table 22. Download_Model_Task variables

Variable name

Description

Type

TASK_ENABLED

If False, the task will be ignored, it will not be executed.

Boolean (default=True)

Usage: It should be used after the task Train_Model.

Export_Data

Task Overview: Export the results of the predictions generated by a classification, clustering or regression algorithm.

Task Variables:

Table 23. Export_Data_Task variables

Variable name

Description

Type

TASK_ENABLED

If False, the task will be ignored, it will not be executed.

Boolean (default=True)

OUTPUT_FILE

Converts the prediction results to HTML, HYPER and CSV file.

String [CSV, JSON, HTML, TABLEAU]

LIMIT_OUTPUT_VIEW

Specifies how many rows of the dataframe will be previewed in the browser to check each task results.

Int (default=-1) (-1 means preview all the rows)

Usage: It should be used after the task Predict_Model.

Import_Data

Task Overview: Load data from external sources and predict its features types if enabled.

Task Variables:

Table 24. Import_Data_Task variables

Variable name

Description

Type

TASK_ENABLED

If False, the task will be ignored, it will not be executed.

Boolean (default=True)

IMPORT_FROM

Selects the type of data source.

List [PA:URL,PA:URI,PA:USER_FILE,PA:GLOBAL_FILE] (default=PA:URL)

FILE_PATH

Inserts a file path/name.

String

FILE_DELIMITER

Defines a delimiter to use.

String (default=;)

LABEL_COLUMN

Refers to the column "label name".

String

LIMIT_OUTPUT_VIEW

Specifies how many rows of the dataframe will be previewed in the browser to check each task results.

Int (-1 means preview all the rows)

DATA_TYPE_IDENTIFICATION

If True, the types of the dataset features will be predicted (as numerical or categorical).

Boolean (default=False)

Your CSV file should be in a table format. See the example below.
csv file organisation
Import_Data_And_Automate_Feature_Engineering

Task Overview: This workflow provides a complete solution to assist data scientists to successfully load and encode their categorical data. It currently supports different encoding methods such as Label, OneHot, Dummy, Binary, Base N, Hash and Target. It also enables:

  • Automatic identification of the best-suited method for encoding each categorical column, when no encoding method is selected (Auto mode).

  • Data type recognition: identification of the data type of each column (categorical or numerical).

  • Creation of summary statistics for each column: missing values, minimum, maximum, average, zeros, and cardinality.

  • Editing of the data structure: modification of column information (name, type, category, etc.), deletion of a column, etc.

This workflow can be used:

  • Stand-alone such that the results can be saved in the User Data Space or locally.

  • In a ML pipeline where the results will be transferred as an input for the following task in the pipeline.

For further information, please check the subsection AutoFeat.
Table 25. Import_Data_And_Automate_Feature_Engineering variables

Variable name

Description

Type

Task variables

IMPORT_FROM

Selects the method/protocol to import the data source.

List [PA:URL,PA:URI,PA:USER_FILE,PA:GLOBAL_FILE] (default=PA:URL)

FILE_PATH

Inserts the path/name of the file that contains the dataset.

String

FILE_DELIMITER

Defines a delimiter to use.

String (default=;)

LIMIT_OUTPUT_VIEW

Specifies how many rows of the encoded dataframe will be previewed in the workflow results.

Int (-1 means preview all the rows)

Import_Model

Task Overview: Load a trained model, and use it to make predictions for new coming data.

Task Variables:

Table 26. Import_Model_Task variables

Variable name

Description

Type

TASK_ENABLED

If False, the task will be ignored, it will not be executed.

Boolean (default=True)

MODEL_URL

Type the URL to load your trained model. default: https://s3.eu-west-2.amazonaws.com/activeeon-public/models/pima-indians-diabetes.model

String

Usage: It should be used before Predict_Model to make predictions.

Preview_Results

Task Overview: Preview the HTML results of the predictions generated by a classification, clustering or regression algorithm.

Task Variables:

Table 27. Preview_Results_Task variables

Variable name

Description

Type

TASK_ENABLED

If False, the task will be ignored, it will not be executed.

Boolean (default=True)

LIMIT_OUTPUT_VIEW

Specifies how many rows of the dataframe will be previewed in the browser to check each task results.

Int (default=-1) (-1 means preview all the rows)

OUTPUT_FILE

Converts the prediction results to HTML or CSV file.

String [CSV, JSON or HTML]

Usage: It should be used after the task Predict_Model.

Log_Parser

Task Overview: Convert an unstructured raw log file into a structured one by matching a group of event patterns.

Task Variables:

Table 28. Log_Parser_Task variables

Variable name

Description

Type

LOG_FILE

Put the URL of the raw log file that you need to parse.

String

PATTERNS_FILE

Put the URL of the CSV file that contains the different RegEx expressions of each possible pattern and their corresponding variables. The csv file must contain three columns (See the example below):

A. id_pattern: Integer Specify the column containing the identifier of each pattern

B. Pattern: RegEx expression Define the regex expression of each pattern

C. Variables: String Specify the name of each variable included in the pattern. N.B: Use the symbol ‘*’ for variables that you need to neglect. (e.g., in the example below the 5th variable is neglected) N.B: All variables specified in each Regex expressions have to be mentioned in the column « Variables » in the right order (use ',' to separate the variable names).

String

STRUCTURED_LOG

Indicate the extension of the file where you will save the resulted structured logs.

String [CSV or HTML]

pattern file

Usage: Could be connected with the tasks Query_Data and Feature_Vector_Extractor.

15.2.3. Data Preprocessing

Append_Data

Task Overview: Append rows of other to the end of this frame, returning a new object. Columns not in this frame are added as new columns.

Task Variables:

Table 29. Append_Data_Task variables

Variable name

Description

*Type

TASK_ENABLED

If False, the task will be ignored, it will not be executed.

Boolean (default=True)

Drop_Columns

Task Overview: Drop the columns specified in COLUMNS_NAME variable.

Task Variables:

Table 30. Drop_Columns_Task variables

Variable name

Description

Type

TASK_ENABLED

If False, the task will be ignored, it will not be executed.

Boolean (default=True)

COLUMNS_NAME

The list of columns that need to be dropped. Columns names should be separated by a comma.

String

More details about the source code of this task can be found here.
Drop_NaNs

Task Overview: Replace inf values to NaNs in a first place then drop objects on a given axis where alternately any or all of the data are missing.

More details about the source code of this task can be found here.
Encode_Data

Task Overview: Encode the values of the columns specified in COLUMNS_NAME variable with integer values between 0 and "the number of unique variables"-1.

Task Variables:

Table 31. Encode_Data_Task variables

Variable name

Description

Type

TASK_ENABLED

If False, the task will be ignored, it will not be executed.

Boolean (default=True)

COLUMNS_NAME

The list of columns that need to be encoded. Columns names should be separated by a comma.

String

More details about the source code of this task can be found here.
Fill_NaNs

Task Overview: Fill NA/NaN values using the specified method.

Task Variables:

Table 32. Fill_NaNs_Task variables

Variable name

Description

Type

TASK_ENABLED

If False, the task will be ignored, it will not be executed.

Boolean (default=True)

FILL_MAP

Refers to the value to use to fill holes (e.g., 0).

Integer

More details about the source code of this task can be found here.
Filter_Columns

Task Overview: Subset columns of a dataframe according to the specified list of columns in the COLUMNS_NAME variable.

Task Variables:

Table 33. Filter_Columns_Task variables

Variable name

Description

Type

TASK_ENABLED

If False, the task will be ignored, it will not be executed.

Boolean (default=True)

COLUMNS_NAME

The list of columns to restrict to. Columns names should be separated by a comma.

String

More details about the source code of this task can be found here.
Merge_Data

Task Overview: Merge DataFrame objects by performing a database-style join operation based on a specific reference column specified in the REF_COLUMN variable.

Task Variables:

Table 34. Merge_Data_Task variables

Variable name

Description

Type

TASK_ENABLED

If False, the task will be ignored, it will not be executed.

Boolean (default=True)

REF_COLUMN

The list of columns to restrict to. Columns names should be separated by a comma.

String

More details about the source code of this task can be found here.
Scale_Data

Task Overview: Scale a dataset based on a robust scaler or standard scaler.

Task Variables:

Table 35. Scale_Data_Task variables

Variable name

Description

Type

TASK_ENABLED

If False, the task will be ignored, it will not be executed.

Boolean (default=True)

SCALER_NAME

The list of columns to restrict to. Columns names should be separated by a comma.

List [RobustScaler, StandardScaler] (default=RobustScaler)

LIMIT_OUTPUT_VIEW

Specifies how many rows of the dataframe will be previewed in the browser to check each task results.

Int (default=-1) (-1 means preview all the rows)

COLUMNS_NAME

The list of columns that will be scaled. Column names should be separated by a comma.

String

More details about the source code of this task can be found here.
Split_Data

Task Overview: Separate data into train and test subsets.

Task Variables:

Table 36. Split_Data_Task variables

Variable name

Description

Type

TASK_ENABLED

If False, the task will be ignored, it will not be executed.

Boolean (default=True)

TRAIN_SIZE

This parameter must be float within the range (0.0, 1.0), not including the values 0.0 and 1.0. default = 0.7

Float

Usage: It should be used before the tasks Train and Predict.

More details about the source code of this task can be found here.
Rename_Columns

Task Overview: Rename the columns of a data frame.

Task Variables:

Table 37. Rename_Columns_Task variables

Variable name

Description

Type

TASK_ENABLED

If False, the task will be ignored, it will not be executed.

Boolean (default=True)

COLUMNS_NAME

The list of columns that will be renamed. Column names should be separated by a comma.

String

Query_Data

Task Overview: Query the columns of your data with a boolean expression.

Task Variables:

Table 38. Query_Data_Task variables

Variable name

Description

Type

TASK_ENABLED

If False, the task will be ignored, it will not be executed.

Boolean (default=True)

QUERY

The query string to evaluate.

String

FILTERED_FILE_OUTPUT

Refers to the extension of the file where the resulted filtered data will be saved.

String [CSV or HTML]

More details about the source code of this task can be found here.

15.2.4. Feature Extraction

Summarize_Data

Task Overview: Calculate the histogram of a dataframe based on a reference column that need to be specified in the variable REF_COLUMN.

Task Variables:

Table 39. Summarize_Data_Task variables

Variable name

Description

Type

TASK_ENABLED

If False, the task will be ignored, it will not be executed.

Boolean (default=True)

GLOBAL_MODEL_TYPE

The model that will be used to summarize data.

List [KMeans, PolynomialFeatures] (default=KMeans)

REF_COLUMN

The column that will be used to group by the different histogram measures.

String

More details about the source code of this task can be found here.
Tsfresh_Features_Extraction

Task Overview: Calculate a comprehensive number of time series features based on the library TSFRESH.

Task Variables:

Table 40. Tsfresh_Features_Extraction_Task variables

Variable name

Description

Type

TASK_ENABLED

If False, the task will be ignored, it will not be executed.

Boolean (default=True)

TIME_COLULN

The column that contains the values of the time series

String

REF_COLUMN

The column that will be used to group by the different features.

String

ALL_FEATURES

False if you do not need to extract all the possible features extractable by the library TSFRESH.

Boolean (default = False)

LIMIT_OUTPUT_VIEW

Specifies how many rows of the dataframe will be previewed in the browser to check each task results.

Int (default=-1) (-1 means preview all the rows)

More details about the source code of this task can be found here.
Feature_Vector_Extractor

Task Overview: Encode structured data into numerical feature vectors whereby ML models can be applied.

Task Variables:

Table 41. Feature_Vector_Extractor_Task variables

Variable name

Description

Type

TASK_ENABLED

If False, the task will be ignored, it will not be executed.

Boolean (default=True)

SESSION_COLUMN

The ID of the entity that you need to represent (to group by).

String

FILE_OUT_FEATURES

The extension of the file where the resulted features will be saved.

String [CSV or HTML]

PATTERN_COLUMN

The index of column containing the log patterns.[specific to features extraction from logs].

String

PATTERNS_COUNT_FEATURES

True if you need to extract count the number of occurrence of each pattern per session.

Boolean [True or False]

STATE_VARIABLES

The different variables that need to be considered to extract features according to their content.

N.B: separate the different variables with a comma ','

String

COUNT_VARIABLES

Refers to the different variables that need to be considered to count their distinct content.

String

N.B: separate the different variables with a comma ','

STATE_COUNT_FEATURES_VARIABLES

True if you need to extract state and count features per session.

Boolean [True or False]

Usage: Could be connected with Train_Model if you need to train a model using unsupervised ML techniques.

15.2.5. AutoML

TPOT_Classifier

Task Overview: TPOT_Classifier performs an intelligent search over ML pipelines that can contain supervised classification models, preprocessors, feature selection techniques, and any other estimator or transformer that follows the scikit-learn API.

Task Variables:

Table 42. TPOT_Classifier_Task variables

Variable name

Description

Type

TASK_ENABLED

If True, This task code will be executed.

Boolean (default=True)

GENERATIONS

Number of iterations to the run pipeline optimization process.

Integer (default=3)

SCORING

Function used to evaluate the quality of a given pipeline for the classification problem.

List (default=accuracy)

CV

Cross-validation strategy used when evaluating pipelines.

Integer (default=5)

VERBOSITY

How much information TPOT communicates while it’s running. Possible inputs: 0, 1, 2, 3.

Integer (default=1)

Usage: It should be connected with Train_Model.

More information about this task can be found here.
AutoSklearn_Classifier

Task Overview: AutoSklearn_Classifier leverages recent advantages in Bayesian optimization, meta-learning and ensemble construction to performs an intelligent search over ML classification algorithms.

Task Variables:

Table 43. AutoSklearn_Classifier_Task variables

Variable name

Description

Type

TASK_ENABLED

If True, This task code will be executed.

Boolean (default=True)

TASK_TIME

Time limit in seconds for the search of appropriate models.

Integer (default=30)

RUN_TIME

Time limit for a single call to the ML model. Model fitting will be stopped if the ML algorithm runs over the time limit.

Integer (default=27)

SAMPLING

If True, the defined resampling strategy will be applied.

Boolean (default=True)

RESAMPLING_STRATEGY

Strategy to handle overfitting.

String (default='cv')

FOLDS

Number of folds for cross-validation.

Integer (default=5)

Usage: It should be connected with Train_Model.

  1. The labels of the selected Dataset have to be numerical.

  2. More information about this task can be found here.

TPOT_Regressor

Task Overview: TPOT_Regressor performs an intelligent search over ML pipelines that can contain supervised regression models, preprocessors, feature selection techniques, and any other estimator or transformer.

Task Variables:

Table 44. TPOT_Regressor_Task variables

Variable name

Description

Type

TASK_ENABLED

If True, This task code will be executed.

Boolean (default=True)

GENERATIONS

Number of iterations to the run pipeline optimization process.

Integer (default=3)

SCORING

Function used to evaluate the quality of a given pipeline for the regression problem.

List (default=neg_mean_squared_error)

CV

Cross-validation strategy used when evaluating pipelines.

Integer (default=5)

VERBOSITY

How much information TPOT communicates while it’s running. Possible inputs: 0, 1, 2, 3.

Integer (default=1)

Usage: It should be connected with Train_Model .

More information about this task can be found here.
AutoSklearn_Regressor

Task Overview: AutoSklearn_Regressor leverages recent advantages in Bayesian optimization, meta-learning and ensemble construction to performs an intelligent search over ML regression algorithms.

Task Variables:

Table 45. AutoSklearn_Regressor_Task variables

Variable name

Description

Type

TASK_ENABLED

If True, This task code will be executed.

Boolean (default=True)

TASK_TIME

Time limit in seconds for the search of appropriate models.

Integer (default=120)

RUN_TIME

Time limit for a single call to the ML model. Model fitting will be stopped if the ML algorithm runs over the time limit.

Integer (default=30)

SAMPLING

If True, the defined resampling strategy will be applied.

Boolean (default=False)

RESAMPLING_STRATEGY

Strategy to handle overfitting.

String (default='cv')

FOLDS

Number of folds for cross-validation.

Integer (default=5)

Usage: It should be connected with Train_Model .

More information about this task can be found here.

15.2.6. ML Classification

Gaussian_Naive_Bayes

Task Overview: Naive Bayes classifier is a family of simple probabilistic classifier based on applying Bayes' theorem with strong (naive) independence assumptions between the features.

Task Variables:

Table 46. Gaussian_Naive_Bayes_Task variables

Variable name

Description

Type

TASK_ENABLED

If False, the task will be ignored, it will not be executed.

Boolean (default=True)

INPUT_VARIABLES

Specifies the parameters' values of the gaussian naive bayes algorithm. Check the list of parameters here.

JSON format

Usage: It should be connected with Train_Model or Predict_Model.

More information about this task can be found here.
Logistic_Regression

Task Overview: Logistic Regression is a regression model where the Dependent Variable (DV) is categorical.

Task Variables:

Table 47. Logistic_Regression_Task variables

Variable name

Description

Type

TASK_ENABLED

If False, the task will be ignored, it will not be executed.

Boolean (default=True)

INPUT_VARIABLES

Specifies the parameters' values of the logistic regression algorithm. Check the list of parameters here.

JSON format

Usage: It should be connected with Train_Model and Predict_Model.

More information about the source code of this task can be found here.
Support_Vector_Machines

Task Overview: Support vector machines are supervised learning models with associated learning algorithms that analyze data used for classification.

Task Variables:

Table 48. Support_Vector_Machines_Task variables

Variable name

Description

Type

TASK_ENABLED

If False, the task will be ignored, it will not be executed.

Boolean (default=True)

INPUT_VARIABLES

Specifies the parameters' values of the support vector machines algorithm. Check the list of parameters here.

JSON format

Usage: It should be connected with Train_Model and Predict_Model.

More information about the source of this task can be found here.

15.2.7. ML Regression

Bayesian_Ridge_Regression

Task Overview: Bayesian linear regression is an approach to linear regression in which the statistical analysis is undertaken within the context of Bayesian inference.

Task Variables:

Table 49. Bayesian_Ridge_Regression_Task variables

Variable name

Description

Type

TASK_ENABLED

If False, the task will be ignored, it will not be executed.

Boolean (default=True)

INPUT_VARIABLES

Specifies the parameters' values of the bayesian ridge regression algorithm. Check the list of parameters here.

JSON format

Usage: It should be connected with Train_Model and Predict_Model.

More information about the source of this task can be found here.
Linear_Regression

Task Overview: Linear regression is a linear approach for modeling the relationship between a scalar dependent variable y and one or more explanatory variables (or independent variables) denoted X.

Task Variables:

Table 50. Linear_Regression_Task variables

Variable name

Description

Type

TASK_ENABLED

If False, the task will be ignored, it will not be executed.

Boolean (default=True)

INPUT_VARIABLES

Specifies the parameters' values of the linear regression algorithm. Check the list of parameters here.

JSON format

Usage: IT should be connected with Train_Model and Predict_Model.

More information about the source of this task can be found here.
Support_Vector_Regression

Task Overview: Support vector regression are supervised learning models with associated learning algorithms that analyze data used for regression.

Task Variables:

Table 51. Support_Vector_Regression_Task variables

Variable name

Description

Type

TASK_ENABLED

If False, the task will be ignored, it will not be executed.

Boolean (default=True)

INPUT_VARIABLES

Specifies the parameters' values of the support vector regression algorithm. Check the list of parameters here.

JSON format

Usage: It should be connected with Train_Model and Predict_Model.

More information about the source of this task can be found here.

15.2.8. ML Anomaly Detection

Isolation_Forest

Task Overview: Isolation Forest is an outlier detection method which returns the anomaly score of each sample using the IsolationForest algorithm. The IsolationForest ‘isolates’ observations by randomly selecting a feature and then randomly selecting a split value between the maximum and minimum values of the selected feature.

Task Variables:

Table 52. Isolation_Forest_Task variables

Variable name

Description

Type

TASK_ENABLED

If False, the task will be ignored, it will not be executed.

Boolean (default=True)

INPUT_VARIABLES

Specifies the parameters' values of the isolation forest algorithm. Check the list of parameters here.

JSON format

More information about the source of this task can be found here.
One_Class_SVM

Task Overview: One-class SVM is an algorithm that learns a decision function for novelty detection: classifying new data as similar or different to the training set.

Task Variables:

Table 53. One_Class_SVM_Task variables

Variable name

Description

Type

TASK_ENABLED

If False, the task will be ignored, it will not be executed.

Boolean (default=True)

INPUT_VARIABLES

Specifies the parameters' values of the one class algorithm. Check the list of parameters here.

JSON format

More information about the source of this task can be found here.

15.2.9. ML Clustering

K_Means

Task Overview: K-means clustering aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster

Task Variables:

Table 54. K_Means_Task variables

Variable name

Description

Type

TASK_ENABLED

If False, the task will be ignored, it will not be executed.

Boolean (default=True)

INPUT_VARIABLES

Specifies the parameters' values of the K-means algorithm. Check the list of parameters here.

JSON format

Usage: It should be connected with Train_Model and Predict_Model.

More information about the source of this task can be found here.
Mean_Shift

Task Overview: Mean shift is a non-parametric feature-space analysis technique for locating the maxima of a density function.

Task Variables:

Table 55. Mean_Shift_Task variables

Variable name

Description

Type

TASK_ENABLED

If False, the task will be ignored, it will not be executed.

Boolean (default=True)

INPUT_VARIABLES

Specifies the parameters' values of the Mean Shift algorithm. Check the list of parameters here.

JSON format

Usage: It should be connected with Train_Model and Predict_Model.

More information about the source of this task can be found here.

15.2.10. ML Ensemble Learning

AdaBoost

Task Overview: AdaBoost combines weak classifier algorithm into a single strong classifier.

Task Variables:

Table 56. AdaBoost_Task variables

Variable name

Description

Type

TASK_ENABLED

If False, the task will be ignored, it will not be executed.

Boolean (default=True)

INPUT_VARIABLES

Specifies the parameters' values of the AdaBoost algorithm. Check the list of parameters here.

JSON format

TYPE

Specifies the type of algorithm.

List [Classification or Regression]

Usage: It should be connected with Train_Model and Predict_Model.

More information about the source of this task can be found here.
CatBoost

Task Overview: CatBoost is a gradient boosting which helps to reduce overfitting. It can be used to solve both classification and regression challenge.

Task Variables:

Table 57. CatBoost_Task variables

Variable name

Description

Type

TASK_ENABLED

If False, the task will be ignored, it will not be executed.

Boolean (default=True)

INPUT_VARIABLES

Specifies the parameters' values of the CatBoost algorithm. Check the list of parameters here.

JSON format

TYPE

Specifies the type of algorithm.

List [Classification or Regression]

Usage: It should be connected with Train_Model and Predict_Model.

More information about the source of this task can be found here.
Gradient_Boosting

Task Overview: Gradient Boosting is an algorithm for regression and classification problems. It produces a prediction model in the form of an ensemble of weak prediction models. Task Variables:

Table 58. Gradient_Boosting_Task variables

Variable name

Description

Type

TASK_ENABLED

If False, the task will be ignored, it will not be executed.

Boolean (default=True)

INPUT_VARIABLES

Specifies the parameters' values of the Gradient Boosting algorithm. Check the list of parameters here.

JSON format

TYPE

Specifies the type of algorithm.

List [Classification or Regression]

Usage: It should be connected with Train_Model and Predict_Model.

More information about the source of this task can be found here.
Random_Forest

Task Overview: Random Forest is an algorithm for regression, classification and other tasks that operates by constructing a multitude of decision trees at training time. Task Variables:

Table 59. Random_Forest_Task variables

Variable name

Description

Type

TASK_ENABLED

If False, the task will be ignored, it will not be executed.

Boolean (default=True)

INPUT_VARIABLES

Specifies the parameters' values of the Random Forest algorithm. Check the list of parameters here.

JSON format

TYPE

Specifies the type of algorithm.

List [Classification or Regression]

Usage: It should be connected with Train_Model and Predict_Model.

More information about the source of this task can be found here.
XGBoost

Task Overview: XGBoost is an implementation of gradient boosted decision trees designed for speed and performance.

Task Variables:

Table 60. XGBoost_Task variables

Variable name

Description

Type

TASK_ENABLED

If False, the task will be ignored, it will not be executed.

Boolean (default=True)

INPUT_VARIABLES

Specifies the parameters' values of the XGBoost algorithm. Check the list of parameters here.

JSON format

TYPE

Specifies the type of algorithm.

List [Classification or Regression]

Usage: It should be connected with Train_Model and Predict_Model.

More information about the source of this task can be found here.

15.2.11. Train

Train_Model

Task Overview: Train a model using a classification, a regression or an anomaly algorithm.

Table 61. Train_Model_Task variables

Variable name

Description

Type

TASK_ENABLED

If False, the task will be ignored, it will not be executed.

Boolean (default=True)

LABEL_COLUMN

Refers to the column "label name".

String

USE_NVIDIA_RAPIDS

Enables NVIDIA RAPIDS support.

Boolean (default=False)

  1. More information about the source of the ML Classification can be found here.

  2. More information about the source of the ML Regression can be found here.

  3. More information about the source of the ML Anomaly Detection can be found here.

15.2.12. Predict

Predict_Model

Task Overview: Generate predictions using a trained model.

Table 62. Predict_Model_Task variables

Variable name

Description

Type

TASK_ENABLED

If False, the task will be ignored, it will not be executed.

Boolean (default=True)

LABEL_COLUMN

Refers to the column "label name".

String.

USE_NVIDIA_RAPIDS

Enables NVIDIA RAPIDS support.

Boolean (default=False)

Usage: It should be used after the task Train_Model.

15.2.13. ML Explainability

Model_Explainability

Task Overview: Explain ML models globally on all data, or locally on a specific data point using the SHAP and eli5 Python libraries.

Table 63. Model_Explainability_Task variables

Variable name

Description

Type

TASK_ENABLED

If False, the task will be ignored, it will not be executed.

Boolean (default=True)

LABEL_COLUMN

Refers to the column "label name".

String

FEATURE_PARTIAL_PLOTS

Partial Dependence Plots show how a feature affects predictions.

String [e.g., distance_circularity, max_length_aspect_ratio, etc].

FEATURE_PARTIAL2D_PLOTS

2D Partial Dependence Plots show predictions for any combination of two features.

String [e.g, distance_circularity, max_length_aspect_ratio].

SHAP_ROW_SHOW

Defines the row of data to show.

Integer

Usage: It should be connected with Train_Model.

The SHAP values interpret the impact of having a certain value for a given feature in comparison to the prediction we would make if that feature took some baseline value. Feature values causing increased predictions are in pink and Feature values decreasing the prediction are in blue.
More information about the source of the SHAP and eli5 Python libraries can be found here.

15.3. Deep Learning Bucket

The ai-deep-learning bucket contains diverse generic deep learning tasks that enable you to easily compose workflows for predictive models learning and testing. This bucket can be easily customized according to your needs. It offers different options, you can customize it by adding new tasks or update the existing tasks.

  1. All deep learning tasks were implemented using PyTorch library.

  2. It is recommended to use an GPU enabled machine to run the deep learning tasks.

15.3.1. Input and Output

Import_Image_Dataset

Task Overview: Load and return an image dataset. There are some simple rules for organizing your files and folders.

  1. Image Classification Dataset: Each class must have its own folder which should contain its related images. The Figure below shows how your folders and files should be organized.

200
You can use RGB images in JPG or PNG formats.
You can find an example of the organization of the folders at: https://s3.eu-west-2.amazonaws.com/activeeon-public/datasets/ants_vs_bees.zip
  1. Image Segmentation Dataset: Two folders are required: the first folder should contain the RGB images in JPG format and another folder should contain its corresponding annotations in PASCAL VOC format. RGB images and annotations should be organized as follows:

150
You can use RGB images in JPG format (Images folder) and the groundtruth annotations (Classes folder) in the PNG format using Pascal VOC pattern.
You can find an example of the organization of the folders at: https://s3.eu-west-2.amazonaws.com/activeeon-public/datasets/oxford.zip
  1. Object Detection Dataset: Two folders are demanded: the first folder should contain the RGB images in JPG format and another folder should contain its corresponding anotations in XML format using PASCAL VOC format or TXT format using YOLO format. The RGB images and annotations should be organized as follows:

150
You can use RGB images in JPG format (Images folder) and the annotations (Classes folder) in the XML format using Pascal VOC or COCO pattern.
In these links, you can find an example of the organization of the folders using Pascal_VOC Dataset and COCO Dataset.

Task Variables:

Table 64. Import_Image_Dataset_Task variables

Variable name

Description

Type

GPU_NODES_ONLY

If True, the tasks will be executed on GPU nodes.

Boolean (default=True)

IMPORT_FROM

Selects the type of data source.

List [PA:URL,PA:URI,PA:USER_FILE,PA:GLOBAL_FILE] (default=PA:URL)

DATA_URL

Inserts a file path/name.

String

TRAIN_SPLIT

Must be a float within the range (0.0, 1.0), not including the values 0.0 and 1.0.

Float (default=1)

VAL_SPLIT

Must be a float within the range (0.0, 1.0), not including the values 0.0 and 1.0.

Float (default=0.1)

TEST_SPLIT

Must be a float within the range (0.0, 1.0), not including the values 0.0 and 1.0.

Float (default=0.3)

DATASET_TYPE

Enter the type of your dataset. There are two possible types: classification or segmentation

List [Classification, Detection or Segmentation]

Download_Model

Task Overview: Download a trained model by a deep learning algorithm.

Task Variables:

Table 65. Import_Model_Task variables

Variable name

Description

Type

GPU_NODES_ONLY

If True, the tasks will be executed on GPU nodes.

Boolean (default=True)

MODEL_TYPE

Choose the type of your model. There are two possible types: PyTorch or ONNX.

List [PyTorch or ONNX]

Not all deep networks support the ONNX model yet. You can download ONNX model of the following networks: AlexNet, DenseNet-161, ResNet-18, VGG-16 and YOLO.
Import_Model

Task Overview: Import a trained model by a deep learning algorithm.

Task Variables:

Table 66. Import_Model_Task variables

Variable name

Description

Type

GPU_NODES_ONLY

If True, the tasks will be executed on GPU nodes.

Boolean (default=True)

MODEL_URL

URL pointing to the zip folder containing the needed model.

String

Import_Text_Dataset

Task Overview: Import data from external sources. Each unique label must have its own folder which should contain its related text file. If your data is unlabeled, use the name 'unlabeled' for the folder containing your text file.

Task Variables:

Table 67. Import_Text_Dataset_Task variables

Variable name

Description

Type

GPU_NODES_ONLY

If True, the tasks will be executed on GPU nodes.

Boolean (default=True)

DATASET_URL

URL pointing to the zip folder containing the needed data.

String

TRAIN_SPLIT

Must be a float within the range (0.0, 1.0), not including the values 0.0 and 1.0.

Float (default=1)

TEST_SPLIT

Must be a float within the range (0.0, 1.0), not including the values 0.0 and 1.0.

Float (default=0.3)

VAL_SPLIT

Must be a float within the range (0.0, 1.0), not including the values 0.0 and 1.0.

Float (default=0.1)

TOY_MODE

Use a subset of the data to train the model fastly.

Boolean (default=True)

TOKENIZER

Transform the text into tokens. Different options are available (str.split, moses, spacy, revtok, subword)

List (default=str.split)

SENTENCE_SEPARATOR

Split the text into separated paragraphs, separated lines, separated words. Choose your own separator.

String (default=\r)

CHARSET

Encode to be used to read the text.

String (default='utf-8')

IS_LABELED_DATA

True if data is labeled.

Boolean (default=True)

Torchtext were used to preprocess and load the text input. More information about this library can be found here.
Preview_Results

Task Overview: Preview the results of the predictions generated by the trained model.

Task Variables:

Table 68. Preview_Results_Task variables

Variable name

Description

Type

GPU_NODES_ONLY

If True, the tasks will be executed on GPU nodes.

Boolean (default=True)

OUTPUT_FILE

Converts the prediction results to HTML or CSV file.

List (default='HTML')

Export_Images

Task Overview: Download a zip file of your results.

Table 69. Export_Images_Task variables

Variable name

Description

Type

GPU_NODES_ONLY

If True, the tasks will be executed on GPU nodes.

Boolean (default=True)

Search_Image_Dataset

Task Overview: Search image from Bing or DuckDuckGo navigator and return an image dataset.

Task Variables:

Table 70. Search_Image_Dataset_Task variables

Variable name

Description

Type

GPU_NODES_ONLY

If True, the tasks will be executed on GPU nodes.

Boolean (default=True)

DATA_FOLDER

Specifies the path where the data should be downloaded.

String

SEARCH_TERM

Enters a keyword to query into search engine.

String

QUERY_SIZE

Maximum number of search results for a single query (maximum of 34 per request for Bing navigator).

List [Bing, DuckDuckGo] default=DuckDuckGo

IMG_SIZE

Inserts (width, height) of the images as a tuple with 2 elements.

Integer (default=(200, 200))

SEARCH_ENGINE

Defines a source engine to query and download images.

String

Usage: It should be connected with Import_Image_Dataset.

Torchtext were used to preprocess and load the text input. More information about this library can be found here.

15.3.2. Image Classification

AlexNet

Task Overview: AlexNet is the name of a Convolutional Neural Network (CNN), originally written with CUDA to run with GPU support, which competed in the ImageNet Large Scale Visual Recognition Challenge in 2012.

Usage: It should be connected to Train_Image_Classification_Model.

Table 71. AlexNet_Task variables

Variable name

Description

Type

GPU_NODES_ONLY

If True, the tasks will be executed on GPU nodes.

Boolean (default=True)

USE_PRETRAINED_MODEL

Parameter to use a pre-trained model for training. If True, the pre-trained model with the corresponding number of layers is loaded and used for training. Otherwise, the network is trained from scratch.

Boolean (default=True)

PyTorch is used to build the model architecture based on AlexNet.
DenseNet-161

Task Overview: Densely Connected Convolutional Network (DenseNet) is a network architecture where each layer is directly connected to every other layer in a feed-forward fashion (within each dense block).

Usage: It should be connected to Train_Image_Classification_Model.

PyTorch is used to build the model architecture based on DenseNet-161.
Table 72. DenseNet-161_Task variables

Variable name

Description

Type

GPU_NODES_ONLY

If True, the tasks will be executed on GPU nodes.

Boolean (default=True)

USE_PRETRAINED_MODEL

Parameter to use a pre-trained model for training. If True, the pre-trained model with the corresponding number of layers is loaded and used for training. Otherwise, the network is trained from scratch.

Boolean (default=True)

ResNet-18

Task Overview: Deep Residual Networks (ResNet-18) is a deep convolutional neural network, trained on 1.28 million ImageNet training images, coming from 1000 classes.

Usage: It should be connected to Train_Image_Classification_Model.

PyTorch is used to build the model architecture based on ResNet-18.
Table 73. ResNet-161_Task variables

Variable name

Description

Type

GPU_NODES_ONLY

If True, the tasks will be executed on GPU nodes.

Boolean (default=True)

USE_PRETRAINED_MODEL

Parameter to use a pre-trained model for training. If True, the pre-trained model with the corresponding number of layers is loaded and used for training. Otherwise, the network is trained from scratch.

Boolean (default=True)

VGG-16

Task Overview: The VGG-16 is an image classification convolutional neural network.

Usage: It should be connected to Train_Image_Classification_Model.

PyTorch is used to build the model architecture based on VGG-16.
Table 74. VGG-16_Task variables

Variable name

Description

Type

GPU_NODES_ONLY

If True, the tasks will be executed on GPU nodes.

Boolean (default=True)

USE_PRETRAINED_MODEL

Parameter to use a pre-trained model for training. If True, the pre-trained model with the corresponding number of layers is loaded and used for training. Otherwise, the network is trained from scratch.

Boolean (default=True)

15.3.3. Image Segmentation

FCN

Task Overview: The FCN16 combines layers of the feature hierarchy and refines the spatial precision of the output.

Usage: It should be connected to Train_Image_Segmentation_Model.

PyTorch is used to build the model architecture based on FCN.
Table 75. FCN_Task variables

Variable name

Description

Type

GPU_NODES_ONLY

If True, the tasks will be executed on GPU nodes.

Boolean (default=True)

IM_SIZE

Insert (width, height) of the images as a tuple with 2 elements.

Integer (default=(64, 64))

NUM_CLASSES

Number of classes.

Integer (default=5)

SegNet

Task Overview: is a deep convolutional encoder-decoder architecture for robust semantic pixel-wise labelling.

Usage: It should be connected to Train_Image_Segmentation_Model.

PyTorch is used to build the model architecture based on SegNet.
Table 76. SegNet_Task variables

Variable name

Description

Type

GPU_NODES_ONLY

If True, the tasks will be executed on GPU nodes.

Boolean (default=True)

IM_SIZE

Insert (width, height) of the images as a tuple with 2 elements.

Integer (default=(64, 64))

NUM_CLASSES

Number of classes.

Integer (default=3)

UNet

Task Overview: consists of a contracting path to capture context and a symmetric expanding path that enables precise localization.

Usage: It should be connected to Train_Image_Segmentation_Model.

PyTorch is used to build the model architecture based on UNet.
Table 77. UNet_Task variables

Variable name

Description

Type

GPU_NODES_ONLY

If True, the tasks will be executed on GPU nodes.

Boolean (default=True)

IM_SIZE

Insert (width, height) of the images as a tuple with 2 elements.

Integer (default=(64, 64))

NUM_CLASSES

Number of classes.

Integer (default=3)

15.3.4. Image Object Detection

SSD

Task Overview: produces a fixed-size collection of bounding boxes and scores for the presence of object class instances in those boxes, followed by a non-maximum suppression step to produce the final detections. For more details click on this link

Usage: It should be connected to Train_Image_Object_Detection_Model.

PyTorch is used to build the model architecture based on SSD.
Table 78. SSD_Task variables

Variable name

Description

Type

GPU_NODES_ONLY

If True, the tasks will be executed on GPU nodes.

Boolean (default=True)

START_ITERATION

Initial iteration.

Integer (default=0)

MAX_ITERATION

Maximum iteration.

Integer (default=5)

LR_STEPS

Learning steps update for SGD (Stochastic Gradient Descent).

Integer (default=5)

LR_FACTOR

Learning rate update for SGD.

Float (default= 1e-3) Range in [0, 1].

GAMMA

Gamma update for SGD.

Float (default=0.1) Range in [0, 1].

MIN_SIZES

Minimum object size for detection by specifying numerical values or reference areas on screen. Objects smaller than that are ignored.

Integer (default= [30, 60, 111, 162, 213, 264])

MAX_SIZES

Maximum object size for detection by specifying numerical values or reference areas on screen. Objects larger than that are ignored.

Integer (default= [60, 111, 162, 213, 264, 315])

LEARNING_RATE

Initial learning rate.

Float (default= 1e-8) Range in [0, 1].

MOMENTUM

Momentum value for optimization.

Float (default=0.9) Range in [0, 1].

WEIGHT_DECAY

Weight decay for SGD

Float (default= 5e-4) Range in [0, 1].

IMG_SIZE

Insert (width, height) of the images as a tuple with 2 elements.

Integer (default=(300, 300))

NUM_CLASSES

Number of classes.

Integer (default=21)

LABEL_PATH

The URL of the file containing the class names of the dataset.

String (default=(https://s3.eu-west-2.amazonaws.com/activeeon-public/datasets/voc.names))

USE_PRETRAINED_MODEL

Parameter to use pre-trained model for training. If True, the pre-trained model with the corresponding number of layers is loaded and used for training. Otherwise, the network is trained from scratch.

Boolean (default=True)

The default parameters of the SSD network were set for the PASCAL VOC dataset (http://host.robots.ox.ac.uk/pascal/VOC/voc2012/). If you’d like to use another dataset, you probably need to change the default parameters.
YOLO

Task Overview: You only look once (YOLO) is a single neural network to predict bounding boxes and class probabilities. For more details click on this link

Usage: It should be connected to Train_Image_Object_Detection_Model.

PyTorch is used to build the model architecture based on YOLO.
Table 79. YOLO_Task variables

Variable name

Description

Type

GPU_NODES_ONLY

If True, the tasks will be executed on GPU nodes.

Boolean (default=True)

LEARNING_RATE

Initial learning rate

Float (default= 0.0005) Range in [0, 1].

MOMENTUM

Momentum value for optimization.

Float (default=0.9) Range in [0, 1].

WEIGHT_DECAY

Weight decay for SGD

Float (default= 5e-4) Range in [0, 1].

IMG_SIZE

Insert (width, height) of the images as a tuple with 2 elements.

Integer (default=(414, 416))

NUM_CLASSES

Number of classes.

Integer (default=81)

CONF_THRESHOLD

This parameter shows how certain it is that the predicted bounding box actually encloses some object. This score does not say anything about what kind of object is in the box, just if the shape of the box is any good.

Float (default=0.5) Range in [0, 1].

NMS_THRESHOLD

This parameter will select only the most accurate (highest probability) one of the boxes.

Float (default=0.45) Range in [0, 1].

LABEL_PATH

The URL of the file containing the class names of the dataset.

String (default=(https://s3.eu-west-2.amazonaws.com/activeeon-public/datasets/coco.names))

USE_PRETRAINED_MODEL

Parameter to use pre-trained model for training. If True, the pre-trained model with the corresponding number of layers is loaded and used for training. Otherwise, the network is trained from scratch.

Boolean (default=True)

The default parameters of the YOLO network were set for the COCO dataset (https://cocodataset.org/#home). If you’d like to use another dataset, you probably need to change the default parameters.

15.3.5. Text Classification

GRU

Task Overview: Gated recurrent units (GRUs) are a gating mechanism in recurrent neural networks.

Task Variables:

Table 80. GRU_Task variables

Variable name

Description

Type

GPU_NODES_ONLY

If True, the tasks will be executed on GPU nodes.

Boolean (default=True)

EMBEDDING_DIM

The dimension of the vectors that will be used to map words in some languages.

Integer (default=50)

HIDDEN_DIM

Hidden dimension of the neural network.

Integer (default=40)

DROPOUT

Percentage of the neurons that will be ignored during the training.

Float (default=0.5)

Usage: It should be connected to Train_Text_Classification_Model.

PyTorch is used to build the model architecture based on GRU.
LSTM

Task Overview: Long short-term memory (LSTM) units (or blocks) are a building unit for layers of a recurrent neural network (RNN).

Task Variables:

Table 81. LSTM_Task variables

Variable name

Description

Type

GPU_NODES_ONLY

If True, the tasks will be executed on GPU nodes.

Boolean (default=True)

EMBEDDING_DIM

The dimension of the vectors that will be used to map words in some languages.

Integer (default=50)

HIDDEN_DIM

Hidden dimension of the neural network.

Integer (default=40)

DROPOUT

Percentage of the neurons that will be ignored during the training.

Float (default=0.5)

HUsage: It should be connected to Train_Text_Classification_Model.

PyTorch is used to build the model architecture based on LSTM.
RNN

Task Overview: A recurrent neural network (RNN) is a class of artificial neural network where connections between units form a directed graph along a sequence.

Task Variables:

Table 82. RNN_Task variables

Variable name

Description

Type

GPU_NODES_ONLY

If True, the tasks will be executed on GPU nodes.

Boolean (default=True)

EMBEDDING_DIM

The dimension of the vectors that will be used to map words in some languages.

Integer (default=50)

HIDDEN_DIM

Hidden dimension of the neural network.

Integer (default=40)

DROPOUT

Percentage of the neurons that will be ignored during the training.

Float (default=0.5)

Usage: It should be connected to Train_Text_Classification_Model.

PyTorch is used to build the model architecture based on RNN.

15.3.6. Train Model

Train_Image_Classification_Model

Task Overview: Train a model using a Convolutional Neural Network (CNN) algorithm.

Task Variables:

Table 83. Train_Image_Classification_Model variables

Variable name

Description

Type

GPU_NODES_ONLY

If True, the tasks will be executed on GPU nodes.

Boolean (default=True)

NUM_EPOCHS

Number of times all the training vectors are used once to update the weights.

Integer (default=1)

BATCH_SIZE

Batch size to be used.

Integer (default=4)

NUM_WORKERS

Number of subprocesses to use for data loading. 0 means that the data will be loaded in the main process.

Integer (default=2)

SHUFFLE

Set to True to have the data reshuffled at every epoch.

Boolean (default=True)se for data loading. 0 means that the data will be loaded in the main process.

Usage: Could be connected to Predict_Image_Classification_Model and Download_Model.

Train_Text_Classification_Model

Task Overview: Train a model using a Recurrent Neural Network (RNN) algorithm.

Task Variables:

Table 84. Train_Text_Classification_Model variables

Variable name

Description

Type

GPU_NODES_ONLY

If True, the tasks will be executed on GPU nodes.

Boolean (default=True)

LEARNING_RATE

Determines how quickly or how slowly you want to update the parameters.

Float (default=0.001)

OPTIMIZER

Choose the optimization algorithm that would help you to minimize the Loss function. Different options are available (42B, 840B, twitter.27B,6B).

List (default=Adam)

LOSS_FUNCTION

Choose the function that will be used to compute the loss. Different options are available (Adam,RMS, SGD, Adagrad, Adadelta).

List (default=NLLLoss)

EPOCHS

Number of times all the training vectors are used once to update the weights.

Integer (default=10)

TRAINABLE

True if you want to update the embedding vectors during the training process.

Boolean (default=False)

GLOVE

Choose the glove vectors that need to be used for words embedding. Different options are available (42B, 840B, twitter.27B,6B)

List (default=6B)

USE_GPU

True if you need to execute the training task in a GPU node.

Boolean (default=True)

Usage: Could be connected to Predict_Text_Classification_Model and Download_Model.

Train_Image_Segmentation_Model

Task Overview: Train a model using an image segmentation algorithm.

Task Variables:

Table 85. Train_Image_Segmentation_Model variables

Variable name

Description

Type

GPU_NODES_ONLY

If True, the tasks will be executed on GPU nodes.

Boolean (default=True)

NUM_EPOCHS

Number of times all the training vectors are used once to update the weights.

Integer (default=1)

BATCH_SIZE

Batch size to be used.

Integer (default=1)

NUM_WORKERS

Number of subprocesses to use for data loading. 0 means that the data will be loaded in the main process.

Integer (default=1)

SHUFFLE

Set to True to have the data reshuffled at every epoch.

Boolean (default=True)se for data loading. 0 means that the data will be loaded in the main process.

Usage: Could be connected to Predict_Image_Segmentation_Model and Download_Model.

Train_Image_Object_Detection_Model

Task Overview: Train a model using an object detection algorithm.

Task Variables:

Table 86. Train_Image_Object_Detection_Model variables

Variable name

Description

Type

GPU_NODES_ONLY

If True, the tasks will be executed on GPU nodes.

Boolean (default=True)

NUM_EPOCHS

Number of times all the training vectors are used once to update the weights.

Integer (default=1)

BATCH_SIZE

Batch size to be used.

Integer (default=1)

NUM_WORKERS

Number of subprocesses to use for data loading. 0 means that the data will be loaded in the main process.

Integer (default=1)

15.3.7. Predict

Predict_Image_Classification_Model

Task Overview: Generate predictions using a trained model.

Table 87. Predict_Image_Classification_Model variables

Variable name

Description

Type

GPU_NODES_ONLY

If True, the tasks will be executed on GPU nodes.

Boolean (default=True)

BATCH_SIZE

Batch size to be used.

Integer (default=4)

NUM_WORKERS

Number of subprocesses to use for data loading. 0 means that the data will be loaded in the main process.

Integer (default=2)

SHUFFLE

Set to True to have the data reshuffled at every epoch.

Boolean (default=True) for data loading. False means that the data will be loaded in the main process.

Usage: It should be used after the tasks Train_Image_Classification_Model or Download_Model.

Predict_Text_Classification_Model

Task Overview: Generate predictions using a trained model.

Table 88. Predict_Text_Model variables

Variable name

Description

Type

GPU_NODES_ONLY

If True, the tasks will be executed on GPU nodes.

Boolean (default=True)

LOSS_FUNCTION

Choose the function that will be used to compute the loss.

List (default=NLLLoss)

Usage: It should be used after the tasks Train_Text_Classification_Model or Download_Model.

Predict_Image_Segmentation_Model

Task Overview: Generate predictions using a trained segmentation model.

Table 89. Predict_Image_Segmentation_Model variables

Variable name

Description

Type

GPU_NODES_ONLY

If True, the tasks will be executed on GPU nodes.

Boolean (default=True)

BATCH_SIZE

Batch size to be used.

Integer (default=4)

NUM_WORKERS

Number of subprocesses to use for data loading. 0 means that the data will be loaded in the main process.

Integer (default=2)

SHUFFLE

Set to True to have the data reshuffled at every epoch.

Boolean (default=True)se for data loading. False means that the data will be loaded in the main process.

Usage: It should be used after the tasks Train_Image_Classification_Model or Download_Model.

Predict_Image_Object_Detection_Model

Task Overview: Generate predictions using a trained object detection model.

Table 90. Predict_Image_Object_Detection_Model Task variables

Variable name

Description

Type

GPU_NODES_ONLY

If True, the tasks will be executed on GPU nodes.

Boolean (default=True)

BATCH_SIZE

Batch size to be used.

Integer (default=1)

NUM_WORKERS

Number of subprocesses to use for data loading. 0 means that the data will be loaded in the main process.

Integer (default=1)

SHUFFLE

Set to True to have the data reshuffled at every epoch.

Boolean (default=True)se for data loading. False means that the data will be loaded in the main process.

Usage: It should be used after the task Predict_Image_Object_Detection_Model or Download_Model.

Model_Explainability

Task Overview: Explain a Deep learning Model using GradientExplainer.

Table 91. Model_Explainability Task variables

Variable name

Description

Type

GPU_NODES_ONLY

If True, the tasks will be executed on GPU nodes.

Boolean (default=True)

IMG_SAMPLES

Number of samples on which to explain the model’s output.

Integer (default=2)

IMG_LIST

Choose some images to explain.

List (default=1, 4, 6, 12)

FEATURE_LAYER

Choose a layer to explain.

String (default=features[7] for a VGG16 model). For example, you can use features[0] for AlexNet, layer1[0].conv1 for ResNet, etc.

RANKED_OUTPUTS

Explain many of the top model outputs determined by output rank order.

Integer (default=4).

Usage: It should be used after the task Train_Image_Classification_Model.

Shap uses GradientExplainer to explain a deep learning model. More information about this library can be found here or on GitHub repository here.
This task requires too much memory. You may receive an error message if you don’t have enough memory (RAM).

15.4. Data Visualization Bucket

The ai-data-visualization catalog integrates generic tasks that can be easily used to broadcast visualizations of the analytic results provided by AI tasks. It offers a large set of plots that can be organized programmatically or through the UI. These plots are used to create dashboards for both live and real-time data, inspect results of experiments, or debug experimental code. The ai-data-visualization catalog provides a fast, easy and practical way to execute different workflows generating these diverse visualizations that are automatically cached by the TensorBoard and Visdom Server. However, other visualization libraries can be integrated as well.

15.4.1. Visdom

It provides a large set of plots that can be organized programmatically or through the UI. These plots can be used to create dashboards for both live and real-time data, inspect results of experiments, or debug experimental code.

Visdom_Service_Start

Task Overview: Bind or/and Start Visdom server.

Task Variables:

Table 92. Visdom_Service_Start_Task variables

Variable name

Description

Type

SERVICE_ID

The id of the Visdom service.

String (default="Visdom")

INSTANCE_NAME

The instance name of the server to be used to broadcast the visualization.

String (default="visdom-server")

PROXIFIED

It takes by default the value of VISDOM_PROXYFIED workflow variable.

String (default="$VISDOM_PROXYFIED") (default=empty)

ENGINE

Container engine.

String (default="$CONTAINER_PLATFORM")

NATIVE_SCHEDULER

Name of the Native Scheduler node source to use when the workflow tasks must be deployed inside a cluster such as SLURM, LSF, etc.

String (default=empty)

NATIVE_SCHEDULER_PARAMS

Parameters given to the native scheduler (SLURM, LSF, etc) while requesting a ProActive node used to deploy the workflow tasks.

String (default=empty)

If two workflows use the same service instance names, then, their generated plots will be created on the same service instance.
Visdom_Service_Actions

Task Overview: Manage the life cycle of Visdom PSA service. It allows triggering three possible actions: Pause_Visdom, Resume_Visdom and Finish_Visdom.

Task Variables:

Table 93. Visdom_Service_Actions_Task variables

Variable name

Description

Type

INSTANCE_ID

The service instance ID.

String (default=Empty)

INSTANCE_NAME

The instance name of the server to be used to broadcast the visualization.

String

ACTION

The action that will be processed regarding the service status.

List [Pause_Visdom, Resume_Visdom and Finish_Visdom] (default="Finish_Visdom")

Visdom_Visualize_Results

Task Overview: Plot the different results obtained by a predictive model using Visdom.

Task Variables:

Variable name

Description

Type

TASK_ENABLED

If False, the task will be ignored, it will not be executed.

Boolean (default=True)

TARGETED_CLASS

The targeted class that you need to track

String

VISDOM_ENDPOINT

The Visdom endpoint to be used.

URL

Usage: This task has to be connected to Visdom_Service_Start. The Visdom server should be up in order to be able to broadcast visualizations.

Visdom_Plots

Task Overview: return numerous examples of plots covered by Visdom.

Task Variables:

Variable name

Description

Type

VISDOM_ENDPOINT

The Visdom endpoint to be used.

URL

15.4.2. Visdom Workflows

The following workflows present some examples using Visdom service to visualize the results obtained while training and testing some predictive models.

Visdom_Plots_Example: returns numerous examples of plots covered by Visdom.

Visdom_Realtime_Digit_Classification: shows an example of realtime plotting using the Visdom server for training a convolutional neural network (CNN) for MNIST digit classification.

Check_Visdom_Support: checks if the user wants (or not) to start the Visdom service.

A demo video of these workflows is available in ActiveEon Youtube Channel.

15.4.3. TensorBoard

It provides the visualization and tooling needed for machine learning experimentation such as tracking, and visualizing metrics such as loss and accuracy, visualizing the model graph (ops and layers), viewing histograms of weights, biases, or other tensors as they change over time, projecting embeddings to a lower dimensional space, displaying images, text, and audio data and others.

TensorBoard_Service_Start

Task Overview: Start the TensorBoard server as a service.

Task Variables:

Table 94. Tensorboard_Service_Start variables

Variable name

Description

Type

SERVICE_ID

The id of the Tensorboard service.

String (default="Tensorboard")

INSTANCE_NAME

The instance name of the server to be used to broadcast the visualization.

String

MOUNT_LOG_PATH

Specifies the path where TensorBoard logs are created and stored on the host.

String (default=/shared/$TENSORBOARD_HOST_LOG_PATH)

ENGINE

Container engine.

String (default="$CONTAINER_PLATFORM")

PROXIFIED

It takes by default the value of TENSORBOARD_PROXYFIED workflow variable.

String (default="$TENSORBOARD_PROXYFIED") (default=empty)

NATIVE_SCHEDULER

Name of the Native Scheduler node source to use when the workflow tasks must be deployed inside a cluster such as SLURM, LSF, etc.

String (default=empty)

NATIVE_SCHEDULER_PARAMS

Parameters given to the native scheduler (SLURM, LSF, etc) while requesting a ProActive node used to deploy the workflow tasks.

String (default=empty)

CONTAINER_ROOTLESS_ENABLED

If True, the user will be able to run the workflow in a rootless mode.

(default=False)

Tensorboard_Service_Actions

Task Overview: Manage the life cycle of TensorBoard PSA service. It allows triggering three possible actions: Pause_Tensorboard, Resume_Tensorboard and Finish_Tensorboard.

Task Variables:

Table 95. Tensorboard_Service_Actions_Task variables

Variable name

Description

Type

INSTANCE_ID

The service instance ID.

String (default=Empty)

INSTANCE_NAME

The instance name of the server to be used to broadcast the visualization.

String

ACTION

The action that will be processed regarding the service status.

List [Pause_Tensorboard, Resume_Tensorboard and Finish_Tensorboard] (default="Finish_Tensorboard")

15.4.4. TensorBoard Workflows

The following workflows present some examples using TensorBoard service to visualize the results obtained while training and testing some predictive models.

Tensorboard_Realtime_CIFAR10_Training: shows an example of real-time graph using TensorBoard for training a CNN using CIFAR10 database.

Tensorboard_Plots_Example: shows an example exposing the different plots available in TensorBoard.

Check_Tensorboard_Support: checks if the user wants (or not) to start the TensorBoard service.

15.5. Satellite Imagery Bucket

The satellite-imagery bucket contains some tasks that enable you to search and download Earth Observation products from different providers, such as Copernicus, Creodias, Mundi, Onda, Peps, Sobloo and Wekeo.

Below are some features available:

  • Execution of multiple tasks in parallel.

  • Display the download error codes and restart the task automatically if failed.

  • Maintain traceability between requests and images downloaded in a JSON file format.

  • Inform the user if there is not enough free space on the disk.

  • Manage quotas (max number of requests over a period of time, max number of simultaneous requests) for the Copernicus task.

15.5.1. Fetch_Images_From_Satellite_Platforms

It allows downloading the metadata and images from the Copernicus, Creodias, Mundi, Onda, Peps, Sobloo and Wekeo platforms.

A behavior diagram of the Copernicus platform is presented below.

ciar diagram

Task Variables:

Table 96. Fetch_Images_From_Satellite_Platforms variables

Variable name

Description

Type

SEARCH_ENGINE

Defines an engine to search and download satellite images.

List [Copernicus, Creodias, Mundi, Onda, Peps, Sobloo and Weke, All] (default=Peps)

IMPORT_FROM

Selects the type of data source.

List [PA:URL,PA:URI,PA:USER_FILE,PA:GLOBAL_FILE] (default=PA:GLOBAL_FILE)

FILE_PATH

Inserts a file path/name. =

String

TIME_TO_RETRIEVE_IN_SECONDS

Defines time in seconds to request an offline product from the Long Term Archive (LTA) in the Copernicus Open Access Hub’s.

Int (default=900 seconds)

TIME_TO_CHECK_ONLINE_IN_SECONDS

Defines time in seconds to check if a product is online.

Int (default=1800 seconds)

WALLTIME

Defines the maximum execution time of a task.

Time (default=24:00:00)

OUTPUT_PATH

Specifies the path where the data should be downloaded.

String

REQUIRED_LICENSES

Determines the policy used by the ProActive Scheduler to determine how Jobs and Tasks are scheduled.

String

Please click here to create a new user account from EODAG (Earth Observation Data Access Gateway) website.
Please click here to create a new user account from Copernicus website.
Note that the variables TIME_TO_RETRIEVE_IN_SECONDS, TIME_TO_CHECK_ONLINE_IN_SECONDS and WALLTIME are only valid for the Copernicus platform. In addition, it is necessary to define a policy used by the ProActive Scheduler. In this case, only two tasks must be executed simultaneously.

15.5.2. Fetch_Satellite_Images_From_PEPS

Task Overview: Load and return a PEPS dataset including a metadata folder with metadata files and images folder containing satellite images.

Task Variables:

Table 97. Fetch_Satellite_Images_From_PEPS variables

Variable name

Description

Type

LOCATION

Defines a town name.

String (default=Indonesia)

PLATFORM_NAME

Specifies an instrument on a Sentinel satellite.

List [S1, S2, S2ST, S3] (default=S2)

PRODUCT_TYPE

Limits the search to a Sentinel product type.

List [GRD, SLC, OCN (for S1) or S2MSI1C S2MSI2A S2MSI2Ap (for S2)] (default=S2MSI1C)

SENSOR_MODE

Specifies the search to a Sentinel sensor mode.

List [EW, IW , SM, WV (for S1) or INS-NOBS, INS-RAW (for S3)] (default=INS-NOBS)

START_DATE

Determines a start date of the query in the format YYYYMMDD.

String

END_DATE

Define an end date of the query in the format YYYYMMDD.

String

TILE

Limits the search to a tile number.

String

LATITUDE

Determines the search to a latitude in decimal degrees.

Float

LONGITUDE

Limits the search to a longitude in decimal degrees.

Float

OUTPUT_PATH

Specifies the path where the data should be downloaded.

String

Please add third party credentials (USER_NAME_PEPS and USER_PASS_PEPS) in the Scheduling & Orchestration interface or Workflow Execution → Manage Third-Party Credentials to connect to PEPS.
More information about the source of this task can be found here.

15.5.3. Fetch_Satellite_Images_From_Copernicus

Task Overview: Load and return a Copernicus dataset including a metadata folder with metadata files and images folder containing satellite images according to the resolution & image band selected by user.

Task Variables:

Table 98. Fetch_Satellite_Images_From_Copernicus variables

Variable name

Description

Type

PLATFORM_NAME

Specifies an instrument on a Sentinel satellite.

String [Sentinel-1, Sentinel-2, Sentinel-3, Sentinel-4, Sentinel-5, Sentinel-5, Precursor, Sentinel-6](default=Sentinel-2)

FOOTPRINT

Defines a geojson file with footprints of the query result.

String

PRODUCT_TYPE

Limits the search to a Sentinel product type.

List [GRD, SLC, OCN (for S1) or S2MSI1C S2MSI2A S2MSI2Ap (for S2)] (default=S2MSI1C)

START_DATE

Determines a start date of the query in the format YYYYMMDD.

String

END_DATE

Defines an end date of the query in the format YYYYMMDD.

String

USE_START_AND_END_DATE

If True, it uses the date defined in the START_DATE and START_DATE fields, otherwise it downloads all scenes that were published in the last 24 hours.

Boolean (default=True)

SPATIAL_RESOLUTION

Defines granule dimensions for each resolution band.

List [10m, 20m, 60m] (default=10m)

IMAGE_BAND

Determines from 13 spectral bands spanning from the Visible and Near Infra-Red (VNIR) to the Short Wave Infra-Red (SWIR).

List [All, B01, B02, B03, B04, B05, B06, B07, B08, B8A, B09, B10, B11, B12, TCI] (default=All)

OUTPUT_PATH

Specifies the path where the data should be downloaded.

String

Please add third party credentials (USER_NAME_COP and USER_PASS_COP) in the Scheduling & Orchestration interface or Workflow Execution → Manage Third-Party Credentials to connect to Copernicus.
More information about the source of this task can be found here.

Activeeon SAS, © 2007-2019. All Rights Reserved.

For more information, please contact contact@activeeon.com.