What We’ve Learned From Tracking Machine Learning Experiments With ClearML

Originally published:June 19, 2023|Last updated:May 16, 2025

Co-written by Tien Duc Cao, Dashlane Senior Machine Learning Engineer, and Quentin Grail, Dashlane Machine Learning Engineer

At Dashlane, we’re working on multiple experimental projects that involve machine learning (ML) applications. To manage experiments, we use ClearML, an open-source MLOps framework that allows us to better organize all our machine learning experiments.

When running ML pipelines, there are three crucial steps for creating a successful application:

Data management: We plan to train lots of models on different datasets and different versions of the same dataset. We will also test our model on a different data set. To accomplish this, we need a clear organization of all the data where we can list, visualize, and retrieve the information as efficiently as possible.
Experiment tracking: We want to be able to monitor the results of the experiments, visualize them, and compare the performance between the runs.
Model storing: Once we’ve selected the best model for our task or decided to switch to a different one, we want to easily retrieve its parameters to put them into our production pipeline.

ClearML helps us perform these three tasks seamlessly.

ClearML features we found useful

Three features stand out from the rest:

Integration with Git: When launching a new task, ClearML automatically registers the commitID, along with the uncommitted changes of the repository. This is super useful for the experiment’s reproducibility and iterating on a successful run.

Resource monitoring: GPU and CPU utilization and memory are tracked in the metrics. This allows us to get the best out of our hardware by optimizing the model’s hyperparameters specifically for the available GPU. Also, it helps us track the pipeline’s possible bottlenecks that should be improved.

Hyperparameter tracking: ClearML automatically captures all the arguments that you give to your script (through argparse, for example). Then, you can easily filter and compare the impact of these parameters.

An example of hyperparameter tracking displayed with the model accuracy

There are two line graphs side by side. On the left is an accuracy monitoring graph. On the right is a loss monitoring graph.

On a black background, there is a ClearML dashboard that lists 3 completed trainings.

ClearML 101

To help you get started with ClearML, here’s some important terminology:

Task: We use ClearML tasks to keep track of Python scripts (or even Jupyter notebooks). There are many supported task types.
Artifact: We use ClearML artifacts to store tasks’ inputs and/or outputs.
Tag: Each ClearML can be associated with one or more tags, which are descriptive keywords that help you filter your tasks.

ClearML integration

Integrating ClearML into your existing project is really simple.

The configuration should be done only once by setting these three environment variables:

You could also define these variables in your CI/CD settings if you want to run your training pipelines through GitHub or GitLab pipelines.

To integrate ClearML, you only need to add these lines of code in the beginning of your Python script:

PROJECT_NAME: Put similar tasks, for example, train one model architecture with different parameters, into the same project.
TASK_NAME: In one (short) sentence, describe the purpose of your Python script.
TAGS: Add your list of tags. For example, “model-cnn,exp-data-augmentation.”
DESCRIPTION: Put detailed explanations of your experiment here.

Your model’s training progress, for example, train loss, validation loss, and more, are automatically tracked if you use one of these Python frameworks.

If you want to track a model's metrics manually, such as the F1 score on the test set, you can do that with one line of code:

How we organized experiments

ClearML provides some useful mechanisms for us to better organize our ML experiments.

Sub-projects: For example, you could create “X/models” to keep track of all your training tasks and create “X/datasets” to store different training datasets. These two sub-projects are stored under “X” project.
Tags: These short keywords describe the purpose of your experiments. They’re helpful for grouping all similar experiments. Since there’s no restriction for tag naming, it’s up to you to determine the naming conventions that make these tags more manageable for searching and organizing your experiments. We defined some simple prefixes:

“Model-” for the tasks that train ML models
“data-” for the tasks that perform data ingestion/processing
“exp-” for the tasks that experiment with some new ideas

How we used artifacts to keep track of data and models

You can save an object as a ClearML artifact with a single line of code. You can later access this artifact by 1) manually downloading the saved object from the web interface when looking at the details of your task or 2) retrieving any job directly within your code and using the artifact in another experiment.

We are mainly using artifacts for three different applications at Dashlane:

To store the datasets: We register and tag our training and testing datasets as artifacts so every other job can access them. We easily know which version of the data has been used for each job. We can later filter the experiments by training data or comparing the results on a specific dataset.

To store the trained models: We always save the trained model so we can download it later to evaluate it on a new dataset or use it in another application.

To analyze model outputs: When evaluating a registered model on a registered dataset, we also save the model’s predictions as an artifact. Then, we can locally download the predictions and start analyzing the results.

ClearML tips

There’s no difference between launching your Python script in your own machine or a remote machine. So to avoid potential problems, you could:

Configure an environment variable that tells your training script to use only a small portion of the training data and to train with only one epoch. This helps you to spot potential bugs from your code while keeping every experiment’s input/output logged into ClearML experiments.
Launch the real training process with all available data in a remote machine after fixing all problems in the previous step.

ClearML tracks uncommitted changes so you can iterate some new ideas quickly without creating too many small commits.

ClearML helped us at Dashlane to better organize ML experiments so we could deliver more reliable ML models to Dashlane users. Perhaps it can help your organization too.

At Dashlane, we’re dealing with multiple ML-related projects, including our autofill engine presented here. We also experiment with novel ML-powered features internally. By efficiently training and selecting the best models, we can provide a better product experience to our customers. ClearML helps us better organize ML experiments so we can deliver more reliable ML models.

ClearML machine learning

Dashlane

Dashlane provides complete credential security, protecting businesses against the threat of human risk. Our intelligent Omnix™ platform unifies credential protection and password management, equipping security teams with proactive intelligence, real-time response, and protected access to secure every employee. Over 25,000 brands worldwide, including Michelin, Sephora, Air France, and Forrester, trust Dashlane for industry-leading innovations, patented zero-knowledge security, and an unmatched user experience.