How AI Powers Dashlane’s Autofill, Without Compromising Your Privacy

Updated:May 29, 2025

Dashlane’s autofill uses the power of AI to deliver both accuracy and privacy for an optimal user experience. Learn how.

Have you ever wondered how Dashlane magically knows which information to fill in which form fields, despite the millions of websites out there, so we can offer our users an amazing experience? Or how we manage to protect your data while still making your online experience seamless?

Today, we're pulling back the curtain on one of our most powerful yet privacy-conscious features: Our AI-powered autofill system.

The challenge: Intelligence without invasion

When most companies build AI systems, they follow a simple formula: Collect user data, send it to the cloud, analyze it there, and send back results. But at Dashlane, our zero-knowledge architecture means we never have access to your passwords, personal information, or even browsing history.

This creates two significant challenges for our team:

The authentication wall problem: Most forms worth filling out live behind login screens. To build a dataset to train our models, we need examples of these forms, but captchas and two-factor authentication make it challenging to automate.
The privacy paradox: We need to build smart models that understand different types of forms and fields, but we can't use traditional cloud-based machine-learning approaches that would compromise user data.

So how did we solve these challenges? Let's dive in.

Data collection: The collaborative approach

Instead of collecting data from our users, we built "Vortex for Dashlaners," an internal tool our team members (Dashlaners) voluntarily run on their devices.

This tool identifies when a Dashlaner encounters a form that isn't yet in our database and invites them to contribute it (after removing all personal information).

A screenshot of the Datadog login screen shows a Dashlane pop-up that says “Unknown form detected. Capture this form to improve the autofill accuracy.” Beside it is a blue button that says “Capture page.

We even made it fun with a leaderboard and experience points, turning data collection into a collaborative effort.

A screenshot that says Vortex leaderboard shows the top 5 form-capture contributors of all time, with an all-time total of 6778 points.

Standardizing web form classification: The SAWF standard

Before training our models to automatically recognize form elements across the web, we needed to define the complete space of possible tags that could be assigned to forms and fields. This classification led us to develop the Semantically Annotated Web Forms (SAWF) standard.

SAWF introduces the data-form-type attribute as a standardized way to semantically tag forms and form fields in HTML, establishing best practices that developers can follow.

The standard precisely defines:

Form types: A comprehensive taxonomy including login, registration, payment, shipping, billing, search, and other specialized forms.
Field types: Detailed specifications for categorizing input fields across dozens of classifications, such as username, password, email, address components, and payment.
Hierarchical relationships: Hierarchical relationships in SAWF define parent-child structures for both forms and fields. For forms, the hierarchy can capture a multi-step process. For example, in a "login" form we can ask you for your email in the first "step," then for your password in the "final" step. For fields, hierarchies organize related inputs into taxonomies. For instance, a general "name" field type has subtypes, including first name, family name, and maiden name, while “date” type can be structured with day, month, and year as children.

By creating this classification standard, we've not only established clear training targets for our AI models but also provided an implementation blueprint that can improve autofill accuracy across the entire web ecosystem.

Smart labeling with generative AI

Once we had collected thousands of forms, we faced another challenge: Understanding what each field was for. One method is to use contractors to manually identify and label each field type (like marking "this is asking for an email" or "this is for a password"). This human labeling provides the examples our AI needs to learn what different form fields look like.

However, this process requires significant time and training for contractors to understand the taxonomy. Since the web is constantly evolving with new form patterns and designs emerging regularly, waiting for contractors to complete the labeling process every time we need to update our model with fresh samples isn't feasible.

Instead, we leverage generative AI (GenAI), specifically for form field classification. Because this labeling happens completely offline on our collected form dataset (not on your personal information), we can use powerful, cutting-edge AI models without any privacy concerns.

Once we've labeled our entire dataset this way, we move to the last stage: Training a separate, highly optimized AI model that's specifically designed to be lightweight enough to run inside your browser extension. This specialized model distills the intelligence from the larger model into a compact form that can perform predictions in milliseconds without ever sending your data to our servers.

But how do we maintain high-quality predictions with a more optimized model, you might wonder?

Finding the right signal in the web page

During our labeling process with GenAI, we can provide the complete HTML content of pages because these models can process vast amounts of information at once.

However, our production model faces different constraints. Browser extensions need compact, efficient models with much smaller "context windows" as they simply can't process entire web pages at once like their larger counterparts.

To bridge this gap, we employ a two-step process. First, we have a detection phase where we identify if and where the forms are in the Document Object Model (DOM) structure of the page. Our algorithm scans the page for both traditional HTML forms and "pseudo-forms" (groups of input fields that function as forms but aren't explicitly tagged as such).

Once we've located the forms, we extract meaningful input signals through a scraping process. We gather both the technical attributes of each field (like HTML tag types, input types, and field names) and the human-readable text that users actually see on the page (labels, placeholder text, and surrounding content).

These carefully selected clues provide the essential information our compact model needs to accurately classify each field without requiring the entire page content.

On the left is a standard login screen with the username field bordered in red and “Detection” labeled underneath everything. There is an arrow pointing to the right of the login screen showing human-readable text labeled Scraping. Below that is an arrow pointing down toward a network icon that says “username” underneath and is labeled Prediction.

The Dashlane autofill cycle: Continuous improvement with privacy by design

Our solution implements a comprehensive cycle that continuously improves our autofill capabilities, as illustrated in the figure below:

Data collection: When Dashlaners navigate to unknown forms across the web, they are captured and stored in our Vortex database.

Smart labeling: These captured forms are then processed by a GenAI system, which tags and classifies each field based on its context and the SAWF standard.

Model training: The tagged forms serve as an input to train our AI autofill model, which is then integrated in the extension and the mobile app.

A diagram starts with “Dashlaner navigates to unknown form. Beside it is an arrow labeled “capture” that leads to the next part of the diagram, “Unknown forms in Vortex.” This is followed by an arrow that leads to “GenAI tags captured forms,” which is followed by an arrow that leads to “Forms tagged by GenAI.” This is followed by an arrow that leads to “Tagged forms serve as input to DL model,” which is followed by an arrow that leads to “DL autofill model.”

The result? A form-filling experience that feels magical but never compromises your privacy. Your sensitive data stays on your device, right where it belongs.

In conclusion, Dashlane's AI-powered autofill system offers a unique approach that prioritizes user privacy above all else. By utilizing internal data collection, a standardized form classification system (SAWF), and GenAI for smart labeling, we've built a robust and intelligent autofill engine. This engine operates entirely on your device, ensuring your sensitive information remains protected.

The continuous cycle of data collecting, smart labeling, and model training allows us to consistently improve the accuracy and efficiency of our autofill capabilities, all while upholding our commitment to privacy by design.

Share content

The challenge: Intelligence without invasion
Data collection: The collaborative approach
Standardizing web form classification: The SAWF standard
Smart labeling with generative AI
Finding the right signal in the web page
The Dashlane autofill cycle: Continuous improvement with privacy by design

Manage all your passwords securely on every device.

Buy now

Engineering

A Journey Toward a Phishing-Resistant Future

Engineering

The Last Line of Defense: How AI-Powered Real-Time Phishing Detection Builds Phishing Resistance

Engineering

Dr. Kaouther Ouenniche is a machine learning engineer at Dashlane with a Ph.D. in Artificial Intelligence from Institut Polytechnique de Paris. Since joining Dashlane in 2024, she's focused on integrating AI capabilities into security features while maintaining user privacy.

How AI Powers Dashlane’s Autofill, Without Compromising Your Privacy

The challenge: Intelligence without invasion

Data collection: The collaborative approach

Standardizing web form classification: The SAWF standard

Smart labeling with generative AI

Finding the right signal in the web page

The Dashlane autofill cycle: Continuous improvement with privacy by design

Table of contents

The challenge: Intelligence without invasion

Data collection: The collaborative approach

Standardizing web form classification: The SAWF standard

Smart labeling with generative AI

Finding the right signal in the web page

The Dashlane autofill cycle: Continuous improvement with privacy by design

Related articles

A Journey Toward a Phishing-Resistant Future

The Last Line of Defense: How AI-Powered Real-Time Phishing Detection Builds Phishing Resistance

How AI Powers Dashlane’s Autofill, Without Compromising Your Privacy

10 Years of Bug Bounty: Lessons in Building a Strong Security Culture

Use Dashlane to login on your Android watch with Passkeys

Dashlane’s Commitment to CISA’s Secure by Design Pledge