How AI Powers Dashlane’s Autofill, Without Compromising Your Privacy

Have you ever wondered how Dashlane magically knows which information to fill in which form fields, despite the millions of websites out there, so we can offer our users an amazing experience? Or how we manage to protect your data while still making your online experience seamless?
Today, we're pulling back the curtain on one of our most powerful yet privacy-conscious features: Our AI-powered autofill system.
The challenge: Intelligence without invasion
When most companies build AI systems, they follow a simple formula: Collect user data, send it to the cloud, analyze it there, and send back results. But at Dashlane, our zero-knowledge architecture means we never have access to your passwords, personal information, or even browsing history.
This creates two significant challenges for our team:
- The authentication wall problem: Most forms worth filling out live behind login screens. To build a dataset to train our models, we need examples of these forms, but captchas and two-factor authentication make it challenging to automate.
- The privacy paradox: We need to build smart models that understand different types of forms and fields, but we can't use traditional cloud-based machine-learning approaches that would compromise user data.
So how did we solve these challenges? Let's dive in.
Data collection: The collaborative approach
Instead of collecting data from our users, we built "Vortex for Dashlaners," an internal tool our team members (Dashlaners) voluntarily run on their devices. This tool identifies when a Dashlaner encounters a form that isn't yet in our database and invites them to contribute it (after removing all personal information).

We even made it fun with a leaderboard and experience points, turning data collection into a collaborative effort.

Standardizing web form classification: The SAWF standard
Before training our models to automatically recognize form elements across the web, we needed to define the complete space of possible tags that could be assigned to forms and fields. This classification led us to develop the Semantically Annotated Web Forms (SAWF) standard.
SAWF introduces the data-form-type attribute as a standardized way to semantically tag forms and form fields in HTML, establishing best practices that developers can follow.
The standard precisely defines:
- Form types: A comprehensive taxonomy including login, registration, payment, shipping, billing, search, and other specialized forms.
- Field types: Detailed specifications for categorizing input fields across dozens of classifications, such as username, password, email, address components, and payment.
- Hierarchical relationships: Hierarchical relationships in SAWF define parent-child structures for both forms and fields. For forms, the hierarchy can capture a multi-step process. For example, in a "login" form we can ask you for your email in the first "step," then for your password in the "final" step. For fields, hierarchies organize related inputs into taxonomies. For instance, a general "name" field type has subtypes, including first name, family name, and maiden name, while “date” type can be structured with day, month, and year as children.
By creating this classification standard, we've not only established clear training targets for our AI models but also provided an implementation blueprint that can improve autofill accuracy across the entire web ecosystem.
Smart labeling with generative AI
Once we had collected thousands of forms, we faced another challenge: Understanding what each field was for. One method is to use contractors to manually identify and label each field type (like marking "this is asking for an email" or "this is for a password"). This human labeling provides the examples our AI needs to learn what different form fields look like.
However, this process requires significant time and training for contractors to understand the taxonomy. Since the web is constantly evolving with new form patterns and designs emerging regularly, waiting for contractors to complete the labeling process every time we need to update our model with fresh samples isn't feasible.
Instead, we leverage generative AI (GenAI), specifically for form field classification. Because this labeling happens completely offline on our collected form dataset (not on your personal information), we can use powerful, cutting-edge AI models without any privacy concerns.
Once we've labeled our entire dataset this way, we move to the last stage: Training a separate, highly optimized AI model that's specifically designed to be lightweight enough to run inside your browser extension. This specialized model distills the intelligence from the larger model into a compact form that can perform predictions in milliseconds without ever sending your data to our servers.
But how do we maintain high-quality predictions with a more optimized model, you might wonder?
Finding the right signal in the web page
During our labeling process with GenAI, we can provide the complete HTML content of pages because these models can process vast amounts of information at once. However, our production model faces different constraints. Browser extensions need compact, efficient models with much smaller "context windows" as they simply can't process entire web pages at once like their larger counterparts.
To bridge this gap, we employ a two-step process. First, we have a detection phase where we identify if and where the forms are in the Document Object Model (DOM) structure of the page.Our algorithm scans the page for both traditional HTML forms and "pseudo-forms" (groups of input fields that function as forms but aren't explicitly tagged as such).
Once we've located the forms, we extract meaningful input signals through a scraping process. We gather both the technical attributes of each field (like HTML tag types, input types, and field names) and the human-readable text that users actually see on the page (labels, placeholder text, and surrounding content). These carefully selected clues provide the essential information our compact model needs to accurately classify each field without requiring the entire page content.

The Dashlane autofill cycle: Continuous improvement with privacy by design
Our solution implements a comprehensive cycle that continuously improves our autofill capabilities, as illustrated in the figure below:
Data collection: When Dashlaners navigate to unknown forms across the web, they are captured and stored in our Vortex database.
Smart labeling: These captured forms are then processed by a GenAI system, which tags and classifies each field based on its context and the SAWF standard.
Model training: The tagged forms serve as an input to train our AI autofill model, which is then integrated in the extension and the mobile app.

The result? A form-filling experience that feels magical but never compromises your privacy. Your sensitive data stays on your device, right where it belongs.
In conclusion, Dashlane's AI-powered autofill system offers a unique approach that prioritizes user privacy above all else. By utilizing internal data collection, a standardized form classification system (SAWF), and GenAI for smart labeling, we've built a robust and intelligent autofill engine. This engine operates entirely on your device, ensuring your sensitive information remains protected.
The continuous cycle of data collecting, smart labeling, and model training allows us to consistently improve the accuracy and efficiency of our autofill capabilities, all while upholding our commitment to privacy by design.
Sign up to receive news and updates about Dashlane