How We Study & Improve The Accuracy of Our Machine Learning Models
At Dashlane, we use Machine Learning Models to classify HTML fields' categories (for example, username, email, address, and so on) and fill them with appropriate data from users' vaults.
It's unavoidable for the model to make incorrect predictions, and we’re looking at ways to improve our user experience by fixing these issues quickly.
During the last Dashlane Tech Week (a variation on our Hackathons focusing on technical projects), we achieved promising results with data augmentation techniques.
The idea is simple: If we find similar training examples with respect to a given incorrect prediction, we can generate variants of that example and assign the expected label to them.
Then we can add these new examples to the training dataset and retrain the model.
Identifying similar training examples
To identify similar training examples, we first need to generate embeddings (multi-dimensional vectors) that represent the semantic of our training data.
Researchers and Machine Learning practitioners usually obtain these embeddings by training self-supervised models on large amounts of textual data.
In this proof of concept, we leverage SentenceTransformers for this purpose.
After generating the embeddings, we identify similar training examples simply by comparing the cosine similarity of:
- The data point that has incorrect prediction's embeddings
- The training data's embeddings
Now we have access to a collection of similar training examples.
To identify more relevant training examples, we experimented with TSDAE, as the built-in pre-trained sentence embeddings couldn't capture domain-specific knowledge well.
The TSDAE method allows us to fine-tune the pre-trained embeddings on our dataset in order to better represent our data.
Generating additional training data through data augmentation
We experimented with some very simple methods provided by the nlpaug library:
- Swap: Swapping two random words in the input text. For example, "A B C" would become "A C B."
- Delete: Deleting random words in the input text. For example, "A B C" would become "A B." We experimented with "swap+delete," in other words, generating 50% of the examples using swap and the rest with delete.
- Synonym replacement: For example, "The quick brown fox jumps over the lazy dog" would become "The speedy brown fox jumps complete the lazy dog." We experimented with "swap+delete+synonym," in other words, generating one-third of the examples using swap, one-third with delete, and the rest with synonyms.
We conducted some quick experiments with more advanced techniques (contextual word embeddings), but the number of fixes for incorrect predictions was really small compared to the simpler ones.
However, it would be interesting to experiment more in-depth with this approach in future work because the simple techniques don’t take word semantics into account.
We used these two metrics to assess the performance of our approach:
- Test F1 score: The weighted F1 score on the test dataset.
- Number of fixed predictions: These results confirm that TSDAE helps identify similar training examples better than the built-in pre-trained model (all-MiniLM-L6-v2).
Improving & moving forward
This approach was inspired by data-centric AI, which improves the model by improving the data quality.
We’ll keep experimenting with new techniques to make our predictive models more accurate and our customers' lives easier (and ours too).
If you are curious about other technical work happening at Dashlane, feel free to check out the Dashlane Github account at: https://github.com/Dashlane/.