Strange Security: What to Know About Generative AI and Cybersecurity
While tech companies have long been using artificial intelligence in their products, the launch of tools like OpenAI’s ChatGPT and DALL-E means generative AI is now accessible to anyone with a computer or smartphone—including threat actors.
AI researchers and cybersecurity experts have begun to poke holes in generative AI platforms in search of security vulnerabilities. The EU’s General Data Protection Regulation (GDPR) has also set provisions around generative AI, all in the name of protecting individuals’ data and getting ahead of cyber threats.
As with any rapidly advancing technology, generative AI has opened up many questions about the safety of our data. As these tools continue to change how we operate as businesses and individuals, they’ve also created concerns around ethics and job security.
Here’s what to keep in mind while navigating this new and unfamiliar technological landscape.
What are generative AI and large language models (LLMs)?
Generative AI is a form of machine learning tech that allows users to input a plain language prompt and generate text, code, images, audio, and video in seconds. OpenAI, an artificial intelligence research lab, developed two of the most prominent generative AI tools: DALL-E, which creates images based on user prompts, and ChatGPT, the world’s most advanced chatbot. ChatGPT is capable of responding to search queries and generating text like movie scripts, poetry, and code. It’s also classified as a natural language processing model, or large language model (LLM), meaning it was trained on a massive dataset that includes websites, books, and articles. This allows ChatGPT to identify patterns in order to make predictions and take an eerily human-like approach when solving problems and answering questions.
ChatGPT is far from the only platform offering accessible generative AI. Companies like Bard and Microsoft’s Bing AI hint that the market is still evolving and becoming competitive.
What are the ethical concerns around generative AI?
Imagine: A trailer for The Lord of the Rings in the style of Wes Anderson. Drake rapping Ice Spice lyrics. Your spouse’s surprisingly succinct and clever wedding vows.
Until recently, producing these would take significant time, resources, and skill to create. Now, thanks to generative AI tools, ideas once left to die in your Notes app can be brought to life suddenly and with little effort—all you need is a text-based prompt.
Generative AI is built on “training data,” which can include any publicly available content and open-source code—uploaded artwork, Google searches, and public forums are fair game. When generative AI produces “new” content, it’s leveraging training data; in this case, original content made by artists, writers, and developers.
Some companies have recently come under fire for using content produced by generative AI for profit or even replacing content creators with generative AI tools. While this shift in industries goes largely unregulated, some creators and developers are getting the short end of the stick.
Preventing large language models from using your intellectual property is not easy. The music corporation Universal Music Group (UMG), for example, requested that streaming services no longer allow work by its artists—which include Drake, Ariana Grande, and Taylor Swift—to be used for AI training data. Platforms like Apple Music and Spotify have yet to respond, while the US Copyright Office says it will seek public input as part of its initiative to examine current copyright laws and policies surrounding AI.
Some artists are using generative AI to enhance their own work, but many have pointed out that this is equivalent to stealing from other creators. Recently, a documentary photographer sparked an industry debate when he used AI to create a photojournalism project on Cuban refugees. As advancements make it harder to discern real photographs from AI-generated images, critics question how this will affect the validity and ethics of documentary photography.
How does generative AI impact online safety?
Users have found workarounds to safety restrictions in place for generative AI tools, allowing platforms like ChatGPT to spit out harmful content or malicious code.
Recently, WIRED reported on research around “indirect prompt-injection attacks,” which are possible with LLMs. Cybersecurity and AI researchers have tested the vulnerabilities of generative AI platforms like ChatGPT and associated plugins to demonstrate the ability to link hidden, malicious instructions within user prompts so that the platform performs action unintended by the user. In one example, a researcher embedded a hidden command in a YouTube video transcript. When ChatGPT was asked to summarize the video, the bot also told a joke that was not part of the video. Though the current examples are benign, experts believe that threat actors could exploit these vulnerabilities to carry out malicious prompt-injection attacks.
Threat actors can leverage these tools to achieve more sophisticated cyberattacks. Cybersecurity experts warn against Business Email Compromise (BEC) attacks and ransomware attacks made more advanced by AI. As tools like ChatGPT are natural language processing models, they enhance a hacker’s ability to sound more human when deploying BEC attacks and phishing scams. As Protocol points out, ransomware attacks are difficult to scale, but with AI advancements, threat actors may be able to automate processes and achieve wider ransomware attacks more quickly.
Yet, just as threat actors explore ways to exploit machine learning, “ethical hackers” invest time in ramping up threat detection technology. For example, while hackers may ask ChatGPT to write a phishing email, users can also ask the bot to review emails for phishing language.
What are the privacy concerns around generative AI?
While OpenAI collects user data typical for most online platforms, security experts highlight its vague privacy policies and question its ability to protect our personal data from threat actors, especially after ChatGPT experienced a “history bug” in March, which temporarily exposed users’ payment info.
OpenAI’s FAQ page also states that their employees review conversations on ChatGPT, and use the data to train the model. European regulators claim this could violate GDPR rules, which require explicit consent from users and a legal basis for storing user data.
In the healthcare industry, ChatGPT is being integrated into Electronic Health Records (EHR) platforms. Medical professionals are also exploring how to use generative AI to simplify tasks, like translating materials for patients and summarizing patients’ medical histories. This all raises concerns around HIPAA compliance as well as confidentiality, accuracy, and accountability in healthcare.
How can you use generative AI more mindfully?
With an awareness of generative AI’s capabilities and privacy shortcomings, here are ways to mitigate risk while using the platform.
- Individuals can request to have OpenAI delete their data and opt out of the tool using their data for training purposes.
- For developers, The Stack, a dataset of permissively licensed source code, gives GitHub users can fill out a Google form “to have their code removed from the dataset upon request.”
- Use a secure login when creating an account with OpenAI and any other generative AI platform.
- Be mindful of how you use content generated from large language models, and consider potential copyright infringement, ethical factors, and the accuracy of AI-generated content.
- Never share any proprietary or personal information with ChatGPT or other generative AI platforms, including sensitive code. If you wouldn’t post it openly online, don’t share it with AI tools.
- Be mindful of how you share your data online in general: think twice before sharing certain information on public forums like social media or unsecured platforms to limit malicious or unintended use of your data.
Thanks! You're subscribed. Be on the lookout for updates straight to your inbox.