Artificial intelligence (AI) models that operate across different media types and domains – called “multimodal AI” – can be used by attackers to create convincing scams. At the same time, defenders find multimodal AI equally useful in detecting fraudulent emails and non-safe for work (NSFW) documents.
An extended language model (LLM) can accurately classify previously unseen samples of emails impersonating different brands with greater than 97% accuracy, measured by a metric known as the F1 score, researchers say from cybersecurity company Sophos, who presented their findings at the Virus Bulletin Conference on October 4. While existing email security and content filtering systems can detect messages using previously encountered brands, multimodal AI systems can identify the latest attacks, even if the system is not trained on samples similar emails. .
While this approach is unlikely to be a feature of email security products, it could be used as a last-step filter by security analysts, says Ben Gelman, a senior data scientist at Sophos, who joined others cybersecurity companies: like Google, Microsoft and Simbianexploring new ways to use LLMs and other generative AI models to augment and assist security analysts and to help speed up incident response.
“AI and cybersecurity are merging, and this whole AI-generated attack/defense (approach) is going to become natural in the cybersecurity space,” he says. “It’s a force multiplier for our analysts. We have a number of projects where we support our SOC analysts with AI-based tools, and the goal is to make them more efficient and give them all this knowledge and confidence at your fingertips.”
Understanding attacker tactics
Attackers have also started using LLMs to improve their email lures and attack code. Microsoft, Google and OpenAI have all warned that state groups appear to be using these public LLMs for various tasks, such as creating spear phishing lures and code snippets used to scrape websites.
As part of their research, the Sophos team created a platform to automate the launch of an e-commerce scam campaignor “campaigns,” to understand what types of attacks might be possible with multimodal generative AI. The platform consisted of five different AI agents: a data agent for generating information about products and services, an image agent for creating images, an audio agent for all sound needs, an image agent for user interface to create custom code and an advertising agent to create marketing materials. The personalization potential of automated spear phishing and ChatGPT scam campaigns could lead to large-scale microtargeting campaigns, Sophos researchers said in their October 2 analysis.
“(We) can see that these techniques are particularly creepy because users may interpret the most effective microtargeting as a chance coincidence,” the researchers said. “Previously, spear phishing required dedicated manual effort, but with this new automation, it is possible to achieve personalization on a scale never seen before. »
That said, Sophos has yet to encounter this level of AI use in the wild.
Defenders should expect AI-assisted cyberattackers to have higher-quality social engineering techniques and faster innovation cycles, says Anand Raghavan, vice president of AI engineering at Cisco Security.
“It’s not just about the quality of emails, but also the ability to automate that has increased dramatically since the advent of GPT and other AI tools,” he says. “Attackers have not only improved incrementally, but exponentially.”
Beyond Keyword Matching
Using LLM to process emails and turn them into text descriptions leads to better accuracy and can help analysts process emails that might otherwise have gone unnoticed, said Younghoo Lee, principal data scientist at the Sophos AI group: in research presented at the Virus Bulletin conference.
“(O)ur multi-modal AI approach, which leverages both text and image inputs, offers a more robust solution for detecting phishing attempts, especially against invisible threats,” he said. he declared in the document accompanying his presentation. “Using text and image features was most effective” when dealing with multiple brands.
The ability to process the context of the text in the email increases the multimodal ability to “understand” words and context from images, allowing for a more complete understanding of an email, says Cisco’s Raghavan. The ability of LLMs to focus not only on identifying suspicious language, but also on dangerous contexts (such as emails prompting a user to take a business-critical action) makes them very useful in facilitating analysis, he said.
Any attempt to compromise workflows related to money, credentials, sensitive data, or confidential processes should be reported.
“Language as a classifier also allows us to reduce false positives by identifying what we call critical workflows,” says Raghavan. “If an attacker wants to compromise your organization, there are four critical workflow types, (and) language is the predominant indicator that allows us to determine (whether) an email is of concern or not.”
So why not use LLMs everywhere? Cost, says Sophos’ Gelman.
“Relying on LLMs to do anything at scale is usually way too expensive compared to the gains you get,” he says. “One of the challenges of multimodal AI is that every time you add a mode like images, you need a lot more data, you need a lot more training time, and when text models and image are in conflict, you need a better model and potentially better training” to decide between the two.