There is so far no foolproof method to protect artificial intelligence systems from misdirection, a US standards body warns, and AI developers and users should be wary of those who claim otherwise.
Caution comes from the US National Institute of Standards and Technology (NIST) in a new guidance for application developers on vulnerabilities in predictive and generative AI and machine learning (ML) systems, types attacks they can expect and approaches to mitigate them. .
“Adversaries can deliberately confuse or even “poison” artificial intelligence (AI) systems to make them malfunction – and there is no foolproof defense for their developers,” NIST explains.
The document, titled Adversarial Machine Learning: A Taxonomy and Terminology of Attacks and Mitigation (NIST.AI.100-2), is part of NIST’s efforts to support the development of trustworthy AI. It can also help put NIST’s AI risk management framework into practice.
A major problem is that the data used to train AI systems may be unreliable, according to NIST. Data sources can be websites and public interactions. There are many opportunities for malicious actors to corrupt this data, both during the training period of an AI system and afterward, as the AI continues to refine its behaviors by interacting with the physical world. This may cause the AI to operate undesirably. Chatbots, for example, can learn to respond with abusive or racist language when their guardrails are circumvented by carefully crafted malicious prompts.
“For the most part, software developers need more people to use their product so that it can improve with exposure,” said NIST computer scientist Apostol Vassilev, one of the publication’s authors. , in a press release. “But there is no guarantee that the exposure will be good. A chatbot can spread misinformation or toxic information when prompted in carefully crafted language.
Partly because the datasets used to train AI are far too large for people to successfully monitor and filter, there is not yet a foolproof way to protect AI from misdirection . To help the developer community, the new report provides insight into the types of attacks AI products could experience and corresponding approaches to reduce the damage.
They understand
– escape attacks, which occur after an AI system is deployed, attempt to modify an input to change how the system responds to it. Examples could include adding markings to stop signs so that an autonomous vehicle misinterprets them as speed limit signs or creating confusing lane markings to cause the vehicle to swerve of the road ;
— poisoning attacks, which occur in the training phase by introducing corrupted data. An example would be inserting many instances of inappropriate language into conversation recordings, so that a chatbot interprets these instances as common enough language to use in its own interactions with customers;
– confidentiality The attacks, which occur during deployment, are attempts to obtain sensitive information about the AI or the data it was trained on for misuse. An adversary can ask a chatbot many legitimate questions, then use the answers to reverse engineer the model to find its weak points – or guess its sources. Adding unwanted examples to these online sources could cause the AI to behave inappropriately, and getting the AI to unlearn these specific unwanted examples after the fact may be difficult;
– abuse attacks, which involve inserting incorrect information into a source, such as a web page or online document, which an AI then absorbs. Unlike poisoning attacks, abuse attacks attempt to feed the AI with incorrect information from a legitimate but compromised source in order to repurpose the AI system’s intended use.
“Most of these attacks are fairly easy to mount and require minimal knowledge of the AI system and limited adversarial capabilities,” said Alina Oprea, report co-author and professor at Northeastern University. “Poisoning attacks, for example, can be staged by monitoring a few dozen training samples, which would be a very small percentage of the entire training set.”
Many mitigation measures focus on sanitizing data and models. However, the report adds, they should be combined with cryptographic techniques for attestation of the origin and integrity of AI systems. Red teaming (creating an internal team to attack a system) as part of pre-deployment testing and evaluating AI systems to identify vulnerabilities is also vital, the report said.
On the other hand, the report also admits that the lack of reliable benchmarks can be a problem in assessing the actual performance of proposed mitigation measures.
“Given the multitude of powerful attacks, designing appropriate mitigation measures is a challenge.
this needs to be addressed before deploying AI systems in critical areas,” the report said.
This challenge, he notes, is exacerbated by the lack of secure machine learning algorithms for many tasks. “This implies that at present, the design of mitigation measures is an inherently ad hoc and fallible process,” the report says.
The report also indicates that developers and buyers of AI systems will have to accept certain trade-offs: indeed, the reliability of an AI system depends on all the attributes that characterize it, notes the report. For example, an AI system that is accurate but easily susceptible to exploitation by adversaries is unlikely to be trustworthy. Conversely, an AI system optimized for adversary robustness may exhibit lower accuracy and worse fairness outcomes.
“In most cases, organizations will have to accept trade-offs between these properties and
decide which of these to prioritize based on the AI system, use case, and potentially many other considerations regarding the economic, environmental, social, cultural, political, and global implications of AI technology.
Joseph Thacker, principal AI engineer and security researcher at AppOmni, called the report “the best publication on AI security I have seen.” What’s most notable is the depth and coverage. This is the most in-depth content on adversarial attacks on AI systems that I have come across. It covers the various forms of rapid injection, elaborating and giving terminology for components that were previously not well labeled. He even references prolific real-world examples like the DAN (Do Anything Now) jailbreak and amazing rapid indirect injection works. It includes several sections covering potential mitigations, but it is clear that this is not yet a resolved issue.
“It also covers the debate between open and closed models. There is a useful glossary at the end, which I personally plan to use as additional “context” for large language models when writing or researching AI security. This will ensure that the LLM and I are working with the same definitions specific to this field. Overall, I think this is the most successful overall content covering AI security.