Introducing a contextual framework to comprehensively assess the social and ethical risks of AI systems
Generative AI systems are already being used to write books, create graphic designs, help doctors, and are increasingly capable. Ensuring that these systems are developed and deployed responsibly requires carefully assessing the potential ethical and social risks they may pose.
In our new paper, we propose a three-tiered framework for assessing the social and ethical risks of AI systems. This framework includes assessments of AI system capacity, human interaction, and systemic impacts.
We also map the current state of security assessments and find three main gaps: context, specific risks, and multimodality. To help fill these gaps, we call for reorienting existing evaluation methods toward generative AI and implementing a holistic approach to evaluation, as in our case study on misinformation. This approach incorporates findings such as the likelihood that the AI system will provide factually incorrect information as well as insights into how people use that system and in what context. Multi-level assessments can draw conclusions beyond the capabilities of the model and indicate whether harm – in this case, misinformation – is actually occurring and spreading.
For any technology to work as intended, both social and technical challenges must be resolved. Thus, to better assess the security of AI systems, these different layers of context must be taken into account. Here, we build on previous research identifying Potential risks of large-scale language modelssuch as privacy leaks, task automation, misinformation, etc. – and introduce a way to comprehensively assess these risks in the future.
Context is key to assessing AI risks
The capabilities of AI systems are an important indicator of the broader types of risks that may arise. For example, AI systems that are more likely to produce factually inaccurate or misleading results may be more prone to creating misinformation risks, leading to problems such as lack of public trust.
Measuring these capabilities is at the heart of AI security assessments, but these assessments alone cannot guarantee the security of AI systems. Whether downstream harm manifests – for example, whether people come to hold false beliefs based on inaccurate model results – depends on context. More specifically, who uses the AI system and for what purpose? Is the AI system working as expected? Does this create unexpected externalities? All of these questions inform an overall assessment of the security of an AI system.
Expand beyond aptitude assessment, we propose an assessment that can assess two additional points where downstream risks manifest: human interaction at the point of use and systemic impact when an AI system is integrated into larger systems and widely deployed. Integrating assessments of a given risk of harm across these layers provides a comprehensive assessment of the safety of an AI system.
Human interaction the evaluation centers the experience of people using an AI system. How do people use the AI system? Does the system work as intended at the point of use, and how do experiences differ across demographics and user groups? Could we observe any unexpected side effects from using this technology or being exposed to its results?
Systemic impact The assessment focuses on the broader structures in which an AI system is embedded, such as social institutions, labor markets, and the natural environment. Evaluating at this level can highlight risks of harm that only become visible once an AI system is widely adopted.
Security assessments are a shared responsibility
AI developers must ensure that their technologies are developed and released responsibly. Public actors, such as governments, are responsible for ensuring public safety. As generative AI systems become increasingly widely used and deployed, ensuring their security is a responsibility shared between several stakeholders:
- AI developers are well placed to question the capabilities of the systems they produce.
- Application developers and designated public authorities are able to assess the functionality of different features and applications, as well as possible externalities for different user groups.
- General public actors are particularly well placed to predict and assess the societal, economic and environmental implications of new technologies, such as generative AI.
The three levels of assessment in the proposed framework are a matter of degree rather than being clearly divided. Although none of these is entirely the responsibility of a single actor, primary responsibility depends on who is best placed to carry out the assessments at each level.
Gaps in current safety assessments of generative multimodal AI
Given the importance of this additional context for assessing the safety of AI systems, it is important to understand the availability of such testing. To better understand the broader landscape, we have made extensive efforts to bring together assessments applied to generative AI systems as comprehensively as possible.
In mapping the current state of security assessments for generative AI, we discovered three key security assessment gaps:
- Context: Most security assessments consider generative AI system capabilities in isolation. Relatively little work has been done to assess potential risks at the level of human interaction or systemic impact.
- Specific risk assessments: Assessments of the capabilities of generative AI systems are limited in the risk areas they cover. For many risk areas, there are few assessments. Where they exist, assessments often operationalize harm narrowly. For example, representational harm is typically defined as stereotypical associations between a profession and different genders, leaving other instances of harm and areas of risk undetected.
- Multimodality: The vast majority of existing security assessments of generative AI systems focus solely on text output – large gaps remain for assessing harm risks in image, audio, or video modalities. This gap is only widening with the introduction of multiple modalities in a single model, such as AI systems capable of taking images as input or producing outputs interweaving audio, text and video. Although some text-based assessments can be applied to other modalities, new modalities introduce new ways in which risks can manifest. For example, the description of an animal is not harmful, but if the description is applied to the image of a person, it is.
We list links to publications detailing security assessments of generative AI systems, freely accessible via this repository. If you would like to contribute, please add reviews by filling out this form.
Putting more comprehensive assessments into practice
Generative AI systems are fueling a wave of new applications and innovations. To ensure that the potential risks associated with these systems are understood and mitigated, we urgently need rigorous and comprehensive assessments of the security of AI systems that consider how these systems can be used and integrated into the society.
A practical first step is to reorient existing assessments and leverage the large models themselves for assessment – although this has significant limitations. For a more comprehensive assessment, we also need to develop approaches to evaluate AI systems at the point of human interaction and their systemic impacts. For example, although the spread of misinformation via generative AI is a recent problem, we show that there are many existing methods for assessing public trust and credibility that could be repurposed.
Ensuring the security of widely used generative AI systems is a shared responsibility and priority. AI developers, public stakeholders, and other parties must collaborate and collectively build a thriving and robust assessment ecosystem for safe AI systems.