The UK AI Safety Institute, the UK’s recently established AI safety body, has released a set of tools designed to “enhance AI safety” by enabling industry, research organizations and for academia to more easily develop AI assessments.
Called Inspect, the set of tools, available under an open source license, specifically one MIT License — aims to evaluate certain capabilities of AI models, including the models’ basic knowledge and ability to reason, and generate a score based on the results.
In a press release announcing In the news on Friday, the AI Safety Institute claimed that Inspect marked “the first time an AI safety testing platform, run by a state-backed organization, has been released for wider use.” wide “.
“Successful collaboration on AI safety testing means having a shared and accessible approach to assessments, and we hope Inspect can be a building block,” Ian Hogarth, president of the AI Safety Institute, said in a statement. “We hope to see the global AI community use Inspect not only to conduct their own model safety testing, but also to help adapt and expand the open source platform so we can produce high-quality assessments at all levels.”
As we have already written, AI Benchmarks are hard — not least because today’s most sophisticated AI models are black boxes whose infrastructure, training data, and other key details are kept secret by the companies that create them. So how does Inspect meet this challenge? By being extensible and expandable to new testing techniques, mainly.
Inspect is made up of three basic components: datasets, solvers, and markers. Datasets provide samples for benchmark testing. Solvers do the work of running the tests. And graders evaluate the work of solvers and aggregate test scores into metrics.
Inspect’s built-in components can be supplemented via third-party packages written in Python.
In an article on
Clément Delangue, CEO of AI startup Hugging Face, floated the idea of integrating Inspect with Hugging Face’s model library or creating a public leaderboard with the results of the toolset’s evaluations.
Inspect’s release comes after a US government agency – the National Institute of Standards and Technology (NIST) – spear NIST GenAI, a program aimed at evaluating various generative AI technologies, including text and image-generating AI. NIST GenAI plans to publish benchmark tests, help create content authenticity detection systems, and encourage the development of software to detect false or misleading AI-generated information.
In April, the US and UK announced a partnership to jointly develop advanced testing of AI models, following commitments announced at the UK Congress. AI Security Summit at Bletchley Park in November last year. As part of this collaboration, the United States intends to launch its own AI Safety Institute, which will be largely responsible for assessing risks related to AI and generative AI.