Today we are announcing a Claudius 3.5 Improved Sonnetand a new model, Claude 3.5 Haiku. The upgraded Claude 3.5 Sonnet offers overall improvements over its predecessor, with particularly significant gains in coding, an area in which it already led the pack. Claude 3.5 Haiku matches the performance of Claude 3 Opus, our previous largest model, on many benchmarks for the same cost and similar speed to the previous generation of Haiku.
We’re also introducing a revolutionary new feature in public beta: computer use. Available today on the APIdevelopers can ask Claude to use computers the way people do: by looking at a screen, moving a cursor, clicking buttons, and typing text. Claude 3.5 Sonnet is the first frontier AI model to offer desktop use in public beta. At this stage, it is still experimental-sometimes tedious and error prone. We are publishing PC usage early to gather developer feedback and hope that capacity will improve quickly over time.
Asana, Canva, Cognition, DoorDash, Replit, and The Browser Company have already begun exploring these possibilities, performing tasks that require dozens, if not hundreds, of steps. For example, Replit uses Claude 3.5 Sonnet’s computer usage and user interface navigation capabilities to develop a key feature that evaluates applications as they are created for their Replit Agent product.
Claude 3.5 Sonnet enhanced version is now available for all users. Starting today, developers can build with the IT Use Beta on the Anthropic API, Amazon Bedrock, and Vertex AI from Google Cloud. The new Claude 3.5 Haiku will be released later this month.
Claude 3.5 Sonnet: Cutting-edge software engineering skills
The update Claudius 3.5 Sonnet shows wide-ranging improvements over industry benchmarks, with particularly large gains in agent coding and tool usage tasks. Concerning coding, this improves performance on Verified SWE Bench from 33.4% to 49.0%, a score higher than all publicly available models, including reasoning models like OpenAI o1-preview and specialized systems designed for agent coding. It also improves performance on TAU bencha task of using agentic tools, from 62.6% to 69.2% in the retail domain, and from 36.0% to 46.0% in the more complex airline domain. The new Claude 3.5 Sonnet delivers these advancements at the same price and speed as its predecessor.
Early customer feedback suggests that the Claude 3.5 Sonnet upgrade represents a significant step forward for AI-driven coding. GitLab, which tested the model for DevSecOps tasks, found that it delivered stronger reasoning (up to 10% across all use cases) without additional latency, making it an ideal choice for powering multi-step software development process. Cognition uses the new Claude 3.5 Sonnet for standalone AI assessments and has seen substantial improvements in coding, planning and problem solving compared to the previous version. The Browser Company, using the model to automate web-based workflows, noted that Claude 3.5 Sonnet outperformed all previously tested models.
As part of our ongoing partnering efforts with external experts, joint pre-deployment testing of the new Claude 3.5 Sonnet model was conducted by the US AI Safety Institute (US AISI) and the UK Safety Institute (UK AISI).
We also evaluated the enhanced Sonnet Claude 3.5 for catastrophic risks and found that the ASL-2 standard, as described in our Responsible Scaling Policyremains appropriate for this model.
Claude 3.5 Haiku: state of the art meets affordability and speed
Claude 3.5 Haiku is the next generation of our fastest model. For the same cost and similar speed to Claude 3 Haiku, Claude 3.5 Haiku improves in all skill areas and even outperforms Claude 3 Opus, the largest model of our previous generation, on many intelligence criteria. Claude 3.5 Haiku is particularly strong in coding tasks. For example, it scores 40.6% on SWE-bench Verified, outperforming many agents using publicly available state-of-the-art models, including the original Claude 3.5 Sonnet and GPT-4o.
With low latency, improved instruction tracking, and more precise tool usage, Claude 3.5 Haiku is well-suited for user-facing products, specialized sub-agent tasks, and generating personalized experiences from huge volumes of data, such as purchase history, prices or inventory. records.
Claude 3.5 Haiku will be available later this month on our proprietary API, Amazon Bedrock and Google Cloud’s Vertex AI, initially as a text-only template, followed by image input.
Teaching Claude to navigate the computer responsibly
With the use of the computer we are trying something fundamentally new. Instead of creating specific tools to help Claude complete individual tasks, we teach him general IT skills, allowing them to use a wide range of standard tools and software designed for people. Developers can use this emerging capability to automate repetitive processes, build and test softwareAnd carry out open-ended tasks like research.
To make these soft skills possible, we built an API that allows Claude to perceive and interact with computer interfaces. Developers can integrate this API to allow Claude to translate instructions (for example, “use data from my computer and online to fill out this form”) into computer commands (for example, check a spreadsheet; move the cursor to open a web browser; access relevant web pages; OSWorldwhich evaluates the ability of AI models to use computers like people do, Claude 3.5 Sonnet scored 14.9% in the screenshots-only category, significantly higher than the score of 7 .8% of the second best AI system. When given more steps to complete the task, Claude scored 22.0%.
Although we expect this ability to improve rapidly over the coming months, Claude’s current ability to use computers is imperfect. Some actions that people perform effortlessly (scrolling, dragging, zooming) currently present challenges for Claude and we encourage developers to start exploring with low-risk tasks. As computer use may provide a new vector for more familiar threats such as spam, disinformation or fraud, we take a proactive approach to promote its safe deployment. We have developed new classifiers that can identify when computer use is being used and whether harm is occurring. You can learn more about the research process behind this new skill, as well as the safety measures, in our article on develop computer use.
Looking to the future
Learning from early deployments of this technology, which is still in its infancy, will help us better understand both the potential and implications of increasingly capable AI systems.
We’re excited for you to explore our new models and public beta of computer use – and welcome you share your comments with us. We think these developments will open up new possibilities in the way you work with Claude, and we can’t wait to see what you create.