A new so-called “reasoning” AI model, QwQ-32B-Preview, has arrived on the scene. It is one of the few to compete with OpenAI o1and it is the first available for download under a permissive license.
Developed by Alibaba’s Qwen team, QwQ-32B-Preview contains 32.5 billion parameters and can accommodate prompts of approximately 32,000 words in length; It performs better on some benchmarks than o1-preview and o1-mini, the two reasoning models that OpenAI has released so far. (Parameters roughly correspond to a model’s problem-solving skills, and models with more parameters generally perform better than those with fewer parameters. OpenAI does not disclose the number of parameters in its models.)
According to Alibaba’s tests, QwQ-32B-Preview beats OpenAI’s o1-preview model in AIME and MATH tests. AIME uses other AI models to evaluate a model’s performance, while MATH is a set of word problems.
QwQ-32B-Preview can solve logic puzzles and answer reasonably difficult math questions, thanks to its “reasoning” abilities. But it’s not perfect. Alibaba notes in a blog post that the model could switch languages unexpectedly, get stuck in loops, and underperform on tasks that require “common sense reasoning.”
Unlike most AI, QwQ-32B-Preview and other reasoning models effectively check facts themselves. This helps them avoid some of the traps which normally trip up models, the downside being that they often take longer to find solutions. Similar to o1, QwQ-32B-Preview reasons through tasks, plans ahead, and performs a series of actions that help the model find answers.
QwQ-32B-Preview, which can be run and downloaded from the Hugging Face AI development platform, appears to be similar to the recently released one. Deep search model of reasoning insofar as it approaches certain political subjects lightly. Alibaba and DeepSeek, as Chinese companies, are subject to comparative analysis by China’s internet regulator to ensure their models’ responses “embody core socialist values.” A lot Chinese AI systems refuse to respond to topics that might anger regulators, such as speculation about Xi Jinping diet.
To the question “Is Taiwan part of China?” ”, QwQ-32B-Preview responded that this was the case (and “inalienable” as well) – a perspective out of step with most of the world, but consistent with that of China’s ruling party. Prompts about Tiananmen Squareas for him, gave rise to a non-response.
QwQ-32B-Preview is “openly” available under an Apache 2.0 license, which means it can be used for commercial applications. But only certain components of the model have been released, making it impossible to replicate QwQ-32B-Preview or gain insight into the system’s inner workings. The “openness” of AI models is not a settled issue, but there is a general continuum from most closed (API access only) to most open (model, weights, disclosed data) and this is somewhere in the middle.
The increased focus on reasoning models comes as the viability of “scaling laws”, long-standing theories that feeding more data and computing power to a model would continually increase its capabilities, is questioned. subjected to careful scrutiny. A gust news reports suggest that models from major AI labs, including OpenAI, Google, and Anthropic, are not improving as dramatically as before.
This has led to a rush towards new AI development approaches, architectures and techniques, one of which is calculation of test time. Also known as inference computing, test-time computing essentially gives models extra processing time to complete tasks and underpins models like o1 and QwQ-32B-Preview. .
Besides OpenAI, major Chinese labs and companies are betting that test-time computing is the future. According to a recent report from The Information, Google has expanded an internal team focused on reasoning models to around 200 people and added substantial computing power to the effort.