We’ve made great strides in the foundations of ML, with extensive work on algorithms, efficiency, data, and privacy. We’ve improved ML efficiency with pioneering techniques that reduce LLM inference times, which have been implemented in Google products and adopted across the industry. Our research on waterfalls presents a method for exploiting smaller models for “easy” results while our novel speculative decoding The algorithm calculates multiple tokens in parallel, speeding up output generation by approximately 2x to 3x without affecting quality. As a result, LLMs that power conversational products can generate responses much faster. This equates to a significantly improved user experience and makes AI more computationally and energy efficient. We build on this work with refinement of the project And block verification. We also looked at new ways to improve LLM reasoning skills through break tokens — increased reasoning power could make smaller models more powerful, leading to significant efficiency gains. We explored the algorithmic efficiency of transformers and designed PolySketchAncient, HyperAttentionAnd Selective attentionthree new attention mechanisms, to address computational challenges and bottlenecks in language model deployment and to improve model quality.
Our teams have made considerable additional progress, notably in researching principles carryover algorithms with several experts and a general two-step adjustment carryover algorithm. OUR RL imitation learning algorithm for compiler optimization leads to significant savings and reduction in binary file size; our research on multi-objective reinforcement learning from human feedback, the Conditional language policy frameworkprovided a principled solution with a key trade-off between quality and reality and significant computational savings; and work on learning in context provided a mechanism for efficient learning on samples for sparse retrieval tasks.
Data is another essential element of ML. To support ML research across the ecosystem, we have published and contributed to various datasets. Croissantfor example, is a metadata format designed for the specific needs of ML data, which we designed in collaboration with industry and academia. We have developed sensitivity samplinga data sampling technique for foundation models, and proven that it is an optimal data sampling strategy for classic clustering problems such as k-means. We have advanced our research on scalable clustering algorithms, and open source a parallel graph clustering library, providing state-of-the-art results on graphs with billions of edges on a single machine. The rapid proliferation of domain-specific machine learning models highlights a key challenge: even if these models excel in their respective domains, their performance often varies significantly across various applications. To solve this problem, our research developed a principle algorithm by presenting the problem as a multi-source domain adaptation task.
Google Research is deeply engaged in privacy research and has made important contributions in this area. Our work on differentially private model training highlights the importance of rigorous analysis and implementation of privacy-preserving ML algorithms to ensure robust protection of user data. We have supplemented these analyzes with more effective analyzes algorithms for training and new methods for auditing implementations, which we open source for the community. In our research on learning from aggregated data, we introduced a new approach to construction of an aggregation datasets and explored various algorithmic aspects of model learning from aggregated data, which allowed optimistic sample complexity rate in this context. We also designed new methods to generate differentially private synthetic data — artificial data offering strong privacy protection, while having the characteristics required for training predictive models.
As we push the boundaries of what can be achieved in IT optimization, this has significant implications for the global economy. Take linear programming (LP), a fundamental computational method that informs data-driven decision-making and has many applications in areas such as manufacturing and transportation. We introduced PDLPwhich requires less memory, is more compatible with modern computing techniques, and greatly increases LP resolution capabilities. He received the prestigious Beale — Orchard-Hays Prize and is now available under Google’s open source program OR Tools. We announced our Shipping Network Design APIa great example of PDLP use case, to optimize shipping of goods. This enables greener and more cost-effective solutions to supply chain challenges, with the ability for shipping networks to deliver 13% more containers with 15% fewer vessels. We introduced Times-FMalso, for more accurate time series forecasting, a type of forecasting widely used in fields such as retail, manufacturing, and finance. This basic decoder-only model was pre-trained on 100 billion real time points, largely using data from Google Trends and Wikipedia page views, and outperformed even powerful deep learning models that were trained on the target time series.