H2O.ai’s h2oGPTe Agent has once again secured the top position on the General AI Assistant (GAIA) benchmark, achieving a record-setting 79.7% accuracy, reinforcing its capability for complex enterprise tasks.
GAIA is a rigorous evaluation framework designed to measure the effectiveness of AI agents across more than 300 real-world tasks, including research, data analysis, document handling, and advanced reasoning. This benchmark is recognized as a key indicator of an AI system’s readiness for enterprise deployment, assessing its ability to perform high-effort, skilled tasks typically executed by humans. The current human-level performance on GAIA is measured at 92%. In comparison, general-purpose models from Google and Microsoft have scored below 50% on this benchmark. H2O.ai first attained the top GAIA ranking in December 2024 and has continued to lead in the domain of production-grade, sovereign AI solutions.
The improved performance of H2O.ai’s agent technology is attributed to several key enhancements. These include advanced browser navigation for precise information extraction, unified search capabilities across multiple sources such as Google and Wikipedia, and the integration of large language models like Google’s Gemini 2.5 Pro and Claude 3.7 Sonnet. The platform also features GitHub integration for navigating codebases and real-time source attribution, which enhances transparency during research processes.
Sri Ambati, CEO and Founder of H2O.ai, commented on the achievement, stating, “GAIA is fast becoming the barometer of enterprise intelligence, and at 79.7%, our agents aren’t just accurate, they’re adaptable.” He added, “Gemini sharpened our vision and multimodal skills, Claude boosted our reasoning and code understanding, and now we’re building toward an auto-agentic future, a framework where planning agents coordinate a series of task-specific power tools.” Ambati emphasized the practical applications, noting that “DeepResearch already gave hedge funds an edge in volatile markets, and in today’s shifting geopolitical landscape, scenario planning is not a luxury, it’s a necessity. Delivering all of this, on-prem, inside sovereign AI environments for governments and public institutions, that’s a game changer.”
H2O.ai’s agents are currently deployed in highly regulated environments to support mission-critical, task-specific operations. Global banks utilize these agents to streamline regulatory reporting and enhance fraud detection. Telecommunication providers deploy them to optimize call center operations, while public agencies leverage them to manage complex document workflows. The company offers a growing portfolio of vertical agents, which are prebuilt for specific industries like banking, telecom, and government, alongside a flexible agent builder framework that allows for the creation of custom agents on private data and internal systems.
Built on a multi-agent architecture, H2O.ai’s platform enables planning agents to coordinate specialized sub-agents across different departments, facilitating structured, speedy, and scalable operations. With integrated human-in-the-loop review, continuous learning capabilities, and built-in auditability, H2O agents are designed to meet stringent compliance requirements while accelerating decision-making and improving return on investment. As enterprises transition from AI pilots to full-scale production, H2O.ai provides modular, hardware-agnostic solutions that operate securely on private clouds, on-premise infrastructure, or air-gapped environments.
Founded in 2012, H2O.ai is an agentic AI company that integrates Generative and Predictive AI to assist enterprises and public sector agencies in developing purpose-built GenAI applications on their private data. The company focuses on Sovereign AI, ensuring secure, compliant, and infrastructure-flexible deployments that align with high standards of data privacy and control. H2O.ai’s open-source technology is utilized by over 20,000 organizations globally, including more than half of the Fortune 500, and is backed by $256 million in funding from investors such as Commonwealth Bank, NVIDIA, Goldman Sachs, Wells Fargo, Capital One, Nexus Ventures, and New York Life.