Deepgram, a real-time API platform for the Voice AI economy, announced it has raised $130 million in Series C funding, reaching a valuation of $1.3 billion. The round, led by AVP, an independent global investment platform, is intended to support the company’s efforts to deliver advanced Voice AI models and infrastructure.
Major existing investors, including Alkeon, In-Q-Tel, Madrona, Tiger, Wing, Y Combinator, and funds and accounts managed by BlackRock, participated in the round. New investors include Alumni Ventures, Princeville Capital, Twilio, ServiceNow Ventures, SAP, and Citi Ventures. Academic institutions such as the University of Michigan and Columbia University also invested, joining existing academic investor Stanford University.
Scott Stephenson, CEO and Co-Founder of Deepgram, highlighted the emergence of the Voice AI economy and Deepgram’s role in powering it. He added that the company’s vision involves pioneering end-to-end deep learning for voice and achieving the Audio Turing Test at scale by 2026.
Elizabeth de Saint-Aignan, General Partner at AVP, stated that AVP chose to lead the round due to its expertise in scaling category-defining companies globally and its ability to support Deepgram’s international expansion, particularly into Europe and other key markets. She drew a parallel between Deepgram’s potential to underpin the emerging trillion-dollar B2B Voice AI economy and Stripe’s role in the payments economy.
Deepgram’s APIs currently power Voice AI functionality for over 1,300 organizations, serving as a foundational infrastructure layer for real-time speech understanding, speech generation, analytics, orchestration, and fully autonomous voice agents. Andy O’Dower, Vice President of Product Management for Voice and Video at Twilio, noted that the combination of Twilio’s orchestration capabilities and Deepgram APIs delivers low-latency AI agent experiences crucial for the Voice AI market.
Deepgram’s product suite includes Aura-2, a text-to-speech model; Nova-3, a speech-to-text model; Flux, a Conversational Speech Recognition model designed for handling interruptions in voice agents; Voice Agent API, a conversational AI API; and Saga, a Voice OS. These models can be customized for specific terminology and acoustic environments and deployed as cloud APIs or through self-hosted and on-premises options.
In conjunction with the funding, Deepgram announced the acquisition of OfOne, an AI-native voice platform developed for restaurants and the quick-service drive-thru market. OfOne had demonstrated over 95% containment with high employee satisfaction scores for national QSR brands. The OfOne team has joined Deepgram, and its technology forms the basis for Deepgram for Restaurants, an offering aimed at improving customer experience, increasing order accuracy, and supporting staff with real-time AI assistance.
The new funding will also accelerate the expansion of Deepgram’s intellectual property, building on a patent portfolio filed since 2016. Several U.
S. patents are expected to be granted in 2025, including US 12,380,880 for End-to-End Automatic Speech Recognition With Transformer, US 12,334,075 for Hardware-Efficient Automatic Speech Recognition, and US 12,499,875 for Deep Learning Internal State Index-Based Search and Classification. These patents are intended to reinforce Deepgram’s position in deep learning architecture, representation learning, and deployment efficiency.
Deepgram is also establishing a new Voice AI Collaboration Hub in San Francisco. This space is designed to foster collaboration with customers, partners, and developers through working sessions, demonstrations, briefings, meetups, and hackathons.
AVP, established in 2016, manages over €2.5 billion in assets across venture, early growth, growth, and fund of funds strategies, with investments in more than 60 technology companies and over 60 funds. Deepgram’s platform offers speech-to-text, text-to-speech, and full speech-to-speech capabilities, utilized by over 200,000 developers for its accuracy, low latency, and pricing. The company states it has processed over 50,000 years of audio and transcribed over 1 trillion words.