Shapefin

Protege Secures $30 Million Series A Extension Led by Andreessen Horowitz to Scale AI Data Platform

Share It:

Protege, an AI data platform specializing in providing access to trusted, real-world datasets at scale, has announced a $30 million Series A funding round led by Andreessen Horowitz (a16z). This new capital expands the company’s initial $25 million Series A from August 2025, bringing its total funding to $65 million since its founding in 2024. Returning investors in the round include Footwork, CRV, Bloomberg Beta, Flex Capital, and Shaper Capital.

Protege’s platform is designed to address the growing demand for real-world data while navigating the challenges of data fragmentation. The company serves as a trusted source of curated, AI-ready data, simultaneously enabling new revenue streams for data providers through licensing agreements. Protege offers technical expertise for curating, creating, and optimizing datasets for AI applications, supporting access to private and proprietary data across various domains and formats, such as media content, audio recordings, de-identified health records, and medical imaging.

Bobby Samuels, CEO and co-founder of Protege, stated, “Across industries, we’re seeing demand for real-world data grow faster than the market’s ability to supply it responsibly. At the same time, data is highly fragmented, and neither data holders nor AI builders are set up to operationalize it at scale. Protege serves as a trusted source of curated, and AI-ready data while unlocking new revenue streams for data providers. Partnering with Andreessen Horowitz allows us to scale this model and deliver high-quality, use-case-specific data that AI research teams can trust.”

Protege collaborates with AI companies and institutions globally, including a majority of the “Magnificent Seven,” to support training and evaluation workflows for next-generation AI systems. In 2025, the company expanded its data partner network to hundreds of organizations, facilitating aggregated access to new data sources and formats. Protege curates datasets from this network to fulfill AI development requirements and provides revenue share payouts to data partners for each use.

Travis May, Chairman and co-founder of Protege, who previously served as CEO of Datavant and LiveRamp, highlighted the market need. “Access to data is the biggest bottleneck to the advancement of AI,” May commented. “The next phase of AI will be driven by real-world, proprietary data generated through everyday human activity. Protege is pioneering ways to safely access this information across data sources and compensate data owners to unlock AI’s potential.”

Daisy Wolf, Partner at Andreessen Horowitz, acknowledged Protege’s market position. “The next era of AI will be shaped by who can responsibly unlock access to the world’s most valuable data,” Wolf noted. “Protege has built a platform that respects the complexity of real-world data across industries while making it usable for modern AI development. Their momentum reflects a broader shift in the market, and we’re proud to support the team as they scale this critical layer of the AI ecosystem.”

The new capital will be allocated to accelerate product development, significantly expand Protege’s data network into additional domains and data formats, deepen existing partnerships with leading institutions, and scale the team and infrastructure necessary to deliver AI-ready and rights-protected access to real-world data.

Latest Posts