Research AI infrastructure Startups Providers

xiaoqing included in category Research

2025-03-09 2025-03-09 976 words 5 minutes

This research explores AI infrastructure providers and their business models, categorizing them into IaaS, PaaS, SaaS, and proxy-based services. It analyzes key players, their offerings, and competitive advantages, highlighting opportunities for independent entry into this highly competitive market. The article also discusses challenges such as high GPU costs, overseas payment restrictions, and potential strategies like cost and performance optimization, proxy models, and leveraging open-source ecosystems. The goal is to identify feasible ways for individuals without significant capital to participate in the AI infrastructure space. Let me know if you'd like any refinements!

Contents

It’s already 2025, and I’m just starting to research AI infrastructure providers. It feels a bit late. My main purpose in this research is to see if there are opportunities to participate in this AI wave.

AI infrastructure services are categorized into providing technical services and solutions, IaaS services (offering GPUs), PaaS, and SaaS services.

Service providers that want to offer technical services and solutions require strong marketing and sales teams. However, those providing IaaS, PaaS, and SaaS products do not need as many marketing and sales resources.

Major cloud providers are already well-established in offering AI-related services, but the emergence of new startups suggests new market opportunities. I am more interested in AI infrastructure startups catering to researchers, small businesses, and developers.

These companies can be classified into those offering IaaS services, SaaS services, and Proxy-based services.

1 Companies Focused on IaaS Services

Runpod:

Provides CPU and GPU hourly billing, one-click deployment for Serverless endpoints, also billed hourly. Users can choose between Secure Cloud (high-reliability data centers) or Community Cloud (P2P community resources) for renting GPUs.
Model: IaaS, PaaS

Hyperbolic:

Offers GPU rentals, Serverless services, and a P2P GPU marketplace.
Model: IaaS, SaaS

Vast.ai:

The world’s largest P2P GPU marketplace, Serverless services (requires code integration with its autoscaler service).
Model: IaaS, PaaS

Lambda Labs:

Provides GPU cloud services and Inference API services (claims to be the world’s cheapest). Sells GPU hardware, technical support, and services.
Model: IaaS, SaaS

2 Companies Focused on SaaS Services

Replicate:

Provides Inference API services and custom model deployment.
Model: PaaS, SaaS

Hugging Face:

Offers a model repository (similar to GitHub), Transformers library, dataset hosting, AI application development tools, Inference API, custom model deployment, and model fine-tuning.
Model: PaaS, SaaS

Deepinfra:

Provides Inference API services and custom model deployment.
Model: PaaS, SaaS

Groq:

Offers GroqRack solutions (including software and hardware) with LPU-based cloud services (Inference API).
Model: SaaS

Fireworks AI:

Provides Inference API services, custom model deployment, and fine-tuning services.
Model: PaaS, SaaS

Lepton AI:

Offers Inference API services, custom model deployment, Dev Pods, and training task execution.
Model: PaaS, SaaS

Fal.ai:

Provides Inference API services and custom model deployment.
Model: PaaS, SaaS

3 Proxy-Based Services

3.1 API Proxy

This type of platform does not provide Inference API services directly but routes user requests to different Inference API providers.

OpenRouter:

Unified API interface: Offers an OpenAI API-compatible interface.
Smart routing and load balancing: Automatically routes requests to the best provider based on price, performance, and availability.

3.2 Deployment Proxy

NVIDIA Brev:

A cloud-based development environment platform for AI and machine learning developers, simplifying GPU instance setup and usage.
Launchables: Provides pre-configured compute and software environments, enabling one-click deployment of AI tasks across various cloud platforms (AWS, GCP, Lambda-Labs).
GPU Sandbox: Offers a complete virtual machine environment.
CLI: Connects local tools to cloud instances.

SkyPilot:

An open-source AI and batch workload management framework.
Unified execution: Runs AI and batch tasks on any cloud or Kubernetes cluster without modifying code.
Cost optimization: Automatically selects the cheapest region, cloud, or instance type (including Spot instances).
High availability: Ensures access to scarce resources like GPU/TPU through automatic failover and resource scheduling.
Lifecycle support: Covers the entire AI development process, including training, fine-tuning, inference, and online services.

4 Summary

AI infrastructure startups primarily focus on providing cloud servers, Inference API services, model development, and fine-tuning services, with some even offering proxy-based inference services. Additionally, (not mentioned above) there are MLOps platforms and model training platforms, which cater more to large enterprises and research institutions.

4.1 Competitive Advantages of Different Providers

Platform	Advantages
Runpod	Low cost, high flexibility, community cloud support
Hyperbolic	High-value GPU, open-source model support, GPU rental
Vast.ai	Ultra-low prices, decentralized GPU leasing
Lambda Labs	High-performance GPU, fast cold start
Replicate	High ease of use, quick deployment of open-source models
Hugging Face	Large model ecosystem, easy inference hosting
Deepinfra	Cloud-hosted large models, simple management
Groq	Ultra-fast inference speed, LPU hardware optimization
Fireworks AI	Multi-modal support, high-speed inference
Lepton AI	Fast deployment, low-cost inference
Fal.ai	Generative media, ultra-fast inference
OpenRouter	Multi-model routing, no subscription fees, pay-as-you-go

5 How to Enter the Market

This sector is already highly competitive. I have analyzed the feasibility of building an Inference API service using public cloud resources, but even maintaining a break-even point is challenging. The only way to reduce costs is to establish a private data center or leverage P2P idle GPUs. These startups are all raising capital, burning VC money to buy GPUs and build data centers.

As an individual without significant capital to acquire GPU resources, the only way to enter the AI infrastructure space is through software optimization—either optimizing costs, improving performance, or identifying niche demand to differentiate from competitors.

Building an Inference API service on public cloud infrastructure is possible, but public cloud GPU prices are high and performance is suboptimal. Optimizing costs to gain a competitive advantage is extremely difficult. Replicate’s pricing is significantly higher than others, which suggests it is built on public cloud services.

Thus, the proxy model is the lowest-cost and easiest way to enter this sector. However, many Inference API providers restrict payment options for Chinese users, even limiting virtual credit cards. The technical difficulty is not high, but the biggest challenge lies in solving overseas payment and marketing issues. Resolving overseas payment might require establishing a foreign company and addressing billing address and credit card issues.

Another potential approach is to build a community ecosystem to attract users—for example, creating open-source AI tools that non-professionals can use or integrating plugins for GitHub and AWS to allow enterprises to seamlessly integrate AI into existing workflows.

I am familiar with cloud-native infrastructure but lack expertise in machine learning, deep learning, and large models. Using cloud-native infrastructure to build AI infrastructure, cost optimization, and performance optimization could be key breakthrough points. Leveraging the open-source ecosystem is another opportunity. Without capital, the only way forward is through technological innovation.