Research AI infrastructure Startups Providers

It’s already 2025, and I’m just starting to research AI infrastructure providers. It feels a bit late. My main purpose in this research is to see if there are opportunities to participate in this AI wave.
AI infrastructure services are categorized into providing technical services and solutions, IaaS services (offering GPUs), PaaS, and SaaS services.
Service providers that want to offer technical services and solutions require strong marketing and sales teams. However, those providing IaaS, PaaS, and SaaS products do not need as many marketing and sales resources.
Major cloud providers are already well-established in offering AI-related services, but the emergence of new startups suggests new market opportunities. I am more interested in AI infrastructure startups catering to researchers, small businesses, and developers.
These companies can be classified into those offering IaaS services, SaaS services, and Proxy-based services.
1 Companies Focused on IaaS Services
Runpod:
- Provides CPU and GPU hourly billing, one-click deployment for Serverless endpoints, also billed hourly. Users can choose between Secure Cloud (high-reliability data centers) or Community Cloud (P2P community resources) for renting GPUs.
- Model: IaaS, PaaS
Hyperbolic:
- Offers GPU rentals, Serverless services, and a P2P GPU marketplace.
- Model: IaaS, SaaS
Vast.ai:
- The world’s largest P2P GPU marketplace, Serverless services (requires code integration with its autoscaler service).
- Model: IaaS, PaaS
Lambda Labs:
- Provides GPU cloud services and Inference API services (claims to be the world’s cheapest). Sells GPU hardware, technical support, and services.
- Model: IaaS, SaaS
2 Companies Focused on SaaS Services
Replicate:
- Provides Inference API services and custom model deployment.
- Model: PaaS, SaaS
Hugging Face:
- Offers a model repository (similar to GitHub), Transformers library, dataset hosting, AI application development tools, Inference API, custom model deployment, and model fine-tuning.
- Model: PaaS, SaaS
Deepinfra:
- Provides Inference API services and custom model deployment.
- Model: PaaS, SaaS
Groq:
- Offers GroqRack solutions (including software and hardware) with LPU-based cloud services (Inference API).
- Model: SaaS
Fireworks AI:
- Provides Inference API services, custom model deployment, and fine-tuning services.
- Model: PaaS, SaaS
Lepton AI:
- Offers Inference API services, custom model deployment, Dev Pods, and training task execution.
- Model: PaaS, SaaS
Fal.ai:
- Provides Inference API services and custom model deployment.
- Model: PaaS, SaaS
3 Proxy-Based Services
3.1 API Proxy
This type of platform does not provide Inference API services directly but routes user requests to different Inference API providers.
OpenRouter:
- Unified API interface: Offers an OpenAI API-compatible interface.
- Smart routing and load balancing: Automatically routes requests to the best provider based on price, performance, and availability.
3.2 Deployment Proxy
NVIDIA Brev:
- A cloud-based development environment platform for AI and machine learning developers, simplifying GPU instance setup and usage.
- Launchables: Provides pre-configured compute and software environments, enabling one-click deployment of AI tasks across various cloud platforms (AWS, GCP, Lambda-Labs).
- GPU Sandbox: Offers a complete virtual machine environment.
- CLI: Connects local tools to cloud instances.
SkyPilot:
- An open-source AI and batch workload management framework.
- Unified execution: Runs AI and batch tasks on any cloud or Kubernetes cluster without modifying code.
- Cost optimization: Automatically selects the cheapest region, cloud, or instance type (including Spot instances).
- High availability: Ensures access to scarce resources like GPU/TPU through automatic failover and resource scheduling.
- Lifecycle support: Covers the entire AI development process, including training, fine-tuning, inference, and online services.
4 Summary
AI infrastructure startups primarily focus on providing cloud servers, Inference API services, model development, and fine-tuning services, with some even offering proxy-based inference services. Additionally, (not mentioned above) there are MLOps platforms and model training platforms, which cater more to large enterprises and research institutions.
4.1 Competitive Advantages of Different Providers
Platform | Advantages |
---|---|
Runpod | Low cost, high flexibility, community cloud support |
Hyperbolic | High-value GPU, open-source model support, GPU rental |
Vast.ai | Ultra-low prices, decentralized GPU leasing |
Lambda Labs | High-performance GPU, fast cold start |
Replicate | High ease of use, quick deployment of open-source models |
Hugging Face | Large model ecosystem, easy inference hosting |
Deepinfra | Cloud-hosted large models, simple management |
Groq | Ultra-fast inference speed, LPU hardware optimization |
Fireworks AI | Multi-modal support, high-speed inference |
Lepton AI | Fast deployment, low-cost inference |
Fal.ai | Generative media, ultra-fast inference |
OpenRouter | Multi-model routing, no subscription fees, pay-as-you-go |
5 How to Enter the Market
This sector is already highly competitive. I have analyzed the feasibility of building an Inference API service using public cloud resources, but even maintaining a break-even point is challenging. The only way to reduce costs is to establish a private data center or leverage P2P idle GPUs. These startups are all raising capital, burning VC money to buy GPUs and build data centers.
As an individual without significant capital to acquire GPU resources, the only way to enter the AI infrastructure space is through software optimization—either optimizing costs, improving performance, or identifying niche demand to differentiate from competitors.
Building an Inference API service on public cloud infrastructure is possible, but public cloud GPU prices are high and performance is suboptimal. Optimizing costs to gain a competitive advantage is extremely difficult. Replicate’s pricing is significantly higher than others, which suggests it is built on public cloud services.
Thus, the proxy model is the lowest-cost and easiest way to enter this sector. However, many Inference API providers restrict payment options for Chinese users, even limiting virtual credit cards. The technical difficulty is not high, but the biggest challenge lies in solving overseas payment and marketing issues. Resolving overseas payment might require establishing a foreign company and addressing billing address and credit card issues.
Another potential approach is to build a community ecosystem to attract users—for example, creating open-source AI tools that non-professionals can use or integrating plugins for GitHub and AWS to allow enterprises to seamlessly integrate AI into existing workflows.
I am familiar with cloud-native infrastructure but lack expertise in machine learning, deep learning, and large models. Using cloud-native infrastructure to build AI infrastructure, cost optimization, and performance optimization could be key breakthrough points. Leveraging the open-source ecosystem is another opportunity. Without capital, the only way forward is through technological innovation.