Research AI infrastructure Startups Providers

It’s already 2025, and I’m just starting to research AI infrastructure providers. It feels a bit late. My main purpose in this research is to see if there are opportunities to participate in this AI wave.

AI infrastructure services are categorized into providing technical services and solutions, IaaS services (offering GPUs), PaaS, and SaaS services.

Service providers that want to offer technical services and solutions require strong marketing and sales teams. However, those providing IaaS, PaaS, and SaaS products do not need as many marketing and sales resources.

Major cloud providers are already well-established in offering AI-related services, but the emergence of new startups suggests new market opportunities. I am more interested in AI infrastructure startups catering to researchers, small businesses, and developers.

These companies can be classified into those offering IaaS services, SaaS services, and Proxy-based services.

Runpod:

  • Provides CPU and GPU hourly billing, one-click deployment for Serverless endpoints, also billed hourly. Users can choose between Secure Cloud (high-reliability data centers) or Community Cloud (P2P community resources) for renting GPUs.
  • Model: IaaS, PaaS

Hyperbolic:

  • Offers GPU rentals, Serverless services, and a P2P GPU marketplace.
  • Model: IaaS, SaaS

Vast.ai:

  • The world’s largest P2P GPU marketplace, Serverless services (requires code integration with its autoscaler service).
  • Model: IaaS, PaaS

Lambda Labs:

  • Provides GPU cloud services and Inference API services (claims to be the world’s cheapest). Sells GPU hardware, technical support, and services.
  • Model: IaaS, SaaS

Replicate:

  • Provides Inference API services and custom model deployment.
  • Model: PaaS, SaaS

Hugging Face:

  • Offers a model repository (similar to GitHub), Transformers library, dataset hosting, AI application development tools, Inference API, custom model deployment, and model fine-tuning.
  • Model: PaaS, SaaS

Deepinfra:

  • Provides Inference API services and custom model deployment.
  • Model: PaaS, SaaS

Groq:

  • Offers GroqRack solutions (including software and hardware) with LPU-based cloud services (Inference API).
  • Model: SaaS

Fireworks AI:

  • Provides Inference API services, custom model deployment, and fine-tuning services.
  • Model: PaaS, SaaS

Lepton AI:

  • Offers Inference API services, custom model deployment, Dev Pods, and training task execution.
  • Model: PaaS, SaaS

Fal.ai:

  • Provides Inference API services and custom model deployment.
  • Model: PaaS, SaaS

This type of platform does not provide Inference API services directly but routes user requests to different Inference API providers.

OpenRouter:

  • Unified API interface: Offers an OpenAI API-compatible interface.
  • Smart routing and load balancing: Automatically routes requests to the best provider based on price, performance, and availability.

NVIDIA Brev:

  • A cloud-based development environment platform for AI and machine learning developers, simplifying GPU instance setup and usage.
  • Launchables: Provides pre-configured compute and software environments, enabling one-click deployment of AI tasks across various cloud platforms (AWS, GCP, Lambda-Labs).
  • GPU Sandbox: Offers a complete virtual machine environment.
  • CLI: Connects local tools to cloud instances.

SkyPilot:

  • An open-source AI and batch workload management framework.
  • Unified execution: Runs AI and batch tasks on any cloud or Kubernetes cluster without modifying code.
  • Cost optimization: Automatically selects the cheapest region, cloud, or instance type (including Spot instances).
  • High availability: Ensures access to scarce resources like GPU/TPU through automatic failover and resource scheduling.
  • Lifecycle support: Covers the entire AI development process, including training, fine-tuning, inference, and online services.

AI infrastructure startups primarily focus on providing cloud servers, Inference API services, model development, and fine-tuning services, with some even offering proxy-based inference services. Additionally, (not mentioned above) there are MLOps platforms and model training platforms, which cater more to large enterprises and research institutions.

PlatformAdvantages
RunpodLow cost, high flexibility, community cloud support
HyperbolicHigh-value GPU, open-source model support, GPU rental
Vast.aiUltra-low prices, decentralized GPU leasing
Lambda LabsHigh-performance GPU, fast cold start
ReplicateHigh ease of use, quick deployment of open-source models
Hugging FaceLarge model ecosystem, easy inference hosting
DeepinfraCloud-hosted large models, simple management
GroqUltra-fast inference speed, LPU hardware optimization
Fireworks AIMulti-modal support, high-speed inference
Lepton AIFast deployment, low-cost inference
Fal.aiGenerative media, ultra-fast inference
OpenRouterMulti-model routing, no subscription fees, pay-as-you-go

This sector is already highly competitive. I have analyzed the feasibility of building an Inference API service using public cloud resources, but even maintaining a break-even point is challenging. The only way to reduce costs is to establish a private data center or leverage P2P idle GPUs. These startups are all raising capital, burning VC money to buy GPUs and build data centers.

As an individual without significant capital to acquire GPU resources, the only way to enter the AI infrastructure space is through software optimization—either optimizing costs, improving performance, or identifying niche demand to differentiate from competitors.

Building an Inference API service on public cloud infrastructure is possible, but public cloud GPU prices are high and performance is suboptimal. Optimizing costs to gain a competitive advantage is extremely difficult. Replicate’s pricing is significantly higher than others, which suggests it is built on public cloud services.

Thus, the proxy model is the lowest-cost and easiest way to enter this sector. However, many Inference API providers restrict payment options for Chinese users, even limiting virtual credit cards. The technical difficulty is not high, but the biggest challenge lies in solving overseas payment and marketing issues. Resolving overseas payment might require establishing a foreign company and addressing billing address and credit card issues.

Another potential approach is to build a community ecosystem to attract users—for example, creating open-source AI tools that non-professionals can use or integrating plugins for GitHub and AWS to allow enterprises to seamlessly integrate AI into existing workflows.

I am familiar with cloud-native infrastructure but lack expertise in machine learning, deep learning, and large models. Using cloud-native infrastructure to build AI infrastructure, cost optimization, and performance optimization could be key breakthrough points. Leveraging the open-source ecosystem is another opportunity. Without capital, the only way forward is through technological innovation.