Vision: "Make AI easy to use, understand, access, and operationalize."

Serverless AI/ML Platform

Run AI/ML workloads without managing infrastructure.

Run training, inference, fine-tuning, agents, RAG pipelines, and batch ML jobs without managing infrastructure. Specify outcomes, not clusters—Nexplane automatically plans, schedules, executes, and recovers workloads across GPUs, TPUs, and cloud environments.

$ nexplane run --workload llama-sft

Workloads: train, infer, rag, agent
Plan generated:
  accelerator: 1x L4
  estimated cost: $24.80
  estimated runtime: 3h 10m
  reliability: high
  risk: low

Proceed? y

Status: running
  [train]  step 500/20000  loss 3.21
  [rag]    indexed 48k docs → vectorstore
  [agent]  142 tool calls  98% success

One platform for every AI/ML workload

From model training to production agents and RAG pipelines.

TrainingFine-tuningInferenceAI agentsRAG pipelinesVector indexingBatch scoringEvaluationData preprocessing

Agent-planned workloads

Nexplane inspects each job — train, infer, RAG, or agent run — and recommends accelerator, runtime, checkpointing, cost, and reliability settings.

Serverless execution

Submit workloads with a few essential knobs. Nexplane handles orchestration details, infrastructure, logs, and results.

Failure-aware recovery

Detect failures, retry intelligently, and optimize for workload completion.

Expert-level control

Researchers keep full control over hyperparameters, model config, batch settings, and evaluation — for any workload type.

Runtime agnostic

PyTorch, JAX, Ray, vLLM, LangChain, LlamaIndex — across clouds and accelerators.

Three modes. One platform.

Simple defaults for most users. Full control for advanced users.

Simple
For developers
  • Workload type, model, tools, dataset, budget, deadline
  • Nexplane agent chooses the rest
Expert
For researchers & ML engineers
  • Full config for any workload (train, infer, agent, RAG)
  • Nexplane handles infra, recovery, artifacts
Agent-assisted
For best product experience
  • Full config + agent pre-flight review
  • Memory, cost, runtime estimates per workload
  • Accept or override recommendations

Training & fine-tuning

Pretrain, SFT, LoRA, and RLHF with checkpoint recovery and cost caps.

Inference & serving

Batch inference, embedding jobs, and model export at scale.

AI agents

Run tool-using agents with retries, timeouts, and observability — without managing runtime infra.

RAG pipelines

Ingest documents, embed, index vector stores, and serve retrieval-augmented generation at scale.

Evaluation & benchmarks

Scheduled eval runs, metric tracking, and comparison across checkpoints.

Data & preprocessing

ETL, chunking, tokenization, and feature pipelines on elastic compute.

Example workload spec

Describe what you want to run. Nexplane plans compute, orchestration, and recovery from your workload definition.

{
  "name": "ml-pipeline",
  "mode": "expert",
  "type": "training",
  "model": "s3://models/llama-sft/checkpoint-final",
  "dataset": "s3://data",
  "constraints": {
    "budgetUsd": 50,
    "deadlineHours": 4,
    "reliability": "high"
  },
  "training": {
    "batchSize": 32,
    "blockSize": 256,
    "learningRate": 0.0003,
    "maxIters": 20000,
    "nLayer": 6,
    "nHead": 6,
    "nEmbd": 384
  },
  "inference": {
    "batchSize": 64,
    "outputUri": "s3://predictions"
  }
}
You control
  • Workload type, model, tools, and dataset
  • Architecture, hyperparameters, and optimizer settings
  • Agent prompts, tool configs, and step limits
  • RAG chunking, embedding models, and retrieval settings
Nexplane handles
  • Accelerator selection and compute provisioning
  • Runtime setup, preemptions, and retries
  • Failure recovery and progress restart
  • Logs, artifact storage, and cost tracking