Vision: "Make AI easy to use, understand, access, and operationalize."
Run AI/ML workloads without managing infrastructure.
Run training, inference, fine-tuning, agents, RAG pipelines, and batch ML jobs without managing infrastructure. Specify outcomes, not clusters—Nexplane automatically plans, schedules, executes, and recovers workloads across GPUs, TPUs, and cloud environments.
$ nexplane run --workload llama-sft Workloads: train, infer, rag, agent Plan generated: accelerator: 1x L4 estimated cost: $24.80 estimated runtime: 3h 10m reliability: high risk: low Proceed? y Status: running [train] step 500/20000 loss 3.21 [rag] indexed 48k docs → vectorstore [agent] 142 tool calls 98% success
One platform for every AI/ML workload
From model training to production agents and RAG pipelines.
Agent-planned workloads
Nexplane inspects each job — train, infer, RAG, or agent run — and recommends accelerator, runtime, checkpointing, cost, and reliability settings.
Serverless execution
Submit workloads with a few essential knobs. Nexplane handles orchestration details, infrastructure, logs, and results.
Failure-aware recovery
Detect failures, retry intelligently, and optimize for workload completion.
Expert-level control
Researchers keep full control over hyperparameters, model config, batch settings, and evaluation — for any workload type.
Runtime agnostic
PyTorch, JAX, Ray, vLLM, LangChain, LlamaIndex — across clouds and accelerators.
Three modes. One platform.
Simple defaults for most users. Full control for advanced users.
- →Workload type, model, tools, dataset, budget, deadline
- →Nexplane agent chooses the rest
- →Full config for any workload (train, infer, agent, RAG)
- →Nexplane handles infra, recovery, artifacts
- →Full config + agent pre-flight review
- →Memory, cost, runtime estimates per workload
- →Accept or override recommendations
Training & fine-tuning
Pretrain, SFT, LoRA, and RLHF with checkpoint recovery and cost caps.
Inference & serving
Batch inference, embedding jobs, and model export at scale.
AI agents
Run tool-using agents with retries, timeouts, and observability — without managing runtime infra.
RAG pipelines
Ingest documents, embed, index vector stores, and serve retrieval-augmented generation at scale.
Evaluation & benchmarks
Scheduled eval runs, metric tracking, and comparison across checkpoints.
Data & preprocessing
ETL, chunking, tokenization, and feature pipelines on elastic compute.
Example workload spec
Describe what you want to run. Nexplane plans compute, orchestration, and recovery from your workload definition.
{
"name": "ml-pipeline",
"mode": "expert",
"type": "training",
"model": "s3://models/llama-sft/checkpoint-final",
"dataset": "s3://data",
"constraints": {
"budgetUsd": 50,
"deadlineHours": 4,
"reliability": "high"
},
"training": {
"batchSize": 32,
"blockSize": 256,
"learningRate": 0.0003,
"maxIters": 20000,
"nLayer": 6,
"nHead": 6,
"nEmbd": 384
},
"inference": {
"batchSize": 64,
"outputUri": "s3://predictions"
}
}- Workload type, model, tools, and dataset
- Architecture, hyperparameters, and optimizer settings
- Agent prompts, tool configs, and step limits
- RAG chunking, embedding models, and retrieval settings
- Accelerator selection and compute provisioning
- Runtime setup, preemptions, and retries
- Failure recovery and progress restart
- Logs, artifact storage, and cost tracking