just let it 

DISTRIBUTED AI ORCHESTRATION

CLUSTER YOUR FLOCK. SATURATE YOUR HARDWARE.

ClusterFlock unifies your heterogeneous GPU fleet - NVIDIA, Apple Silicon, DGX, consumer cards - into a single AI backend. Load models, run inference, and launch autonomous missions across every device from one command.

WHAT IT DOES

SMART ALLOCATION
Auto-detects VRAM, profiles hardware, and bin-packs the best models onto each GPU. No manual config. Self adapting and self healing, even while the mission runs.
MIXTURE OF AGENTS
A showrunner LLM coordinates a flock of worker models - dispatching tasks, evaluating results, iterating until done.
AUTONOMOUS MISSIONS
Describe a goal. ClusterFlock spins up sandboxed containers, assigns agents, and builds the solution autonomously.
REAL-TIME TELEMETRY
Live GPU utilization, VRAM, model status, and tokens/sec from every node.
MULTI-BACKEND
Native support for llama.cpp, LM Studio, Metal, and CUDA. DGX Spark, consumer GPUs, and Mac - all in one cluster.
OPENAI-COMPATIBLE API
Drop-in replacement. Point any OpenAI SDK, LangChain app, or curl command at port 1919 and go. ClusterFlock intelligently routes across your network and saturates your AI hardware with work.

ONE ENDPOINT. FULL CLUSTER.

POST http://your-cluster:1919/v1/chat/completions { "model": "clusterflock", "messages": [ {"role": "user", "content": "Explain quantum computing"} ] }
Three routing modes: FANOUT broadcast to all, synthesize best SPEED fastest single endpoint MANUAL pick your model
Works with OpenAI SDK · LangChain · LiteLLM · curl

AVAILABLE SOON!

We're putting in some final touches.
- your friends at Notum Robotics