just let it
DISTRIBUTED AI ORCHESTRATION
// OVERVIEW
CLUSTER YOUR FLOCK. SATURATE YOUR HARDWARE.
ClusterFlock unifies your heterogeneous GPU fleet - NVIDIA, Apple Silicon, DGX, consumer cards - into a single AI backend. Load models, run inference, and launch autonomous missions across every device from one command.
// CAPABILITIES
WHAT IT DOES
SMART ALLOCATION
Auto-detects VRAM, profiles hardware, and bin-packs the best models onto each GPU. No manual config. Self adapting and self healing, even while the mission runs.
MIXTURE OF AGENTS
A showrunner LLM coordinates a flock of worker models - dispatching tasks, evaluating results, iterating until done.
AUTONOMOUS MISSIONS
Describe a goal. ClusterFlock spins up sandboxed containers, assigns agents, and builds the solution autonomously.
REAL-TIME TELEMETRY
Live GPU utilization, VRAM, model status, and tokens/sec from every node.
MULTI-BACKEND
Native support for llama.cpp, LM Studio, Metal, and CUDA. DGX Spark, consumer GPUs, and Mac - all in one cluster.
OPENAI-COMPATIBLE API
Drop-in replacement. Point any OpenAI SDK, LangChain app, or curl command at port 1919 and go. ClusterFlock intelligently routes across your network and saturates your AI hardware with work.
// API
ONE ENDPOINT. FULL CLUSTER.
POST http://your-cluster:1919/v1/chat/completions
{
"model": "clusterflock",
"messages": [
{"role": "user", "content": "Explain quantum computing"}
]
}
Three routing modes:
FANOUT
broadcast to all, synthesize best
SPEED
fastest single endpoint
MANUAL
pick your model
Works with OpenAI SDK · LangChain · LiteLLM · curl
// OPEN SOURCE (MIT)
AVAILABLE SOON!
We're putting in some final touches.
- your friends at Notum Robotics