A drop-in replacement for the OpenAI and Anthropic APIs, running near-SOTA open weights on our own GPUs in Finland. €20 a month, flat — no per-token billing, no data retention. We pin one model, measure how it behaves, and publish the results — failures included. Point your existing tools at it and keep working.
Everything a developer needs, priced like a utility — and built for teams that care where their data lives.
Prompts and completions live only in GPU memory. Nothing is written to disk or logged.
Twenty euros a month, used as much as fair-use allows. No per-token meter, and no surprise on the invoice at the end of the month.
The same endpoints your tools already speak: OpenAI SDKs, Cursor, Claude Code, Continue, aider. Change the base URL, nothing else.
Entire codebases, long histories, and large documents in one session. Hybrid attention keeps it practical, and flat pricing means a long context costs the same as a short one.
A single B300 holds sub-second time-to-first-token even under concurrent load. Speculative decoding and KV caching keep decode fast as users pile on.
Tokens stream as they're generated over server-sent events. No polling, no batch waits.
Published benchmarks from the HuggingFace model card, V4 Flash in Max reasoning mode (no external tools) against the leading closed-source models of 2026. Source: DeepSeek V4 Flash.
| Benchmark | V4 Flash (Max) | Opus 4.8 | GPT-5.5 |
|---|---|---|---|
| LiveCodeBench | 91.6 | 88.8 | — |
| GPQA Diamond | 88.1 | 93.6 | 93.6 |
| HLE | 34.8 | 49.8 | 41.4 |
| SWE Verified | 79.0 | 88.6 | — |
V4 Flash leads on code generation: its LiveCodeBench score (91.6) tops both Opus 4.8 and GPT-5.5. On the broader reasoning benchmarks it trails the newest frontier models, but it's a 284B-parameter mixture-of-experts with just 13B active, so it runs on a single GPU and costs a fraction per token.
It's MIT-licensed with open weights, so there's no vendor lock-in: the same model that runs DeepSeek's own API, served from our GPUs in Finland instead of theirs.
We benchmarked a single B300 against DeepSeek's own API on identical weights:
Measured on a single NVIDIA B300, June 2026. Full config and data →
The US hosts ~75% of the world's AI supercomputer capacity; the EU holds under 5% (Epoch AI, 2025). America's largest clusters are scaling past a gigawatt; xAI's Colossus alone runs 200,000 H100s at 300 MW. Meanwhile, OpenAI raised $122 billion in a single round (March 2026) — more than the EU's entire InvestAI target, most of which is reallocated from existing programmes. In May 2026, Mistral's CEO told the French National Assembly Europe has two years before it becomes a US "vassal state." Training models from scratch is a game Europe already lost. The opening is deployment: run the best open-weight models on European GPUs and compete on operations, price, and trust.
Per-token pricing turns a developer tool into a budget line that gets capped and cut. Teams are rationing prompts and restricting access after burning through annual budgets in months, and some startups now exist only to track and reduce token spend. Inference should be priced like a utility, not metered like a luxury.
On June 12, 2026, the US issued its first-ever export control on AI models, forcing Anthropic to disable Claude Fable 5 and Mythos 5 for all foreign users overnight. This wasn't a chip restriction — it was a deployed model pulled by government order. Europe already relies on non-EU providers for over 80% of its digital infrastructure. Any application built on US-hosted AI is one directive away from going dark. If your inference runs outside the EU, you don't control it.
Everything included. No surprises.
Volume pricing for engineering teams.
One email when we launch. That's it.
hi@affordableai.euAffordableAI is operated by a privately held Dutch company (BV). Bootstrapped and independently funded, with no outside investors. The infrastructure is built and run in Europe.