CUDA 12
CUDA 12.4/12.6 + cuDNN preinstalled. Boot, ssh in, run nvidia-smi.
NVIDIA RTX 4090, RTX 5090 and H100 SXM5 GPU servers for AI training, inference, image and video generation. CUDA 12 and cuDNN preinstalled, plus PyTorch / ComfyUI / Ollama image presets ready to ssh into. Threadripper Pro hosts on H100 tiers for full PCIe Gen 5 lane count. Available in 4 offshore jurisdictions, no KYC, crypto-only payment in 14 chains including Monero.
Same NVIDIA hardware across every jurisdiction with unlimited bandwidth on every plan. Pricing varies by jurisdiction — Iceland is the lowest-carbon option, Moldova the cheapest.
| Plan | GPU | VRAM | CPU | RAM | NVMe | Bandwidth | Price | |
|---|---|---|---|---|---|---|---|---|
| IS-S | 1× NVIDIA RTX 4090 | 24 GB GDDR6X | 12 vCPU | 64 GB DDR5 | 1 TB NVMe | Unlimited | $299/mo | Order |
| IS-M Popular | 1× NVIDIA RTX 5090 | 32 GB GDDR7 | 16 vCPU | 96 GB DDR5 | 1.5 TB NVMe | Unlimited | $479/mo | Order |
| IS-L | 1× NVIDIA H100 SXM5 | 80 GB HBM3 | 24 vCPU | 192 GB DDR5 | 2 TB NVMe | Unlimited | $1849/mo | Order |
| IS-XL | 2× NVIDIA H100 SXM5 | 160 GB HBM3 | 32 vCPU | 384 GB DDR5 | 4 TB NVMe | Unlimited | $3499/mo | Order |
| Plan | GPU | VRAM | CPU | RAM | NVMe | Bandwidth | Price | |
|---|---|---|---|---|---|---|---|---|
| MD-S | 1× NVIDIA RTX 4090 | 24 GB GDDR6X | 12 vCPU | 64 GB DDR5 | 1 TB NVMe | Unlimited | $249/mo | Order |
| MD-M Popular | 1× NVIDIA RTX 5090 | 32 GB GDDR7 | 16 vCPU | 96 GB DDR5 | 1.5 TB NVMe | Unlimited | $399/mo | Order |
| MD-L | 1× NVIDIA H100 SXM5 | 80 GB HBM3 | 24 vCPU | 192 GB DDR5 | 2 TB NVMe | Unlimited | $1699/mo | Order |
| MD-XL | 2× NVIDIA H100 SXM5 | 160 GB HBM3 | 32 vCPU | 384 GB DDR5 | 4 TB NVMe | Unlimited | $3199/mo | Order |
| Plan | GPU | VRAM | CPU | RAM | NVMe | Bandwidth | Price | |
|---|---|---|---|---|---|---|---|---|
| RO-S | 1× NVIDIA RTX 4090 | 24 GB GDDR6X | 12 vCPU | 64 GB DDR5 | 1 TB NVMe | Unlimited | $269/mo | Order |
| RO-M Popular | 1× NVIDIA RTX 5090 | 32 GB GDDR7 | 16 vCPU | 96 GB DDR5 | 1.5 TB NVMe | Unlimited | $429/mo | Order |
| RO-L | 1× NVIDIA H100 SXM5 | 80 GB HBM3 | 24 vCPU | 192 GB DDR5 | 2 TB NVMe | Unlimited | $1749/mo | Order |
| RO-XL | 2× NVIDIA H100 SXM5 | 160 GB HBM3 | 32 vCPU | 384 GB DDR5 | 4 TB NVMe | Unlimited | $3299/mo | Order |
| Plan | GPU | VRAM | CPU | RAM | NVMe | Bandwidth | Price | |
|---|---|---|---|---|---|---|---|---|
| NL-S | 1× NVIDIA RTX 4090 | 24 GB GDDR6X | 12 vCPU | 64 GB DDR5 | 1 TB NVMe | Unlimited | $279/mo | Order |
| NL-M Popular | 1× NVIDIA RTX 5090 | 32 GB GDDR7 | 16 vCPU | 96 GB DDR5 | 1.5 TB NVMe | Unlimited | $449/mo | Order |
| NL-L | 1× NVIDIA H100 SXM5 | 80 GB HBM3 | 24 vCPU | 192 GB DDR5 | 2 TB NVMe | Unlimited | $1799/mo | Order |
| NL-XL | 2× NVIDIA H100 SXM5 | 160 GB HBM3 | 32 vCPU | 384 GB DDR5 | 4 TB NVMe | Unlimited | $3399/mo | Order |
GPU hosting is available in 4 jurisdictions at launch (Iceland, Netherlands, Romania, Moldova). Russia is excluded due to NVIDIA export sanctions; Switzerland and Panama are kept Linux-only for now.
CUDA 12.4/12.6 + cuDNN preinstalled. Boot, ssh in, run nvidia-smi.
From paid order to nvidia-smi output in under 60 seconds.
Up to 4 TB NVMe SSD, paired with DDR5 RAM for fast dataset I/O.
Full root SSH, plus pre-bound JupyterLab on port 8888 with token auth.
Llama, Mistral, Qwen, DeepSeek finetuning with LoRA / QLoRA / full FT on H100. Or self-hosted inference with vLLM / TGI / Ollama for production model serving.
Stable Diffusion, FLUX.1, SDXL with ComfyUI or Forge. Train your own LoRA, batch-generate at scale, or self-host an inference endpoint.
OpenSora, CogVideoX, Wan-2.1, AnimateDiff. Video generation needs serious VRAM — start at RTX 5090 (32 GB) or H100 (80 GB).
Deploy fine-tuned models behind your own API. Predictable costs, no per-token fees, no data leaving your jurisdiction. JupyterLab + FastAPI included.
Tick any of these at order time and your GPU server boots with the stack already installed, configured and started via systemd. Add pre-downloaded models below to skip the HuggingFace 30-60 min download too.
Production-grade LLM serving with continuous batching and paged attention. Exposes an /v1/completions endpoint compatible with the OpenAI SDK.
Self-hosted ChatGPT-style web UI. Pulls Ollama-native quantized weights; easiest path to "talk to my LLM in a browser".
Gradio UI with broad backend support — Transformers, ExLlamaV2, llama.cpp, AWQ, GPTQ. Power-user choice for benchmarking quantizations.
HuggingFace Text Generation Inference — production server with token streaming, tensor parallelism, paged attention.
YAML-config driven finetuning. Supports LoRA, QLoRA, full FT, DPO, ORPO. Pre-cloned to /opt/axolotl with starter configs for Llama / Qwen / Mistral.
2× faster + 70% less VRAM finetuning via custom Triton kernels. Ideal for budget runs on RTX 4090. Pre-installed in /opt/unsloth.
WebUI-driven finetuning platform. SFT / RLHF / DPO / KTO. Good entry point for non-coders who want to finetune on a UI.
Node-graph image-gen interface, ships with FLUX.1-schnell + Kontext workflows. Power-user image generation pipeline.
The mainstream Stable Diffusion WebUI. Stable Diffusion 3.5 + extensions ecosystem. Familiar UI for users coming from civitai.
A1111 fork optimized for FLUX, faster sampling, lower VRAM. Drop-in replacement for users coming from Auto1111.
GUI for training Stable Diffusion / FLUX LoRA, DreamBooth, textual inversion. Trains a custom-style LoRA on RTX 4090 in 30-90 min.
ComfyUI with video-gen workflows preloaded — Wan 2.2 T2V, HunyuanVideo, LTX-Video. Needs 40+ GB VRAM for usable speed at 720p.
Lightweight video workflows — CogVideoX-5B, Wan 2.1 1.3B, LTX-Video. Runs on a single RTX 4090.
OpenAI Whisper Large v3 Turbo with faster-whisper backend behind a /transcribe HTTP API. 8× faster than v3, 99 langs, real-time on any GPU.
Multi-model TTS endpoint serving Kokoro 82M (54 voices, 8 langs) and Sesame CSM-1B (conversational with context). REST + WebSocket streaming.
Always installed. PyTorch 2.5 + CUDA 12.4 + Transformers + diffusers + accelerate + bitsandbytes + xformers + flash-attn. The universal AI dev baseline.
VSCode running in your browser, full Python/IPython/extensions. For users who prefer IDE workflow over notebooks.
Combine multiple stacks on the same GPU — the deploy script resolves dependency conflicts and assigns non-clashing ports.
Tick the models you need at order time and they're cached in /root/.cache/huggingface before you log in. 🔒 Gated models (Llama, Mistral, Gemma, FLUX-dev, SD 3.5) require your HuggingFace token (also asked at order time).
| Model | HuggingFace | Size | Min VRAM | Min GPU tier | Type |
|---|---|---|---|---|---|
| Llama 3.3 70B Instruct 🔒 Gated | meta-llama/Llama-3.3-70B-Instruct |
140 GB | 160 GB | GPU-L | LLM |
| Qwen3 32B | Qwen/Qwen3-32B |
64 GB | 80 GB | GPU-L | LLM |
| Qwen3 14B | Qwen/Qwen3-14B |
28 GB | 32 GB | GPU-S | LLM |
| Qwen3 8B | Qwen/Qwen3-8B |
16 GB | 20 GB | GPU-S | LLM |
| DeepSeek-R1 Distill Qwen 32B | deepseek-ai/DeepSeek-R1-Distill-Qwen-32B |
64 GB | 80 GB | GPU-S | LLM |
| DeepSeek-R1 Distill Llama 70B | deepseek-ai/DeepSeek-R1-Distill-Llama-70B |
140 GB | 160 GB | GPU-S | LLM |
| Mistral Small 3.2 24B (multimodal) | mistralai/Mistral-Small-3.2-24B-Instruct-2506 |
48 GB | 60 GB | GPU-S | LLM |
| Gemma 3 27B (multimodal) 🔒 Gated | google/gemma-3-27b-it |
54 GB | 64 GB | GPU-L | LLM |
| Gemma 3 12B (multimodal) 🔒 Gated | google/gemma-3-12b-it |
24 GB | 28 GB | GPU-S | LLM |
| Phi-4 (14B) | microsoft/phi-4 |
28 GB | 32 GB | GPU-S | LLM |
| Phi-4 Mini Instruct (3.8B) | microsoft/Phi-4-mini-instruct |
8 GB | 10 GB | GPU-S | LLM |
| FLUX.1 [dev] 🔒 Gated | black-forest-labs/FLUX.1-dev |
24 GB | 24 GB | GPU-S | Image |
| FLUX.1 [schnell] | black-forest-labs/FLUX.1-schnell |
24 GB | 24 GB | GPU-S | Image |
| FLUX.1 Kontext [dev] (image editing) 🔒 Gated | black-forest-labs/FLUX.1-Kontext-dev |
24 GB | 24 GB | GPU-S | Image |
| Stable Diffusion 3.5 Large 🔒 Gated | stabilityai/stable-diffusion-3.5-large |
16 GB | 18 GB | GPU-S | Image |
| Stable Diffusion 3.5 Medium 🔒 Gated | stabilityai/stable-diffusion-3.5-medium |
5 GB | 10 GB | GPU-S | Image |
| HiDream-I1 Full | HiDream-ai/HiDream-I1-Full |
34 GB | 40 GB | GPU-S | Image |
| Wan 2.2 T2V A14B | Wan-AI/Wan2.2-T2V-A14B |
28 GB | 40 GB | GPU-S | Video |
| Wan 2.1 T2V 1.3B (low VRAM) | Wan-AI/Wan2.1-T2V-1.3B |
3 GB | 8 GB | GPU-S | Video |
| HunyuanVideo 1.5 (8.3B) | tencent/HunyuanVideo-1.5 |
17 GB | 24 GB | GPU-S | Video |
| LTX-Video 0.9.8 13B | Lightricks/LTX-Video |
26 GB | 24 GB | GPU-S | Video |
| CogVideoX-5B | zai-org/CogVideoX-5b |
10 GB | 16 GB | GPU-S | Video |
| Whisper Large v3 Turbo | openai/whisper-large-v3-turbo |
2 GB | 4 GB | GPU-S | Audio |
| Whisper Large v3 | openai/whisper-large-v3 |
3 GB | 6 GB | GPU-S | Audio |
| Kokoro 82M (TTS) | hexgrad/Kokoro-82M |
1 GB | 2 GB | GPU-S | Audio |
| Sesame CSM-1B (conversational TTS) | sesame/csm-1b |
2 GB | 6 GB | GPU-S | Audio |
| Stable Audio Open 1.0 🔒 Gated | stabilityai/stable-audio-open-1.0 |
3 GB | 8 GB | GPU-S | Audio |
Sizes are FP16 weights. For RTX 4090 (24 GB VRAM) on 70B models, the AWQ-quantized variant is auto-downloaded in parallel.
Crypto-only checkout, native Monero, token-only signup, pre-installed AI stacks, pre-downloaded HuggingFace models, encrypted HF tokens, auto Let's Encrypt endpoints, unlimited bandwidth and 100% renewable energy in Iceland — read the row labelled "ServPrivacy" and judge for yourself.
| Feature | ServPrivacy | Vast.ai | RunPod | Paperspace | Lambda | TensorDock |
|---|---|---|---|---|---|---|
| Crypto-only checkout | ✅ 14 chains | ⚠️ BTC | ⚠️ Gateway | ❌ | ❌ | ⚠️ BTC/ETH/USDT |
| Native Monero (XMR) | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ |
| No KYC, no email signup | ✅ Token-only | ⚠️ Email + ID for trust | ⚠️ Email + payment | ❌ Full KYC | ❌ Enterprise KYC | ⚠️ Email + light KYC |
| Pre-installed AI stacks | ✅ 17 templates | ⚠️ Docker BYO | ✅ 100+ | ⚠️ Notebooks only | ⚠️ Lambda Stack only | ⚠️ Docker BYO |
| Pre-downloaded models at order | ✅ 27 models | ❌ | ❌ | ❌ | ❌ | ❌ |
| HuggingFace token at order | ✅ Encrypted, used once | ❌ | ❌ | ❌ | ❌ | ❌ |
| SSH key at order | ✅ | ✅ | ✅ | ⚠️ | ✅ | ⚠️ |
| Auto-shutdown timer | ✅ 6h-7d | ✅ | ⚠️ Spot only | ❌ | ❌ | ❌ |
| Public HTTPS endpoint (Let's Encrypt) | ✅ Auto | ⚠️ Manual | ✅ Pods | ✅ | ❌ | ⚠️ Manual |
| Unlimited bandwidth | ✅ | ⚠️ Per host | ⚠️ Capped | ⚠️ Capped | ⚠️ Capped | ⚠️ Per host |
| Renewable-energy datacenter | ✅ Iceland 100% geo+hydro | ❌ Variable | ⚠️ US grid | ⚠️ US grid | ⚠️ US grid | ⚠️ Variable |
| Offshore jurisdiction | ✅ IS / NL / RO / MD | ❌ Distributed P2P | ❌ US-centric | ❌ US | ❌ US-only | ⚠️ Multi-region |
| Sandbox dry-run mode | ✅ ?dry_run=1 | ⚠️ Trial credit | ⚠️ Limited | ⚠️ Free GPU tier | ❌ | ❌ |
| AI-agent / MCP first | ✅ MCP + REST + x402 | ⚠️ REST | ⚠️ REST | ⚠️ REST | ⚠️ REST | ⚠️ REST |
| Entry RTX 4090 / mo | $249 | ~$216 spot | ~$396 on-demand | n/a | n/a | ~$252 spot |
Comparison data sourced from competitors' public 2026-05 pricing pages and signup flows. ServPrivacy entry RTX 4090 = $249/mo Moldova; competitor "spot" prices are average rates for equivalent hardware.
Full hardware passthrough. You get the entire physical NVIDIA card with direct VRAM access — not a vGPU slice, not a time-shared MIG partition. nvidia-smi inside your VM shows the same numbers as the bare-metal host. Full driver access, full CUDA, full PyTorch / TensorFlow stack — no SR-IOV reservations.
Default image: Ubuntu 22.04 + CUDA 12.4 + cuDNN 9 + NVIDIA driver 550. Other ready-to-go images: Ubuntu 24.04 + CUDA 12.6, Ubuntu 22 + PyTorch 2.5, Ubuntu 22 + ComfyUI + Flux, Ubuntu 22 + Ollama + Open WebUI. Vanilla Ubuntu / Debian / AlmaLinux / Rocky also offered if you want to install your own stack. You can switch driver versions at any time as full root.
Yes. Many of our GPU customers run public inference APIs on top of vLLM / TGI / FastAPI. The GPU servers come with full root, predictable monthly billing (no per-token surprises) and a fixed jurisdictional IP. Bandwidth is unlimited on every GPU plan, so you can serve high-traffic public endpoints without watching meters or paying overage fees.
NVIDIA H100, A100 and high-end RTX cards (4090 and above) are subject to US Department of Commerce export controls (15 CFR Part 744) and EU dual-use regulations that prohibit shipment to Russian datacenters. We do not provision them in Russia to stay compliant with the controls that apply to our supply chain. If you need offshore Linux VPS or Dedicated in Russia, those product lines are unaffected.
Iceland datacenters run on 100% renewable geothermal and hydroelectric power, and the cold ambient temperature meaningfully reduces the cooling overhead on H100 boxes that draw 700W each under sustained load. The end result is the lowest-carbon offshore GPU compute on the market. The premium price covers the higher datacenter cost in Iceland and the cleaner energy sourcing — for ESG-conscious AI teams, this is the only credible offshore answer.
Yes — the GPU-XL tier is 2× H100 SXM5 with NVLink interconnect inside one box, ideal for FSDP / DeepSpeed Zero-3 / DDP on the same machine. For multi-node training you can rent multiple GPU-XL servers in the same datacenter and connect them over the 10 Gbps uplink. We do not yet ship 8× H100 cluster nodes — contact us if your training run needs more scale.
Pick your jurisdiction, pick your NVIDIA GPU, pay in any of 14 cryptos. Live JupyterLab in under 60 seconds. No KYC, no email, no phone — just a token.
View GPU Plans