Home / GPU AI Hosting
RTX 4090 · RTX 5090 · H100 SXM5 · CUDA 12

Offshore GPU AI Hosting

NVIDIA RTX 4090, RTX 5090 and H100 SXM5 GPU servers for AI training, inference, image and video generation. CUDA 12 and cuDNN preinstalled, plus PyTorch / ComfyUI / Ollama image presets ready to ssh into. Threadripper Pro hosts on H100 tiers for full PCIe Gen 5 lane count. Available in 4 offshore jurisdictions, no KYC, crypto-only payment in 14 chains including Monero.

No KYC
Crypto Only
CUDA 12
NVMe SSD
Full Root
Monero accepted
All GPU Plans

GPU Plans by Location

Same NVIDIA hardware across every jurisdiction with unlimited bandwidth on every plan. Pricing varies by jurisdiction — Iceland is the lowest-carbon option, Moldova the cheapest.

Iceland Free Speech Haven

PlanGPUVRAMCPURAMNVMeBandwidthPrice
IS-S 1× NVIDIA RTX 4090 24 GB GDDR6X 12 vCPU 64 GB DDR5 1 TB NVMe Unlimited $299/mo Order
IS-M Popular 1× NVIDIA RTX 5090 32 GB GDDR7 16 vCPU 96 GB DDR5 1.5 TB NVMe Unlimited $479/mo Order
IS-L 1× NVIDIA H100 SXM5 80 GB HBM3 24 vCPU 192 GB DDR5 2 TB NVMe Unlimited $1849/mo Order
IS-XL 2× NVIDIA H100 SXM5 160 GB HBM3 32 vCPU 384 GB DDR5 4 TB NVMe Unlimited $3499/mo Order

Moldova Budget Offshore

PlanGPUVRAMCPURAMNVMeBandwidthPrice
MD-S 1× NVIDIA RTX 4090 24 GB GDDR6X 12 vCPU 64 GB DDR5 1 TB NVMe Unlimited $249/mo Order
MD-M Popular 1× NVIDIA RTX 5090 32 GB GDDR7 16 vCPU 96 GB DDR5 1.5 TB NVMe Unlimited $399/mo Order
MD-L 1× NVIDIA H100 SXM5 80 GB HBM3 24 vCPU 192 GB DDR5 2 TB NVMe Unlimited $1699/mo Order
MD-XL 2× NVIDIA H100 SXM5 160 GB HBM3 32 vCPU 384 GB DDR5 4 TB NVMe Unlimited $3199/mo Order

Romania Anti-Retention

PlanGPUVRAMCPURAMNVMeBandwidthPrice
RO-S 1× NVIDIA RTX 4090 24 GB GDDR6X 12 vCPU 64 GB DDR5 1 TB NVMe Unlimited $269/mo Order
RO-M Popular 1× NVIDIA RTX 5090 32 GB GDDR7 16 vCPU 96 GB DDR5 1.5 TB NVMe Unlimited $429/mo Order
RO-L 1× NVIDIA H100 SXM5 80 GB HBM3 24 vCPU 192 GB DDR5 2 TB NVMe Unlimited $1749/mo Order
RO-XL 2× NVIDIA H100 SXM5 160 GB HBM3 32 vCPU 384 GB DDR5 4 TB NVMe Unlimited $3299/mo Order

Netherlands Best Peering

PlanGPUVRAMCPURAMNVMeBandwidthPrice
NL-S 1× NVIDIA RTX 4090 24 GB GDDR6X 12 vCPU 64 GB DDR5 1 TB NVMe Unlimited $279/mo Order
NL-M Popular 1× NVIDIA RTX 5090 32 GB GDDR7 16 vCPU 96 GB DDR5 1.5 TB NVMe Unlimited $449/mo Order
NL-L 1× NVIDIA H100 SXM5 80 GB HBM3 24 vCPU 192 GB DDR5 2 TB NVMe Unlimited $1799/mo Order
NL-XL 2× NVIDIA H100 SXM5 160 GB HBM3 32 vCPU 384 GB DDR5 4 TB NVMe Unlimited $3399/mo Order

GPU hosting is available in 4 jurisdictions at launch (Iceland, Netherlands, Romania, Moldova). Russia is excluded due to NVIDIA export sanctions; Switzerland and Panama are kept Linux-only for now.

Included on Every GPU Server

CUDA 12

CUDA 12.4/12.6 + cuDNN preinstalled. Boot, ssh in, run nvidia-smi.

60-second deploy

From paid order to nvidia-smi output in under 60 seconds.

NVMe SSD

Up to 4 TB NVMe SSD, paired with DDR5 RAM for fast dataset I/O.

SSH + Jupyter

Full root SSH, plus pre-bound JupyterLab on port 8888 with token auth.

Use cases

What GPU AI Hosting is Used For

LLM finetuning & inference

Llama, Mistral, Qwen, DeepSeek finetuning with LoRA / QLoRA / full FT on H100. Or self-hosted inference with vLLM / TGI / Ollama for production model serving.

Image generation

Stable Diffusion, FLUX.1, SDXL with ComfyUI or Forge. Train your own LoRA, batch-generate at scale, or self-host an inference endpoint.

AI video generation

OpenSora, CogVideoX, Wan-2.1, AnimateDiff. Video generation needs serious VRAM — start at RTX 5090 (32 GB) or H100 (80 GB).

Production inference

Deploy fine-tuned models behind your own API. Predictable costs, no per-token fees, no data leaving your jurisdiction. JupyterLab + FastAPI included.

1-click deploy

Pre-installed AI templates

Tick any of these at order time and your GPU server boots with the stack already installed, configured and started via systemd. Add pre-downloaded models below to skip the HuggingFace 30-60 min download too.

LLM Inference

vLLM (OpenAI-compatible)

Production-grade LLM serving with continuous batching and paged attention. Exposes an /v1/completions endpoint compatible with the OpenAI SDK.

LLMOpenAI APIproduction +
LLM Inference

Ollama + Open WebUI

Self-hosted ChatGPT-style web UI. Pulls Ollama-native quantized weights; easiest path to "talk to my LLM in a browser".

LLMchat UIbeginner +
LLM Inference

text-generation-webui (Oobabooga)

Gradio UI with broad backend support — Transformers, ExLlamaV2, llama.cpp, AWQ, GPTQ. Power-user choice for benchmarking quantizations.

LLMmulti-backendpower user +
LLM Inference

HuggingFace TGI

HuggingFace Text Generation Inference — production server with token streaming, tensor parallelism, paged attention.

LLMproductionHuggingFace +
Finetuning

Axolotl (LLM finetuning)

YAML-config driven finetuning. Supports LoRA, QLoRA, full FT, DPO, ORPO. Pre-cloned to /opt/axolotl with starter configs for Llama / Qwen / Mistral.

finetuneLoRAQLoRA +
Finetuning

Unsloth (2× faster finetune)

2× faster + 70% less VRAM finetuning via custom Triton kernels. Ideal for budget runs on RTX 4090. Pre-installed in /opt/unsloth.

finetunefastlow VRAM +
Finetuning

LLaMA-Factory

WebUI-driven finetuning platform. SFT / RLHF / DPO / KTO. Good entry point for non-coders who want to finetune on a UI.

finetuneGUI +
Image Generation

ComfyUI + FLUX.1

Node-graph image-gen interface, ships with FLUX.1-schnell + Kontext workflows. Power-user image generation pipeline.

imageFLUXworkflow +
Image Generation

Automatic1111 + SD 3.5

The mainstream Stable Diffusion WebUI. Stable Diffusion 3.5 + extensions ecosystem. Familiar UI for users coming from civitai.

imageSD 3.5 +
Image Generation

Forge (faster A1111)

A1111 fork optimized for FLUX, faster sampling, lower VRAM. Drop-in replacement for users coming from Auto1111.

imageFLUXfast +
Image Generation

Kohya SS (LoRA training)

GUI for training Stable Diffusion / FLUX LoRA, DreamBooth, textual inversion. Trains a custom-style LoRA on RTX 4090 in 30-90 min.

LoRA trainingimageGUI +
AI Video

ComfyUI + Wan 2.2 / HunyuanVideo

ComfyUI with video-gen workflows preloaded — Wan 2.2 T2V, HunyuanVideo, LTX-Video. Needs 40+ GB VRAM for usable speed at 720p.

videoWan 2.2HunyuanVideo +
AI Video

ComfyUI Video Lite (CogVideoX / LTX)

Lightweight video workflows — CogVideoX-5B, Wan 2.1 1.3B, LTX-Video. Runs on a single RTX 4090.

videoCogVideoXlow VRAM +
Audio

Whisper Large v3 Turbo server

OpenAI Whisper Large v3 Turbo with faster-whisper backend behind a /transcribe HTTP API. 8× faster than v3, 99 langs, real-time on any GPU.

audiospeech-to-textAPI +
Audio

TTS server (Kokoro + CSM-1B)

Multi-model TTS endpoint serving Kokoro 82M (54 voices, 8 langs) and Sesame CSM-1B (conversational with context). REST + WebSocket streaming.

audiotext-to-speechAPI +
Notebooks & Dev

JupyterLab + PyTorch baseline

Always installed. PyTorch 2.5 + CUDA 12.4 + Transformers + diffusers + accelerate + bitsandbytes + xformers + flash-attn. The universal AI dev baseline.

notebookbaselinealways-on +
Notebooks & Dev

code-server (VSCode in browser)

VSCode running in your browser, full Python/IPython/extensions. For users who prefer IDE workflow over notebooks.

IDEVSCodedev +

Combine multiple stacks on the same GPU — the deploy script resolves dependency conflicts and assigns non-clashing ports.

Skip the download

Pre-downloaded open-weight models

Tick the models you need at order time and they're cached in /root/.cache/huggingface before you log in. 🔒 Gated models (Llama, Mistral, Gemma, FLUX-dev, SD 3.5) require your HuggingFace token (also asked at order time).

Model HuggingFace Size Min VRAM Min GPU tier Type
Llama 3.3 70B Instruct 🔒 Gated meta-llama/Llama-3.3-70B-Instruct 140 GB 160 GB GPU-L LLM
Qwen3 32B Qwen/Qwen3-32B 64 GB 80 GB GPU-L LLM
Qwen3 14B Qwen/Qwen3-14B 28 GB 32 GB GPU-S LLM
Qwen3 8B Qwen/Qwen3-8B 16 GB 20 GB GPU-S LLM
DeepSeek-R1 Distill Qwen 32B deepseek-ai/DeepSeek-R1-Distill-Qwen-32B 64 GB 80 GB GPU-S LLM
DeepSeek-R1 Distill Llama 70B deepseek-ai/DeepSeek-R1-Distill-Llama-70B 140 GB 160 GB GPU-S LLM
Mistral Small 3.2 24B (multimodal) mistralai/Mistral-Small-3.2-24B-Instruct-2506 48 GB 60 GB GPU-S LLM
Gemma 3 27B (multimodal) 🔒 Gated google/gemma-3-27b-it 54 GB 64 GB GPU-L LLM
Gemma 3 12B (multimodal) 🔒 Gated google/gemma-3-12b-it 24 GB 28 GB GPU-S LLM
Phi-4 (14B) microsoft/phi-4 28 GB 32 GB GPU-S LLM
Phi-4 Mini Instruct (3.8B) microsoft/Phi-4-mini-instruct 8 GB 10 GB GPU-S LLM
FLUX.1 [dev] 🔒 Gated black-forest-labs/FLUX.1-dev 24 GB 24 GB GPU-S Image
FLUX.1 [schnell] black-forest-labs/FLUX.1-schnell 24 GB 24 GB GPU-S Image
FLUX.1 Kontext [dev] (image editing) 🔒 Gated black-forest-labs/FLUX.1-Kontext-dev 24 GB 24 GB GPU-S Image
Stable Diffusion 3.5 Large 🔒 Gated stabilityai/stable-diffusion-3.5-large 16 GB 18 GB GPU-S Image
Stable Diffusion 3.5 Medium 🔒 Gated stabilityai/stable-diffusion-3.5-medium 5 GB 10 GB GPU-S Image
HiDream-I1 Full HiDream-ai/HiDream-I1-Full 34 GB 40 GB GPU-S Image
Wan 2.2 T2V A14B Wan-AI/Wan2.2-T2V-A14B 28 GB 40 GB GPU-S Video
Wan 2.1 T2V 1.3B (low VRAM) Wan-AI/Wan2.1-T2V-1.3B 3 GB 8 GB GPU-S Video
HunyuanVideo 1.5 (8.3B) tencent/HunyuanVideo-1.5 17 GB 24 GB GPU-S Video
LTX-Video 0.9.8 13B Lightricks/LTX-Video 26 GB 24 GB GPU-S Video
CogVideoX-5B zai-org/CogVideoX-5b 10 GB 16 GB GPU-S Video
Whisper Large v3 Turbo openai/whisper-large-v3-turbo 2 GB 4 GB GPU-S Audio
Whisper Large v3 openai/whisper-large-v3 3 GB 6 GB GPU-S Audio
Kokoro 82M (TTS) hexgrad/Kokoro-82M 1 GB 2 GB GPU-S Audio
Sesame CSM-1B (conversational TTS) sesame/csm-1b 2 GB 6 GB GPU-S Audio
Stable Audio Open 1.0 🔒 Gated stabilityai/stable-audio-open-1.0 3 GB 8 GB GPU-S Audio

Sizes are FP16 weights. For RTX 4090 (24 GB VRAM) on 70B models, the AWQ-quantized variant is auto-downloaded in parallel.

How we compare

ServPrivacy vs Vast.ai · RunPod · Paperspace · Lambda Labs · TensorDock

Crypto-only checkout, native Monero, token-only signup, pre-installed AI stacks, pre-downloaded HuggingFace models, encrypted HF tokens, auto Let's Encrypt endpoints, unlimited bandwidth and 100% renewable energy in Iceland — read the row labelled "ServPrivacy" and judge for yourself.

Feature ServPrivacy Vast.ai RunPod Paperspace Lambda TensorDock
Crypto-only checkout ✅ 14 chains⚠️ BTC⚠️ Gateway⚠️ BTC/ETH/USDT
Native Monero (XMR)
No KYC, no email signup ✅ Token-only⚠️ Email + ID for trust⚠️ Email + payment❌ Full KYC❌ Enterprise KYC⚠️ Email + light KYC
Pre-installed AI stacks ✅ 17 templates⚠️ Docker BYO✅ 100+⚠️ Notebooks only⚠️ Lambda Stack only⚠️ Docker BYO
Pre-downloaded models at order ✅ 27 models
HuggingFace token at order ✅ Encrypted, used once
SSH key at order ⚠️⚠️
Auto-shutdown timer ✅ 6h-7d⚠️ Spot only
Public HTTPS endpoint (Let's Encrypt) ✅ Auto⚠️ Manual✅ Pods⚠️ Manual
Unlimited bandwidth ⚠️ Per host⚠️ Capped⚠️ Capped⚠️ Capped⚠️ Per host
Renewable-energy datacenter ✅ Iceland 100% geo+hydro❌ Variable⚠️ US grid⚠️ US grid⚠️ US grid⚠️ Variable
Offshore jurisdiction ✅ IS / NL / RO / MD❌ Distributed P2P❌ US-centric❌ US❌ US-only⚠️ Multi-region
Sandbox dry-run mode ✅ ?dry_run=1⚠️ Trial credit⚠️ Limited⚠️ Free GPU tier
AI-agent / MCP first ✅ MCP + REST + x402⚠️ REST⚠️ REST⚠️ REST⚠️ REST⚠️ REST
Entry RTX 4090 / mo $249~$216 spot~$396 on-demandn/an/a~$252 spot

Comparison data sourced from competitors' public 2026-05 pricing pages and signup flows. ServPrivacy entry RTX 4090 = $249/mo Moldova; competitor "spot" prices are average rates for equivalent hardware.

FAQ

GPU AI Hosting FAQ

01 Is the GPU passed through with full hardware access, or is it shared / vGPU sliced?

Full hardware passthrough. You get the entire physical NVIDIA card with direct VRAM access — not a vGPU slice, not a time-shared MIG partition. nvidia-smi inside your VM shows the same numbers as the bare-metal host. Full driver access, full CUDA, full PyTorch / TensorFlow stack — no SR-IOV reservations.

02 Which CUDA / driver versions are preinstalled?

Default image: Ubuntu 22.04 + CUDA 12.4 + cuDNN 9 + NVIDIA driver 550. Other ready-to-go images: Ubuntu 24.04 + CUDA 12.6, Ubuntu 22 + PyTorch 2.5, Ubuntu 22 + ComfyUI + Flux, Ubuntu 22 + Ollama + Open WebUI. Vanilla Ubuntu / Debian / AlmaLinux / Rocky also offered if you want to install your own stack. You can switch driver versions at any time as full root.

03 Can I run my own AI startup's production inference on these GPUs?

Yes. Many of our GPU customers run public inference APIs on top of vLLM / TGI / FastAPI. The GPU servers come with full root, predictable monthly billing (no per-token surprises) and a fixed jurisdictional IP. Bandwidth is unlimited on every GPU plan, so you can serve high-traffic public endpoints without watching meters or paying overage fees.

04 Why is Russia excluded from GPU locations?

NVIDIA H100, A100 and high-end RTX cards (4090 and above) are subject to US Department of Commerce export controls (15 CFR Part 744) and EU dual-use regulations that prohibit shipment to Russian datacenters. We do not provision them in Russia to stay compliant with the controls that apply to our supply chain. If you need offshore Linux VPS or Dedicated in Russia, those product lines are unaffected.

05 Why is Iceland positioned as the premium GPU location?

Iceland datacenters run on 100% renewable geothermal and hydroelectric power, and the cold ambient temperature meaningfully reduces the cooling overhead on H100 boxes that draw 700W each under sustained load. The end result is the lowest-carbon offshore GPU compute on the market. The premium price covers the higher datacenter cost in Iceland and the cleaner energy sourcing — for ESG-conscious AI teams, this is the only credible offshore answer.

06 Can I use multiple GPUs in distributed training (DDP / FSDP)?

Yes — the GPU-XL tier is 2× H100 SXM5 with NVLink interconnect inside one box, ideal for FSDP / DeepSpeed Zero-3 / DDP on the same machine. For multi-node training you can rent multiple GPU-XL servers in the same datacenter and connect them over the 10 Gbps uplink. We do not yet ship 8× H100 cluster nodes — contact us if your training run needs more scale.

Deploy Your Offshore GPU Server

Pick your jurisdiction, pick your NVIDIA GPU, pay in any of 14 cryptos. Live JupyterLab in under 60 seconds. No KYC, no email, no phone — just a token.

View GPU Plans