Home / GPU AI Hosting

RTX 4090 · RTX 5090 · H100 SXM5 · CUDA 12

Offshore GPU AI Hosting

Name: Offshore GPU AI Hosting
Brand: ServPrivacy
Availability: InStock

NVIDIA RTX 4090, RTX 5090 and H100 SXM5 GPU servers for AI training, inference, image and video generation. CUDA 12 and cuDNN preinstalled, plus PyTorch / ComfyUI / Ollama image presets ready to ssh into. Threadripper Pro hosts on H100 tiers for full PCIe Gen 5 lane count. Available in 4 offshore jurisdictions, no KYC, crypto-only payment in 14 chains including Monero.

View GPU Plans Dedicated Servers

Every GPU Server Includes

NVIDIA hardware, full passthrough
CUDA 12 + cuDNN preinstalled
Threadripper / Ryzen 9 hosts
NVMe SSD & DDR5
60-second deploy
14 cryptos accepted including Monero

No KYC

Crypto Only

CUDA 12

NVMe SSD

Full Root

Monero accepted

All GPU Plans

GPU Plans by Location

Same NVIDIA hardware across every jurisdiction with unlimited bandwidth on every plan. Pricing varies by jurisdiction — Iceland is the lowest-carbon option, Moldova the cheapest.

Iceland Free Speech Haven

Plan	GPU	VRAM	CPU	RAM	NVMe	Bandwidth	Price
IS-S	1× NVIDIA RTX 4090	24 GB GDDR6X	12 vCPU	64 GB DDR5	1 TB NVMe	Unlimited	$299/mo	Order
IS-M Popular	1× NVIDIA RTX 5090	32 GB GDDR7	16 vCPU	96 GB DDR5	1.5 TB NVMe	Unlimited	$479/mo	Order
IS-L	1× NVIDIA H100 SXM5	80 GB HBM3	24 vCPU	192 GB DDR5	2 TB NVMe	Unlimited	$1849/mo	Order
IS-XL	2× NVIDIA H100 SXM5	160 GB HBM3	32 vCPU	384 GB DDR5	4 TB NVMe	Unlimited	$3499/mo	Order

Moldova Budget Offshore

Plan	GPU	VRAM	CPU	RAM	NVMe	Bandwidth	Price
MD-S	1× NVIDIA RTX 4090	24 GB GDDR6X	12 vCPU	64 GB DDR5	1 TB NVMe	Unlimited	$249/mo	Order
MD-M Popular	1× NVIDIA RTX 5090	32 GB GDDR7	16 vCPU	96 GB DDR5	1.5 TB NVMe	Unlimited	$399/mo	Order
MD-L	1× NVIDIA H100 SXM5	80 GB HBM3	24 vCPU	192 GB DDR5	2 TB NVMe	Unlimited	$1699/mo	Order
MD-XL	2× NVIDIA H100 SXM5	160 GB HBM3	32 vCPU	384 GB DDR5	4 TB NVMe	Unlimited	$3199/mo	Order

Romania Anti-Retention

Plan	GPU	VRAM	CPU	RAM	NVMe	Bandwidth	Price
RO-S	1× NVIDIA RTX 4090	24 GB GDDR6X	12 vCPU	64 GB DDR5	1 TB NVMe	Unlimited	$269/mo	Order
RO-M Popular	1× NVIDIA RTX 5090	32 GB GDDR7	16 vCPU	96 GB DDR5	1.5 TB NVMe	Unlimited	$429/mo	Order
RO-L	1× NVIDIA H100 SXM5	80 GB HBM3	24 vCPU	192 GB DDR5	2 TB NVMe	Unlimited	$1749/mo	Order
RO-XL	2× NVIDIA H100 SXM5	160 GB HBM3	32 vCPU	384 GB DDR5	4 TB NVMe	Unlimited	$3299/mo	Order

Netherlands Best Peering

Plan	GPU	VRAM	CPU	RAM	NVMe	Bandwidth	Price
NL-S	1× NVIDIA RTX 4090	24 GB GDDR6X	12 vCPU	64 GB DDR5	1 TB NVMe	Unlimited	$279/mo	Order
NL-M Popular	1× NVIDIA RTX 5090	32 GB GDDR7	16 vCPU	96 GB DDR5	1.5 TB NVMe	Unlimited	$449/mo	Order
NL-L	1× NVIDIA H100 SXM5	80 GB HBM3	24 vCPU	192 GB DDR5	2 TB NVMe	Unlimited	$1799/mo	Order
NL-XL	2× NVIDIA H100 SXM5	160 GB HBM3	32 vCPU	384 GB DDR5	4 TB NVMe	Unlimited	$3399/mo	Order

GPU hosting is available in 4 jurisdictions at launch (Iceland, Netherlands, Romania, Moldova). Russia is excluded due to NVIDIA export sanctions; Switzerland and Panama are kept Linux-only for now.

Included on Every GPU Server

CUDA 12

CUDA 12.4/12.6 + cuDNN preinstalled. Boot, ssh in, run nvidia-smi.

60-second deploy

From paid order to nvidia-smi output in under 60 seconds.

NVMe SSD

Up to 4 TB NVMe SSD, paired with DDR5 RAM for fast dataset I/O.

SSH + Jupyter

Full root SSH, plus pre-bound JupyterLab on port 8888 with token auth.

Use cases

What GPU AI Hosting is Used For

LLM finetuning & inference

Llama, Mistral, Qwen, DeepSeek finetuning with LoRA / QLoRA / full FT on H100. Or self-hosted inference with vLLM / TGI / Ollama for production model serving.

Image generation

Stable Diffusion, FLUX.1, SDXL with ComfyUI or Forge. Train your own LoRA, batch-generate at scale, or self-host an inference endpoint.

AI video generation

OpenSora, CogVideoX, Wan-2.1, AnimateDiff. Video generation needs serious VRAM — start at RTX 5090 (32 GB) or H100 (80 GB).

Production inference

Deploy fine-tuned models behind your own API. Predictable costs, no per-token fees, no data leaving your jurisdiction. JupyterLab + FastAPI included.

1-click deploy

Pre-installed AI templates

Tick any of these at order time and your GPU server boots with the stack already installed, configured and started via systemd. Add pre-downloaded models below to skip the HuggingFace 30-60 min download too.

LLM Inference

vLLM (OpenAI-compatible)

Production-grade LLM serving with continuous batching and paged attention. Exposes an /v1/completions endpoint compatible with the OpenAI SDK.

LLMOpenAI APIproduction +

LLM Inference

Ollama + Open WebUI

Self-hosted ChatGPT-style web UI. Pulls Ollama-native quantized weights; easiest path to "talk to my LLM in a browser".

LLMchat UIbeginner +

LLM Inference

text-generation-webui (Oobabooga)

Gradio UI with broad backend support — Transformers, ExLlamaV2, llama.cpp, AWQ, GPTQ. Power-user choice for benchmarking quantizations.

LLMmulti-backendpower user +

LLM Inference

HuggingFace TGI

HuggingFace Text Generation Inference — production server with token streaming, tensor parallelism, paged attention.

LLMproductionHuggingFace +

Finetuning

Axolotl (LLM finetuning)

YAML-config driven finetuning. Supports LoRA, QLoRA, full FT, DPO, ORPO. Pre-cloned to /opt/axolotl with starter configs for Llama / Qwen / Mistral.

finetuneLoRAQLoRA +

Finetuning

Unsloth (2× faster finetune)

2× faster + 70% less VRAM finetuning via custom Triton kernels. Ideal for budget runs on RTX 4090. Pre-installed in /opt/unsloth.

finetunefastlow VRAM +

Finetuning

LLaMA-Factory

WebUI-driven finetuning platform. SFT / RLHF / DPO / KTO. Good entry point for non-coders who want to finetune on a UI.

finetuneGUI +

Image Generation

ComfyUI + FLUX.1

Node-graph image-gen interface, ships with FLUX.1-schnell + Kontext workflows. Power-user image generation pipeline.

imageFLUXworkflow +

Image Generation

Automatic1111 + SD 3.5

The mainstream Stable Diffusion WebUI. Stable Diffusion 3.5 + extensions ecosystem. Familiar UI for users coming from civitai.

imageSD 3.5 +

Image Generation

Forge (faster A1111)

A1111 fork optimized for FLUX, faster sampling, lower VRAM. Drop-in replacement for users coming from Auto1111.

imageFLUXfast +

Image Generation

Kohya SS (LoRA training)

GUI for training Stable Diffusion / FLUX LoRA, DreamBooth, textual inversion. Trains a custom-style LoRA on RTX 4090 in 30-90 min.

LoRA trainingimageGUI +

AI Video

ComfyUI + Wan 2.2 / HunyuanVideo

ComfyUI with video-gen workflows preloaded — Wan 2.2 T2V, HunyuanVideo, LTX-Video. Needs 40+ GB VRAM for usable speed at 720p.

videoWan 2.2HunyuanVideo +

AI Video

ComfyUI Video Lite (CogVideoX / LTX)

Lightweight video workflows — CogVideoX-5B, Wan 2.1 1.3B, LTX-Video. Runs on a single RTX 4090.

videoCogVideoXlow VRAM +

Audio

Whisper Large v3 Turbo server

OpenAI Whisper Large v3 Turbo with faster-whisper backend behind a /transcribe HTTP API. 8× faster than v3, 99 langs, real-time on any GPU.

audiospeech-to-textAPI +

Audio

TTS server (Kokoro + CSM-1B)

Multi-model TTS endpoint serving Kokoro 82M (54 voices, 8 langs) and Sesame CSM-1B (conversational with context). REST + WebSocket streaming.

audiotext-to-speechAPI +

Notebooks & Dev

JupyterLab + PyTorch baseline

Always installed. PyTorch 2.5 + CUDA 12.4 + Transformers + diffusers + accelerate + bitsandbytes + xformers + flash-attn. The universal AI dev baseline.

notebookbaselinealways-on +

Notebooks & Dev

code-server (VSCode in browser)

VSCode running in your browser, full Python/IPython/extensions. For users who prefer IDE workflow over notebooks.

IDEVSCodedev +

Combine multiple stacks on the same GPU — the deploy script resolves dependency conflicts and assigns non-clashing ports.

Skip the download

Pre-downloaded open-weight models

Tick the models you need at order time and they're cached in /root/.cache/huggingface before you log in. 🔒 Gated models (Llama, Mistral, Gemma, FLUX-dev, SD 3.5) require your HuggingFace token (also asked at order time).

Model	HuggingFace	Size	Min VRAM	Min GPU tier	Type
Llama 3.3 70B Instruct 🔒 Gated	`meta-llama/Llama-3.3-70B-Instruct`	140 GB	160 GB	GPU-L	LLM
Qwen3 32B	`Qwen/Qwen3-32B`	64 GB	80 GB	GPU-L	LLM
Qwen3 14B	`Qwen/Qwen3-14B`	28 GB	32 GB	GPU-S	LLM
Qwen3 8B	`Qwen/Qwen3-8B`	16 GB	20 GB	GPU-S	LLM
DeepSeek-R1 Distill Qwen 32B	`deepseek-ai/DeepSeek-R1-Distill-Qwen-32B`	64 GB	80 GB	GPU-S	LLM
DeepSeek-R1 Distill Llama 70B	`deepseek-ai/DeepSeek-R1-Distill-Llama-70B`	140 GB	160 GB	GPU-S	LLM
Mistral Small 3.2 24B (multimodal)	`mistralai/Mistral-Small-3.2-24B-Instruct-2506`	48 GB	60 GB	GPU-S	LLM
Gemma 3 27B (multimodal) 🔒 Gated	`google/gemma-3-27b-it`	54 GB	64 GB	GPU-L	LLM
Gemma 3 12B (multimodal) 🔒 Gated	`google/gemma-3-12b-it`	24 GB	28 GB	GPU-S	LLM
Phi-4 (14B)	`microsoft/phi-4`	28 GB	32 GB	GPU-S	LLM
Phi-4 Mini Instruct (3.8B)	`microsoft/Phi-4-mini-instruct`	8 GB	10 GB	GPU-S	LLM
FLUX.1 [dev] 🔒 Gated	`black-forest-labs/FLUX.1-dev`	24 GB	24 GB	GPU-S	Image
FLUX.1 [schnell]	`black-forest-labs/FLUX.1-schnell`	24 GB	24 GB	GPU-S	Image
FLUX.1 Kontext [dev] (image editing) 🔒 Gated	`black-forest-labs/FLUX.1-Kontext-dev`	24 GB	24 GB	GPU-S	Image
Stable Diffusion 3.5 Large 🔒 Gated	`stabilityai/stable-diffusion-3.5-large`	16 GB	18 GB	GPU-S	Image
Stable Diffusion 3.5 Medium 🔒 Gated	`stabilityai/stable-diffusion-3.5-medium`	5 GB	10 GB	GPU-S	Image
HiDream-I1 Full	`HiDream-ai/HiDream-I1-Full`	34 GB	40 GB	GPU-S	Image
Wan 2.2 T2V A14B	`Wan-AI/Wan2.2-T2V-A14B`	28 GB	40 GB	GPU-S	Video
Wan 2.1 T2V 1.3B (low VRAM)	`Wan-AI/Wan2.1-T2V-1.3B`	3 GB	8 GB	GPU-S	Video
HunyuanVideo 1.5 (8.3B)	`tencent/HunyuanVideo-1.5`	17 GB	24 GB	GPU-S	Video
LTX-Video 0.9.8 13B	`Lightricks/LTX-Video`	26 GB	24 GB	GPU-S	Video
CogVideoX-5B	`zai-org/CogVideoX-5b`	10 GB	16 GB	GPU-S	Video
Whisper Large v3 Turbo	`openai/whisper-large-v3-turbo`	2 GB	4 GB	GPU-S	Audio
Whisper Large v3	`openai/whisper-large-v3`	3 GB	6 GB	GPU-S	Audio
Kokoro 82M (TTS)	`hexgrad/Kokoro-82M`	1 GB	2 GB	GPU-S	Audio
Sesame CSM-1B (conversational TTS)	`sesame/csm-1b`	2 GB	6 GB	GPU-S	Audio
Stable Audio Open 1.0 🔒 Gated	`stabilityai/stable-audio-open-1.0`	3 GB	8 GB	GPU-S	Audio

Sizes are FP16 weights. For RTX 4090 (24 GB VRAM) on 70B models, the AWQ-quantized variant is auto-downloaded in parallel.

How we compare

ServPrivacy vs Vast.ai · RunPod · Paperspace · Lambda Labs · TensorDock

Crypto-only checkout, native Monero, token-only signup, pre-installed AI stacks, pre-downloaded HuggingFace models, encrypted HF tokens, auto Let's Encrypt endpoints, unlimited bandwidth and 100% renewable energy in Iceland — read the row labelled "ServPrivacy" and judge for yourself.

Feature	ServPrivacy	Vast.ai	RunPod	Paperspace	Lambda	TensorDock
Crypto-only checkout	✅ 14 chains	⚠️ BTC	⚠️ Gateway	❌	❌	⚠️ BTC/ETH/USDT
Native Monero (XMR)	✅	❌	❌	❌	❌	❌
No KYC, no email signup	✅ Token-only	⚠️ Email + ID for trust	⚠️ Email + payment	❌ Full KYC	❌ Enterprise KYC	⚠️ Email + light KYC
Pre-installed AI stacks	✅ 17 templates	⚠️ Docker BYO	✅ 100+	⚠️ Notebooks only	⚠️ Lambda Stack only	⚠️ Docker BYO
Pre-downloaded models at order	✅ 27 models	❌	❌	❌	❌	❌
HuggingFace token at order	✅ Encrypted, used once	❌	❌	❌	❌	❌
SSH key at order	✅	✅	✅	⚠️	✅	⚠️
Auto-shutdown timer	✅ 6h-7d	✅	⚠️ Spot only	❌	❌	❌
Public HTTPS endpoint (Let's Encrypt)	✅ Auto	⚠️ Manual	✅ Pods	✅	❌	⚠️ Manual
Unlimited bandwidth	✅	⚠️ Per host	⚠️ Capped	⚠️ Capped	⚠️ Capped	⚠️ Per host
Renewable-energy datacenter	✅ Iceland 100% geo+hydro	❌ Variable	⚠️ US grid	⚠️ US grid	⚠️ US grid	⚠️ Variable
Offshore jurisdiction	✅ IS / NL / RO / MD	❌ Distributed P2P	❌ US-centric	❌ US	❌ US-only	⚠️ Multi-region
Sandbox dry-run mode	✅ ?dry_run=1	⚠️ Trial credit	⚠️ Limited	⚠️ Free GPU tier	❌	❌
AI-agent / MCP first	✅ MCP + REST + x402	⚠️ REST	⚠️ REST	⚠️ REST	⚠️ REST	⚠️ REST
Entry RTX 4090 / mo	$249	~$216 spot	~$396 on-demand	n/a	n/a	~$252 spot

Comparison data sourced from competitors' public 2026-05 pricing pages and signup flows. ServPrivacy entry RTX 4090 = $249/mo Moldova; competitor "spot" prices are average rates for equivalent hardware.

FAQ

GPU AI Hosting FAQ

01 Is the GPU passed through with full hardware access, or is it shared / vGPU sliced?

Full hardware passthrough. You get the entire physical NVIDIA card with direct VRAM access — not a vGPU slice, not a time-shared MIG partition. nvidia-smi inside your VM shows the same numbers as the bare-metal host. Full driver access, full CUDA, full PyTorch / TensorFlow stack — no SR-IOV reservations.

02 Which CUDA / driver versions are preinstalled?

Default image: Ubuntu 22.04 + CUDA 12.4 + cuDNN 9 + NVIDIA driver 550. Other ready-to-go images: Ubuntu 24.04 + CUDA 12.6, Ubuntu 22 + PyTorch 2.5, Ubuntu 22 + ComfyUI + Flux, Ubuntu 22 + Ollama + Open WebUI. Vanilla Ubuntu / Debian / AlmaLinux / Rocky also offered if you want to install your own stack. You can switch driver versions at any time as full root.

03 Can I run my own AI startup's production inference on these GPUs?

Yes. Many of our GPU customers run public inference APIs on top of vLLM / TGI / FastAPI. The GPU servers come with full root, predictable monthly billing (no per-token surprises) and a fixed jurisdictional IP. Bandwidth is unlimited on every GPU plan, so you can serve high-traffic public endpoints without watching meters or paying overage fees.

04 Why is Russia excluded from GPU locations?

NVIDIA H100, A100 and high-end RTX cards (4090 and above) are subject to US Department of Commerce export controls (15 CFR Part 744) and EU dual-use regulations that prohibit shipment to Russian datacenters. We do not provision them in Russia to stay compliant with the controls that apply to our supply chain. If you need offshore Linux VPS or Dedicated in Russia, those product lines are unaffected.

05 Why is Iceland positioned as the premium GPU location?

Iceland datacenters run on 100% renewable geothermal and hydroelectric power, and the cold ambient temperature meaningfully reduces the cooling overhead on H100 boxes that draw 700W each under sustained load. The end result is the lowest-carbon offshore GPU compute on the market. The premium price covers the higher datacenter cost in Iceland and the cleaner energy sourcing — for ESG-conscious AI teams, this is the only credible offshore answer.

06 Can I use multiple GPUs in distributed training (DDP / FSDP)?

Yes — the GPU-XL tier is 2× H100 SXM5 with NVLink interconnect inside one box, ideal for FSDP / DeepSpeed Zero-3 / DDP on the same machine. For multi-node training you can rent multiple GPU-XL servers in the same datacenter and connect them over the 10 Gbps uplink. We do not yet ship 8× H100 cluster nodes — contact us if your training run needs more scale.

Deploy Your Offshore GPU Server

Pick your jurisdiction, pick your NVIDIA GPU, pay in any of 14 cryptos. Live JupyterLab in under 60 seconds. No KYC, no email, no phone — just a token.

View GPU Plans

Welcome back

Offshore GPU AI Hosting

Every GPU Server Includes

GPU Plans by Location

Iceland Free Speech Haven

Moldova Budget Offshore

Romania Anti-Retention

Netherlands Best Peering

Included on Every GPU Server

CUDA 12

60-second deploy

NVMe SSD

SSH + Jupyter

What GPU AI Hosting is Used For

LLM finetuning & inference

Image generation

AI video generation

Production inference

Pre-installed AI templates

vLLM (OpenAI-compatible)

Ollama + Open WebUI

text-generation-webui (Oobabooga)

HuggingFace TGI

Axolotl (LLM finetuning)

Unsloth (2× faster finetune)

LLaMA-Factory

ComfyUI + FLUX.1

Automatic1111 + SD 3.5

Forge (faster A1111)

Kohya SS (LoRA training)

ComfyUI + Wan 2.2 / HunyuanVideo

ComfyUI Video Lite (CogVideoX / LTX)

Whisper Large v3 Turbo server

TTS server (Kokoro + CSM-1B)

JupyterLab + PyTorch baseline

code-server (VSCode in browser)

Pre-downloaded open-weight models

ServPrivacy vs Vast.ai · RunPod · Paperspace · Lambda Labs · TensorDock

GPU AI Hosting FAQ

Deploy Your Offshore GPU Server