Local AI Voice Assistant with Home Assistant, Ollama, and Nvidia GPU on Proxmox
Local AI Voice Assistant with Home Assistant, Ollama, and Nvidia GPU on Proxmox
Fully local AI-powered voice assistant using Home Assistant Voice Preview Edition, Ollama running on a GTX 1080 Ti in a Proxmox LXC, with weather forecast capabilities. No cloud, no ongoing costs.
Architecture Overview
+---------------------+ speech (Wyoming) +---------------------+ inference +---------------------+
| | | | | |
| HA Voice PE | -------------------------> | Home Assistant | -----------------> | Ollama LXC |
| (ESP32 device) | Whisper STT / Piper TTS | Assist Pipeline | HTTP :11434 | GTX 1080 Ti |
| | | | | qwen3:4b-instruct |
+---------------------+ +---------------------+ +---------------------+
|
| tool calls
+-------+-------+
| |
+------+------+ +------+------+
| | | |
| llm_intents | | Open-Meteo |
| Brave Search| | Weather |
+-------------+ +-------------+
Background
Home Assistant’s built-in Assist handles simple home control commands well, but falls apart on anything conversational — general knowledge questions, weather forecasts, or anything requiring real-world context. The solution is to route unrecognized commands to a locally-hosted LLM via Ollama, keeping everything on-premises with no data leaving the network.
The challenge with Proxmox is getting Nvidia GPU passthrough working in an LXC container. Unlike a full VM passthrough, LXC device passthrough lets the GPU be shared with other containers (Jellyfin, Frigate etc.) while still giving Ollama near-native CUDA performance. The driver version on the host and inside the LXC must match exactly — this is the most common failure point and the one that requires the most troubleshooting.
Stack
| Component | Role |
|---|---|
| Home Assistant Voice Preview Edition | Microphone / speaker / wake word |
| Faster Whisper (HA add-on) | Local speech-to-text |
| Piper (HA add-on) | Local text-to-speech |
| Ollama (Proxmox LXC) | Local LLM inference server |
| qwen3:4b-instruct | Conversation model — fast, tool-capable |
| GTX 1080 Ti (11GB VRAM) | GPU acceleration |
| llm_intents + Brave Search API | Web search tool for Ollama |
| Open-Meteo | Free weather integration, no API key |
Implementation
01 — Install Ollama on Proxmox via Community Script
Run from the Proxmox host shell. Review the script source before executing — it runs as root.
1bash -c "$(curl -fsSL https://raw.githubusercontent.com/community-scripts/ProxmoxVE/main/ct/ollama.sh)"
When prompted, select a privileged container. Nvidia GPU passthrough requires privileged mode due to device node permissions — unprivileged containers remap UIDs/GIDs in a way that breaks CUDA device access. Allocate at minimum 4 cores, 8GB RAM, and 40GB disk (models are large).
02 — Install Nvidia Drivers on the Proxmox Host
Add non-free repositories and install the driver:
1# Add non-free to all three repo lines
2nano /etc/apt/sources.list
3# Append: non-free non-free-firmware to each deb line
4
5apt update
6apt install -y pve-headers-$(uname -r) nvidia-driver nvidia-modprobe
Blacklist nouveau to prevent it from claiming the GPU before the Nvidia driver loads:
1echo "blacklist nouveau" > /etc/modprobe.d/blacklist-nouveau.conf
2echo "options nouveau modeset=0" >> /etc/modprobe.d/blacklist-nouveau.conf
3update-initramfs -u
4reboot
If the GPU was previously configured for VM passthrough, vfio-pci will have claimed it. Check and remove any vfio bindings:
1# Check which driver is bound
2lspci -k | grep -A3 "09:00.0"
3
4# If Kernel driver in use: vfio-pci, remove the binding
5# Comment out vfio ids line in /etc/modprobe.d/vfio.conf
6# Remove vfio entries from /etc/modules
7# Remove blacklist nvidia from /etc/modprobe.d/blacklist.conf and pve-blacklist.conf
8update-initramfs -u
9reboot
03 — Driver Version Matching
The host and LXC must run the exact same Nvidia driver version. The community script installs the LXC with Ubuntu 24.04 (noble), which may pull a newer driver version than what Debian’s repos offer for the host.
If there is a mismatch (Failed to initialize NVML: Driver/library version mismatch), the cleanest fix is to install the matching runfile driver directly from Nvidia on the host:
1# Remove apt-managed driver first
2apt remove --purge nvidia* -y
3apt autoremove -y
4
5# Install kernel headers
6apt install -y pve-headers-$(uname -r)
7
8# Download and install matching runfile (replace version as needed)
9wget https://us.download.nvidia.com/XFree86/Linux-x86_64/535.288.01/NVIDIA-Linux-x86_64-535.288.01.run
10sh NVIDIA-Linux-x86_64-535.288.01.run --no-questions --ui=none
11reboot
Verify both sides match after reboot:
1# Host
2nvidia-smi | grep "Driver Version"
3
4# Inside LXC
5pct enter <CTID>
6nvidia-smi | grep "Driver Version"
04 — Create Nvidia Device Nodes and Persist Them
The /dev/nvidia* device nodes are not created automatically at boot without a display. Create a systemd service to initialize them before the LXC starts:
1nano /etc/systemd/system/nvidia-modprobe.service
1[Unit]
2Description=NVIDIA modprobe
3After=multi-user.target
4Before=pve-guests.service
5
6[Service]
7Type=oneshot
8RemainAfterExit=yes
9ExecStart=/usr/bin/nvidia-modprobe -u -c=0
10ExecStart=/bin/mkdir -p /dev/nvidia-caps
11ExecStart=/bin/bash -c 'mknod /dev/nvidia-caps/nvidia-cap1 c 236 1 2>/dev/null || true'
12ExecStart=/bin/bash -c 'mknod /dev/nvidia-caps/nvidia-cap2 c 236 2 2>/dev/null || true'
13ExecStart=/bin/chmod 666 /dev/nvidia-caps/nvidia-cap1
14ExecStart=/bin/chmod 666 /dev/nvidia-caps/nvidia-cap2
15
16[Install]
17WantedBy=multi-user.target
1systemctl daemon-reload
2systemctl enable nvidia-modprobe.service
3systemctl start nvidia-modprobe.service
05 — LXC Configuration
The community script automatically adds the correct device passthrough entries to the LXC config using Proxmox’s dev syntax. Verify /etc/pve/lxc/<CTID>.conf contains:
unprivileged: 0
dev0: /dev/nvidia0,gid=44
dev1: /dev/nvidiactl,gid=44
dev2: /dev/nvidia-uvm,gid=44
dev3: /dev/nvidia-uvm-tools,gid=44
dev4: /dev/nvidia-caps/nvidia-cap1,gid=44
dev5: /dev/nvidia-caps/nvidia-cap2,gid=44
Note: nvidia-modeset is not required for Ollama inference and can be omitted if the device node is not present (common with runfile-installed drivers on headless systems).
06 — Pull a Model and Verify GPU Acceleration
Inside the LXC:
1ollama pull qwen3:4b-instruct
2ollama run qwen3:4b-instruct "how many quarts in a gallon"
While it runs, monitor GPU usage on the host:
1watch -n 1 nvidia-smi
Memory-Usage should jump to ~3GB and GPU-Util should spike during generation. Response time for short answers should be 1–3 seconds.
07 — Connect Ollama to Home Assistant
In Home Assistant, go to Settings → Devices & Services → Add Integration → Ollama and enter the LXC’s IP:
http://<LXC_IP>:11434
Configure the integration:
| Setting | Value |
|---|---|
| Model | qwen3:4b-instruct |
| Keep alive | 300 (seconds) |
| Context window | 8192 |
| Control Home Assistant | Off (initially) |
| Think before responding | Off |
Set the system prompt to something voice-optimized:
You are a voice assistant for Home Assistant.
Answer questions about the world truthfully.
Answer in plain text only. No markdown, no bullet points, no asterisks.
Keep answers short and conversational, as your response will be spoken aloud.
The current time is {{ now() }}.
The location is [Your City], [Your State].
When asked about sensors or devices, always use available tools to look up
current state before answering. Never guess the state of a device or sensor.
08 — Configure Voice Pipeline
In Settings → Voice Assistants, set the conversation agent to Ollama, speech-to-text to Faster Whisper, and text-to-speech to Piper. Assign the assistant to the Voice Preview Edition device.
09 — Web Search via llm_intents
Install via HACS by adding the custom repository https://github.com/skye-harris/llm_intents. After installing and restarting HA, add the integration and configure it with a Brave Search API key (free tier: 1,000 searches/month). Enable the search tool in the Ollama integration’s tool settings. The model must support tool use — qwen3:4b-instruct does.
10 — Weather Forecasts
Add the Open-Meteo integration (no API key required). Expose the weather.home entity via Settings → Voice Assistants → Expose. For richer forecast responses, the weather entity state and forecast data can be passed to Ollama via automations using the weather.get_forecasts action.
Limitations
Nvidia GPU passthrough in an LXC is more fragile than a full VM passthrough. The host and container driver versions must match exactly, and this breaks silently on driver updates — apt upgrade inside the LXC can pull a newer nvidia-utils version that mismatches the host, causing Ollama to silently fall back to CPU. Pinning the nvidia packages in the LXC with apt-mark hold prevents this.
The runfile-installed driver on the host is not managed by apt, meaning Proxmox kernel updates may require reinstalling the driver manually if the kernel module no longer matches the running kernel.
The 4b model is fast but makes more mistakes than larger models on complex reasoning or ambiguous entity lookups. Keeping the exposed entity list lean and using natural aliases improves reliability significantly.
Outcomes
| Inference | GPU-accelerated, 1–3 second response time for short answers |
| Privacy | Fully local — no data leaves the network |
| Cost | $5/month Brave Search API free tier, everything else self-hosted |
| Model | qwen3:4b-instruct — fast, tool-capable, fits in 11GB VRAM |
| GPU | GTX 1080 Ti, ~3GB VRAM used during inference |
| Voice hardware | Home Assistant Voice Preview Edition |