Your AI lives on
a USB stick
Plug in. Chat locally. Unplug. Zero footprint.
No install. No cloud. No SSD space wasted.
# plug in your USB, run one command
$ ./launch.sh
PocketLLM v0.1.0
OS: Darwin (arm64) · USB: /Volumes/MY_USB
✓ Ollama server → port 11434
✓ Web UI → localhost:8080
✓ Model loaded: gemma4:e2b
>>> What is quantum computing?
Quantum computing uses qubits that can exist in superposition…
Process
3 Steps. Zero Installs.
Plug In
Insert the USB into any Mac or Linux machine. Runtime, models, and chat UI are all on the drive.
Launch
Run ./launch.sh — Ollama starts, loads your model into RAM, and opens the chat UI.
Unplug
Ctrl+C to stop. Pull the USB. Nothing installed, cached, or saved on the host.
┌─────────────────────────────────────────────┐
│ USB DRIVE │
│ │
│ bin/ models/ context/ │
│ ollama GGUF weights history.db │
│ + GPU libs (your models) (encrypted) │
│ │
│ ui/ config/ launch.sh │
│ web chat settings (entry point)│
└──────────────────┬──────────────────────────┘
│ USB 3.x
▼
┌─────────────────────────────────────────────┐
│ HOST MACHINE (borrowed) │
│ │
│ CPU/GPU runs inference │
│ RAM loads model weights via mmap │
│ Browser opens localhost:8080 │
│ │
│ Nothing installed. Nothing saved to disk. │
└─────────────────────────────────────────────┘Why
The Problem with Local LLMs
They eat your disk, chain you to one machine, and leave traces everywhere. PocketLLM fixes all of it.
Truly Private
No data leaves the machine. No cloud. No API keys. Unplug and every trace vanishes.
Save Your SSD
5–50 GB of model weights live on the USB. Your laptop’s drive stays clean.
Zero Install
No Ollama, no brew, no Docker needed. The USB bundles the full runtime and GPU libraries.
Machine to Machine
Carry AI between laptop, desktop, and office machine. Same models, same history, one USB.
Performance
USB = SSD Speed
After the one-time model load, inference is identical. Tested on MacBook Pro M4, 16 GB RAM.
Cold Start — Model Load
One-time cost on first chat
| Model | SSD | USB |
|---|---|---|
| gemma4:e2b | 7.0s | 47.7s |
| llama3.1:8b | 11.0s | 29.8s |
Slower on USB \u2014 but only happens once. Model stays in RAM after.
Inference — After Load
Where it matters
| Model | SSD | USB |
|---|---|---|
| gemma4:e2b | 53.9 tok/s | 54.0 tok/s |
| llama3.1:8b | 21.2 tok/s | 21.4 tok/s |
Identical. Zero difference.
The only bottleneck is initial model load. With a USB 3.2 SSD enclosure, even that gap shrinks. After loading, the real limit is your machine’s RAM.
Features
Everything on the Drive
1 USB. Full AI workspace.
Any Ollama Model
Gemma, Llama, Mistral, Phi — if Ollama runs it, PocketLLM runs it.
Web Chat UI
Markdown, code blocks, streaming, history. All in-browser.
Terminal Chat
Interactive CLI chat on launch. Like your own Claude Code.
Chat History
Conversations saved to USB. Carry history between machines.
Skills System
Extend UI with drop-in JavaScript plugins.
1-Command Launch
./launch.sh — auto-detects OS, models, starts server, opens browser.
Models
Pick Your Model
Any Ollama-compatible model works.
| Model | Disk | RAM |
|---|---|---|
| Phi-3 mini (3.8B) | ~2.3 GB | ~3 GB |
| Mistral 7B | ~4.1 GB | ~5 GB |
| Llama 3.1 8B | ~4.7 GB | ~6 GB |
| Gemma 4 e2brecommended | ~7.2 GB | ~9 GB |
Compatibility
Platform Support
1 USB drive, multiple machines.
macOS
Intel + Apple Silicon
SupportedLinux
Ubuntu 20+, x86_64
SupportedWindows
x86_64
Coming SoonRequirements
What You Need
PocketLLM brings the software. Your machine brings the compute.
Minimum
- USB 3.0 drive, 64 GB
- 8 GB RAM (small models)
- macOS 12+ or Ubuntu 20+
Recommended
- USB 3.2 Gen 2, 128 GB+
- 16 GB+ RAM (7B+ models)
- macOS with Apple Silicon
Ready to carry your AI in your pocket?
Open source. Free forever. 5 minutes to set up.