Open Source · MIT License

Your AI lives on
a USB stick

Plug in. Chat locally. Unplug. Zero footprint.
No install. No cloud. No SSD space wasted.

Get Started See How It Works

# plug in your USB, run one command

$ ./launch.sh

PocketLLM v0.1.0

OS: Darwin (arm64) · USB: /Volumes/MY_USB

✓ Ollama server → port 11434

✓ Web UI → localhost:8080

✓ Model loaded: gemma4:e2b

>>> What is quantum computing?

Quantum computing uses qubits that can exist in superposition…

Process

3 Steps. Zero Installs.

Plug In

Insert the USB into any Mac or Linux machine. Runtime, models, and chat UI are all on the drive.

Launch

Run ./launch.sh — Ollama starts, loads your model into RAM, and opens the chat UI.

Unplug

Ctrl+C to stop. Pull the USB. Nothing installed, cached, or saved on the host.

┌─────────────────────────────────────────────┐
│                 USB DRIVE                    │
│                                              │
│  bin/           models/         context/     │
│  ollama         GGUF weights    history.db   │
│  + GPU libs     (your models)   (encrypted)  │
│                                              │
│  ui/            config/         launch.sh    │
│  web chat       settings        (entry point)│
└──────────────────┬──────────────────────────┘
                   │ USB 3.x
                   ▼
┌─────────────────────────────────────────────┐
│           HOST MACHINE (borrowed)            │
│                                              │
│  CPU/GPU runs inference                      │
│  RAM loads model weights via mmap            │
│  Browser opens localhost:8080                │
│                                              │
│  Nothing installed. Nothing saved to disk.   │
└─────────────────────────────────────────────┘

Why

The Problem with Local LLMs

They eat your disk, chain you to one machine, and leave traces everywhere. PocketLLM fixes all of it.

Truly Private

No data leaves the machine. No cloud. No API keys. Unplug and every trace vanishes.

Save Your SSD

5–50 GB of model weights live on the USB. Your laptop’s drive stays clean.

Zero Install

No Ollama, no brew, no Docker needed. The USB bundles the full runtime and GPU libraries.

Machine to Machine

Carry AI between laptop, desktop, and office machine. Same models, same history, one USB.

Performance

USB = SSD Speed

After the one-time model load, inference is identical. Tested on MacBook Pro M4, 16 GB RAM.

Cold Start — Model Load

One-time cost on first chat

Model	SSD	USB
gemma4:e2b	7.0s	47.7s
llama3.1:8b	11.0s	29.8s

Slower on USB \u2014 but only happens once. Model stays in RAM after.

Inference — After Load

Where it matters

Model	SSD	USB
gemma4:e2b	53.9 tok/s	54.0 tok/s
llama3.1:8b	21.2 tok/s	21.4 tok/s

Identical. Zero difference.

The only bottleneck is initial model load. With a USB 3.2 SSD enclosure, even that gap shrinks. After loading, the real limit is your machine’s RAM.

Features

Everything on the Drive

1 USB. Full AI workspace.

Any Ollama Model

Gemma, Llama, Mistral, Phi — if Ollama runs it, PocketLLM runs it.

Web Chat UI

Markdown, code blocks, streaming, history. All in-browser.

Terminal Chat

Interactive CLI chat on launch. Like your own Claude Code.

Chat History

Conversations saved to USB. Carry history between machines.

Skills System

Extend UI with drop-in JavaScript plugins.

1-Command Launch

./launch.sh — auto-detects OS, models, starts server, opens browser.

Models

Pick Your Model

Any Ollama-compatible model works.

Model	Disk	RAM	Best For
Phi-3 mini (3.8B)	~2.3 GB	~3 GB	Low-RAM machines
Mistral 7B	~4.1 GB	~5 GB	General purpose
Llama 3.1 8B	~4.7 GB	~6 GB	Balanced quality
Gemma 4 e2brecommended	~7.2 GB	~9 GB	Best quality

Compatibility

Platform Support

1 USB drive, multiple machines.

macOS

Intel + Apple Silicon

Supported

Linux

Ubuntu 20+, x86_64