Skip to content
Open Source · MIT License

Your AI lives on
a USB stick

Plug in. Chat locally. Unplug. Zero footprint.No install. No cloud. No SSD space wasted.

Process

3 Steps. Zero Installs.

01

Plug In

Insert the USB into any Mac or Linux machine. Runtime, models, and chat UI are all on the drive.

02

Launch

Run ./launch.sh — Ollama starts, loads your model into RAM, and opens the chat UI.

03

Unplug

Ctrl+C to stop. Pull the USB. Nothing installed, cached, or saved on the host.

Why

The Problem with Local LLMs

They eat your disk, chain you to one machine, and leave traces everywhere. PocketLLM fixes all of it.

Truly Private

No data leaves the machine. No cloud. No API keys. Unplug and every trace vanishes.

Save Your SSD

5–50 GB of model weights live on the USB. Your laptop’s drive stays clean.

Zero Install

No Ollama, no brew, no Docker needed. The USB bundles the full runtime and GPU libraries.

Machine to Machine

Carry AI between laptop, desktop, and office machine. Same models, same history, one USB.

Performance

USB = SSD Speed

After the one-time model load, inference is identical. Tested on MacBook Pro M4, 16 GB RAM.

Cold Start — Model Load

One-time cost on first chat

ModelSSDUSB
gemma4:e2b7.0s47.7s
llama3.1:8b11.0s29.8s

Slower on USB \u2014 but only happens once. Model stays in RAM after.

Inference — After Load

Where it matters

ModelSSDUSB
gemma4:e2b53.9 tok/s54.0 tok/s
llama3.1:8b21.2 tok/s21.4 tok/s

Identical. Zero difference.

The only bottleneck is initial model load. With a USB 3.2 SSD enclosure, even that gap shrinks. After loading, the real limit is your machine’s RAM.

Features

Everything on the Drive

1 USB. Full AI workspace.

Any Ollama Model

Gemma, Llama, Mistral, Phi — if Ollama runs it, PocketLLM runs it.

Web Chat UI

Markdown, code blocks, streaming, history. All in-browser.

Terminal Chat

Interactive CLI chat on launch. Like your own Claude Code.

Chat History

Conversations saved to USB. Carry history between machines.

Skills System

Extend UI with drop-in JavaScript plugins.

1-Command Launch

./launch.sh — auto-detects OS, models, starts server, opens browser.

Models

Pick Your Model

Any Ollama-compatible model works.

ModelDiskRAM
Phi-3 mini (3.8B)~2.3 GB~3 GB
Mistral 7B~4.1 GB~5 GB
Llama 3.1 8B~4.7 GB~6 GB
Gemma 4 e2brecommended~7.2 GB~9 GB

Compatibility

Platform Support

1 USB drive, multiple machines.

macOS

Intel + Apple Silicon

Supported

Linux

Ubuntu 20+, x86_64

Supported

Windows

x86_64

Coming Soon

Requirements

What You Need

PocketLLM brings the software. Your machine brings the compute.

Minimum

  • USB 3.0 drive, 64 GB
  • 8 GB RAM (small models)
  • macOS 12+ or Ubuntu 20+
Recommended

Recommended

  • USB 3.2 Gen 2, 128 GB+
  • 16 GB+ RAM (7B+ models)
  • macOS with Apple Silicon

Ready to carry your AI in your pocket?

Open source. Free forever. 5 minutes to set up.