Local GPT

Local GPT allows you to run AI language models directly on your infrastructure with complete data privacy. Your conversations and queries never leave your network, ensuring full compliance with data protection requirements while maintaining the power of modern language models.

Features

Complete Privacy: All conversations are processed entirely on-premise
Multiple Models: Choose from a variety of models based on your hardware capabilities
Reasoning Support: Some models include advanced reasoning/thinking capabilities
Conversation History: Save and manage chat conversations (with optional Privacy Mode)
Markdown Export: Export conversations for documentation or sharing
Real-time Streaming: See responses as they are generated
RAM Management: Automatic detection of system resources and model compatibility

Getting Started

Prerequisites

Local GPT requires the llama-cpp-python library to run models locally. On first launch, you’ll be prompted to install it automatically.

Loading a Model

Open Local GPT from the Enterprise applications menu
Click Settings to open the Model Manager
Choose a model based on your available RAM:
- Basic (2-4 GB RAM): Good for testing and simple tasks
- Standard (6-8 GB RAM): Balanced performance for general use
- High (8-10 GB RAM): Better quality responses
- Super-High (20+ GB RAM): Best quality for complex analysis
Click the download icon to download the model
Once downloaded, click the play icon to load the model

Available Models

Basic Quality

Model	Size	RAM Required	Best For
Qwen 2.5 0.5B	0.4 GB	2 GB	Testing (tends to hallucinate)
Llama 3.2 1B	0.7 GB	2 GB	Basic tasks, testing
Qwen 2.5 1.5B	1.0 GB	3 GB	General tasks, light coding
Gemma 2 2B	1.6 GB	4 GB	Conversations, creative writing
Llama 3.2 3B	1.8 GB	4 GB	General tasks, conversations
Qwen 2.5 3B	2.0 GB	4 GB	General tasks, light coding
Qwen3 4B	2.5 GB	4 GB	General tasks, reasoning

Standard Quality

Model	Size	RAM Required	Best For
Phi-3.5 Mini 3.8B	2.2 GB	6 GB	Reasoning, math, science
Qwen3 4B Thinking	2.5 GB	6 GB	Deep reasoning, complex problems
Qwen3 8B	5.0 GB	8 GB	General tasks, coding, reasoning
Qwen 2.5 Coder 7B	4.4 GB	8 GB	Programming, code review

High Quality

Model	Size	RAM Required	Best For
DeepSeek R1 0528 8B	5.0 GB	8 GB	Deep reasoning, math, coding
Llama 3.1 8B	4.7 GB	10 GB	General tasks, coding, analysis

Super-High Quality

Model	Size	RAM Required	Best For
GPT-OSS 20B (OpenAI)	11.7 GB	20 GB	Advanced reasoning, complex analysis

_{More models will be added over time. Check back for updates regularly depending on your hardware capabilities.}

Reasoning Models

Some models support reasoning/thinking capabilities, indicated by a pink “REASONING” badge in the Model Manager. These models show their thought process before providing a final answer, which can be expanded or collapsed in the chat interface.

Reasoning models include:

Qwen3 4B
Qwen3 4B Thinking
Qwen3 8B
DeepSeek R1 0528 8B
GPT-OSS 20B

Privacy Mode

Enable Privacy Mode by clicking the shield icon in the chat header. When enabled:

Conversations are not saved to disk
No history entry is created
The conversation exists only in memory during your session
Ideal for sensitive discussions that shouldn’t be logged

_{Privacy Mode is indicated by a green shield icon. When disabled, conversations are automatically saved to your local storage.}

Conversation Management

Saving Conversations

Conversations are automatically saved after each message exchange (unless Privacy Mode is enabled). Each conversation includes:

Full message history
Model used for each response
Timestamps for all messages
Reasoning/thinking content (if applicable)

Exporting Conversations

Export any conversation to Markdown format:

Hover over a conversation in the sidebar
Click the export icon
The file will be downloaded with the conversation title and date

Or export the current conversation using the export button in the header.

Technical Details

Model Format: GGUF (quantized models from Hugging Face)
Inference Engine: llama-cpp-python
Streaming: Server-Sent Events (SSE)
Storage: Models stored in _user_packages/_ai_models/
Conversations: Stored as JSON in logs/localgpt/