Local GPT
Local GPT allows you to run AI language models directly on your infrastructure with complete data privacy. Your conversations and queries never leave your network, ensuring full compliance with data protection requirements while maintaining the power of modern language models.
Features
Section titled “Features”- Complete Privacy: All conversations are processed entirely on-premise
- Multiple Models: Choose from a variety of models based on your hardware capabilities
- Reasoning Support: Some models include advanced reasoning/thinking capabilities
- Conversation History: Save and manage chat conversations (with optional Privacy Mode)
- Markdown Export: Export conversations for documentation or sharing
- Real-time Streaming: See responses as they are generated
- RAM Management: Automatic detection of system resources and model compatibility
Getting Started
Section titled “Getting Started”Prerequisites
Section titled “Prerequisites”Local GPT requires the llama-cpp-python library to run models locally. On first launch, you’ll be prompted to install it automatically.
Loading a Model
Section titled “Loading a Model”- Open Local GPT from the Enterprise applications menu
- Click Settings to open the Model Manager
- Choose a model based on your available RAM:
- Basic (2-4 GB RAM): Good for testing and simple tasks
- Standard (6-8 GB RAM): Balanced performance for general use
- High (8-10 GB RAM): Better quality responses
- Super-High (20+ GB RAM): Best quality for complex analysis
- Click the download icon to download the model
- Once downloaded, click the play icon to load the model
Available Models
Section titled “Available Models”Basic Quality
Section titled “Basic Quality”| Model | Size | RAM Required | Best For |
|---|---|---|---|
| Qwen 2.5 0.5B | 0.4 GB | 2 GB | Testing (tends to hallucinate) |
| Llama 3.2 1B | 0.7 GB | 2 GB | Basic tasks, testing |
| Qwen 2.5 1.5B | 1.0 GB | 3 GB | General tasks, light coding |
| Gemma 2 2B | 1.6 GB | 4 GB | Conversations, creative writing |
| Llama 3.2 3B | 1.8 GB | 4 GB | General tasks, conversations |
| Qwen 2.5 3B | 2.0 GB | 4 GB | General tasks, light coding |
| Qwen3 4B | 2.5 GB | 4 GB | General tasks, reasoning |
Standard Quality
Section titled “Standard Quality”| Model | Size | RAM Required | Best For |
|---|---|---|---|
| Phi-3.5 Mini 3.8B | 2.2 GB | 6 GB | Reasoning, math, science |
| Qwen3 4B Thinking | 2.5 GB | 6 GB | Deep reasoning, complex problems |
| Qwen3 8B | 5.0 GB | 8 GB | General tasks, coding, reasoning |
| Qwen 2.5 Coder 7B | 4.4 GB | 8 GB | Programming, code review |
High Quality
Section titled “High Quality”| Model | Size | RAM Required | Best For |
|---|---|---|---|
| DeepSeek R1 0528 8B | 5.0 GB | 8 GB | Deep reasoning, math, coding |
| Llama 3.1 8B | 4.7 GB | 10 GB | General tasks, coding, analysis |
Super-High Quality
Section titled “Super-High Quality”| Model | Size | RAM Required | Best For |
|---|---|---|---|
| GPT-OSS 20B (OpenAI) | 11.7 GB | 20 GB | Advanced reasoning, complex analysis |
Reasoning Models
Section titled “Reasoning Models”Some models support reasoning/thinking capabilities, indicated by a pink “REASONING” badge in the Model Manager. These models show their thought process before providing a final answer, which can be expanded or collapsed in the chat interface.
Reasoning models include:
- Qwen3 4B
- Qwen3 4B Thinking
- Qwen3 8B
- DeepSeek R1 0528 8B
- GPT-OSS 20B
Privacy Mode
Section titled “Privacy Mode”Enable Privacy Mode by clicking the shield icon in the chat header. When enabled:
- Conversations are not saved to disk
- No history entry is created
- The conversation exists only in memory during your session
- Ideal for sensitive discussions that shouldn’t be logged
Conversation Management
Section titled “Conversation Management”Saving Conversations
Section titled “Saving Conversations”Conversations are automatically saved after each message exchange (unless Privacy Mode is enabled). Each conversation includes:
- Full message history
- Model used for each response
- Timestamps for all messages
- Reasoning/thinking content (if applicable)
Exporting Conversations
Section titled “Exporting Conversations”Export any conversation to Markdown format:
- Hover over a conversation in the sidebar
- Click the export icon
- The file will be downloaded with the conversation title and date
Or export the current conversation using the export button in the header.
Technical Details
Section titled “Technical Details”- Model Format: GGUF (quantized models from Hugging Face)
- Inference Engine: llama-cpp-python
- Streaming: Server-Sent Events (SSE)
- Storage: Models stored in
_user_packages/_ai_models/ - Conversations: Stored as JSON in
logs/localgpt/