Skip to content

Local GPT

Local GPT allows you to run AI language models directly on your infrastructure with complete data privacy. Your conversations and queries never leave your network, ensuring full compliance with data protection requirements while maintaining the power of modern language models.

  • Complete Privacy: All conversations are processed entirely on-premise
  • Multiple Models: Choose from a variety of models based on your hardware capabilities
  • Reasoning Support: Some models include advanced reasoning/thinking capabilities
  • Conversation History: Save and manage chat conversations (with optional Privacy Mode)
  • Markdown Export: Export conversations for documentation or sharing
  • Real-time Streaming: See responses as they are generated
  • RAM Management: Automatic detection of system resources and model compatibility

Local GPT requires the llama-cpp-python library to run models locally. On first launch, you’ll be prompted to install it automatically.

  1. Open Local GPT from the Enterprise applications menu
  2. Click Settings to open the Model Manager
  3. Choose a model based on your available RAM:
    • Basic (2-4 GB RAM): Good for testing and simple tasks
    • Standard (6-8 GB RAM): Balanced performance for general use
    • High (8-10 GB RAM): Better quality responses
    • Super-High (20+ GB RAM): Best quality for complex analysis
  4. Click the download icon to download the model
  5. Once downloaded, click the play icon to load the model
ModelSizeRAM RequiredBest For
Qwen 2.5 0.5B0.4 GB2 GBTesting (tends to hallucinate)
Llama 3.2 1B0.7 GB2 GBBasic tasks, testing
Qwen 2.5 1.5B1.0 GB3 GBGeneral tasks, light coding
Gemma 2 2B1.6 GB4 GBConversations, creative writing
Llama 3.2 3B1.8 GB4 GBGeneral tasks, conversations
Qwen 2.5 3B2.0 GB4 GBGeneral tasks, light coding
Qwen3 4B2.5 GB4 GBGeneral tasks, reasoning
ModelSizeRAM RequiredBest For
Phi-3.5 Mini 3.8B2.2 GB6 GBReasoning, math, science
Qwen3 4B Thinking2.5 GB6 GBDeep reasoning, complex problems
Qwen3 8B5.0 GB8 GBGeneral tasks, coding, reasoning
Qwen 2.5 Coder 7B4.4 GB8 GBProgramming, code review
ModelSizeRAM RequiredBest For
DeepSeek R1 0528 8B5.0 GB8 GBDeep reasoning, math, coding
Llama 3.1 8B4.7 GB10 GBGeneral tasks, coding, analysis
ModelSizeRAM RequiredBest For
GPT-OSS 20B (OpenAI)11.7 GB20 GBAdvanced reasoning, complex analysis
More models will be added over time. Check back for updates regularly depending on your hardware capabilities.

Some models support reasoning/thinking capabilities, indicated by a pink “REASONING” badge in the Model Manager. These models show their thought process before providing a final answer, which can be expanded or collapsed in the chat interface.

Reasoning models include:

  • Qwen3 4B
  • Qwen3 4B Thinking
  • Qwen3 8B
  • DeepSeek R1 0528 8B
  • GPT-OSS 20B

Enable Privacy Mode by clicking the shield icon in the chat header. When enabled:

  • Conversations are not saved to disk
  • No history entry is created
  • The conversation exists only in memory during your session
  • Ideal for sensitive discussions that shouldn’t be logged
Privacy Mode is indicated by a green shield icon. When disabled, conversations are automatically saved to your local storage.

Conversations are automatically saved after each message exchange (unless Privacy Mode is enabled). Each conversation includes:

  • Full message history
  • Model used for each response
  • Timestamps for all messages
  • Reasoning/thinking content (if applicable)

Export any conversation to Markdown format:

  1. Hover over a conversation in the sidebar
  2. Click the export icon
  3. The file will be downloaded with the conversation title and date

Or export the current conversation using the export button in the header.

  • Model Format: GGUF (quantized models from Hugging Face)
  • Inference Engine: llama-cpp-python
  • Streaming: Server-Sent Events (SSE)
  • Storage: Models stored in _user_packages/_ai_models/
  • Conversations: Stored as JSON in logs/localgpt/