> ## Documentation Index
> Fetch the complete documentation index at: https://docs.usestrawberry.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# DevOps Architecture 

### Servers and GPU Infrastructure

We operate a **self-hosted Large Language Model (LLM) ecosystem**, selecting models such as **DeepSeek R3** and **Mistral** based on task-specific requirements. These models power:

* Conversational interactions
* Complex reasoning operations
* Advanced classification tasks

Our infrastructure runs on **RunPod’s scalable GPU resources**, within a **robust Docker environment**. This setup allows us to:

* Dynamically adjust GPU instance count in response to real-time demand.
* Ensure optimal performance by efficiently allocating resources.

For tasks requiring **extensive context processing** (e.g., analyzing lengthy articles), we **strategically integrate centralized LLMs** (O3 and Anthropic). This approach is specifically utilized for processing large-scale data collected through our Anatomy of Luigi **scraping system**.

### Embedding Models

We prioritize **data privacy and security** by running our **own suite of embedding models** rather than outsourcing user data to external providers like OpenAI. These models include:

* Text embeddings
* Contextual embeddings
* Reranking models
  Our **primary embedding framework** is built using models from **Nomic AI**, ensuring high-quality vector representations while maintaining **full control over data processing**.

### Retrieval-Augmented Generation (RAG) Operations

To support **scalable data retrieval and indexing**, we operate a **distributed TimescaleDB infrastructure** across multiple **geographical regions**. This architecture ensures:

* High availability and redundancy
* Optimized performance for AI-driven data queries

By leveraging **TimescaleDB’s seamless integration with PostgreSQL**, our RAG pipeline supports:

* Automated document embeddings generation
* Advanced reranking model processing
* In-database execution of complex LLM queries

This integrated approach significantly enhances data retrieval efficiency and query performance.

### Conversational History Management

We utilize a tiered data storage approach to balance speed, scalability, and privacy:

* **Redis Instances** – Handle real-time processing of user interactions, ensuring ultra-fast response times.
* **MongoDB Atlas** – Provides optimized long-term storage, supporting efficient indexing and retrieval.

**Privacy-First Approach**

* **User data is stored as heuristic fingerprints** —you exist as an encrypted ID within our system.
* **Conversations remain private unless explicitly shared** via our secure link-sharing feature.
  This **privacy-first architecture** ensures robust **data protection** while maintaining seamless user experience.

### Agent Monitoring & Performance Analytics

Our AI agents are continuously monitored through a **self-hosted Elasticsearch cluster**, enabling comprehensive real-time analytics on:

* System health indicators
* Token processing efficiency
* Term frequency and usage patterns
*

By maintaining **detailed agent telemetry**, we can:

* Identify performance bottlenecks early
* Optimize model efficiency and responsiveness
* Continuously fine-tune our self-hosted models for long-term improvement

This observability-driven infrastructure ensures our AI ecosystem remains **scalable, efficient, and highly reliable**.
