Astra DB & Cassandra Integration Guide

Overview

Astra DB is DataStax’s cloud-native Cassandra service that provides distributed vector storage for the Prompted Forge platform. This guide covers the complete integration, from setup to advanced query patterns.

Architecture Integration

flowchart TB
    subgraph "Application Layer"
        APP[Prompted Forge Server]
        COLLECTOR[Collector Service]
        CACHE[Cache Layer]
    end

    subgraph "Astra DB Cloud"
        PROXY[Astra Proxy]
        COORD[Coordinator Nodes]
        DATA[Data Nodes]
        STORAGE[Storage Nodes]
    end

    subgraph "Security Layer"
        BUNDLE[Secure Bundle]
        TLS[TLS Encryption]
        AUTH[Authentication]
    end

    subgraph "Vector Operations"
        EMBED[Embedding Generation]
        INDEX[Vector Indexing]
        SEARCH[Similarity Search]
    end

    APP --> BUNDLE
    BUNDLE --> TLS
    TLS --> AUTH
    AUTH --> PROXY

    PROXY --> COORD
    COORD --> DATA
    DATA --> STORAGE

    COLLECTOR --> EMBED
    EMBED --> INDEX
    INDEX --> DATA

    APP --> SEARCH
    SEARCH --> COORD
    COORD --> CACHE

Secure Bundle Management

The Astra DB secure bundle contains connection credentials and certificates required for secure communication.

Bundle Download Process

stateDiagram-v2
    [*] --> CheckBundle
    CheckBundle --> ValidateBundle : Bundle exists
    CheckBundle --> DownloadBundle : Bundle missing

    ValidateBundle --> ConnectionTest : Valid bundle
    ValidateBundle --> DownloadBundle : Invalid bundle

    DownloadBundle --> DirectURL : URL provided
    DownloadBundle --> AstraAPI : Use API

    DirectURL --> ValidateDownload
    AstraAPI --> ValidateDownload

    ValidateDownload --> ConnectionTest : Success
    ValidateDownload --> RetryDownload : Failure

    RetryDownload --> DownloadBundle : Retry < 3
    RetryDownload --> [*] : Max retries

    ConnectionTest --> Ready : Success
    ConnectionTest --> [*] : Connection failed

    Ready --> [*]

Environment Configuration

# Astra DB Configuration - ProductionASTRA_DB_APPLICATION_TOKEN="AstraCS:token:value"ASTRA_DB_ENDPOINT="<https://database-id-region.apps.astra.datastax.com>"ASTRA_DB_BUNDLE_URL="<https://datastax-cluster-config.s3.amazonaws.com/bundle.zip"ASTRA_DB_KEYSPACE="Prompted> Forge"# Alternative: Direct API downloadASTRA_DB_DATABASE_ID="database-uuid"ASTRA_DB_REGION="us-east-1"ASTRA_DB_APPLICATION_TOKEN="token-for-api-access"

Vector Schema Design

Keyspace and Table Structure

-- Create keyspace for Prompted Forge
CREATE KEYSPACE IF NOT EXISTS Prompted Forge
WITH replication = {
    'class': 'NetworkTopologyStrategy',
    'datacenter1': 3
};

-- Main document embeddings table
CREATE TABLE IF NOT EXISTS Prompted Forge.document_embeddings (
    id UUID PRIMARY KEY,
    workspace_id UUID,
    document_id UUID,
    chunk_index INT,
    content TEXT,
    embedding VECTOR<FLOAT, 1536>,  -- OpenAI ada-002 dimensions
    metadata MAP<TEXT, TEXT>,
    title TEXT,
    source TEXT,
    doc_path TEXT,
    created_at TIMESTAMP,
    updated_at TIMESTAMP
);

-- Vector similarity index
CREATE CUSTOM INDEX embedding_index
ON Prompted Forge.document_embeddings(embedding)
USING 'StorageAttachedIndex'
WITH OPTIONS = {
    'similarity_function': 'cosine',
    'source_model': 'ada-002'
};

-- Workspace filtering index
CREATE INDEX workspace_idx
ON Prompted Forge.document_embeddings(workspace_id);

-- Document reference index
CREATE INDEX document_idx
ON Prompted Forge.document_embeddings(document_id);

-- Metadata search table for hybrid queries
CREATE TABLE IF NOT EXISTS Prompted Forge.document_metadata (
    workspace_id UUID,
    document_id UUID,
    title TEXT,
    author TEXT,
    created_date DATE,
    tags SET<TEXT>,
    file_type TEXT,
    file_size BIGINT,
    PRIMARY KEY (workspace_id, document_id)
);

Vector Data Model

erDiagram
    DOCUMENT_EMBEDDINGS ||--o{ WORKSPACE : belongs_to
    DOCUMENT_EMBEDDINGS ||--o{ DOCUMENT : references
    DOCUMENT_EMBEDDINGS ||--o{ EMBEDDING_VECTOR : contains

    DOCUMENT_EMBEDDINGS {
        uuid id PK
        uuid workspace_id FK
        uuid document_id FK
        int chunk_index
        text content
        vector embedding
        map metadata
        text title
        text source
        text doc_path
        timestamp created_at
        timestamp updated_at
    }

    DOCUMENT_METADATA {
        uuid workspace_id PK
        uuid document_id PK
        text title
        text author
        date created_date
        set tags
        text file_type
        bigint file_size
    }

    WORKSPACE {
        uuid id PK
        text name
        text description
    }

    DOCUMENT {
        uuid id PK
        text filename
        text original_path
    }

Data Operations

Vector Insertion Pipeline

sequenceDiagram
    participant App as Application
    participant Embed as Embedding Service
    participant Astra as Astra DB
    participant Index as Vector Index

    Note over App,Index: Document Processing
    App->>Embed: Generate embeddings
    Embed-->>App: Vector embeddings

    Note over App,Astra: Batch Insert
    loop For each chunk
        App->>Astra: INSERT embedding
        Astra->>Index: Update vector index
        Index-->>Astra: Index updated
        Astra-->>App: Insert confirmed
    end

    Note over App,Index: Verification
    App->>Astra: Query inserted vectors
    Astra-->>App: Vector count
    App->>Index: Verify index consistency
    Index-->>App: Index status OK