Astra DB is DataStax’s cloud-native Cassandra service that provides distributed vector storage for the Prompted Forge platform. This guide covers the complete integration, from setup to advanced query patterns.
flowchart TB
subgraph "Application Layer"
APP[Prompted Forge Server]
COLLECTOR[Collector Service]
CACHE[Cache Layer]
end
subgraph "Astra DB Cloud"
PROXY[Astra Proxy]
COORD[Coordinator Nodes]
DATA[Data Nodes]
STORAGE[Storage Nodes]
end
subgraph "Security Layer"
BUNDLE[Secure Bundle]
TLS[TLS Encryption]
AUTH[Authentication]
end
subgraph "Vector Operations"
EMBED[Embedding Generation]
INDEX[Vector Indexing]
SEARCH[Similarity Search]
end
APP --> BUNDLE
BUNDLE --> TLS
TLS --> AUTH
AUTH --> PROXY
PROXY --> COORD
COORD --> DATA
DATA --> STORAGE
COLLECTOR --> EMBED
EMBED --> INDEX
INDEX --> DATA
APP --> SEARCH
SEARCH --> COORD
COORD --> CACHE
The Astra DB secure bundle contains connection credentials and certificates required for secure communication.
stateDiagram-v2
[*] --> CheckBundle
CheckBundle --> ValidateBundle : Bundle exists
CheckBundle --> DownloadBundle : Bundle missing
ValidateBundle --> ConnectionTest : Valid bundle
ValidateBundle --> DownloadBundle : Invalid bundle
DownloadBundle --> DirectURL : URL provided
DownloadBundle --> AstraAPI : Use API
DirectURL --> ValidateDownload
AstraAPI --> ValidateDownload
ValidateDownload --> ConnectionTest : Success
ValidateDownload --> RetryDownload : Failure
RetryDownload --> DownloadBundle : Retry < 3
RetryDownload --> [*] : Max retries
ConnectionTest --> Ready : Success
ConnectionTest --> [*] : Connection failed
Ready --> [*]
# Astra DB Configuration - ProductionASTRA_DB_APPLICATION_TOKEN="AstraCS:token:value"ASTRA_DB_ENDPOINT="<https://database-id-region.apps.astra.datastax.com>"ASTRA_DB_BUNDLE_URL="<https://datastax-cluster-config.s3.amazonaws.com/bundle.zip"ASTRA_DB_KEYSPACE="Prompted> Forge"# Alternative: Direct API downloadASTRA_DB_DATABASE_ID="database-uuid"ASTRA_DB_REGION="us-east-1"ASTRA_DB_APPLICATION_TOKEN="token-for-api-access"
-- Create keyspace for Prompted Forge
CREATE KEYSPACE IF NOT EXISTS Prompted Forge
WITH replication = {
'class': 'NetworkTopologyStrategy',
'datacenter1': 3
};
-- Main document embeddings table
CREATE TABLE IF NOT EXISTS Prompted Forge.document_embeddings (
id UUID PRIMARY KEY,
workspace_id UUID,
document_id UUID,
chunk_index INT,
content TEXT,
embedding VECTOR<FLOAT, 1536>, -- OpenAI ada-002 dimensions
metadata MAP<TEXT, TEXT>,
title TEXT,
source TEXT,
doc_path TEXT,
created_at TIMESTAMP,
updated_at TIMESTAMP
);
-- Vector similarity index
CREATE CUSTOM INDEX embedding_index
ON Prompted Forge.document_embeddings(embedding)
USING 'StorageAttachedIndex'
WITH OPTIONS = {
'similarity_function': 'cosine',
'source_model': 'ada-002'
};
-- Workspace filtering index
CREATE INDEX workspace_idx
ON Prompted Forge.document_embeddings(workspace_id);
-- Document reference index
CREATE INDEX document_idx
ON Prompted Forge.document_embeddings(document_id);
-- Metadata search table for hybrid queries
CREATE TABLE IF NOT EXISTS Prompted Forge.document_metadata (
workspace_id UUID,
document_id UUID,
title TEXT,
author TEXT,
created_date DATE,
tags SET<TEXT>,
file_type TEXT,
file_size BIGINT,
PRIMARY KEY (workspace_id, document_id)
);
erDiagram
DOCUMENT_EMBEDDINGS ||--o{ WORKSPACE : belongs_to
DOCUMENT_EMBEDDINGS ||--o{ DOCUMENT : references
DOCUMENT_EMBEDDINGS ||--o{ EMBEDDING_VECTOR : contains
DOCUMENT_EMBEDDINGS {
uuid id PK
uuid workspace_id FK
uuid document_id FK
int chunk_index
text content
vector embedding
map metadata
text title
text source
text doc_path
timestamp created_at
timestamp updated_at
}
DOCUMENT_METADATA {
uuid workspace_id PK
uuid document_id PK
text title
text author
date created_date
set tags
text file_type
bigint file_size
}
WORKSPACE {
uuid id PK
text name
text description
}
DOCUMENT {
uuid id PK
text filename
text original_path
}
sequenceDiagram
participant App as Application
participant Embed as Embedding Service
participant Astra as Astra DB
participant Index as Vector Index
Note over App,Index: Document Processing
App->>Embed: Generate embeddings
Embed-->>App: Vector embeddings
Note over App,Astra: Batch Insert
loop For each chunk
App->>Astra: INSERT embedding
Astra->>Index: Update vector index
Index-->>Astra: Index updated
Astra-->>App: Insert confirmed
end
Note over App,Index: Verification
App->>Astra: Query inserted vectors
Astra-->>App: Vector count
App->>Index: Verify index consistency
Index-->>App: Index status OK