AWS Bedrock Knowledge Bases Guide
Table of Contents
- Overview
- What are Knowledge Bases?
- Key Concepts
- Architecture
- Vector Store Options for Knowledge Bases
- Prerequisites
- Getting Started
- Creating a Knowledge Base
- Data Sources
- Querying Knowledge Bases
- RAG (Retrieval-Augmented Generation)
- Advanced Features
- Best Practices
- AWS Code Samples and Resources
- Troubleshooting
Overview
AWS Bedrock Knowledge Bases enable Retrieval-Augmented Generation (RAG) by allowing foundation models to access and reference your proprietary data. This combines the power of large language models with your specific domain knowledge.
Key Benefits: - Ground model responses in your data - Reduce hallucinations - Keep data up-to-date without retraining - Cite sources for transparency - Maintain data security and privacy - Scale to millions of documents
Is Knowledge Base Limited to RAG?
No! While RAG is the primary use case, Knowledge Bases support multiple patterns:
1. RAG (Retrieval-Augmented Generation) - Most Common
Query → Retrieve Context → Generate Answer with LLM
Use: Q&A, chatbots, content generation with your data
2. Retrieval Only (No Generation)
Query → Retrieve Relevant Documents → Return Documents
Use: Search, document discovery, research
Example:
# Just retrieve documents, no AI generation
results = retrieve(kb_id, "product specifications")
# Returns: List of relevant documents with scores
3. Semantic Search
Query → Find Similar Content → Rank by Relevance
Use: Finding related documents, recommendations
Example:
# Find documents similar to a given document
similar_docs = find_similar(kb_id, reference_document)
4. Hybrid Search (Semantic + Keyword)
Query → Vector Search + Keyword Search → Combined Results
Use: Best of both worlds - meaning and exact matches
Example:
# Combine semantic understanding with exact keyword matching
results = hybrid_search(kb_id, "API authentication", search_type="HYBRID")
5. Metadata Filtering
Query → Filter by Metadata → Retrieve Filtered Results
Use: Department-specific docs, date ranges, categories
Example:
# Retrieve only from specific department and year
results = retrieve_with_filter(
kb_id,
query="policies",
filters={"department": "HR", "year": 2024}
)
6. Multi-Step Reasoning (Agent Workflows)
Query → Retrieve → Analyze → Retrieve More → Generate
Use: Complex research, multi-document analysis
Example:
# Agent decides when to retrieve more information
agent_response = agent.run(
query="Compare our products",
tools=[knowledge_base_retrieval, calculator, web_search]
)
7. Context Injection (Without RAG)
Retrieve Documents → Use in Custom Workflow → Your Processing
Use: Custom pipelines, data extraction, analysis
Example:
# Retrieve docs and process them your way
docs = retrieve(kb_id, "contracts")
for doc in docs:
extracted_data = custom_extraction(doc)
store_in_database(extracted_data)
8. Citation and Source Tracking
Query → Retrieve → Generate → Track Sources
Use: Compliance, audit trails, transparency
Example:
# Get answer with full citation trail
response = retrieve_and_generate(kb_id, query)
print(f"Answer: {response['answer']}")
print(f"Sources: {response['citations']}")
Comparison: RAG vs Other Patterns
| Pattern | Retrieval | Generation | Use Case |
|---|---|---|---|
| RAG | ✅ Yes | ✅ Yes | Q&A, chatbots, content creation |
| Retrieval Only | ✅ Yes | ❌ No | Search, document discovery |
| Semantic Search | ✅ Yes | ❌ No | Finding similar content |
| Metadata Filtering | ✅ Yes | ⚠️ Optional | Filtered search/retrieval |
| Agent Workflows | ✅ Yes | ✅ Yes | Complex multi-step tasks |
| Custom Processing | ✅ Yes | ❌ No | Data extraction, analysis |
When to Use Each Pattern
Use RAG when: - You need natural language answers - Users ask questions about your documents - You want conversational interfaces - Citations are important
Use Retrieval Only when: - Users need to see actual documents - You're building a search interface - You want to process documents yourself - You need document metadata
Use Semantic Search when: - Finding related content - Building recommendation systems - Discovering similar documents - Research and exploration
Use Metadata Filtering when: - Documents have categories/tags - Need department-specific results - Time-based filtering (recent docs) - Compliance requirements
Use Agent Workflows when: - Multi-step reasoning needed - Combining multiple data sources - Complex decision-making - Dynamic retrieval strategies
Key Takeaway
Knowledge Bases are flexible retrieval systems. RAG is just one (very popular) way to use them.
Knowledge Base = Smart Document Storage + Retrieval Engine
You can:
✅ Use it for RAG (retrieve + generate)
✅ Use it for search only (retrieve)
✅ Use it for recommendations (similarity)
✅ Use it in custom workflows (your logic)
✅ Combine multiple patterns
The power is in having your documents indexed and searchable - what you do with the retrieved information is up to you!
What are Knowledge Bases?
The Core Purpose: Providing Custom Context to LLMs
Yes! The primary purpose of Knowledge Bases is to provide custom, domain-specific context to Large Language Models (LLMs).
The Problem Knowledge Bases Solve
Without Knowledge Bases:
User: "What is our company's return policy?"
↓
LLM (trained on general internet data)
↓
Response: "I don't have information about your specific company's
return policy. Return policies typically vary by company..."
❌ Problem: The LLM doesn't know YOUR specific information
With Knowledge Bases:
User: "What is our company's return policy?"
↓
Knowledge Base retrieves relevant documents
↓
LLM receives: User question + Your company's actual policy document
↓
Response: "According to your company policy, customers can return
items within 30 days with receipt for full refund..."
✅ Solution: The LLM now has YOUR specific context
How Knowledge Bases Provide Context
Think of it like giving the LLM a textbook before asking it questions:
┌─────────────────────────────────────────────────────┐
│ YOUR DOCUMENTS (The Custom Context) │
│ • Company policies │
│ • Product manuals │
│ • Internal documentation │
│ • Customer data │
│ • Technical specifications │
└─────────────────────────────────────────────────────┘
↓
[Knowledge Base]
(Stores & Indexes Documents)
↓
┌─────────────────────────────────────────────────────┐
│ USER ASKS QUESTION │
│ "How do I configure the API?" │
└─────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────┐
│ RETRIEVAL: Find Relevant Context │
│ Searches your documents for relevant information │
└─────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────┐
│ AUGMENTATION: Add Context to Prompt │
│ "Based on this documentation: [retrieved docs] │
│ Answer: How do I configure the API?" │
└─────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────┐
│ LLM GENERATES ANSWER │
│ Uses YOUR context to provide accurate response │
└─────────────────────────────────────────────────────┘
Why Custom Context Matters
1. Accuracy
Without context:
question = "What's the warranty period?"
# LLM guesses: "Warranties typically range from 1-3 years..."
With context from Knowledge Base:
question = "What's the warranty period?"
context = "Our products come with a 5-year comprehensive warranty..."
# LLM answers accurately: "Your products have a 5-year warranty..."
2. Specificity
Without context:
question = "How do I reset my password?"
# LLM gives generic steps that may not match your system
With context from Knowledge Base:
question = "How do I reset my password?"
context = "To reset password in our system: 1. Click 'Forgot Password'
on login page 2. Enter email 3. Check for reset link..."
# LLM provides YOUR exact steps
3. Up-to-Date Information
Without context:
# LLM training data is from 2023
question = "What are the new features in version 5.0?"
# LLM: "I don't have information about version 5.0"
With context from Knowledge Base:
# Your docs are updated daily
question = "What are the new features in version 5.0?"
context = "Version 5.0 released yesterday includes: AI assistant,
dark mode, mobile app..."
# LLM provides current information
4. Proprietary Information
Without context:
question = "What's our pricing for enterprise customers?"
# LLM: "I don't have access to your pricing information"
With context from Knowledge Base:
question = "What's our pricing for enterprise customers?"
context = "Enterprise pricing: $500/month for up to 100 users..."
# LLM provides YOUR pricing
The RAG Pattern: Retrieval-Augmented Generation
Knowledge Bases implement the RAG pattern:
R - RETRIEVAL
↓ Find relevant documents from your knowledge base
A - AUGMENTATION
↓ Add retrieved documents as context to the prompt
G - GENERATION
↓ LLM generates answer based on YOUR context
Example in Action:
# 1. RETRIEVAL
user_question = "What's the refund process?"
retrieved_docs = knowledge_base.search(user_question)
# Returns: Your company's refund policy document
# 2. AUGMENTATION
augmented_prompt = f"""
Based on this company policy:
{retrieved_docs}
Answer the question: {user_question}
"""
# 3. GENERATION
llm_response = model.generate(augmented_prompt)
# Returns: Accurate answer based on YOUR policy
What Knowledge Bases Are (Technical View)
Knowledge Bases are managed repositories that: - Store your documents - PDFs, text files, web pages, databases - Create embeddings - Vector representations of content for semantic search - Enable semantic search - Find relevant information based on meaning, not just keywords - Integrate with models - Automatically provide context to LLMs - Track sources - Show where information came from (citations) - Update dynamically - Add/remove documents without retraining models
Real-World Analogy
Think of a Knowledge Base like a smart filing cabinet for an AI assistant:
Traditional Approach:
Employee: "What's the vacation policy?"
Manager: "Let me find the employee handbook... [searches files]...
Here it is, page 47..."
Knowledge Base Approach:
Employee: "What's the vacation policy?"
AI (with Knowledge Base): [Instantly retrieves policy]
"According to the employee handbook, you get 15 days..."
The Knowledge Base gives the AI instant access to your specific documents, just like a manager who has memorized all company policies.
Use Cases for Custom Context
Customer Support:
Context: Product manuals, FAQs, troubleshooting guides
Result: AI answers customer questions accurately
Internal Knowledge Management:
Context: Company policies, procedures, onboarding docs
Result: Employees get instant answers to HR/IT questions
Product Information:
Context: Product catalogs, specifications, pricing
Result: Sales team gets accurate product details
Legal/Compliance:
Context: Contracts, regulations, legal documents
Result: Quick answers to compliance questions
Technical Documentation:
Context: API docs, code examples, architecture diagrams
Result: Developers get accurate technical guidance
Research & Analysis:
Context: Research papers, reports, data
Result: AI synthesizes information from your sources
Key Insight
Knowledge Bases don't change what the LLM knows fundamentally. Instead, they provide temporary, relevant context for each query.
LLM's Base Knowledge (from training):
"I know general information about the world"
+
Knowledge Base Context (your documents):
"Here's YOUR specific information"
=
Accurate, Customized Response:
"Answer based on YOUR data"
This is why Knowledge Bases are so powerful - they let you leverage the LLM's language understanding while grounding responses in YOUR truth.
Key Concepts
Knowledge Base
A managed repository that stores, indexes, and retrieves your documents.
Data Source
The origin of your documents (S3, web crawler, Confluence, SharePoint, Salesforce).
Embeddings
Vector representations of text that capture semantic meaning.
Embedding Model
The model used to convert text to vectors (e.g., Amazon Titan Embeddings).
Vector Store
Database that stores embeddings for fast similarity search (OpenSearch, Pinecone, Redis).
Chunking
Breaking documents into smaller pieces for better retrieval.
Retrieval
Finding relevant document chunks based on a query.
RAG (Retrieval-Augmented Generation)
Combining retrieved information with model generation.
Architecture
┌─────────────┐
│ Documents │
│ (S3, etc) │
└──────┬──────┘
│
▼
┌─────────────────┐
│ Chunking & │
│ Embedding │
└──────┬──────────┘
│
▼
┌─────────────────┐
│ Vector Store │
│ (OpenSearch) │
└──────┬──────────┘
│
▼
┌─────────────────┐ ┌──────────────┐
│ User Query │─────▶│ Retrieval │
└─────────────────┘ └──────┬───────┘
│
▼
┌──────────────┐
│ Foundation │
│ Model │
└──────┬───────┘
│
▼
┌──────────────┐
│ Response │
│ with Sources │
└──────────────┘
Vector Store Options for Knowledge Bases
Knowledge Bases require a vector store (also called vector database) to store document embeddings. AWS Bedrock supports multiple vector store options, each with different characteristics.
What is a Vector Store?
A vector store is a specialized database that: - Stores embeddings (numerical representations of text) - Performs similarity search (finds similar vectors quickly) - Scales to millions of documents - Returns results in milliseconds
Simple Analogy:
Traditional Database:
"Find documents where title = 'API Guide'"
→ Exact match only
Vector Store:
"Find documents similar to 'API authentication'"
→ Returns: API Guide, Security Docs, OAuth Tutorial
→ Based on semantic meaning, not exact words
Supported Vector Stores
AWS Bedrock Knowledge Bases support these vector store options:
| Vector Store | Type | Best For | Pricing Model |
|---|---|---|---|
| Amazon OpenSearch Serverless | Managed | Production, recommended | Pay per OCU |
| Amazon OpenSearch Service | Managed | Existing OpenSearch users | Instance-based |
| Amazon Aurora PostgreSQL | Relational DB | Existing Aurora users | Instance-based |
| Pinecone | Third-party SaaS | Multi-cloud, specialized | Subscription |
| Redis Enterprise Cloud | Third-party SaaS | High performance | Subscription |
| MongoDB Atlas | Third-party SaaS | Document-oriented | Subscription |
1. Amazon OpenSearch Serverless (Recommended)
Overview: - Fully managed, serverless vector search - No infrastructure management - Auto-scaling - Built-in security
When to Use: - ✅ New projects - ✅ Don't want to manage infrastructure - ✅ Variable workloads - ✅ Quick setup
Configuration:
import boto3
# Create OpenSearch Serverless collection
aoss_client = boto3.client('opensearchserverless')
collection = aoss_client.create_collection(
name='bedrock-kb-collection',
type='VECTORSEARCH',
description='Vector store for Knowledge Base'
)
# Use in Knowledge Base
storage_config = {
'type': 'OPENSEARCH_SERVERLESS',
'opensearchServerlessConfiguration': {
'collectionArn': collection['createCollectionDetail']['arn'],
'vectorIndexName': 'bedrock-kb-index',
'fieldMapping': {
'vectorField': 'embedding',
'textField': 'text',
'metadataField': 'metadata'
}
}
}
Pros: - ✅ No server management - ✅ Auto-scaling - ✅ AWS-native integration - ✅ Built-in security (IAM, encryption) - ✅ Pay only for what you use
Cons: - ❌ Higher cost for consistent high loads - ❌ Cold start latency possible - ❌ Less control over configuration
Pricing: - Based on OpenSearch Compute Units (OCUs) - ~$0.24 per OCU-hour - Minimum 2 OCUs for indexing, 2 for search
2. Amazon OpenSearch Service
Overview: - Managed OpenSearch clusters - Full control over configuration - Predictable pricing
When to Use: - ✅ Already using OpenSearch - ✅ Need fine-grained control - ✅ Consistent high workloads - ✅ Custom plugins/configurations
Configuration:
storage_config = {
'type': 'OPENSEARCH',
'opensearchConfiguration': {
'endpoint': 'https://my-domain.us-east-1.es.amazonaws.com',
'vectorIndexName': 'bedrock-kb-index',
'fieldMapping': {
'vectorField': 'embedding',
'textField': 'text',
'metadataField': 'metadata'
}
}
}
Pros: - ✅ Full control over cluster - ✅ Predictable costs - ✅ Better for sustained high loads - ✅ Advanced OpenSearch features
Cons: - ❌ Must manage cluster - ❌ Manual scaling - ❌ Pay for reserved capacity
Pricing: - Instance-based (e.g., r6g.large.search) - ~$0.162/hour per instance - Storage: ~$0.135/GB-month
3. Amazon Aurora PostgreSQL (with pgvector)
Overview: - Relational database with vector extension - Combines structured and vector data - Familiar SQL interface
When to Use: - ✅ Already using Aurora PostgreSQL - ✅ Need relational + vector data together - ✅ SQL-based workflows - ✅ Transactional consistency
Configuration:
storage_config = {
'type': 'RDS',
'rdsConfiguration': {
'resourceArn': 'arn:aws:rds:us-east-1:ACCOUNT:cluster:my-aurora-cluster',
'credentialsSecretArn': 'arn:aws:secretsmanager:...',
'databaseName': 'knowledge_base',
'tableName': 'embeddings',
'fieldMapping': {
'primaryKeyField': 'id',
'vectorField': 'embedding',
'textField': 'text',
'metadataField': 'metadata'
}
}
}
Setup pgvector:
-- Enable pgvector extension
CREATE EXTENSION IF NOT EXISTS vector;
-- Create table for embeddings
CREATE TABLE embeddings (
id SERIAL PRIMARY KEY,
text TEXT,
embedding vector(1536), -- Dimension depends on embedding model
metadata JSONB,
created_at TIMESTAMP DEFAULT NOW()
);
-- Create index for fast similarity search
CREATE INDEX ON embeddings USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);
Pros: - ✅ Combine relational and vector data - ✅ SQL queries - ✅ ACID transactions - ✅ Familiar PostgreSQL ecosystem
Cons: - ❌ Not specialized for vectors - ❌ Slower than dedicated vector DBs at scale - ❌ More complex setup
Pricing: - Aurora instance pricing - ~$0.29/hour for db.r6g.large - Storage: ~$0.10/GB-month
4. Pinecone
Overview: - Purpose-built vector database (SaaS) - Specialized for vector search - Multi-cloud support
When to Use: - ✅ Multi-cloud strategy - ✅ Need specialized vector features - ✅ Want managed service outside AWS - ✅ High-performance requirements
Configuration:
storage_config = {
'type': 'PINECONE',
'pineconeConfiguration': {
'connectionString': 'https://your-index.pinecone.io',
'credentialsSecretArn': 'arn:aws:secretsmanager:...',
'namespace': 'bedrock-kb',
'fieldMapping': {
'textField': 'text',
'metadataField': 'metadata'
}
}
}
Pros: - ✅ Purpose-built for vectors - ✅ Excellent performance - ✅ Simple API - ✅ Good documentation
Cons: - ❌ Third-party dependency - ❌ Data leaves AWS - ❌ Additional vendor relationship - ❌ Subscription costs
Pricing: - Starter: Free (1M vectors) - Standard: ~$70/month (5M vectors) - Enterprise: Custom pricing
5. Redis Enterprise Cloud
Overview: - In-memory database with vector search - Extremely fast retrieval - Real-time performance
When to Use: - ✅ Need ultra-low latency - ✅ Already using Redis - ✅ Real-time applications - ✅ Caching + vector search
Configuration:
storage_config = {
'type': 'REDIS_ENTERPRISE_CLOUD',
'redisEnterpriseCloudConfiguration': {
'endpoint': 'redis-12345.c1.us-east-1-1.ec2.cloud.redislabs.com:12345',
'credentialsSecretArn': 'arn:aws:secretsmanager:...',
'vectorIndexName': 'bedrock-kb-idx',
'fieldMapping': {
'vectorField': 'embedding',
'textField': 'text',
'metadataField': 'metadata'
}
}
}
Pros: - ✅ Extremely fast (in-memory) - ✅ Sub-millisecond latency - ✅ Redis ecosystem - ✅ Caching + vectors
Cons: - ❌ More expensive (in-memory) - ❌ Third-party service - ❌ Data size limited by memory
Pricing: - Based on memory and throughput - ~$0.119/GB-hour - Minimum ~$100/month
6. MongoDB Atlas
Overview: - Document database with vector search - Combines documents and vectors - Flexible schema
When to Use: - ✅ Already using MongoDB - ✅ Document-oriented data - ✅ Flexible schema needs - ✅ JSON-native workflows
Configuration:
storage_config = {
'type': 'MONGO_DB_ATLAS',
'mongoDbAtlasConfiguration': {
'endpoint': 'mongodb+srv://cluster.mongodb.net',
'credentialsSecretArn': 'arn:aws:secretsmanager:...',
'databaseName': 'knowledge_base',
'collectionName': 'embeddings',
'vectorIndexName': 'vector_index',
'fieldMapping': {
'vectorField': 'embedding',
'textField': 'text',
'metadataField': 'metadata'
}
}
}
Pros: - ✅ Document + vector in one DB - ✅ Flexible schema - ✅ MongoDB ecosystem - ✅ Good for unstructured data
Cons: - ❌ Not specialized for vectors - ❌ Third-party service - ❌ Can be expensive at scale
Pricing: - Serverless: Pay per operation - Dedicated: ~$0.08/hour (M10) - Storage: ~$0.25/GB-month
Comparison Matrix
| Feature | OpenSearch Serverless | OpenSearch Service | Aurora PostgreSQL | Pinecone | Redis | MongoDB |
|---|---|---|---|---|---|---|
| Setup Complexity | ⭐ Easy | ⭐⭐ Medium | ⭐⭐⭐ Complex | ⭐ Easy | ⭐⭐ Medium | ⭐⭐ Medium |
| Performance | ⭐⭐⭐⭐ Good | ⭐⭐⭐⭐ Good | ⭐⭐⭐ Medium | ⭐⭐⭐⭐⭐ Excellent | ⭐⭐⭐⭐⭐ Excellent | ⭐⭐⭐ Medium |
| Scalability | ⭐⭐⭐⭐⭐ Auto | ⭐⭐⭐⭐ Manual | ⭐⭐⭐ Limited | ⭐⭐⭐⭐⭐ Auto | ⭐⭐⭐⭐ Good | ⭐⭐⭐⭐ Good |
| Cost (Small) | ⭐⭐⭐ Medium | ⭐⭐ High | ⭐⭐⭐ Medium | ⭐⭐⭐⭐ Low | ⭐⭐ High | ⭐⭐⭐ Medium |
| Cost (Large) | ⭐⭐ High | ⭐⭐⭐⭐ Low | ⭐⭐⭐ Medium | ⭐⭐⭐ Medium | ⭐ Very High | ⭐⭐⭐ Medium |
| AWS Integration | ⭐⭐⭐⭐⭐ Native | ⭐⭐⭐⭐⭐ Native | ⭐⭐⭐⭐⭐ Native | ⭐⭐⭐ Good | ⭐⭐⭐ Good | ⭐⭐⭐ Good |
| Management | ⭐⭐⭐⭐⭐ Fully Managed | ⭐⭐⭐ Managed | ⭐⭐⭐ Managed | ⭐⭐⭐⭐⭐ Fully Managed | ⭐⭐⭐⭐ Managed | ⭐⭐⭐⭐ Managed |
Decision Guide
Choose OpenSearch Serverless if: - Starting a new project - Want simplicity and AWS-native - Variable workloads - Don't want to manage infrastructure
Choose OpenSearch Service if: - Already using OpenSearch - Need full control - Consistent high workloads - Want predictable costs
Choose Aurora PostgreSQL if: - Already using Aurora/PostgreSQL - Need relational + vector data - Want SQL interface - Need ACID transactions
Choose Pinecone if: - Need best-in-class vector performance - Multi-cloud strategy - Want specialized vector features - Willing to use third-party service
Choose Redis if: - Need ultra-low latency - Real-time requirements - Already using Redis - Budget for in-memory costs
Choose MongoDB if: - Already using MongoDB - Document-oriented data model - Need flexible schema - JSON-native workflows
Cost Comparison Example
Scenario: 1 million documents, 10,000 queries/day
| Vector Store | Monthly Cost (Estimate) |
|---|---|
| OpenSearch Serverless | ~$350-500 |
| OpenSearch Service (3 nodes) | ~$350-400 |
| Aurora PostgreSQL | ~$200-300 |
| Pinecone (Standard) | ~$70-100 |
| Redis Enterprise | ~$200-400 |
| MongoDB Atlas | ~$150-250 |
Note: Costs vary based on actual usage, region, and configuration
Recommendation for Most Users
🏆 Start with Amazon OpenSearch Serverless
Why? - ✅ Easiest setup - ✅ AWS-native (best Bedrock integration) - ✅ No infrastructure management - ✅ Auto-scaling - ✅ Good performance - ✅ Secure by default
You can always migrate to another option later if your needs change.
Prerequisites
AWS Account Requirements: - AWS account with Bedrock access - S3 bucket for documents - Vector store (OpenSearch Serverless recommended) - IAM permissions
Required IAM Permissions:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"bedrock:CreateKnowledgeBase",
"bedrock:GetKnowledgeBase",
"bedrock:UpdateKnowledgeBase",
"bedrock:DeleteKnowledgeBase",
"bedrock:ListKnowledgeBases",
"bedrock:CreateDataSource",
"bedrock:StartIngestionJob",
"bedrock:Retrieve",
"bedrock:RetrieveAndGenerate"
],
"Resource": "*"
},
{
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:ListBucket"
],
"Resource": [
"arn:aws:s3:::your-bucket-name",
"arn:aws:s3:::your-bucket-name/*"
]
},
{
"Effect": "Allow",
"Action": [
"aoss:APIAccessAll"
],
"Resource": "arn:aws:aoss:*:*:collection/*"
}
]
}
SDK Installation:
# Python
pip install boto3
# Node.js
npm install @aws-sdk/client-bedrock-agent @aws-sdk/client-bedrock-agent-runtime
Getting Started
Step 1: Prepare Your Documents
# Create S3 bucket
aws s3 mb s3://my-knowledge-base-docs
# Upload documents
aws s3 cp ./documents/ s3://my-knowledge-base-docs/ --recursive
# Supported formats:
# - PDF
# - TXT
# - MD (Markdown)
# - HTML
# - DOC/DOCX
# - CSV
Step 2: Create Vector Store (OpenSearch Serverless)
import boto3
aoss_client = boto3.client('opensearchserverless')
# Create collection
response = aoss_client.create_collection(
name='bedrock-knowledge-base',
type='VECTORSEARCH',
description='Vector store for Bedrock Knowledge Base'
)
collection_id = response['createCollectionDetail']['id']
collection_arn = response['createCollectionDetail']['arn']
print(f"Collection created: {collection_id}")
Creating a Knowledge Base
Method 1: Using AWS Console
Navigate to Bedrock Console
- Go to AWS Console → Amazon Bedrock → Knowledge bases
- Click "Create knowledge base"
Configure Knowledge Base
- Name:
company-docs-kb - Description: "Company documentation knowledge base"
- IAM role: Create or select existing
- Name:
Configure Data Source
- Source: Amazon S3
- S3 URI:
s3://my-knowledge-base-docs/ - Chunking strategy: Default or custom
Select Embeddings Model
- Model: Amazon Titan Embeddings G1 - Text
- Dimensions: 1536
Configure Vector Store
- Type: OpenSearch Serverless
- Collection: Select your collection
- Index name:
bedrock-kb-index
Create and Sync
- Review settings
- Create knowledge base
- Start ingestion job
Method 2: Using Python SDK
import boto3
import json
bedrock_agent = boto3.client('bedrock-agent', region_name='us-east-1')
# Create knowledge base
response = bedrock_agent.create_knowledge_base(
name='company-docs-kb',
description='Company documentation knowledge base',
roleArn='arn:aws:iam::ACCOUNT_ID:role/BedrockKBRole',
knowledgeBaseConfiguration={
'type': 'VECTOR',
'vectorKnowledgeBaseConfiguration': {
'embeddingModelArn': 'arn:aws:bedrock:us-east-1::foundation-model/amazon.titan-embed-text-v1'
}
},
storageConfiguration={
'type': 'OPENSEARCH_SERVERLESS',
'opensearchServerlessConfiguration': {
'collectionArn': 'arn:aws:aoss:us-east-1:ACCOUNT_ID:collection/COLLECTION_ID',
'vectorIndexName': 'bedrock-kb-index',
'fieldMapping': {
'vectorField': 'vector',
'textField': 'text',
'metadataField': 'metadata'
}
}
}
)
knowledge_base_id = response['knowledgeBase']['knowledgeBaseId']
print(f"Knowledge Base created: {knowledge_base_id}")
Method 3: Complete Setup Script
import boto3
import time
from typing import Dict, Any
class KnowledgeBaseManager:
"""
Helper class for managing Bedrock Knowledge Bases
"""
def __init__(self, region_name='us-east-1'):
self.bedrock_agent = boto3.client('bedrock-agent', region_name=region_name)
self.bedrock_agent_runtime = boto3.client('bedrock-agent-runtime', region_name=region_name)
self.s3 = boto3.client('s3', region_name=region_name)
def create_knowledge_base(
self,
name: str,
description: str,
role_arn: str,
s3_bucket: str,
collection_arn: str,
embedding_model: str = 'amazon.titan-embed-text-v1'
) -> str:
"""
Create a knowledge base with S3 data source
Returns:
Knowledge base ID
"""
# Create knowledge base
kb_response = self.bedrock_agent.create_knowledge_base(
name=name,
description=description,
roleArn=role_arn,
knowledgeBaseConfiguration={
'type': 'VECTOR',
'vectorKnowledgeBaseConfiguration': {
'embeddingModelArn': f'arn:aws:bedrock:us-east-1::foundation-model/{embedding_model}'
}
},
storageConfiguration={
'type': 'OPENSEARCH_SERVERLESS',
'opensearchServerlessConfiguration': {
'collectionArn': collection_arn,
'vectorIndexName': f'{name}-index',
'fieldMapping': {
'vectorField': 'vector',
'textField': 'text',
'metadataField': 'metadata'
}
}
}
)
kb_id = kb_response['knowledgeBase']['knowledgeBaseId']
print(f"✓ Knowledge Base created: {kb_id}")
# Create data source
ds_response = self.bedrock_agent.create_data_source(
knowledgeBaseId=kb_id,
name=f'{name}-s3-source',
description=f'S3 data source for {name}',
dataSourceConfiguration={
'type': 'S3',
's3Configuration': {
'bucketArn': f'arn:aws:s3:::{s3_bucket}'
}
},
vectorIngestionConfiguration={
'chunkingConfiguration': {
'chunkingStrategy': 'FIXED_SIZE',
'fixedSizeChunkingConfiguration': {
'maxTokens': 300,
'overlapPercentage': 20
}
}
}
)
data_source_id = ds_response['dataSource']['dataSourceId']
print(f"✓ Data Source created: {data_source_id}")
return kb_id, data_source_id
def start_ingestion(self, kb_id: str, data_source_id: str) -> str:
"""
Start ingestion job to process documents
"""
response = self.bedrock_agent.start_ingestion_job(
knowledgeBaseId=kb_id,
dataSourceId=data_source_id
)
job_id = response['ingestionJob']['ingestionJobId']
print(f"✓ Ingestion job started: {job_id}")
return job_id
def wait_for_ingestion(self, kb_id: str, data_source_id: str, job_id: str):
"""
Wait for ingestion job to complete
"""
print("Waiting for ingestion to complete...")
while True:
response = self.bedrock_agent.get_ingestion_job(
knowledgeBaseId=kb_id,
dataSourceId=data_source_id,
ingestionJobId=job_id
)
status = response['ingestionJob']['status']
print(f"Status: {status}")
if status == 'COMPLETE':
print("✓ Ingestion completed successfully")
break
elif status == 'FAILED':
print("✗ Ingestion failed")
print(f"Failure reasons: {response['ingestionJob'].get('failureReasons', [])}")
break
time.sleep(10)
def query(self, kb_id: str, query: str, num_results: int = 5) -> Dict:
"""
Query the knowledge base (retrieval only)
"""
response = self.bedrock_agent_runtime.retrieve(
knowledgeBaseId=kb_id,
retrievalQuery={
'text': query
},
retrievalConfiguration={
'vectorSearchConfiguration': {
'numberOfResults': num_results
}
}
)
return response['retrievalResults']
def query_and_generate(
self,
kb_id: str,
query: str,
model_id: str = 'anthropic.claude-3-sonnet-20240229-v1:0'
) -> Dict:
"""
Query knowledge base and generate response (RAG)
"""
response = self.bedrock_agent_runtime.retrieve_and_generate(
input={
'text': query
},
retrieveAndGenerateConfiguration={
'type': 'KNOWLEDGE_BASE',
'knowledgeBaseConfiguration': {
'knowledgeBaseId': kb_id,
'modelArn': f'arn:aws:bedrock:us-east-1::foundation-model/{model_id}'
}
}
)
return {
'output': response['output']['text'],
'citations': response.get('citations', [])
}
# Usage Example
manager = KnowledgeBaseManager()
# Create knowledge base
kb_id, ds_id = manager.create_knowledge_base(
name='company-docs',
description='Company documentation',
role_arn='arn:aws:iam::ACCOUNT:role/BedrockKBRole',
s3_bucket='my-docs-bucket',
collection_arn='arn:aws:aoss:us-east-1:ACCOUNT:collection/COLLECTION_ID'
)
# Start ingestion
job_id = manager.start_ingestion(kb_id, ds_id)
# Wait for completion
manager.wait_for_ingestion(kb_id, ds_id, job_id)
# Query the knowledge base
result = manager.query_and_generate(
kb_id=kb_id,
query='What is our return policy?'
)
print(f"Answer: {result['output']}")
print(f"Sources: {len(result['citations'])} citations")
Data Sources
S3 Data Source
Most common data source for documents:
# Configure S3 data source
s3_data_source = {
'type': 'S3',
's3Configuration': {
'bucketArn': 'arn:aws:s3:::my-docs-bucket',
'inclusionPrefixes': ['docs/', 'manuals/'], # Optional: specific folders
'exclusionPrefixes': ['archive/'] # Optional: exclude folders
}
}
# Create data source
response = bedrock_agent.create_data_source(
knowledgeBaseId=kb_id,
name='s3-documents',
dataSourceConfiguration=s3_data_source,
vectorIngestionConfiguration={
'chunkingConfiguration': {
'chunkingStrategy': 'FIXED_SIZE',
'fixedSizeChunkingConfiguration': {
'maxTokens': 300,
'overlapPercentage': 20
}
}
}
)
Web Crawler Data Source
Crawl websites for documentation:
# Configure web crawler
web_crawler_config = {
'type': 'WEB',
'webConfiguration': {
'crawlerConfiguration': {
'crawlerLimits': {
'rateLimit': 300 # Pages per minute
},
'inclusionFilters': [
'https://docs.example.com/*'
],
'exclusionFilters': [
'*/archive/*',
'*/old/*'
],
'scope': 'HOST_ONLY' # or 'SUBDOMAINS'
},
'sourceConfiguration': {
'urlConfiguration': {
'seedUrls': [
{'url': 'https://docs.example.com'}
]
}
}
}
}
Confluence Data Source
# Configure Confluence
confluence_config = {
'type': 'CONFLUENCE',
'confluenceConfiguration': {
'sourceConfiguration': {
'hostUrl': 'https://your-domain.atlassian.net',
'hostType': 'SAAS', # or 'SERVER'
'authType': 'BASIC',
'credentialsSecretArn': 'arn:aws:secretsmanager:...'
},
'crawlerConfiguration': {
'filterConfiguration': {
'type': 'PATTERN',
'patternObjectFilter': {
'filters': [
{
'objectType': 'Space',
'inclusionFilters': ['DOCS', 'TECH'],
'exclusionFilters': ['ARCHIVE']
}
]
}
}
}
}
}
SharePoint Data Source
# Configure SharePoint
sharepoint_config = {
'type': 'SHAREPOINT',
'sharePointConfiguration': {
'sourceConfiguration': {
'hostType': 'ONLINE', # or 'SERVER'
'domain': 'your-domain',
'siteUrls': [
'https://your-domain.sharepoint.com/sites/docs'
],
'tenantId': 'your-tenant-id',
'credentialsSecretArn': 'arn:aws:secretsmanager:...'
},
'crawlerConfiguration': {
'filterConfiguration': {
'type': 'PATTERN',
'patternObjectFilter': {
'filters': [
{
'objectType': 'Document',
'inclusionFilters': ['*.pdf', '*.docx']
}
]
}
}
}
}
}
Salesforce Data Source
# Configure Salesforce
salesforce_config = {
'type': 'SALESFORCE',
'salesforceConfiguration': {
'sourceConfiguration': {
'hostUrl': 'https://your-domain.salesforce.com',
'authType': 'OAUTH2',
'credentialsSecretArn': 'arn:aws:secretsmanager:...'
},
'crawlerConfiguration': {
'filterConfiguration': {
'type': 'PATTERN',
'patternObjectFilter': {
'filters': [
{
'objectType': 'Knowledge',
'inclusionFilters': ['Published']
},
{
'objectType': 'Case',
'inclusionFilters': ['Closed']
}
]
}
}
}
}
}
Querying Knowledge Bases
Method 1: Retrieve Only (No Generation)
Get relevant documents without generating a response:
import boto3
bedrock_agent_runtime = boto3.client('bedrock-agent-runtime')
def retrieve_documents(kb_id: str, query: str, num_results: int = 5):
"""
Retrieve relevant documents from knowledge base
"""
response = bedrock_agent_runtime.retrieve(
knowledgeBaseId=kb_id,
retrievalQuery={'text': query},
retrievalConfiguration={
'vectorSearchConfiguration': {
'numberOfResults': num_results,
'overrideSearchType': 'HYBRID' # HYBRID, SEMANTIC, or None
}
}
)
results = []
for item in response['retrievalResults']:
results.append({
'content': item['content']['text'],
'score': item['score'],
'location': item['location'],
'metadata': item.get('metadata', {})
})
return results
# Usage
documents = retrieve_documents(
kb_id='YOUR_KB_ID',
query='What is the refund policy?',
num_results=3
)
for i, doc in enumerate(documents, 1):
print(f"\n--- Document {i} (Score: {doc['score']:.4f}) ---")
print(doc['content'])
print(f"Source: {doc['location']}")
Method 2: Retrieve and Generate (RAG)
Get an AI-generated answer based on retrieved documents:
def query_with_rag(kb_id: str, query: str, model_id: str = 'anthropic.claude-3-sonnet-20240229-v1:0'):
"""
Query knowledge base and generate answer using RAG
"""
response = bedrock_agent_runtime.retrieve_and_generate(
input={'text': query},
retrieveAndGenerateConfiguration={
'type': 'KNOWLEDGE_BASE',
'knowledgeBaseConfiguration': {
'knowledgeBaseId': kb_id,
'modelArn': f'arn:aws:bedrock:us-east-1::foundation-model/{model_id}',
'retrievalConfiguration': {
'vectorSearchConfiguration': {
'numberOfResults': 5,
'overrideSearchType': 'HYBRID'
}
},
'generationConfiguration': {
'promptTemplate': {
'textPromptTemplate': '''You are a helpful assistant. Answer the question based only on the provided context.
Context:
$search_results$
Question: $query$
If the context doesn't contain the answer, say "I don't have enough information to answer that question."
Answer:'''
},
'inferenceConfig': {
'textInferenceConfig': {
'temperature': 0.5,
'topP': 0.9,
'maxTokens': 1000
}
}
}
}
}
)
return {
'answer': response['output']['text'],
'citations': response.get('citations', []),
'session_id': response.get('sessionId')
}
# Usage
result = query_with_rag(
kb_id='YOUR_KB_ID',
query='How do I reset my password?'
)
print(f"Answer: {result['answer']}\n")
# Print citations
print("Sources:")
for i, citation in enumerate(result['citations'], 1):
for ref in citation.get('retrievedReferences', []):
print(f"{i}. {ref['location']['s3Location']['uri']}")
print(f" Excerpt: {ref['content']['text'][:100]}...")
Method 3: Conversational RAG (Multi-Turn)
Maintain context across multiple queries:
class ConversationalKB:
"""
Conversational interface to Knowledge Base
"""
def __init__(self, kb_id: str, model_id: str = 'anthropic.claude-3-sonnet-20240229-v1:0'):
self.kb_id = kb_id
self.model_id = model_id
self.session_id = None
self.runtime = boto3.client('bedrock-agent-runtime')
def ask(self, question: str) -> Dict:
"""
Ask a question with conversation history
"""
config = {
'input': {'text': question},
'retrieveAndGenerateConfiguration': {
'type': 'KNOWLEDGE_BASE',
'knowledgeBaseConfiguration': {
'knowledgeBaseId': self.kb_id,
'modelArn': f'arn:aws:bedrock:us-east-1::foundation-model/{self.model_id}'
}
}
}
# Include session ID for follow-up questions
if self.session_id:
config['sessionId'] = self.session_id
response = self.runtime.retrieve_and_generate(**config)
# Store session ID for next question
self.session_id = response.get('sessionId')
return {
'answer': response['output']['text'],
'citations': response.get('citations', [])
}
def reset(self):
"""Reset conversation"""
self.session_id = None
# Usage
chat = ConversationalKB(kb_id='YOUR_KB_ID')
# First question
result1 = chat.ask("What products do you offer?")
print(f"Q1: {result1['answer']}\n")
# Follow-up question (maintains context)
result2 = chat.ask("What's the price of the first one?")
print(f"Q2: {result2['answer']}\n")
# Another follow-up
result3 = chat.ask("Is it available in blue?")
print(f"Q3: {result3['answer']}\n")
# Reset conversation
chat.reset()
RAG (Retrieval-Augmented Generation)
Understanding RAG
RAG combines retrieval and generation:
- Retrieve: Find relevant documents from knowledge base
- Augment: Add retrieved context to the prompt
- Generate: Model generates answer based on context
Custom RAG Implementation
def custom_rag(kb_id: str, query: str, model_id: str):
"""
Custom RAG implementation with full control
"""
runtime = boto3.client('bedrock-agent-runtime')
bedrock_runtime = boto3.client('bedrock-runtime')
# Step 1: Retrieve relevant documents
retrieve_response = runtime.retrieve(
knowledgeBaseId=kb_id,
retrievalQuery={'text': query},
retrievalConfiguration={
'vectorSearchConfiguration': {
'numberOfResults': 5
}
}
)
# Step 2: Extract and format context
contexts = []
sources = []
for result in retrieve_response['retrievalResults']:
contexts.append(result['content']['text'])
sources.append({
'uri': result['location']['s3Location']['uri'],
'score': result['score']
})
context_text = "\n\n---\n\n".join(contexts)
# Step 3: Create prompt with context
prompt = f"""Answer the question based on the following context. If the answer is not in the context, say so.
Context:
{context_text}
Question: {query}
Answer:"""
# Step 4: Generate response
response = bedrock_runtime.converse(
modelId=model_id,
messages=[
{
'role': 'user',
'content': [{'text': prompt}]
}
],
inferenceConfig={
'temperature': 0.5,
'maxTokens': 1000
}
)
answer = response['output']['message']['content'][0]['text']
return {
'answer': answer,
'sources': sources,
'context_used': context_text
}
# Usage
result = custom_rag(
kb_id='YOUR_KB_ID',
query='What is the warranty period?',
model_id='anthropic.claude-3-sonnet-20240229-v1:0'
)
print(f"Answer: {result['answer']}\n")
print("Sources:")
for source in result['sources']:
print(f" - {source['uri']} (score: {source['score']:.4f})")
Advanced RAG with Reranking
def rag_with_reranking(kb_id: str, query: str, model_id: str):
"""
RAG with custom reranking logic
"""
runtime = boto3.client('bedrock-agent-runtime')
bedrock_runtime = boto3.client('bedrock-runtime')
# Retrieve more documents than needed
retrieve_response = runtime.retrieve(
knowledgeBaseId=kb_id,
retrievalQuery={'text': query},
retrievalConfiguration={
'vectorSearchConfiguration': {
'numberOfResults': 10 # Get more for reranking
}
}
)
# Rerank documents using model
reranked_docs = []
for result in retrieve_response['retrievalResults']:
# Ask model to score relevance
relevance_prompt = f"""On a scale of 1-10, how relevant is this document to the question?
Question: {query}
Document: {result['content']['text'][:500]}
Respond with only a number 1-10."""
score_response = bedrock_runtime.converse(
modelId=model_id,
messages=[{'role': 'user', 'content': [{'text': relevance_prompt}]}],
inferenceConfig={'temperature': 0.1, 'maxTokens': 10}
)
try:
relevance_score = float(score_response['output']['message']['content'][0]['text'].strip())
except:
relevance_score = result['score']
reranked_docs.append({
'content': result['content']['text'],
'original_score': result['score'],
'relevance_score': relevance_score,
'location': result['location']
})
# Sort by relevance score
reranked_docs.sort(key=lambda x: x['relevance_score'], reverse=True)
# Use top 3 documents
top_docs = reranked_docs[:3]
context_text = "\n\n---\n\n".join([doc['content'] for doc in top_docs])
# Generate answer
prompt = f"""Answer based on this context:
{context_text}
Question: {query}
Answer:"""
response = bedrock_runtime.converse(
modelId=model_id,
messages=[{'role': 'user', 'content': [{'text': prompt}]}],
inferenceConfig={'temperature': 0.5, 'maxTokens': 1000}
)
return {
'answer': response['output']['message']['content'][0]['text'],
'top_documents': top_docs
}
RAG with Citation Tracking
def rag_with_citations(kb_id: str, query: str):
"""
RAG that tracks and formats citations
"""
runtime = boto3.client('bedrock-agent-runtime')
response = runtime.retrieve_and_generate(
input={'text': query},
retrieveAndGenerateConfiguration={
'type': 'KNOWLEDGE_BASE',
'knowledgeBaseConfiguration': {
'knowledgeBaseId': kb_id,
'modelArn': 'arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-3-sonnet-20240229-v1:0'
}
}
)
answer = response['output']['text']
citations = response.get('citations', [])
# Format answer with inline citations
formatted_answer = answer
citation_list = []
for i, citation in enumerate(citations, 1):
for ref in citation.get('retrievedReferences', []):
# Extract source info
location = ref['location']['s3Location']['uri']
excerpt = ref['content']['text']
citation_list.append({
'number': i,
'source': location,
'excerpt': excerpt
})
# Add citations at the end
if citation_list:
formatted_answer += "\n\n**Sources:**\n"
for cite in citation_list:
formatted_answer += f"\n[{cite['number']}] {cite['source']}"
formatted_answer += f"\n \"{cite['excerpt'][:100]}...\"\n"
return formatted_answer
# Usage
answer_with_citations = rag_with_citations(
kb_id='YOUR_KB_ID',
query='What are the shipping options?'
)
print(answer_with_citations)
Advanced Features
Custom Chunking Strategies
# Fixed size chunking
fixed_chunking = {
'chunkingStrategy': 'FIXED_SIZE',
'fixedSizeChunkingConfiguration': {
'maxTokens': 300,
'overlapPercentage': 20
}
}
# Hierarchical chunking (for structured documents)
hierarchical_chunking = {
'chunkingStrategy': 'HIERARCHICAL',
'hierarchicalChunkingConfiguration': {
'levelConfigurations': [
{
'maxTokens': 1500
},
{
'maxTokens': 300
}
],
'overlapTokens': 60
}
}
# Semantic chunking (groups related content)
semantic_chunking = {
'chunkingStrategy': 'SEMANTIC',
'semanticChunkingConfiguration': {
'maxTokens': 300,
'bufferSize': 0,
'breakpointPercentileThreshold': 95
}
}
# No chunking (use entire document)
no_chunking = {
'chunkingStrategy': 'NONE'
}
Metadata Filtering
def query_with_metadata_filter(kb_id: str, query: str, filters: dict):
"""
Query with metadata filters
"""
runtime = boto3.client('bedrock-agent-runtime')
response = runtime.retrieve(
knowledgeBaseId=kb_id,
retrievalQuery={'text': query},
retrievalConfiguration={
'vectorSearchConfiguration': {
'numberOfResults': 5,
'filter': {
'andAll': [
{
'equals': {
'key': 'department',
'value': filters.get('department')
}
},
{
'greaterThan': {
'key': 'year',
'value': filters.get('min_year')
}
}
]
}
}
}
)
return response['retrievalResults']
# Usage
results = query_with_metadata_filter(
kb_id='YOUR_KB_ID',
query='What are the policies?',
filters={
'department': 'HR',
'min_year': 2023
}
)
Hybrid Search
Combine semantic and keyword search:
def hybrid_search(kb_id: str, query: str):
"""
Use hybrid search (semantic + keyword)
"""
runtime = boto3.client('bedrock-agent-runtime')
response = runtime.retrieve(
knowledgeBaseId=kb_id,
retrievalQuery={'text': query},
retrievalConfiguration={
'vectorSearchConfiguration': {
'numberOfResults': 10,
'overrideSearchType': 'HYBRID' # Combines vector and keyword search
}
}
)
return response['retrievalResults']
Custom Prompt Templates
def query_with_custom_prompt(kb_id: str, query: str, prompt_template: str):
"""
Use custom prompt template for generation
"""
runtime = boto3.client('bedrock-agent-runtime')
response = runtime.retrieve_and_generate(
input={'text': query},
retrieveAndGenerateConfiguration={
'type': 'KNOWLEDGE_BASE',
'knowledgeBaseConfiguration': {
'knowledgeBaseId': kb_id,
'modelArn': 'arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-3-sonnet-20240229-v1:0',
'generationConfiguration': {
'promptTemplate': {
'textPromptTemplate': prompt_template
}
}
}
}
)
return response['output']['text']
# Custom template for technical documentation
technical_template = '''You are a technical documentation assistant.
Context from documentation:
$search_results$
User Question: $query$
Instructions:
1. Provide accurate technical information
2. Include code examples if relevant
3. Cite specific sections
4. If unsure, say so
Response:'''
answer = query_with_custom_prompt(
kb_id='YOUR_KB_ID',
query='How do I configure the API?',
prompt_template=technical_template
)
Best Practices
1. Document Preparation
# ✅ Good document structure
"""
# Product Documentation
## Overview
Clear introduction...
## Features
- Feature 1: Description
- Feature 2: Description
## Installation
Step-by-step instructions...
## API Reference
Detailed API docs...
"""
# ❌ Poor document structure
"""
lots of text without structure or headings everything runs together
making it hard to chunk effectively...
"""
2. Optimal Chunking
# Choose chunking based on document type
chunking_strategies = {
'technical_docs': {
'strategy': 'HIERARCHICAL',
'max_tokens': 300,
'overlap': 20
},
'articles': {
'strategy': 'SEMANTIC',
'max_tokens': 400,
'overlap': 15
},
'structured_data': {
'strategy': 'FIXED_SIZE',
'max_tokens': 200,
'overlap': 10
}
}
3. Metadata Best Practices
Add rich metadata to documents:
# Add metadata to S3 objects
s3 = boto3.client('s3')
s3.put_object(
Bucket='my-kb-bucket',
Key='docs/product-guide.pdf',
Body=file_content,
Metadata={
'document-type': 'product-guide',
'department': 'engineering',
'version': '2.0',
'last-updated': '2024-01-15',
'author': 'tech-team',
'tags': 'api,integration,setup'
}
)
4. Query Optimization
def optimize_query(user_query: str) -> str:
"""
Optimize user query for better retrieval
"""
# Expand abbreviations
expansions = {
'API': 'Application Programming Interface',
'FAQ': 'Frequently Asked Questions',
'KB': 'Knowledge Base'
}
optimized = user_query
for abbr, full in expansions.items():
optimized = optimized.replace(abbr, f"{abbr} {full}")
# Add context
if '?' not in optimized:
optimized += '?'
return optimized
# Usage
original = "How to use API"
optimized = optimize_query(original)
# Result: "How to use API Application Programming Interface?"
5. Monitoring and Maintenance
class KBMonitor:
"""
Monitor knowledge base performance
"""
def __init__(self, kb_id: str):
self.kb_id = kb_id
self.agent = boto3.client('bedrock-agent')
self.metrics = []
def check_ingestion_status(self):
"""
Check status of data sources
"""
response = self.agent.list_data_sources(
knowledgeBaseId=self.kb_id
)
for ds in response['dataSourceSummaries']:
print(f"Data Source: {ds['name']}")
print(f" Status: {ds['status']}")
print(f" Last Updated: {ds.get('updatedAt', 'N/A')}")
def track_query_performance(self, query: str, response_time: float, relevant: bool):
"""
Track query performance metrics
"""
self.metrics.append({
'query': query,
'response_time': response_time,
'relevant': relevant,
'timestamp': datetime.now()
})
def get_statistics(self):
"""
Get performance statistics
"""
if not self.metrics:
return {}
return {
'total_queries': len(self.metrics),
'avg_response_time': sum(m['response_time'] for m in self.metrics) / len(self.metrics),
'relevance_rate': sum(1 for m in self.metrics if m['relevant']) / len(self.metrics)
}
def trigger_reingestion(self, data_source_id: str):
"""
Trigger reingestion to update knowledge base
"""
response = self.agent.start_ingestion_job(
knowledgeBaseId=self.kb_id,
dataSourceId=data_source_id
)
return response['ingestionJob']['ingestionJobId']
# Usage
monitor = KBMonitor(kb_id='YOUR_KB_ID')
monitor.check_ingestion_status()
# Track queries
import time
start = time.time()
result = query_kb(kb_id, "What is the return policy?")
elapsed = time.time() - start
monitor.track_query_performance(
query="What is the return policy?",
response_time=elapsed,
relevant=True # Based on user feedback
)
# Get stats
stats = monitor.get_statistics()
print(f"Average response time: {stats['avg_response_time']:.2f}s")
print(f"Relevance rate: {stats['relevance_rate']:.1%}")
6. Cost Optimization
# Optimize number of results retrieved
def cost_optimized_query(kb_id: str, query: str):
"""
Retrieve fewer documents to reduce costs
"""
runtime = boto3.client('bedrock-agent-runtime')
# Start with fewer results
response = runtime.retrieve(
knowledgeBaseId=kb_id,
retrievalQuery={'text': query},
retrievalConfiguration={
'vectorSearchConfiguration': {
'numberOfResults': 3 # Start small
}
}
)
# Check if results are good enough
top_score = response['retrievalResults'][0]['score']
if top_score < 0.7:
# If confidence is low, retrieve more
response = runtime.retrieve(
knowledgeBaseId=kb_id,
retrievalQuery={'text': query},
retrievalConfiguration={
'vectorSearchConfiguration': {
'numberOfResults': 5
}
}
)
return response['retrievalResults']
7. Security Best Practices
# Use IAM policies for access control
kb_access_policy = {
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::ACCOUNT:role/AppRole"
},
"Action": [
"bedrock:Retrieve",
"bedrock:RetrieveAndGenerate"
],
"Resource": "arn:aws:bedrock:us-east-1:ACCOUNT:knowledge-base/KB_ID"
}
]
}
# Encrypt data at rest
storage_config = {
'type': 'OPENSEARCH_SERVERLESS',
'opensearchServerlessConfiguration': {
'collectionArn': 'arn:aws:aoss:...',
'vectorIndexName': 'index',
'fieldMapping': {...}
}
}
# Use VPC endpoints for private access
# Configure OpenSearch Serverless with VPC access
AWS Code Samples and Resources
Official Documentation
Bedrock Knowledge Bases User Guide
- URL: https://docs.aws.amazon.com/bedrock/latest/userguide/knowledge-base.html
- Complete guide to creating and using knowledge bases
API Reference
- URL: https://docs.aws.amazon.com/bedrock/latest/APIReference/APIOperationsAgentsforAmazon_Bedrock.html
- Detailed API documentation
RAG Best Practices
- URL: https://docs.aws.amazon.com/bedrock/latest/userguide/kb-test.html
- Guidelines for optimal RAG implementation
AWS Samples Repository
# Clone AWS samples
git clone https://github.com/aws-samples/amazon-bedrock-samples.git
cd amazon-bedrock-samples/knowledge-bases
# Key examples:
# - knowledge-base-with-s3/
# - rag-examples/
# - custom-chunking/
# - metadata-filtering/
AWS Workshops
Bedrock Workshop - Knowledge Bases Module
- URL: https://catalog.workshops.aws/amazon-bedrock/en-US/knowledge-bases
RAG with Bedrock Workshop
- Hands-on labs for building RAG applications
Community Resources
AWS re:Post - URL: https://repost.aws/tags/TA4kkYBfVxQ_2R5Xt8jXZDdQ/amazon-bedrock - Community Q&A
AWS Blog Posts - Search: "Amazon Bedrock Knowledge Bases" on aws.amazon.com/blogs/
AWS CLI Examples
# Create knowledge base
aws bedrock-agent create-knowledge-base \
--name "my-kb" \
--role-arn "arn:aws:iam::ACCOUNT:role/BedrockKBRole" \
--knowledge-base-configuration file://kb-config.json \
--storage-configuration file://storage-config.json \
--region us-east-1
# Create data source
aws bedrock-agent create-data-source \
--knowledge-base-id KB_ID \
--name "s3-source" \
--data-source-configuration file://ds-config.json \
--region us-east-1
# Start ingestion
aws bedrock-agent start-ingestion-job \
--knowledge-base-id KB_ID \
--data-source-id DS_ID \
--region us-east-1
# Query knowledge base
aws bedrock-agent-runtime retrieve \
--knowledge-base-id KB_ID \
--retrieval-query text="What is the return policy?" \
--region us-east-1
# RAG query
aws bedrock-agent-runtime retrieve-and-generate \
--input text="What is the return policy?" \
--retrieve-and-generate-configuration file://rag-config.json \
--region us-east-1
CloudFormation Template
AWSTemplateFormatVersion: '2010-09-09'
Description: 'Bedrock Knowledge Base Infrastructure'
Resources:
DocumentsBucket:
Type: AWS::S3::Bucket
Properties:
BucketName: !Sub '${AWS::StackName}-kb-docs'
VersioningConfiguration:
Status: Enabled
PublicAccessBlockConfiguration:
BlockPublicAcls: true
BlockPublicPolicy: true
IgnorePublicAcls: true
RestrictPublicBuckets: true
OpenSearchCollection:
Type: AWS::OpenSearchServerless::Collection
Properties:
Name: !Sub '${AWS::StackName}-kb-collection'
Type: VECTORSEARCH
Description: Vector store for Knowledge Base
KnowledgeBaseRole:
Type: AWS::IAM::Role
Properties:
AssumeRolePolicyDocument:
Version: '2012-10-17'
Statement:
- Effect: Allow
Principal:
Service: bedrock.amazonaws.com
Action: sts:AssumeRole
ManagedPolicyArns:
- arn:aws:iam::aws:policy/AmazonBedrockFullAccess
Policies:
- PolicyName: S3Access
PolicyDocument:
Version: '2012-10-17'
Statement:
- Effect: Allow
Action:
- s3:GetObject
- s3:ListBucket
Resource:
- !GetAtt DocumentsBucket.Arn
- !Sub '${DocumentsBucket.Arn}/*'
- PolicyName: OpenSearchAccess
PolicyDocument:
Version: '2012-10-17'
Statement:
- Effect: Allow
Action:
- aoss:APIAccessAll
Resource: !GetAtt OpenSearchCollection.Arn
Outputs:
BucketName:
Value: !Ref DocumentsBucket
Description: S3 bucket for documents
CollectionArn:
Value: !GetAtt OpenSearchCollection.Arn
Description: OpenSearch collection ARN
RoleArn:
Value: !GetAtt KnowledgeBaseRole.Arn
Description: IAM role for Knowledge Base
Troubleshooting
Issue 1: Ingestion Fails
Problem: Documents not being ingested
Solution:
def diagnose_ingestion_failure(kb_id: str, ds_id: str, job_id: str):
"""
Diagnose ingestion failures
"""
agent = boto3.client('bedrock-agent')
# Get job details
response = agent.get_ingestion_job(
knowledgeBaseId=kb_id,
dataSourceId=ds_id,
ingestionJobId=job_id
)
job = response['ingestionJob']
print(f"Status: {job['status']}")
if job['status'] == 'FAILED':
print("\nFailure Reasons:")
for reason in job.get('failureReasons', []):
print(f" - {reason}")
# Common fixes
print("\nCommon Solutions:")
print("1. Check IAM role permissions")
print("2. Verify S3 bucket access")
print("3. Check document formats (PDF, TXT, MD, HTML, DOC/DOCX)")
print("4. Ensure documents are not corrupted")
print("5. Check OpenSearch collection status")
# Check statistics
stats = job.get('statistics', {})
print(f"\nStatistics:")
print(f" Documents scanned: {stats.get('numberOfDocumentsScanned', 0)}")
print(f" Documents indexed: {stats.get('numberOfNewDocumentsIndexed', 0)}")
print(f" Documents failed: {stats.get('numberOfDocumentsFailed', 0)}")
Issue 2: Poor Retrieval Quality
Problem: Retrieved documents not relevant
Solution:
def improve_retrieval_quality(kb_id: str, query: str):
"""
Strategies to improve retrieval quality
"""
runtime = boto3.client('bedrock-agent-runtime')
# Strategy 1: Use hybrid search
response1 = runtime.retrieve(
knowledgeBaseId=kb_id,
retrievalQuery={'text': query},
retrievalConfiguration={
'vectorSearchConfiguration': {
'numberOfResults': 5,
'overrideSearchType': 'HYBRID'
}
}
)
# Strategy 2: Increase number of results
response2 = runtime.retrieve(
knowledgeBaseId=kb_id,
retrievalQuery={'text': query},
retrievalConfiguration={
'vectorSearchConfiguration': {
'numberOfResults': 10 # Get more, filter later
}
}
)
# Strategy 3: Query expansion
expanded_query = f"{query} related information context details"
response3 = runtime.retrieve(
knowledgeBaseId=kb_id,
retrievalQuery={'text': expanded_query},
retrievalConfiguration={
'vectorSearchConfiguration': {
'numberOfResults': 5
}
}
)
print("Try these strategies:")
print("1. Use hybrid search (semantic + keyword)")
print("2. Retrieve more documents and rerank")
print("3. Expand query with related terms")
print("4. Adjust chunking strategy")
print("5. Add more relevant documents to KB")
Issue 3: Slow Query Performance
Problem: Queries taking too long
Solution:
def optimize_query_performance(kb_id: str, query: str):
"""
Optimize query performance
"""
import time
# Measure baseline
start = time.time()
runtime = boto3.client('bedrock-agent-runtime')
response = runtime.retrieve(
knowledgeBaseId=kb_id,
retrievalQuery={'text': query},
retrievalConfiguration={
'vectorSearchConfiguration': {
'numberOfResults': 3 # Reduce number of results
}
}
)
elapsed = time.time() - start
print(f"Query time: {elapsed:.2f}s")
if elapsed > 2.0:
print("\nOptimization suggestions:")
print("1. Reduce numberOfResults (currently using 3)")
print("2. Use metadata filters to narrow search")
print("3. Check OpenSearch collection performance")
print("4. Consider caching frequent queries")
print("5. Optimize document chunking (smaller chunks = faster)")
Issue 4: Citations Not Showing
Problem: No citations in RAG responses
Solution:
# Ensure you're using retrieve_and_generate (not just retrieve)
response = runtime.retrieve_and_generate(
input={'text': query},
retrieveAndGenerateConfiguration={
'type': 'KNOWLEDGE_BASE',
'knowledgeBaseConfiguration': {
'knowledgeBaseId': kb_id,
'modelArn': 'arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-3-sonnet-20240229-v1:0'
}
}
)
# Citations are in the response
citations = response.get('citations', [])
if not citations:
print("No citations found. Check:")
print("1. Using retrieve_and_generate (not retrieve)")
print("2. Model supports citations")
print("3. Retrieved documents have location metadata")
Conclusion
AWS Bedrock Knowledge Bases provide a powerful RAG solution for grounding AI responses in your proprietary data. By combining semantic search with foundation models, you can build applications that provide accurate, cited, and up-to-date information.
Key Takeaways: - Knowledge Bases enable RAG without managing infrastructure - Support multiple data sources (S3, web, Confluence, SharePoint, Salesforce) - Flexible chunking strategies for different document types - Built-in citation tracking for transparency - Scales to millions of documents
Next Steps: 1. Create your first knowledge base with S3 documents 2. Experiment with different chunking strategies 3. Implement RAG in your application 4. Monitor and optimize retrieval quality 5. Explore advanced features like metadata filtering
For the latest features and updates, refer to the official AWS Bedrock documentation.