AWS Bedrock Knowledge Base

AWS Bedrock Knowledge Bases Guide

Table of Contents


Overview

AWS Bedrock Knowledge Bases enable Retrieval-Augmented Generation (RAG) by allowing foundation models to access and reference your proprietary data. This combines the power of large language models with your specific domain knowledge.

Key Benefits: - Ground model responses in your data - Reduce hallucinations - Keep data up-to-date without retraining - Cite sources for transparency - Maintain data security and privacy - Scale to millions of documents

Is Knowledge Base Limited to RAG?

No! While RAG is the primary use case, Knowledge Bases support multiple patterns:

1. RAG (Retrieval-Augmented Generation) - Most Common

Query → Retrieve Context → Generate Answer with LLM
Use: Q&A, chatbots, content generation with your data

2. Retrieval Only (No Generation)

Query → Retrieve Relevant Documents → Return Documents
Use: Search, document discovery, research

Example:

# Just retrieve documents, no AI generation
results = retrieve(kb_id, "product specifications")
# Returns: List of relevant documents with scores
Query → Find Similar Content → Rank by Relevance
Use: Finding related documents, recommendations

Example:

# Find documents similar to a given document
similar_docs = find_similar(kb_id, reference_document)

4. Hybrid Search (Semantic + Keyword)

Query → Vector Search + Keyword Search → Combined Results
Use: Best of both worlds - meaning and exact matches

Example:

# Combine semantic understanding with exact keyword matching
results = hybrid_search(kb_id, "API authentication", search_type="HYBRID")

5. Metadata Filtering

Query → Filter by Metadata → Retrieve Filtered Results
Use: Department-specific docs, date ranges, categories

Example:

# Retrieve only from specific department and year
results = retrieve_with_filter(
    kb_id, 
    query="policies",
    filters={"department": "HR", "year": 2024}
)

6. Multi-Step Reasoning (Agent Workflows)

Query → Retrieve → Analyze → Retrieve More → Generate
Use: Complex research, multi-document analysis

Example:

# Agent decides when to retrieve more information
agent_response = agent.run(
    query="Compare our products",
    tools=[knowledge_base_retrieval, calculator, web_search]
)

7. Context Injection (Without RAG)

Retrieve Documents → Use in Custom Workflow → Your Processing
Use: Custom pipelines, data extraction, analysis

Example:

# Retrieve docs and process them your way
docs = retrieve(kb_id, "contracts")
for doc in docs:
    extracted_data = custom_extraction(doc)
    store_in_database(extracted_data)

8. Citation and Source Tracking

Query → Retrieve → Generate → Track Sources
Use: Compliance, audit trails, transparency

Example:

# Get answer with full citation trail
response = retrieve_and_generate(kb_id, query)
print(f"Answer: {response['answer']}")
print(f"Sources: {response['citations']}")

Comparison: RAG vs Other Patterns

Pattern Retrieval Generation Use Case
RAG ✅ Yes ✅ Yes Q&A, chatbots, content creation
Retrieval Only ✅ Yes ❌ No Search, document discovery
Semantic Search ✅ Yes ❌ No Finding similar content
Metadata Filtering ✅ Yes ⚠️ Optional Filtered search/retrieval
Agent Workflows ✅ Yes ✅ Yes Complex multi-step tasks
Custom Processing ✅ Yes ❌ No Data extraction, analysis

When to Use Each Pattern

Use RAG when: - You need natural language answers - Users ask questions about your documents - You want conversational interfaces - Citations are important

Use Retrieval Only when: - Users need to see actual documents - You're building a search interface - You want to process documents yourself - You need document metadata

Use Semantic Search when: - Finding related content - Building recommendation systems - Discovering similar documents - Research and exploration

Use Metadata Filtering when: - Documents have categories/tags - Need department-specific results - Time-based filtering (recent docs) - Compliance requirements

Use Agent Workflows when: - Multi-step reasoning needed - Combining multiple data sources - Complex decision-making - Dynamic retrieval strategies

Key Takeaway

Knowledge Bases are flexible retrieval systems. RAG is just one (very popular) way to use them.

Knowledge Base = Smart Document Storage + Retrieval Engine

You can:
✅ Use it for RAG (retrieve + generate)
✅ Use it for search only (retrieve)
✅ Use it for recommendations (similarity)
✅ Use it in custom workflows (your logic)
✅ Combine multiple patterns

The power is in having your documents indexed and searchable - what you do with the retrieved information is up to you!

What are Knowledge Bases?

The Core Purpose: Providing Custom Context to LLMs

Yes! The primary purpose of Knowledge Bases is to provide custom, domain-specific context to Large Language Models (LLMs).

The Problem Knowledge Bases Solve

Without Knowledge Bases:

User: "What is our company's return policy?"
   ↓
LLM (trained on general internet data)
   ↓
Response: "I don't have information about your specific company's 
          return policy. Return policies typically vary by company..."

Problem: The LLM doesn't know YOUR specific information

With Knowledge Bases:

User: "What is our company's return policy?"
   ↓
Knowledge Base retrieves relevant documents
   ↓
LLM receives: User question + Your company's actual policy document
   ↓
Response: "According to your company policy, customers can return 
          items within 30 days with receipt for full refund..."

Solution: The LLM now has YOUR specific context

How Knowledge Bases Provide Context

Think of it like giving the LLM a textbook before asking it questions:

┌─────────────────────────────────────────────────────┐
│  YOUR DOCUMENTS (The Custom Context)                │
│  • Company policies                                 │
│  • Product manuals                                  │
│  • Internal documentation                           │
│  • Customer data                                    │
│  • Technical specifications                         │
└─────────────────────────────────────────────────────┘
                    ↓
            [Knowledge Base]
         (Stores & Indexes Documents)
                    ↓
┌─────────────────────────────────────────────────────┐
│  USER ASKS QUESTION                                 │
│  "How do I configure the API?"                      │
└─────────────────────────────────────────────────────┘
                    ↓
┌─────────────────────────────────────────────────────┐
│  RETRIEVAL: Find Relevant Context                   │
│  Searches your documents for relevant information   │
└─────────────────────────────────────────────────────┘
                    ↓
┌─────────────────────────────────────────────────────┐
│  AUGMENTATION: Add Context to Prompt                │
│  "Based on this documentation: [retrieved docs]     │
│   Answer: How do I configure the API?"              │
└─────────────────────────────────────────────────────┘
                    ↓
┌─────────────────────────────────────────────────────┐
│  LLM GENERATES ANSWER                               │
│  Uses YOUR context to provide accurate response     │
└─────────────────────────────────────────────────────┘

Why Custom Context Matters

1. Accuracy

Without context:

question = "What's the warranty period?"
# LLM guesses: "Warranties typically range from 1-3 years..."

With context from Knowledge Base:

question = "What's the warranty period?"
context = "Our products come with a 5-year comprehensive warranty..."
# LLM answers accurately: "Your products have a 5-year warranty..."

2. Specificity

Without context:

question = "How do I reset my password?"
# LLM gives generic steps that may not match your system

With context from Knowledge Base:

question = "How do I reset my password?"
context = "To reset password in our system: 1. Click 'Forgot Password' 
          on login page 2. Enter email 3. Check for reset link..."
# LLM provides YOUR exact steps

3. Up-to-Date Information

Without context:

# LLM training data is from 2023
question = "What are the new features in version 5.0?"
# LLM: "I don't have information about version 5.0"

With context from Knowledge Base:

# Your docs are updated daily
question = "What are the new features in version 5.0?"
context = "Version 5.0 released yesterday includes: AI assistant, 
          dark mode, mobile app..."
# LLM provides current information

4. Proprietary Information

Without context:

question = "What's our pricing for enterprise customers?"
# LLM: "I don't have access to your pricing information"

With context from Knowledge Base:

question = "What's our pricing for enterprise customers?"
context = "Enterprise pricing: $500/month for up to 100 users..."
# LLM provides YOUR pricing

The RAG Pattern: Retrieval-Augmented Generation

Knowledge Bases implement the RAG pattern:

R - RETRIEVAL
    ↓ Find relevant documents from your knowledge base

A - AUGMENTATION  
    ↓ Add retrieved documents as context to the prompt

G - GENERATION
    ↓ LLM generates answer based on YOUR context

Example in Action:

# 1. RETRIEVAL
user_question = "What's the refund process?"
retrieved_docs = knowledge_base.search(user_question)
# Returns: Your company's refund policy document

# 2. AUGMENTATION
augmented_prompt = f"""
Based on this company policy:
{retrieved_docs}

Answer the question: {user_question}
"""

# 3. GENERATION
llm_response = model.generate(augmented_prompt)
# Returns: Accurate answer based on YOUR policy

What Knowledge Bases Are (Technical View)

Knowledge Bases are managed repositories that: - Store your documents - PDFs, text files, web pages, databases - Create embeddings - Vector representations of content for semantic search - Enable semantic search - Find relevant information based on meaning, not just keywords - Integrate with models - Automatically provide context to LLMs - Track sources - Show where information came from (citations) - Update dynamically - Add/remove documents without retraining models

Real-World Analogy

Think of a Knowledge Base like a smart filing cabinet for an AI assistant:

Traditional Approach:
Employee: "What's the vacation policy?"
Manager: "Let me find the employee handbook... [searches files]... 
         Here it is, page 47..."

Knowledge Base Approach:
Employee: "What's the vacation policy?"
AI (with Knowledge Base): [Instantly retrieves policy] 
         "According to the employee handbook, you get 15 days..."

The Knowledge Base gives the AI instant access to your specific documents, just like a manager who has memorized all company policies.

Use Cases for Custom Context

Customer Support:

Context: Product manuals, FAQs, troubleshooting guides
Result: AI answers customer questions accurately

Internal Knowledge Management:

Context: Company policies, procedures, onboarding docs
Result: Employees get instant answers to HR/IT questions

Product Information:

Context: Product catalogs, specifications, pricing
Result: Sales team gets accurate product details

Legal/Compliance:

Context: Contracts, regulations, legal documents
Result: Quick answers to compliance questions

Technical Documentation:

Context: API docs, code examples, architecture diagrams
Result: Developers get accurate technical guidance

Research & Analysis:

Context: Research papers, reports, data
Result: AI synthesizes information from your sources

Key Insight

Knowledge Bases don't change what the LLM knows fundamentally. Instead, they provide temporary, relevant context for each query.

LLM's Base Knowledge (from training):
"I know general information about the world"
        +
Knowledge Base Context (your documents):
"Here's YOUR specific information"
        =
Accurate, Customized Response:
"Answer based on YOUR data"

This is why Knowledge Bases are so powerful - they let you leverage the LLM's language understanding while grounding responses in YOUR truth.

Key Concepts

Knowledge Base

A managed repository that stores, indexes, and retrieves your documents.

Data Source

The origin of your documents (S3, web crawler, Confluence, SharePoint, Salesforce).

Embeddings

Vector representations of text that capture semantic meaning.

Embedding Model

The model used to convert text to vectors (e.g., Amazon Titan Embeddings).

Vector Store

Database that stores embeddings for fast similarity search (OpenSearch, Pinecone, Redis).

Chunking

Breaking documents into smaller pieces for better retrieval.

Retrieval

Finding relevant document chunks based on a query.

RAG (Retrieval-Augmented Generation)

Combining retrieved information with model generation.

Architecture

┌─────────────┐
│  Documents  │
│  (S3, etc)  │
└──────┬──────┘
       │
       ▼
┌─────────────────┐
│  Chunking &     │
│  Embedding      │
└──────┬──────────┘
       │
       ▼
┌─────────────────┐
│  Vector Store   │
│  (OpenSearch)   │
└──────┬──────────┘
       │
       ▼
┌─────────────────┐      ┌──────────────┐
│  User Query     │─────▶│  Retrieval   │
└─────────────────┘      └──────┬───────┘
                                │
                                ▼
                         ┌──────────────┐
                         │  Foundation  │
                         │    Model     │
                         └──────┬───────┘
                                │
                                ▼
                         ┌──────────────┐
                         │   Response   │
                         │ with Sources │
                         └──────────────┘

Vector Store Options for Knowledge Bases

Knowledge Bases require a vector store (also called vector database) to store document embeddings. AWS Bedrock supports multiple vector store options, each with different characteristics.

What is a Vector Store?

A vector store is a specialized database that: - Stores embeddings (numerical representations of text) - Performs similarity search (finds similar vectors quickly) - Scales to millions of documents - Returns results in milliseconds

Simple Analogy:

Traditional Database:
"Find documents where title = 'API Guide'"
→ Exact match only

Vector Store:
"Find documents similar to 'API authentication'"
→ Returns: API Guide, Security Docs, OAuth Tutorial
→ Based on semantic meaning, not exact words

Supported Vector Stores

AWS Bedrock Knowledge Bases support these vector store options:

Vector Store Type Best For Pricing Model
Amazon OpenSearch Serverless Managed Production, recommended Pay per OCU
Amazon OpenSearch Service Managed Existing OpenSearch users Instance-based
Amazon Aurora PostgreSQL Relational DB Existing Aurora users Instance-based
Pinecone Third-party SaaS Multi-cloud, specialized Subscription
Redis Enterprise Cloud Third-party SaaS High performance Subscription
MongoDB Atlas Third-party SaaS Document-oriented Subscription

Overview: - Fully managed, serverless vector search - No infrastructure management - Auto-scaling - Built-in security

When to Use: - ✅ New projects - ✅ Don't want to manage infrastructure - ✅ Variable workloads - ✅ Quick setup

Configuration:

import boto3

# Create OpenSearch Serverless collection
aoss_client = boto3.client('opensearchserverless')

collection = aoss_client.create_collection(
    name='bedrock-kb-collection',
    type='VECTORSEARCH',
    description='Vector store for Knowledge Base'
)

# Use in Knowledge Base
storage_config = {
    'type': 'OPENSEARCH_SERVERLESS',
    'opensearchServerlessConfiguration': {
        'collectionArn': collection['createCollectionDetail']['arn'],
        'vectorIndexName': 'bedrock-kb-index',
        'fieldMapping': {
            'vectorField': 'embedding',
            'textField': 'text',
            'metadataField': 'metadata'
        }
    }
}

Pros: - ✅ No server management - ✅ Auto-scaling - ✅ AWS-native integration - ✅ Built-in security (IAM, encryption) - ✅ Pay only for what you use

Cons: - ❌ Higher cost for consistent high loads - ❌ Cold start latency possible - ❌ Less control over configuration

Pricing: - Based on OpenSearch Compute Units (OCUs) - ~$0.24 per OCU-hour - Minimum 2 OCUs for indexing, 2 for search

2. Amazon OpenSearch Service

Overview: - Managed OpenSearch clusters - Full control over configuration - Predictable pricing

When to Use: - ✅ Already using OpenSearch - ✅ Need fine-grained control - ✅ Consistent high workloads - ✅ Custom plugins/configurations

Configuration:

storage_config = {
    'type': 'OPENSEARCH',
    'opensearchConfiguration': {
        'endpoint': 'https://my-domain.us-east-1.es.amazonaws.com',
        'vectorIndexName': 'bedrock-kb-index',
        'fieldMapping': {
            'vectorField': 'embedding',
            'textField': 'text',
            'metadataField': 'metadata'
        }
    }
}

Pros: - ✅ Full control over cluster - ✅ Predictable costs - ✅ Better for sustained high loads - ✅ Advanced OpenSearch features

Cons: - ❌ Must manage cluster - ❌ Manual scaling - ❌ Pay for reserved capacity

Pricing: - Instance-based (e.g., r6g.large.search) - ~$0.162/hour per instance - Storage: ~$0.135/GB-month

3. Amazon Aurora PostgreSQL (with pgvector)

Overview: - Relational database with vector extension - Combines structured and vector data - Familiar SQL interface

When to Use: - ✅ Already using Aurora PostgreSQL - ✅ Need relational + vector data together - ✅ SQL-based workflows - ✅ Transactional consistency

Configuration:

storage_config = {
    'type': 'RDS',
    'rdsConfiguration': {
        'resourceArn': 'arn:aws:rds:us-east-1:ACCOUNT:cluster:my-aurora-cluster',
        'credentialsSecretArn': 'arn:aws:secretsmanager:...',
        'databaseName': 'knowledge_base',
        'tableName': 'embeddings',
        'fieldMapping': {
            'primaryKeyField': 'id',
            'vectorField': 'embedding',
            'textField': 'text',
            'metadataField': 'metadata'
        }
    }
}

Setup pgvector:

-- Enable pgvector extension
CREATE EXTENSION IF NOT EXISTS vector;

-- Create table for embeddings
CREATE TABLE embeddings (
    id SERIAL PRIMARY KEY,
    text TEXT,
    embedding vector(1536),  -- Dimension depends on embedding model
    metadata JSONB,
    created_at TIMESTAMP DEFAULT NOW()
);

-- Create index for fast similarity search
CREATE INDEX ON embeddings USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);

Pros: - ✅ Combine relational and vector data - ✅ SQL queries - ✅ ACID transactions - ✅ Familiar PostgreSQL ecosystem

Cons: - ❌ Not specialized for vectors - ❌ Slower than dedicated vector DBs at scale - ❌ More complex setup

Pricing: - Aurora instance pricing - ~$0.29/hour for db.r6g.large - Storage: ~$0.10/GB-month

4. Pinecone

Overview: - Purpose-built vector database (SaaS) - Specialized for vector search - Multi-cloud support

When to Use: - ✅ Multi-cloud strategy - ✅ Need specialized vector features - ✅ Want managed service outside AWS - ✅ High-performance requirements

Configuration:

storage_config = {
    'type': 'PINECONE',
    'pineconeConfiguration': {
        'connectionString': 'https://your-index.pinecone.io',
        'credentialsSecretArn': 'arn:aws:secretsmanager:...',
        'namespace': 'bedrock-kb',
        'fieldMapping': {
            'textField': 'text',
            'metadataField': 'metadata'
        }
    }
}

Pros: - ✅ Purpose-built for vectors - ✅ Excellent performance - ✅ Simple API - ✅ Good documentation

Cons: - ❌ Third-party dependency - ❌ Data leaves AWS - ❌ Additional vendor relationship - ❌ Subscription costs

Pricing: - Starter: Free (1M vectors) - Standard: ~$70/month (5M vectors) - Enterprise: Custom pricing

5. Redis Enterprise Cloud

Overview: - In-memory database with vector search - Extremely fast retrieval - Real-time performance

When to Use: - ✅ Need ultra-low latency - ✅ Already using Redis - ✅ Real-time applications - ✅ Caching + vector search

Configuration:

storage_config = {
    'type': 'REDIS_ENTERPRISE_CLOUD',
    'redisEnterpriseCloudConfiguration': {
        'endpoint': 'redis-12345.c1.us-east-1-1.ec2.cloud.redislabs.com:12345',
        'credentialsSecretArn': 'arn:aws:secretsmanager:...',
        'vectorIndexName': 'bedrock-kb-idx',
        'fieldMapping': {
            'vectorField': 'embedding',
            'textField': 'text',
            'metadataField': 'metadata'
        }
    }
}

Pros: - ✅ Extremely fast (in-memory) - ✅ Sub-millisecond latency - ✅ Redis ecosystem - ✅ Caching + vectors

Cons: - ❌ More expensive (in-memory) - ❌ Third-party service - ❌ Data size limited by memory

Pricing: - Based on memory and throughput - ~$0.119/GB-hour - Minimum ~$100/month

6. MongoDB Atlas

Overview: - Document database with vector search - Combines documents and vectors - Flexible schema

When to Use: - ✅ Already using MongoDB - ✅ Document-oriented data - ✅ Flexible schema needs - ✅ JSON-native workflows

Configuration:

storage_config = {
    'type': 'MONGO_DB_ATLAS',
    'mongoDbAtlasConfiguration': {
        'endpoint': 'mongodb+srv://cluster.mongodb.net',
        'credentialsSecretArn': 'arn:aws:secretsmanager:...',
        'databaseName': 'knowledge_base',
        'collectionName': 'embeddings',
        'vectorIndexName': 'vector_index',
        'fieldMapping': {
            'vectorField': 'embedding',
            'textField': 'text',
            'metadataField': 'metadata'
        }
    }
}

Pros: - ✅ Document + vector in one DB - ✅ Flexible schema - ✅ MongoDB ecosystem - ✅ Good for unstructured data

Cons: - ❌ Not specialized for vectors - ❌ Third-party service - ❌ Can be expensive at scale

Pricing: - Serverless: Pay per operation - Dedicated: ~$0.08/hour (M10) - Storage: ~$0.25/GB-month

Comparison Matrix

Feature OpenSearch Serverless OpenSearch Service Aurora PostgreSQL Pinecone Redis MongoDB
Setup Complexity ⭐ Easy ⭐⭐ Medium ⭐⭐⭐ Complex ⭐ Easy ⭐⭐ Medium ⭐⭐ Medium
Performance ⭐⭐⭐⭐ Good ⭐⭐⭐⭐ Good ⭐⭐⭐ Medium ⭐⭐⭐⭐⭐ Excellent ⭐⭐⭐⭐⭐ Excellent ⭐⭐⭐ Medium
Scalability ⭐⭐⭐⭐⭐ Auto ⭐⭐⭐⭐ Manual ⭐⭐⭐ Limited ⭐⭐⭐⭐⭐ Auto ⭐⭐⭐⭐ Good ⭐⭐⭐⭐ Good
Cost (Small) ⭐⭐⭐ Medium ⭐⭐ High ⭐⭐⭐ Medium ⭐⭐⭐⭐ Low ⭐⭐ High ⭐⭐⭐ Medium
Cost (Large) ⭐⭐ High ⭐⭐⭐⭐ Low ⭐⭐⭐ Medium ⭐⭐⭐ Medium ⭐ Very High ⭐⭐⭐ Medium
AWS Integration ⭐⭐⭐⭐⭐ Native ⭐⭐⭐⭐⭐ Native ⭐⭐⭐⭐⭐ Native ⭐⭐⭐ Good ⭐⭐⭐ Good ⭐⭐⭐ Good
Management ⭐⭐⭐⭐⭐ Fully Managed ⭐⭐⭐ Managed ⭐⭐⭐ Managed ⭐⭐⭐⭐⭐ Fully Managed ⭐⭐⭐⭐ Managed ⭐⭐⭐⭐ Managed

Decision Guide

Choose OpenSearch Serverless if: - Starting a new project - Want simplicity and AWS-native - Variable workloads - Don't want to manage infrastructure

Choose OpenSearch Service if: - Already using OpenSearch - Need full control - Consistent high workloads - Want predictable costs

Choose Aurora PostgreSQL if: - Already using Aurora/PostgreSQL - Need relational + vector data - Want SQL interface - Need ACID transactions

Choose Pinecone if: - Need best-in-class vector performance - Multi-cloud strategy - Want specialized vector features - Willing to use third-party service

Choose Redis if: - Need ultra-low latency - Real-time requirements - Already using Redis - Budget for in-memory costs

Choose MongoDB if: - Already using MongoDB - Document-oriented data model - Need flexible schema - JSON-native workflows

Cost Comparison Example

Scenario: 1 million documents, 10,000 queries/day

Vector Store Monthly Cost (Estimate)
OpenSearch Serverless ~$350-500
OpenSearch Service (3 nodes) ~$350-400
Aurora PostgreSQL ~$200-300
Pinecone (Standard) ~$70-100
Redis Enterprise ~$200-400
MongoDB Atlas ~$150-250

Note: Costs vary based on actual usage, region, and configuration

Recommendation for Most Users

🏆 Start with Amazon OpenSearch Serverless

Why? - ✅ Easiest setup - ✅ AWS-native (best Bedrock integration) - ✅ No infrastructure management - ✅ Auto-scaling - ✅ Good performance - ✅ Secure by default

You can always migrate to another option later if your needs change.

Prerequisites

AWS Account Requirements: - AWS account with Bedrock access - S3 bucket for documents - Vector store (OpenSearch Serverless recommended) - IAM permissions

Required IAM Permissions:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "bedrock:CreateKnowledgeBase",
        "bedrock:GetKnowledgeBase",
        "bedrock:UpdateKnowledgeBase",
        "bedrock:DeleteKnowledgeBase",
        "bedrock:ListKnowledgeBases",
        "bedrock:CreateDataSource",
        "bedrock:StartIngestionJob",
        "bedrock:Retrieve",
        "bedrock:RetrieveAndGenerate"
      ],
      "Resource": "*"
    },
    {
      "Effect": "Allow",
      "Action": [
        "s3:GetObject",
        "s3:ListBucket"
      ],
      "Resource": [
        "arn:aws:s3:::your-bucket-name",
        "arn:aws:s3:::your-bucket-name/*"
      ]
    },
    {
      "Effect": "Allow",
      "Action": [
        "aoss:APIAccessAll"
      ],
      "Resource": "arn:aws:aoss:*:*:collection/*"
    }
  ]
}

SDK Installation:

# Python
pip install boto3

# Node.js
npm install @aws-sdk/client-bedrock-agent @aws-sdk/client-bedrock-agent-runtime

Getting Started

Step 1: Prepare Your Documents

# Create S3 bucket
aws s3 mb s3://my-knowledge-base-docs

# Upload documents
aws s3 cp ./documents/ s3://my-knowledge-base-docs/ --recursive

# Supported formats:
# - PDF
# - TXT
# - MD (Markdown)
# - HTML
# - DOC/DOCX
# - CSV

Step 2: Create Vector Store (OpenSearch Serverless)

import boto3

aoss_client = boto3.client('opensearchserverless')

# Create collection
response = aoss_client.create_collection(
    name='bedrock-knowledge-base',
    type='VECTORSEARCH',
    description='Vector store for Bedrock Knowledge Base'
)

collection_id = response['createCollectionDetail']['id']
collection_arn = response['createCollectionDetail']['arn']

print(f"Collection created: {collection_id}")

Creating a Knowledge Base

Method 1: Using AWS Console

  1. Navigate to Bedrock Console

    • Go to AWS Console → Amazon Bedrock → Knowledge bases
    • Click "Create knowledge base"
  2. Configure Knowledge Base

    • Name: company-docs-kb
    • Description: "Company documentation knowledge base"
    • IAM role: Create or select existing
  3. Configure Data Source

    • Source: Amazon S3
    • S3 URI: s3://my-knowledge-base-docs/
    • Chunking strategy: Default or custom
  4. Select Embeddings Model

    • Model: Amazon Titan Embeddings G1 - Text
    • Dimensions: 1536
  5. Configure Vector Store

    • Type: OpenSearch Serverless
    • Collection: Select your collection
    • Index name: bedrock-kb-index
  6. Create and Sync

    • Review settings
    • Create knowledge base
    • Start ingestion job

Method 2: Using Python SDK

import boto3
import json

bedrock_agent = boto3.client('bedrock-agent', region_name='us-east-1')

# Create knowledge base
response = bedrock_agent.create_knowledge_base(
    name='company-docs-kb',
    description='Company documentation knowledge base',
    roleArn='arn:aws:iam::ACCOUNT_ID:role/BedrockKBRole',
    knowledgeBaseConfiguration={
        'type': 'VECTOR',
        'vectorKnowledgeBaseConfiguration': {
            'embeddingModelArn': 'arn:aws:bedrock:us-east-1::foundation-model/amazon.titan-embed-text-v1'
        }
    },
    storageConfiguration={
        'type': 'OPENSEARCH_SERVERLESS',
        'opensearchServerlessConfiguration': {
            'collectionArn': 'arn:aws:aoss:us-east-1:ACCOUNT_ID:collection/COLLECTION_ID',
            'vectorIndexName': 'bedrock-kb-index',
            'fieldMapping': {
                'vectorField': 'vector',
                'textField': 'text',
                'metadataField': 'metadata'
            }
        }
    }
)

knowledge_base_id = response['knowledgeBase']['knowledgeBaseId']
print(f"Knowledge Base created: {knowledge_base_id}")

Method 3: Complete Setup Script

import boto3
import time
from typing import Dict, Any

class KnowledgeBaseManager:
    """
    Helper class for managing Bedrock Knowledge Bases
    """

    def __init__(self, region_name='us-east-1'):
        self.bedrock_agent = boto3.client('bedrock-agent', region_name=region_name)
        self.bedrock_agent_runtime = boto3.client('bedrock-agent-runtime', region_name=region_name)
        self.s3 = boto3.client('s3', region_name=region_name)

    def create_knowledge_base(
        self,
        name: str,
        description: str,
        role_arn: str,
        s3_bucket: str,
        collection_arn: str,
        embedding_model: str = 'amazon.titan-embed-text-v1'
    ) -> str:
        """
        Create a knowledge base with S3 data source

        Returns:
            Knowledge base ID
        """
        # Create knowledge base
        kb_response = self.bedrock_agent.create_knowledge_base(
            name=name,
            description=description,
            roleArn=role_arn,
            knowledgeBaseConfiguration={
                'type': 'VECTOR',
                'vectorKnowledgeBaseConfiguration': {
                    'embeddingModelArn': f'arn:aws:bedrock:us-east-1::foundation-model/{embedding_model}'
                }
            },
            storageConfiguration={
                'type': 'OPENSEARCH_SERVERLESS',
                'opensearchServerlessConfiguration': {
                    'collectionArn': collection_arn,
                    'vectorIndexName': f'{name}-index',
                    'fieldMapping': {
                        'vectorField': 'vector',
                        'textField': 'text',
                        'metadataField': 'metadata'
                    }
                }
            }
        )

        kb_id = kb_response['knowledgeBase']['knowledgeBaseId']
        print(f"✓ Knowledge Base created: {kb_id}")

        # Create data source
        ds_response = self.bedrock_agent.create_data_source(
            knowledgeBaseId=kb_id,
            name=f'{name}-s3-source',
            description=f'S3 data source for {name}',
            dataSourceConfiguration={
                'type': 'S3',
                's3Configuration': {
                    'bucketArn': f'arn:aws:s3:::{s3_bucket}'
                }
            },
            vectorIngestionConfiguration={
                'chunkingConfiguration': {
                    'chunkingStrategy': 'FIXED_SIZE',
                    'fixedSizeChunkingConfiguration': {
                        'maxTokens': 300,
                        'overlapPercentage': 20
                    }
                }
            }
        )

        data_source_id = ds_response['dataSource']['dataSourceId']
        print(f"✓ Data Source created: {data_source_id}")

        return kb_id, data_source_id

    def start_ingestion(self, kb_id: str, data_source_id: str) -> str:
        """
        Start ingestion job to process documents
        """
        response = self.bedrock_agent.start_ingestion_job(
            knowledgeBaseId=kb_id,
            dataSourceId=data_source_id
        )

        job_id = response['ingestionJob']['ingestionJobId']
        print(f"✓ Ingestion job started: {job_id}")

        return job_id

    def wait_for_ingestion(self, kb_id: str, data_source_id: str, job_id: str):
        """
        Wait for ingestion job to complete
        """
        print("Waiting for ingestion to complete...")

        while True:
            response = self.bedrock_agent.get_ingestion_job(
                knowledgeBaseId=kb_id,
                dataSourceId=data_source_id,
                ingestionJobId=job_id
            )

            status = response['ingestionJob']['status']
            print(f"Status: {status}")

            if status == 'COMPLETE':
                print("✓ Ingestion completed successfully")
                break
            elif status == 'FAILED':
                print("✗ Ingestion failed")
                print(f"Failure reasons: {response['ingestionJob'].get('failureReasons', [])}")
                break

            time.sleep(10)

    def query(self, kb_id: str, query: str, num_results: int = 5) -> Dict:
        """
        Query the knowledge base (retrieval only)
        """
        response = self.bedrock_agent_runtime.retrieve(
            knowledgeBaseId=kb_id,
            retrievalQuery={
                'text': query
            },
            retrievalConfiguration={
                'vectorSearchConfiguration': {
                    'numberOfResults': num_results
                }
            }
        )

        return response['retrievalResults']

    def query_and_generate(
        self,
        kb_id: str,
        query: str,
        model_id: str = 'anthropic.claude-3-sonnet-20240229-v1:0'
    ) -> Dict:
        """
        Query knowledge base and generate response (RAG)
        """
        response = self.bedrock_agent_runtime.retrieve_and_generate(
            input={
                'text': query
            },
            retrieveAndGenerateConfiguration={
                'type': 'KNOWLEDGE_BASE',
                'knowledgeBaseConfiguration': {
                    'knowledgeBaseId': kb_id,
                    'modelArn': f'arn:aws:bedrock:us-east-1::foundation-model/{model_id}'
                }
            }
        )

        return {
            'output': response['output']['text'],
            'citations': response.get('citations', [])
        }

# Usage Example
manager = KnowledgeBaseManager()

# Create knowledge base
kb_id, ds_id = manager.create_knowledge_base(
    name='company-docs',
    description='Company documentation',
    role_arn='arn:aws:iam::ACCOUNT:role/BedrockKBRole',
    s3_bucket='my-docs-bucket',
    collection_arn='arn:aws:aoss:us-east-1:ACCOUNT:collection/COLLECTION_ID'
)

# Start ingestion
job_id = manager.start_ingestion(kb_id, ds_id)

# Wait for completion
manager.wait_for_ingestion(kb_id, ds_id, job_id)

# Query the knowledge base
result = manager.query_and_generate(
    kb_id=kb_id,
    query='What is our return policy?'
)

print(f"Answer: {result['output']}")
print(f"Sources: {len(result['citations'])} citations")

Data Sources

S3 Data Source

Most common data source for documents:

# Configure S3 data source
s3_data_source = {
    'type': 'S3',
    's3Configuration': {
        'bucketArn': 'arn:aws:s3:::my-docs-bucket',
        'inclusionPrefixes': ['docs/', 'manuals/'],  # Optional: specific folders
        'exclusionPrefixes': ['archive/']  # Optional: exclude folders
    }
}

# Create data source
response = bedrock_agent.create_data_source(
    knowledgeBaseId=kb_id,
    name='s3-documents',
    dataSourceConfiguration=s3_data_source,
    vectorIngestionConfiguration={
        'chunkingConfiguration': {
            'chunkingStrategy': 'FIXED_SIZE',
            'fixedSizeChunkingConfiguration': {
                'maxTokens': 300,
                'overlapPercentage': 20
            }
        }
    }
)

Web Crawler Data Source

Crawl websites for documentation:

# Configure web crawler
web_crawler_config = {
    'type': 'WEB',
    'webConfiguration': {
        'crawlerConfiguration': {
            'crawlerLimits': {
                'rateLimit': 300  # Pages per minute
            },
            'inclusionFilters': [
                'https://docs.example.com/*'
            ],
            'exclusionFilters': [
                '*/archive/*',
                '*/old/*'
            ],
            'scope': 'HOST_ONLY'  # or 'SUBDOMAINS'
        },
        'sourceConfiguration': {
            'urlConfiguration': {
                'seedUrls': [
                    {'url': 'https://docs.example.com'}
                ]
            }
        }
    }
}

Confluence Data Source

# Configure Confluence
confluence_config = {
    'type': 'CONFLUENCE',
    'confluenceConfiguration': {
        'sourceConfiguration': {
            'hostUrl': 'https://your-domain.atlassian.net',
            'hostType': 'SAAS',  # or 'SERVER'
            'authType': 'BASIC',
            'credentialsSecretArn': 'arn:aws:secretsmanager:...'
        },
        'crawlerConfiguration': {
            'filterConfiguration': {
                'type': 'PATTERN',
                'patternObjectFilter': {
                    'filters': [
                        {
                            'objectType': 'Space',
                            'inclusionFilters': ['DOCS', 'TECH'],
                            'exclusionFilters': ['ARCHIVE']
                        }
                    ]
                }
            }
        }
    }
}

SharePoint Data Source

# Configure SharePoint
sharepoint_config = {
    'type': 'SHAREPOINT',
    'sharePointConfiguration': {
        'sourceConfiguration': {
            'hostType': 'ONLINE',  # or 'SERVER'
            'domain': 'your-domain',
            'siteUrls': [
                'https://your-domain.sharepoint.com/sites/docs'
            ],
            'tenantId': 'your-tenant-id',
            'credentialsSecretArn': 'arn:aws:secretsmanager:...'
        },
        'crawlerConfiguration': {
            'filterConfiguration': {
                'type': 'PATTERN',
                'patternObjectFilter': {
                    'filters': [
                        {
                            'objectType': 'Document',
                            'inclusionFilters': ['*.pdf', '*.docx']
                        }
                    ]
                }
            }
        }
    }
}

Salesforce Data Source

# Configure Salesforce
salesforce_config = {
    'type': 'SALESFORCE',
    'salesforceConfiguration': {
        'sourceConfiguration': {
            'hostUrl': 'https://your-domain.salesforce.com',
            'authType': 'OAUTH2',
            'credentialsSecretArn': 'arn:aws:secretsmanager:...'
        },
        'crawlerConfiguration': {
            'filterConfiguration': {
                'type': 'PATTERN',
                'patternObjectFilter': {
                    'filters': [
                        {
                            'objectType': 'Knowledge',
                            'inclusionFilters': ['Published']
                        },
                        {
                            'objectType': 'Case',
                            'inclusionFilters': ['Closed']
                        }
                    ]
                }
            }
        }
    }
}

Querying Knowledge Bases

Method 1: Retrieve Only (No Generation)

Get relevant documents without generating a response:

import boto3

bedrock_agent_runtime = boto3.client('bedrock-agent-runtime')

def retrieve_documents(kb_id: str, query: str, num_results: int = 5):
    """
    Retrieve relevant documents from knowledge base
    """
    response = bedrock_agent_runtime.retrieve(
        knowledgeBaseId=kb_id,
        retrievalQuery={'text': query},
        retrievalConfiguration={
            'vectorSearchConfiguration': {
                'numberOfResults': num_results,
                'overrideSearchType': 'HYBRID'  # HYBRID, SEMANTIC, or None
            }
        }
    )

    results = []
    for item in response['retrievalResults']:
        results.append({
            'content': item['content']['text'],
            'score': item['score'],
            'location': item['location'],
            'metadata': item.get('metadata', {})
        })

    return results

# Usage
documents = retrieve_documents(
    kb_id='YOUR_KB_ID',
    query='What is the refund policy?',
    num_results=3
)

for i, doc in enumerate(documents, 1):
    print(f"\n--- Document {i} (Score: {doc['score']:.4f}) ---")
    print(doc['content'])
    print(f"Source: {doc['location']}")

Method 2: Retrieve and Generate (RAG)

Get an AI-generated answer based on retrieved documents:

def query_with_rag(kb_id: str, query: str, model_id: str = 'anthropic.claude-3-sonnet-20240229-v1:0'):
    """
    Query knowledge base and generate answer using RAG
    """
    response = bedrock_agent_runtime.retrieve_and_generate(
        input={'text': query},
        retrieveAndGenerateConfiguration={
            'type': 'KNOWLEDGE_BASE',
            'knowledgeBaseConfiguration': {
                'knowledgeBaseId': kb_id,
                'modelArn': f'arn:aws:bedrock:us-east-1::foundation-model/{model_id}',
                'retrievalConfiguration': {
                    'vectorSearchConfiguration': {
                        'numberOfResults': 5,
                        'overrideSearchType': 'HYBRID'
                    }
                },
                'generationConfiguration': {
                    'promptTemplate': {
                        'textPromptTemplate': '''You are a helpful assistant. Answer the question based only on the provided context.

Context:
$search_results$

Question: $query$

If the context doesn't contain the answer, say "I don't have enough information to answer that question."

Answer:'''
                    },
                    'inferenceConfig': {
                        'textInferenceConfig': {
                            'temperature': 0.5,
                            'topP': 0.9,
                            'maxTokens': 1000
                        }
                    }
                }
            }
        }
    )

    return {
        'answer': response['output']['text'],
        'citations': response.get('citations', []),
        'session_id': response.get('sessionId')
    }

# Usage
result = query_with_rag(
    kb_id='YOUR_KB_ID',
    query='How do I reset my password?'
)

print(f"Answer: {result['answer']}\n")

# Print citations
print("Sources:")
for i, citation in enumerate(result['citations'], 1):
    for ref in citation.get('retrievedReferences', []):
        print(f"{i}. {ref['location']['s3Location']['uri']}")
        print(f"   Excerpt: {ref['content']['text'][:100]}...")

Method 3: Conversational RAG (Multi-Turn)

Maintain context across multiple queries:

class ConversationalKB:
    """
    Conversational interface to Knowledge Base
    """

    def __init__(self, kb_id: str, model_id: str = 'anthropic.claude-3-sonnet-20240229-v1:0'):
        self.kb_id = kb_id
        self.model_id = model_id
        self.session_id = None
        self.runtime = boto3.client('bedrock-agent-runtime')

    def ask(self, question: str) -> Dict:
        """
        Ask a question with conversation history
        """
        config = {
            'input': {'text': question},
            'retrieveAndGenerateConfiguration': {
                'type': 'KNOWLEDGE_BASE',
                'knowledgeBaseConfiguration': {
                    'knowledgeBaseId': self.kb_id,
                    'modelArn': f'arn:aws:bedrock:us-east-1::foundation-model/{self.model_id}'
                }
            }
        }

        # Include session ID for follow-up questions
        if self.session_id:
            config['sessionId'] = self.session_id

        response = self.runtime.retrieve_and_generate(**config)

        # Store session ID for next question
        self.session_id = response.get('sessionId')

        return {
            'answer': response['output']['text'],
            'citations': response.get('citations', [])
        }

    def reset(self):
        """Reset conversation"""
        self.session_id = None

# Usage
chat = ConversationalKB(kb_id='YOUR_KB_ID')

# First question
result1 = chat.ask("What products do you offer?")
print(f"Q1: {result1['answer']}\n")

# Follow-up question (maintains context)
result2 = chat.ask("What's the price of the first one?")
print(f"Q2: {result2['answer']}\n")

# Another follow-up
result3 = chat.ask("Is it available in blue?")
print(f"Q3: {result3['answer']}\n")

# Reset conversation
chat.reset()

RAG (Retrieval-Augmented Generation)

Understanding RAG

RAG combines retrieval and generation:

  1. Retrieve: Find relevant documents from knowledge base
  2. Augment: Add retrieved context to the prompt
  3. Generate: Model generates answer based on context

Custom RAG Implementation

def custom_rag(kb_id: str, query: str, model_id: str):
    """
    Custom RAG implementation with full control
    """
    runtime = boto3.client('bedrock-agent-runtime')
    bedrock_runtime = boto3.client('bedrock-runtime')

    # Step 1: Retrieve relevant documents
    retrieve_response = runtime.retrieve(
        knowledgeBaseId=kb_id,
        retrievalQuery={'text': query},
        retrievalConfiguration={
            'vectorSearchConfiguration': {
                'numberOfResults': 5
            }
        }
    )

    # Step 2: Extract and format context
    contexts = []
    sources = []

    for result in retrieve_response['retrievalResults']:
        contexts.append(result['content']['text'])
        sources.append({
            'uri': result['location']['s3Location']['uri'],
            'score': result['score']
        })

    context_text = "\n\n---\n\n".join(contexts)

    # Step 3: Create prompt with context
    prompt = f"""Answer the question based on the following context. If the answer is not in the context, say so.

Context:
{context_text}

Question: {query}

Answer:"""

    # Step 4: Generate response
    response = bedrock_runtime.converse(
        modelId=model_id,
        messages=[
            {
                'role': 'user',
                'content': [{'text': prompt}]
            }
        ],
        inferenceConfig={
            'temperature': 0.5,
            'maxTokens': 1000
        }
    )

    answer = response['output']['message']['content'][0]['text']

    return {
        'answer': answer,
        'sources': sources,
        'context_used': context_text
    }

# Usage
result = custom_rag(
    kb_id='YOUR_KB_ID',
    query='What is the warranty period?',
    model_id='anthropic.claude-3-sonnet-20240229-v1:0'
)

print(f"Answer: {result['answer']}\n")
print("Sources:")
for source in result['sources']:
    print(f"  - {source['uri']} (score: {source['score']:.4f})")

Advanced RAG with Reranking

def rag_with_reranking(kb_id: str, query: str, model_id: str):
    """
    RAG with custom reranking logic
    """
    runtime = boto3.client('bedrock-agent-runtime')
    bedrock_runtime = boto3.client('bedrock-runtime')

    # Retrieve more documents than needed
    retrieve_response = runtime.retrieve(
        knowledgeBaseId=kb_id,
        retrievalQuery={'text': query},
        retrievalConfiguration={
            'vectorSearchConfiguration': {
                'numberOfResults': 10  # Get more for reranking
            }
        }
    )

    # Rerank documents using model
    reranked_docs = []

    for result in retrieve_response['retrievalResults']:
        # Ask model to score relevance
        relevance_prompt = f"""On a scale of 1-10, how relevant is this document to the question?

Question: {query}

Document: {result['content']['text'][:500]}

Respond with only a number 1-10."""

        score_response = bedrock_runtime.converse(
            modelId=model_id,
            messages=[{'role': 'user', 'content': [{'text': relevance_prompt}]}],
            inferenceConfig={'temperature': 0.1, 'maxTokens': 10}
        )

        try:
            relevance_score = float(score_response['output']['message']['content'][0]['text'].strip())
        except:
            relevance_score = result['score']

        reranked_docs.append({
            'content': result['content']['text'],
            'original_score': result['score'],
            'relevance_score': relevance_score,
            'location': result['location']
        })

    # Sort by relevance score
    reranked_docs.sort(key=lambda x: x['relevance_score'], reverse=True)

    # Use top 3 documents
    top_docs = reranked_docs[:3]
    context_text = "\n\n---\n\n".join([doc['content'] for doc in top_docs])

    # Generate answer
    prompt = f"""Answer based on this context:

{context_text}

Question: {query}

Answer:"""

    response = bedrock_runtime.converse(
        modelId=model_id,
        messages=[{'role': 'user', 'content': [{'text': prompt}]}],
        inferenceConfig={'temperature': 0.5, 'maxTokens': 1000}
    )

    return {
        'answer': response['output']['message']['content'][0]['text'],
        'top_documents': top_docs
    }

RAG with Citation Tracking

def rag_with_citations(kb_id: str, query: str):
    """
    RAG that tracks and formats citations
    """
    runtime = boto3.client('bedrock-agent-runtime')

    response = runtime.retrieve_and_generate(
        input={'text': query},
        retrieveAndGenerateConfiguration={
            'type': 'KNOWLEDGE_BASE',
            'knowledgeBaseConfiguration': {
                'knowledgeBaseId': kb_id,
                'modelArn': 'arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-3-sonnet-20240229-v1:0'
            }
        }
    )

    answer = response['output']['text']
    citations = response.get('citations', [])

    # Format answer with inline citations
    formatted_answer = answer
    citation_list = []

    for i, citation in enumerate(citations, 1):
        for ref in citation.get('retrievedReferences', []):
            # Extract source info
            location = ref['location']['s3Location']['uri']
            excerpt = ref['content']['text']

            citation_list.append({
                'number': i,
                'source': location,
                'excerpt': excerpt
            })

    # Add citations at the end
    if citation_list:
        formatted_answer += "\n\n**Sources:**\n"
        for cite in citation_list:
            formatted_answer += f"\n[{cite['number']}] {cite['source']}"
            formatted_answer += f"\n    \"{cite['excerpt'][:100]}...\"\n"

    return formatted_answer

# Usage
answer_with_citations = rag_with_citations(
    kb_id='YOUR_KB_ID',
    query='What are the shipping options?'
)

print(answer_with_citations)

Advanced Features

Custom Chunking Strategies

# Fixed size chunking
fixed_chunking = {
    'chunkingStrategy': 'FIXED_SIZE',
    'fixedSizeChunkingConfiguration': {
        'maxTokens': 300,
        'overlapPercentage': 20
    }
}

# Hierarchical chunking (for structured documents)
hierarchical_chunking = {
    'chunkingStrategy': 'HIERARCHICAL',
    'hierarchicalChunkingConfiguration': {
        'levelConfigurations': [
            {
                'maxTokens': 1500
            },
            {
                'maxTokens': 300
            }
        ],
        'overlapTokens': 60
    }
}

# Semantic chunking (groups related content)
semantic_chunking = {
    'chunkingStrategy': 'SEMANTIC',
    'semanticChunkingConfiguration': {
        'maxTokens': 300,
        'bufferSize': 0,
        'breakpointPercentileThreshold': 95
    }
}

# No chunking (use entire document)
no_chunking = {
    'chunkingStrategy': 'NONE'
}

Metadata Filtering

def query_with_metadata_filter(kb_id: str, query: str, filters: dict):
    """
    Query with metadata filters
    """
    runtime = boto3.client('bedrock-agent-runtime')

    response = runtime.retrieve(
        knowledgeBaseId=kb_id,
        retrievalQuery={'text': query},
        retrievalConfiguration={
            'vectorSearchConfiguration': {
                'numberOfResults': 5,
                'filter': {
                    'andAll': [
                        {
                            'equals': {
                                'key': 'department',
                                'value': filters.get('department')
                            }
                        },
                        {
                            'greaterThan': {
                                'key': 'year',
                                'value': filters.get('min_year')
                            }
                        }
                    ]
                }
            }
        }
    )

    return response['retrievalResults']

# Usage
results = query_with_metadata_filter(
    kb_id='YOUR_KB_ID',
    query='What are the policies?',
    filters={
        'department': 'HR',
        'min_year': 2023
    }
)

Combine semantic and keyword search:

def hybrid_search(kb_id: str, query: str):
    """
    Use hybrid search (semantic + keyword)
    """
    runtime = boto3.client('bedrock-agent-runtime')

    response = runtime.retrieve(
        knowledgeBaseId=kb_id,
        retrievalQuery={'text': query},
        retrievalConfiguration={
            'vectorSearchConfiguration': {
                'numberOfResults': 10,
                'overrideSearchType': 'HYBRID'  # Combines vector and keyword search
            }
        }
    )

    return response['retrievalResults']

Custom Prompt Templates

def query_with_custom_prompt(kb_id: str, query: str, prompt_template: str):
    """
    Use custom prompt template for generation
    """
    runtime = boto3.client('bedrock-agent-runtime')

    response = runtime.retrieve_and_generate(
        input={'text': query},
        retrieveAndGenerateConfiguration={
            'type': 'KNOWLEDGE_BASE',
            'knowledgeBaseConfiguration': {
                'knowledgeBaseId': kb_id,
                'modelArn': 'arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-3-sonnet-20240229-v1:0',
                'generationConfiguration': {
                    'promptTemplate': {
                        'textPromptTemplate': prompt_template
                    }
                }
            }
        }
    )

    return response['output']['text']

# Custom template for technical documentation
technical_template = '''You are a technical documentation assistant.

Context from documentation:
$search_results$

User Question: $query$

Instructions:
1. Provide accurate technical information
2. Include code examples if relevant
3. Cite specific sections
4. If unsure, say so

Response:'''

answer = query_with_custom_prompt(
    kb_id='YOUR_KB_ID',
    query='How do I configure the API?',
    prompt_template=technical_template
)

Best Practices

1. Document Preparation

# ✅ Good document structure
"""
# Product Documentation

## Overview
Clear introduction...

## Features
- Feature 1: Description
- Feature 2: Description

## Installation
Step-by-step instructions...

## API Reference
Detailed API docs...
"""

# ❌ Poor document structure
"""
lots of text without structure or headings everything runs together
making it hard to chunk effectively...
"""

2. Optimal Chunking

# Choose chunking based on document type
chunking_strategies = {
    'technical_docs': {
        'strategy': 'HIERARCHICAL',
        'max_tokens': 300,
        'overlap': 20
    },
    'articles': {
        'strategy': 'SEMANTIC',
        'max_tokens': 400,
        'overlap': 15
    },
    'structured_data': {
        'strategy': 'FIXED_SIZE',
        'max_tokens': 200,
        'overlap': 10
    }
}

3. Metadata Best Practices

Add rich metadata to documents:

# Add metadata to S3 objects
s3 = boto3.client('s3')

s3.put_object(
    Bucket='my-kb-bucket',
    Key='docs/product-guide.pdf',
    Body=file_content,
    Metadata={
        'document-type': 'product-guide',
        'department': 'engineering',
        'version': '2.0',
        'last-updated': '2024-01-15',
        'author': 'tech-team',
        'tags': 'api,integration,setup'
    }
)

4. Query Optimization

def optimize_query(user_query: str) -> str:
    """
    Optimize user query for better retrieval
    """
    # Expand abbreviations
    expansions = {
        'API': 'Application Programming Interface',
        'FAQ': 'Frequently Asked Questions',
        'KB': 'Knowledge Base'
    }

    optimized = user_query
    for abbr, full in expansions.items():
        optimized = optimized.replace(abbr, f"{abbr} {full}")

    # Add context
    if '?' not in optimized:
        optimized += '?'

    return optimized

# Usage
original = "How to use API"
optimized = optimize_query(original)
# Result: "How to use API Application Programming Interface?"

5. Monitoring and Maintenance

class KBMonitor:
    """
    Monitor knowledge base performance
    """

    def __init__(self, kb_id: str):
        self.kb_id = kb_id
        self.agent = boto3.client('bedrock-agent')
        self.metrics = []

    def check_ingestion_status(self):
        """
        Check status of data sources
        """
        response = self.agent.list_data_sources(
            knowledgeBaseId=self.kb_id
        )

        for ds in response['dataSourceSummaries']:
            print(f"Data Source: {ds['name']}")
            print(f"  Status: {ds['status']}")
            print(f"  Last Updated: {ds.get('updatedAt', 'N/A')}")

    def track_query_performance(self, query: str, response_time: float, relevant: bool):
        """
        Track query performance metrics
        """
        self.metrics.append({
            'query': query,
            'response_time': response_time,
            'relevant': relevant,
            'timestamp': datetime.now()
        })

    def get_statistics(self):
        """
        Get performance statistics
        """
        if not self.metrics:
            return {}

        return {
            'total_queries': len(self.metrics),
            'avg_response_time': sum(m['response_time'] for m in self.metrics) / len(self.metrics),
            'relevance_rate': sum(1 for m in self.metrics if m['relevant']) / len(self.metrics)
        }

    def trigger_reingestion(self, data_source_id: str):
        """
        Trigger reingestion to update knowledge base
        """
        response = self.agent.start_ingestion_job(
            knowledgeBaseId=self.kb_id,
            dataSourceId=data_source_id
        )

        return response['ingestionJob']['ingestionJobId']

# Usage
monitor = KBMonitor(kb_id='YOUR_KB_ID')
monitor.check_ingestion_status()

# Track queries
import time
start = time.time()
result = query_kb(kb_id, "What is the return policy?")
elapsed = time.time() - start

monitor.track_query_performance(
    query="What is the return policy?",
    response_time=elapsed,
    relevant=True  # Based on user feedback
)

# Get stats
stats = monitor.get_statistics()
print(f"Average response time: {stats['avg_response_time']:.2f}s")
print(f"Relevance rate: {stats['relevance_rate']:.1%}")

6. Cost Optimization

# Optimize number of results retrieved
def cost_optimized_query(kb_id: str, query: str):
    """
    Retrieve fewer documents to reduce costs
    """
    runtime = boto3.client('bedrock-agent-runtime')

    # Start with fewer results
    response = runtime.retrieve(
        knowledgeBaseId=kb_id,
        retrievalQuery={'text': query},
        retrievalConfiguration={
            'vectorSearchConfiguration': {
                'numberOfResults': 3  # Start small
            }
        }
    )

    # Check if results are good enough
    top_score = response['retrievalResults'][0]['score']

    if top_score < 0.7:
        # If confidence is low, retrieve more
        response = runtime.retrieve(
            knowledgeBaseId=kb_id,
            retrievalQuery={'text': query},
            retrievalConfiguration={
                'vectorSearchConfiguration': {
                    'numberOfResults': 5
                }
            }
        )

    return response['retrievalResults']

7. Security Best Practices

# Use IAM policies for access control
kb_access_policy = {
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::ACCOUNT:role/AppRole"
            },
            "Action": [
                "bedrock:Retrieve",
                "bedrock:RetrieveAndGenerate"
            ],
            "Resource": "arn:aws:bedrock:us-east-1:ACCOUNT:knowledge-base/KB_ID"
        }
    ]
}

# Encrypt data at rest
storage_config = {
    'type': 'OPENSEARCH_SERVERLESS',
    'opensearchServerlessConfiguration': {
        'collectionArn': 'arn:aws:aoss:...',
        'vectorIndexName': 'index',
        'fieldMapping': {...}
    }
}

# Use VPC endpoints for private access
# Configure OpenSearch Serverless with VPC access

AWS Code Samples and Resources

Official Documentation

  1. Bedrock Knowledge Bases User Guide

    • URL: https://docs.aws.amazon.com/bedrock/latest/userguide/knowledge-base.html
    • Complete guide to creating and using knowledge bases
  2. API Reference

    • URL: https://docs.aws.amazon.com/bedrock/latest/APIReference/APIOperationsAgentsforAmazon_Bedrock.html
    • Detailed API documentation
  3. RAG Best Practices

    • URL: https://docs.aws.amazon.com/bedrock/latest/userguide/kb-test.html
    • Guidelines for optimal RAG implementation

AWS Samples Repository

# Clone AWS samples
git clone https://github.com/aws-samples/amazon-bedrock-samples.git
cd amazon-bedrock-samples/knowledge-bases

# Key examples:
# - knowledge-base-with-s3/
# - rag-examples/
# - custom-chunking/
# - metadata-filtering/

AWS Workshops

  1. Bedrock Workshop - Knowledge Bases Module

    • URL: https://catalog.workshops.aws/amazon-bedrock/en-US/knowledge-bases
  2. RAG with Bedrock Workshop

    • Hands-on labs for building RAG applications

Community Resources

AWS re:Post - URL: https://repost.aws/tags/TA4kkYBfVxQ_2R5Xt8jXZDdQ/amazon-bedrock - Community Q&A

AWS Blog Posts - Search: "Amazon Bedrock Knowledge Bases" on aws.amazon.com/blogs/

AWS CLI Examples

# Create knowledge base
aws bedrock-agent create-knowledge-base \
    --name "my-kb" \
    --role-arn "arn:aws:iam::ACCOUNT:role/BedrockKBRole" \
    --knowledge-base-configuration file://kb-config.json \
    --storage-configuration file://storage-config.json \
    --region us-east-1

# Create data source
aws bedrock-agent create-data-source \
    --knowledge-base-id KB_ID \
    --name "s3-source" \
    --data-source-configuration file://ds-config.json \
    --region us-east-1

# Start ingestion
aws bedrock-agent start-ingestion-job \
    --knowledge-base-id KB_ID \
    --data-source-id DS_ID \
    --region us-east-1

# Query knowledge base
aws bedrock-agent-runtime retrieve \
    --knowledge-base-id KB_ID \
    --retrieval-query text="What is the return policy?" \
    --region us-east-1

# RAG query
aws bedrock-agent-runtime retrieve-and-generate \
    --input text="What is the return policy?" \
    --retrieve-and-generate-configuration file://rag-config.json \
    --region us-east-1

CloudFormation Template

AWSTemplateFormatVersion: '2010-09-09'
Description: 'Bedrock Knowledge Base Infrastructure'

Resources:
  DocumentsBucket:
    Type: AWS::S3::Bucket
    Properties:
      BucketName: !Sub '${AWS::StackName}-kb-docs'
      VersioningConfiguration:
        Status: Enabled
      PublicAccessBlockConfiguration:
        BlockPublicAcls: true
        BlockPublicPolicy: true
        IgnorePublicAcls: true
        RestrictPublicBuckets: true

  OpenSearchCollection:
    Type: AWS::OpenSearchServerless::Collection
    Properties:
      Name: !Sub '${AWS::StackName}-kb-collection'
      Type: VECTORSEARCH
      Description: Vector store for Knowledge Base

  KnowledgeBaseRole:
    Type: AWS::IAM::Role
    Properties:
      AssumeRolePolicyDocument:
        Version: '2012-10-17'
        Statement:
          - Effect: Allow
            Principal:
              Service: bedrock.amazonaws.com
            Action: sts:AssumeRole
      ManagedPolicyArns:
        - arn:aws:iam::aws:policy/AmazonBedrockFullAccess
      Policies:
        - PolicyName: S3Access
          PolicyDocument:
            Version: '2012-10-17'
            Statement:
              - Effect: Allow
                Action:
                  - s3:GetObject
                  - s3:ListBucket
                Resource:
                  - !GetAtt DocumentsBucket.Arn
                  - !Sub '${DocumentsBucket.Arn}/*'
        - PolicyName: OpenSearchAccess
          PolicyDocument:
            Version: '2012-10-17'
            Statement:
              - Effect: Allow
                Action:
                  - aoss:APIAccessAll
                Resource: !GetAtt OpenSearchCollection.Arn

Outputs:
  BucketName:
    Value: !Ref DocumentsBucket
    Description: S3 bucket for documents

  CollectionArn:
    Value: !GetAtt OpenSearchCollection.Arn
    Description: OpenSearch collection ARN

  RoleArn:
    Value: !GetAtt KnowledgeBaseRole.Arn
    Description: IAM role for Knowledge Base

Troubleshooting

Issue 1: Ingestion Fails

Problem: Documents not being ingested

Solution:

def diagnose_ingestion_failure(kb_id: str, ds_id: str, job_id: str):
    """
    Diagnose ingestion failures
    """
    agent = boto3.client('bedrock-agent')

    # Get job details
    response = agent.get_ingestion_job(
        knowledgeBaseId=kb_id,
        dataSourceId=ds_id,
        ingestionJobId=job_id
    )

    job = response['ingestionJob']

    print(f"Status: {job['status']}")

    if job['status'] == 'FAILED':
        print("\nFailure Reasons:")
        for reason in job.get('failureReasons', []):
            print(f"  - {reason}")

        # Common fixes
        print("\nCommon Solutions:")
        print("1. Check IAM role permissions")
        print("2. Verify S3 bucket access")
        print("3. Check document formats (PDF, TXT, MD, HTML, DOC/DOCX)")
        print("4. Ensure documents are not corrupted")
        print("5. Check OpenSearch collection status")

    # Check statistics
    stats = job.get('statistics', {})
    print(f"\nStatistics:")
    print(f"  Documents scanned: {stats.get('numberOfDocumentsScanned', 0)}")
    print(f"  Documents indexed: {stats.get('numberOfNewDocumentsIndexed', 0)}")
    print(f"  Documents failed: {stats.get('numberOfDocumentsFailed', 0)}")

Issue 2: Poor Retrieval Quality

Problem: Retrieved documents not relevant

Solution:

def improve_retrieval_quality(kb_id: str, query: str):
    """
    Strategies to improve retrieval quality
    """
    runtime = boto3.client('bedrock-agent-runtime')

    # Strategy 1: Use hybrid search
    response1 = runtime.retrieve(
        knowledgeBaseId=kb_id,
        retrievalQuery={'text': query},
        retrievalConfiguration={
            'vectorSearchConfiguration': {
                'numberOfResults': 5,
                'overrideSearchType': 'HYBRID'
            }
        }
    )

    # Strategy 2: Increase number of results
    response2 = runtime.retrieve(
        knowledgeBaseId=kb_id,
        retrievalQuery={'text': query},
        retrievalConfiguration={
            'vectorSearchConfiguration': {
                'numberOfResults': 10  # Get more, filter later
            }
        }
    )

    # Strategy 3: Query expansion
    expanded_query = f"{query} related information context details"
    response3 = runtime.retrieve(
        knowledgeBaseId=kb_id,
        retrievalQuery={'text': expanded_query},
        retrievalConfiguration={
            'vectorSearchConfiguration': {
                'numberOfResults': 5
            }
        }
    )

    print("Try these strategies:")
    print("1. Use hybrid search (semantic + keyword)")
    print("2. Retrieve more documents and rerank")
    print("3. Expand query with related terms")
    print("4. Adjust chunking strategy")
    print("5. Add more relevant documents to KB")

Issue 3: Slow Query Performance

Problem: Queries taking too long

Solution:

def optimize_query_performance(kb_id: str, query: str):
    """
    Optimize query performance
    """
    import time

    # Measure baseline
    start = time.time()
    runtime = boto3.client('bedrock-agent-runtime')

    response = runtime.retrieve(
        knowledgeBaseId=kb_id,
        retrievalQuery={'text': query},
        retrievalConfiguration={
            'vectorSearchConfiguration': {
                'numberOfResults': 3  # Reduce number of results
            }
        }
    )

    elapsed = time.time() - start
    print(f"Query time: {elapsed:.2f}s")

    if elapsed > 2.0:
        print("\nOptimization suggestions:")
        print("1. Reduce numberOfResults (currently using 3)")
        print("2. Use metadata filters to narrow search")
        print("3. Check OpenSearch collection performance")
        print("4. Consider caching frequent queries")
        print("5. Optimize document chunking (smaller chunks = faster)")

Issue 4: Citations Not Showing

Problem: No citations in RAG responses

Solution:

# Ensure you're using retrieve_and_generate (not just retrieve)
response = runtime.retrieve_and_generate(
    input={'text': query},
    retrieveAndGenerateConfiguration={
        'type': 'KNOWLEDGE_BASE',
        'knowledgeBaseConfiguration': {
            'knowledgeBaseId': kb_id,
            'modelArn': 'arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-3-sonnet-20240229-v1:0'
        }
    }
)

# Citations are in the response
citations = response.get('citations', [])
if not citations:
    print("No citations found. Check:")
    print("1. Using retrieve_and_generate (not retrieve)")
    print("2. Model supports citations")
    print("3. Retrieved documents have location metadata")

Conclusion

AWS Bedrock Knowledge Bases provide a powerful RAG solution for grounding AI responses in your proprietary data. By combining semantic search with foundation models, you can build applications that provide accurate, cited, and up-to-date information.

Key Takeaways: - Knowledge Bases enable RAG without managing infrastructure - Support multiple data sources (S3, web, Confluence, SharePoint, Salesforce) - Flexible chunking strategies for different document types - Built-in citation tracking for transparency - Scales to millions of documents

Next Steps: 1. Create your first knowledge base with S3 documents 2. Experiment with different chunking strategies 3. Implement RAG in your application 4. Monitor and optimize retrieval quality 5. Explore advanced features like metadata filtering

For the latest features and updates, refer to the official AWS Bedrock documentation.