π§ Lesson 1-4: Memory & Basic RAG
Learning Objectives
By the end of this lesson, you will be able to:
- Distinguish between short-term and long-term agent memory
- Explain how embeddings numerically represent text and capture semantic similarity
- Configure and use a vector store (FAISS or Chroma) for embedding storage and retrieval
- Describe the core workflow of Retrieval-Augmented Generation (RAG)
- Build a simple document-question-answering agent grounded by RAG
πΎ 1. Memory in AI Agents
1.1 Short-Term Memory
Purpose
Keep context and intermediate results within one session.
Implementation
Use Cases
Maintain conversation turns, avoid repeat tool calls.
1.2 Long-Term Memory
Purpose
Persist knowledge across sessionsβdocuments, past interactions.
Implementation
Vector database storing embeddings with metadata.
Schema Example
id | embedding | source | chunk_text |
---|---|---|---|
doc1_0 | [0.12, β0.03, β¦] | policy.pdf | "Employees may requestβ¦" |
π’ 2. Numeric Representations of Text (Embeddings)
2.1 What Are Embeddings?
Definition
Vectors produced by an embedding model (OpenAI, Cohere, Hugging Face) that map text to numeric space.
Key Feature
Capture semantic similarity: similar meanings yield nearby vectors.
2.2 Embedding Providers Comparison
Provider Comparison
Provider | Model Variant | Dimensionality | Speed | Cost |
---|---|---|---|---|
OpenAI | text-embedding-3-small | 1,536 | Moderate | Moderate |
Cohere | embed-multilingual | 1,024 | Fast | Low-Medium |
Hugging Face | sentence-transformers | 768 | Variable | Free/Custom |
2.3 Generating Embeddings
Embedding Generation
ποΈ 3. Vector Stores: FAISS vs. Chroma
3.1 FAISS (Facebook AI Similarity Search)
FAISS Characteristics
- Type: In-memory, fast prototype
- Use:
FAISS.from_documents(texts, embedder)
3.2 Chroma
Chroma Characteristics
- Type: Persistent on-disk, metadata support
Chroma Usage
Configuration Tips
- Choose similarity metric (cosine vs. Euclidean)
- Select
k
(top-k results) based on chunk granularity
π 4. Retrieval-Augmented Generation (RAG) Workflow
RAG Steps
- Embed Query
- Retrieve Documents
- Compose Prompt
- LLM Completion
RAG Benefits
Grounded answers, reduced hallucinations, up-to-date knowledge without retraining.
ποΈ 5. Building a Basic RAG Agent
5.1 Ingest & Split Documents
Document Processing
5.2 Embed & Index
Vector Store Setup
5.3 Construct RetrievalQA Chain
QA Chain Implementation
from langchain.chains import RetrievalQA
from langchain.llms import OpenAI
qa_chain = RetrievalQA.from_chain_type(
llm=OpenAI(model_name="gpt-4o"),
chain_type="stuff",
retriever=vector_store.as_retriever(search_kwargs={"k": 3})
)
answer = qa_chain.run("What is our leave-of-absence policy?")
print(answer)
π» 6. Mini-Project: Document Q&A Agent
Document Q&A Challenge
Build a Document-QA agent:
- Ingest and chunk a PDF or text file.
- Embed and index with FAISS or Chroma.
- Instantiate
RetrievalQA
withk=3
. - Ask three distinct questions; evaluate answer relevance and accuracy.
- Adjust
k
, chunk size, or prompt formatting to improve grounding.
β 7. Self-Check Questions
Knowledge Check
- How do FAISS and Chroma differ in persistence and metadata support?
- Why does chunk overlap matter when splitting documents?
- What happens if you set
k
too low or too high in similarity search? - Explain how embeddings capture semantic similarity across different providers.
π§ Navigation
Next Up
Lesson 1-5 will evaluate Model APIsβweighing GPT-4o, Claude 4, and open-source models on cost, latency, and capability to choose the right "brain" for your agents.