Like many students, I do not enjoy scrolling through endless PDFs. I would if I could skip reading and just ask my textbook questions. So, naturally, I did what any lazy-but-resourceful person would do: I dumped the entire PDF into an LLM and started asking questions, praying to God that the answers were accurate.
Spoiler alert: they weren’t.
The answers were either vague, wrong, or just plain confusing. That’s when I realized — large language models aren’t magic (shocking, I know). They have context limits, and stuffing a whole book into one prompt is like trying to fit a watermelon into a ziplock bag.
So I started digging, and that’s when I found the real MVP: RAG (Retrieval-Augmented Generation). With RAG, instead of force-feeding the model everything, you teach it where to look — and suddenly, answers start making sense
Why Large Context Windows Don’t Really Help (Much)
You might think, “Wait… but newer models have massive context windows, right? Shouldn’t that fix the problem?”
In theory? Yes.
In practice? Meh.
Even with context windows stretching up to 100k tokens (which sounds huge), you're still working with trade-offs:
-
They’re expensive to use.
-
They often truncate or compress information.
-
And unless your prompt is perfectly structured (which is rarely the case), the model still ends up hallucinating or giving generic responses.
It’s like asking your friend to remember every word of a 300-page book and hoping they don’t mess up the details. Not ideal.
RAG to the Rescue
RAG — Retrieval-Augmented Generation — is like giving your LLM a cheat sheet… but a really smart, targeted one.
Here’s the flow:
- You split your book into smaller chunks
- You store these chunks in a vector DB.
- When a user asks a question, you don’t give the model the entire book — just the most relevant parts.
- Then the LLM crafts a solid, informed answer using onlythose parts.
Less noise. More signal. Way better answers.
What Does the RAG Pipeline Look Like?
Imagine you’re the middleman between your textbook and your model.
Your job is to:
-
Split the content → Break the book into readable chunks
-
Convert them into vectors → Using an embedding model (Cohere)
-
Save those vectors → In a vector database (Pinecone)
-
When a question is asked:
- Convert the question into a vector
- Search the database for chunks that are most similar (using cosine distance metric)
- Send the best matches + the question to a language model (I used Gemini)
- Boom — you get a clear, helpful answer
And that’s the heart of it. You’re not replacing the model’s brain — just giving it better memory.
My Stack: Simple, Powerful, Beginner-Friendly
Here’s what I used:
- 🧠 Cohere – To turn both book content and questions into vectors (aka embeddings)
- 📦 Pinecone – To store and search those vectors super efficiently
- 💬 Gemini – To generate the final, natural-language response
You don’t have to use these, but this combo is beginner-friendly, well-documented, and plays nicely together.
Step-by-Step: Build Your Own AskMyBook Bot
Okay, let’s actually build the thing now. I used Google Colab (because free GPU and easy sharing), but this should work in any Python environment.
Step 1: Load and Chunk Your Book
I used the PyMuPDF library to extract text.
!pip install pymupdf
Now, let’s extract the text:
import fitz # PyMuPDF
def extract_text_from_pdf(pdf_path):
doc = fitz.open(pdf_path)
text = ""
for page in doc:
text += page.get_text()
return text
book_path = 'enter the path here'
book_text = extract_text_from_pdf(book_path)
Now, we’ll split the book into chunks, making it more digestible.
import re
def chunk_text(text, chunk_size=300, overlap=50):
words = re.findall(r'\S+', text)
chunks = []
for i in range(0, len(words), chunk_size - overlap):
chunk = ' '.join(words[i:i + chunk_size])
chunks.append(chunk)
return chunks
chunks = chunk_text(book_text)
print(f"Total Chunks: {len(chunks)}")
print("Sample chunk:\n", chunks[0])
Here, each chunk has 300 words, with a 50-word overlap for context continuity. Think of it as giving the model a smooth flow between paragraphs.
Step 2: Create Embeddings with Cohere
Embeddings = turning text into numbers that reflect meaning. We’ll use Cohere's embed-english-v3.0 model for this.
!pip install cohere
import cohere
co = cohere.Client("YOUR-API-KEY") # Replace with your actual key
def get_embeddings(texts):
response = co.embed(
texts=texts,
model="embed-english-v3.0",
input_type="search_document"
)
return response.embeddings
Step 3: Store Chunks in Pinecone
Now we store the embeddings in Pinecone — a vector database that helps us search similar chunks later.
!pip install pinecone
import pinecone
pinecone = pinecone.Pinecone(api_key="YOUR-API-KEY")
index_name = "ask-my-book"
if index_name not in pinecone.list_indexes().names():
pinecone.create_index(
index_name,
spec={
"pod": {
"pod_type": "p1",
"replicas": 1,
"metric": "cosine",
"environment": "aws-us-east1"
}
},
dimension=1024
)
index = pinecone.Index(index_name)
Now, batch upload the chunks
import uuid
import time
batch_size = 96
for i in range(0, len(chunks), batch_size):
batch_chunks = chunks[i:i+batch_size]
batch_embeds = get_embeddings(batch_chunks)
ids = [str(uuid.uuid4()) for _ in batch_chunks]
vectors = list(zip(ids, batch_embeds, [{"text": t} for t in batch_chunks]))
index.upsert(vectors=vectors)
time.sleep(60) # avoid hitting rate limits
Boom! Your book is now smartly stored in vector format.
Step 4: Ask Questions + Get Answers with Gemini
We’ll search for relevant chunks using your query, and then pass those to Gemini for generating an answer.
First, get the query embedding:
def get_query_embedding(query):
response = co.embed(
texts=[query],
model="embed-english-v3.0",
input_type="search_query"
)
return response.embeddings[0]
Now, search Pinecone:
def search_similar_chunks(query, top_k=5):
query_embedding = get_query_embedding(query)
result = index.query(vector=query_embedding, top_k=top_k, include_metadata=True)
return [match['metadata']['text'] for match in result['matches']]
Then plug the top chunks into Gemini:
import google.generativeai as genai
genai.configure(api_key="YOUR-GEMINI-API-KEY")
model = genai.GenerativeModel("gemini-1.5-flash")
def generate_answer(query):
context_chunks = search_similar_chunks(query)
context = "\n\n".join(context_chunks)
prompt = f"""
You are an assistant bot trained on the following book content. Use only the info provided to answer the user's question.
Book Context:
{context}
Question:
{query}
If the question is not relevant to the context, respond with:
'I am a bot trained to answer questions based on the book content. This question is out of scope.'
"""
response = model.generate_content(prompt)
return response.text
Try it out!
question = "What’s does the author say in Module 1 of the book?"
print(generate_answer(question))
That’s It — You Now Have an Ask-My-Book Bot!
You built a bot that:
- Understands your textbook
- Finds the right part when asked
- Gives meaningful answers using that part only
No more endless skimming. Just type and ask.
What Next? Level Up Your Book-Bot
What we’ve built is a basic but powerful Question-and-Answer system. Think of it as the MVP (Minimum Viable Product) of your personal study assistant.
But once you’re comfortable, there’s so much more you can add:
- Citations – Show which chunk or page the answer came from, so you can verify the source.
- Multi-turn Conversations – Let the bot remember previous questions and give more intelligent answers over time.
- Multi-step Reasoning – Chain thoughts together to answer complex questions.
- Custom Memory – Let your bot hold on to important facts you highlight for future queries.
- UI Upgrade – Hook this into a Streamlit or React frontend for a polished, user-friendly experience.
With these, your bot goes from “smart textbook” to “AI study buddy.”
If you’ve ever stared at a textbook, praying it would just talk back and tell you what matters — well, now it can.
This was my little experiment in turning boring PDFs into interactive conversations. Hope it inspires you to build your own and maybe even customize it for friends or classes.
Got stuck somewhere? Want help with adding a UI or citations next? Drop a comment or ping me — always happy to chat.