What is RAG and an AI agent? How a company turns documents into answers
Author: Michael Jan Rogocki (AI Engineer & Data Scientist) · Last updated:
Every company holds an enormous amount of knowledge — locked away in documents. Regulations, procedures, contracts, technical manuals, price lists. The problem is that finding a specific answer often means manually searching through hundreds of pages, or asking a colleague who happens to know.
RAG and AI agents are technologies that change this. Instead of searching by hand, you ask a question — and the system finds the answer in your own documents, citing its source. In this article we explain what RAG is, how an AI agent differs from a chatbot, and how to start deploying these solutions in a company — with data security as the foundation, not an add-on.
1. What is RAG and how does it work?
⚡ In one sentence
RAG (Retrieval-Augmented Generation) is an AI architecture that, before giving an answer, searches for passages in your company documents — it answers based on sources, not on the model's general "knowledge".
💡 In plain terms
Imagine you hire a new employee. They have a general education and communicate well, but they don't yet know your internal procedures, price lists or industry regulations. If you ask them a question, they'll answer based on what they learned at university. They might be right — but they might also say something that sounds credible yet has no basis in fact.
Now imagine you give them access to the company library and say: "before you answer, check the documents". That's exactly what RAG is.
What does this look like with a concrete example? A new employee in the company — their first week. They want to find out the holiday policy, how the equipment-ordering procedure works, or who to ask for access to the warehouse system. The information exists in the company — in the handbook, the procedures, the manuals — but it's scattered across different documents and folders.
Without a RAG system, the new employee asks colleagues, digs through folders, sends emails. Every question pulls someone on the team away from their current work. Some answers arrive late, some in conflicting versions.
With a RAG system, they type the question into a chat window — just as they'd write a message to a colleague. The system:
- Searches the document base and finds the relevant passage of the handbook or procedure. It doesn't search for keywords like a search engine — it searches for meaning, because the documents have been converted into a mathematical representation of their content (so-called embeddings — more in the 🔧 section). The question "how do I order a laptop?" reaches the equipment-ordering procedure, even if the document is titled "IT purchasing policy".
- Passes the context — the passage it found goes to the AI model together with the question. The model gets the instruction: answer based on this document.
- Generates the answer — the model phrases the answer in natural language and indicates the source. The employee can click and check for themselves whether the system interpreted the document correctly.
The effect: the new employee gets an answer in a dozen or so seconds, without involving anyone on the team. And when another person joins a month later — the answers are ready from day one.
🔧 Deep dive
The term RAG (Retrieval-Augmented Generation) was introduced in 2020 by Patrick Lewis and co-authors from Meta AI (then Facebook AI Research) and University College London, in a paper presented at the NeurIPS conference. The idea: combine two kinds of memory — parametric (what the model learned during training) and non-parametric (an external document base the model draws on at the moment of generating an answer). Under the simple chat interface there's a multi-stage pipeline:
- Document preprocessing — before the system can answer, the documents have to be prepared. Each document is split into passages (chunks) of a set length, preserving context (e.g. section headings attached to each chunk). If the documents are scans or images, the text first has to be extracted — that's where OCR comes in (cf. What is OCR, NLP and how does AI read documents?).
- Embeddings — each passage is converted into an embedding — a numerical vector representing its meaning, not its literal content. Thanks to this, the question "how do I order a new monitor?" reaches the purchasing procedure that says "requests for IT equipment are submitted via the intranet form" — even though the words don't match.
- Vector database — embeddings are stored in a special database (e.g. Pinecone, Weaviate, Qdrant, FAISS), optimized for nearest-neighbor search. This is where the question is matched to the documents.
- Reranking — an optional but important step. A separate model "scores" the relevance of the passages found in the context of the specific question and reorders them. This improves answer quality, especially when the initial search returned partially relevant passages.
- Augmented prompt — the user's question is combined with the passages found into a single prompt, which goes to the LLM (a large language model, e.g. GPT-4, Claude, Llama). The prompt includes the instruction: "answer based on the supplied passages; if you can't find the answer — say so".
- Generation with attribution — the LLM generates the answer and indicates which passage (and therefore which document) the information comes from.
The quality of a RAG system depends on several factors: how the documents are split into passages (too small loses context, too large dilutes relevance), the quality of the embedding model, the search configuration, and how well the LLM sticks to the supplied sources instead of reaching for its training knowledge.
It's also worth knowing the limitations. An LLM has a context limit (context window) — if the system finds too many relevant passages, not all of them will fit into a single request to the model. Documents with tables, charts or complex formatting (e.g. multi-page contract annexes) are harder to process than plain text. And with large document bases, the cost of maintaining the embeddings and the search infrastructure grows.
RAG doesn't eliminate AI hallucinations — but it significantly reduces them, because it forces the model to rely on specific documents rather than on general training knowledge. The accuracy of the answers depends on the quality of the documents in the base and on how well the search system matches passages to the question (cf. What is Artificial Intelligence? — the section on AI hallucinations).
2. What is an AI agent and how does it differ from a chatbot?
⚡ In one sentence
An AI agent is a system built from four elements — an LLM, tools, memory and autonomy — that independently carries out multi-step tasks, rather than just answering questions.
💡 In plain terms
The simplest way to understand an AI agent is to compare three levels of independence:
A chatbot — answers questions from what it learned during training. It has no access to your documents or systems. You ask a question, you get an answer that sounds credible — but you don't know where it comes from or whether it's up to date.
A RAG system — a chatbot with access to a company document base. Before answering, it checks your documents. The answer is based on specific sources you can verify. A significant leap in quality — but a RAG system still waits for a question and then answers. It doesn't take any action on its own.
An AI agent — this is more than a chatbot with tools. It consists of four elements:
- An LLM as the "brain" — a large language model that "understands" the question and the context, and on that basis formulates the next steps. It's the LLM that "steers" the whole process.
- Access to tools — the agent can reach document bases (RAG), call APIs (interfaces through which systems communicate with one another), search company systems, generate files. In more advanced applications the agent can analyze an image from a Computer Vision system or pull data from a BI dashboard. Each tool is a separate "skill" the agent uses when it "decides" it's needed.
- Memory — the agent "remembers" earlier steps within a task and uses their results in the next ones. It doesn't start from scratch at each step.
- Autonomy — the agent chooses which step to take next based on the result of the previous one. It doesn't wait for a user instruction after every stage.
What does this look like in practice? When you give the agent the task "check whether the new offer from the supplier complies with our framework terms", the agent:
- Searches for the framework terms in the documents (using RAG as a tool).
- Compares them point by point with the content of the offer.
- Identifies the discrepancies.
- Generates a report with the findings.
A person verifies the result and makes the final decision — but doesn't have to go through every step themselves. Where the AI agent sits in the spectrum of automation — from a macro to an agent — is something we write about in the article What is automation?
🔧 Deep dive
There's an ongoing debate in the AI industry about what deserves to be called an "agent". Some solutions sold as "AI agents" are in fact RAG systems with an elaborate interface. A credible definition of an agent requires the presence of all four elements: an LLM, tools, memory and autonomy. If even one is missing — it's more of an advanced chatbot than an agent.
The ReAct approach — how an agent "thinks" and acts
One of the most widely used agent architectures is the ReAct approach (Reason + Act), proposed by Shunyu Yao and co-authors from Princeton University and Google Research in 2022. The name says it plainly: the agent alternates between reasoning (Reason) and acting (Act). In practice it looks like this — the agent works in a repeating loop of three steps:
- Thought — the agent "analyzes" the context so far and articulates what it should do next. This isn't thinking in the human sense — it's generating text in which the model spells out its plan.
- Action — the agent performs a specific action: it searches the document base, calls an API, runs a query against a company system.
- Observation — the agent receives the result of the action and, on that basis, moves to the next Thought step.
The Thought → Action → Observation loop repeats until the agent "decides" the task is complete. It's precisely this loop that distinguishes an agent from a RAG system: RAG performs one cycle (search → answer), an agent performs as many as it needs.
Why does it matter? The ReAct approach gives the agent two things that simpler systems lack. First — the ability to correct course mid-task. If the result of one step is insufficient, the agent can change strategy instead of returning a wrong answer. Second — transparency. Each Thought step is recorded, so a person can trace how the agent arrived at the result.
An important caveat: when we write that the agent "thinks", "analyzes" or "plans", that's a simplification. An AI agent doesn't think — it processes instructions, patterns and rules. But the effect can come close to an employee acting independently on routine tasks (cf. What is Artificial Intelligence? — the section on how AI "thinks").
3. Case study: a legal-document assistant in the German construction sector
⚡ In one sentence
A company in the German construction sector replaced the manual searching of hundreds of pages of regulations with a RAG assistant — response time dropped from minutes to a dozen or so seconds.
💡 In plain terms
The construction sector in Germany rests on a dense web of regulations and ordinances — HOAI (the fee structure for architects and engineers), VOB (procedures for construction contracts) and many others. Every offer, every calculation, every settlement requires a check against the regulations.
The problem: The company owner was losing a significant part of their working time on establishing costs, practices and legal requirements. The documents existed within the company — but as hundreds of pages of PDFs, scattered across various folders. Finding the answer to a specific question meant manually searching the documents or phoning an advisor.
The solution: An intelligent assistant (chat) based on the RAG architecture, operating solely on documents from the company knowledge base. The owner asks a question in Polish or German — the system searches the base of regulations, finds the relevant passage and phrases an answer, citing the source.
What changed in practice:
- The time needed to find an answer dropped from minutes (and sometimes longer) to a dozen or so seconds.
- Answers based on specific passages of the documents — the owner can verify each one.
- The system operates in a closed environment — company documents don't leave the secure infrastructure (more on security in section 4).
- New employees find their way through the thicket of regulations faster — the system works like an experienced colleague you can turn to with a question.
This isn't a replacement for a lawyer or an advisor. It's the elimination of mechanical searching, so that time and attention can go where they're truly needed — to interpretation, negotiation, decisions.
"Before we started building the system, we first sat down with the owner and wrote down which questions they ask most often and which documents they look in for answers. It turned out that the vast majority of questions concerned a few specific areas — rates, procedures, deadlines. That let us start with a small, well-defined base instead of trying to 'dump everything in' at once."
— Karol Jurewicz, Business Process Architect, cm-opti
🔧 Deep dive
From a technical perspective, the key architectural decisions concerned:
- The scope of the document base — instead of uploading all company documents, the base covers only verified industry regulations and ordinances. This limits the risk of the system returning outdated or irrelevant information.
- Multilingualism — the system handles questions and answers in Polish and German, even though the source documents are in German. The model generates answers in the language of the question, preserving the terminology from the source document.
- Security — documents processed and stored in a closed cloud environment. The data is not sent to public APIs nor used to train external models. Details in section 4.
"The biggest mistake I see with RAG implementations is focusing on the AI model and skipping the document preparation. If the base is a mess — inconsistent, duplicated, outdated documents — then even the best model will give poor answers. That's why, before we launch the system, we invest time in organizing the base: what goes into it, in what format, how it's split up."
— Michael Jan Rogocki, AI Engineer & Data Scientist, cm-opti
4. Data security — why it's a foundation, not an add-on
⚡ In one sentence
Data security in a RAG system is an architectural requirement, not an option — it determines the choice of infrastructure, suppliers and method of deployment.
💡 In plain terms
When a company deploys a RAG system, it gives it access to its most valuable documents: contracts, regulations, internal procedures, customer data. That's a different situation from using a public chatbot, where you type in a question and don't worry about what happens to the data. In a RAG system for a company, several things matter:
- Where the data is. Documents must be stored in a controlled environment — a private cloud, a dedicated server, infrastructure with a clearly defined data location. Not on the AI supplier's public server, not in an environment shared with other companies.
- Who has access. The permissions system must reflect who in the company should see which documents. If the RAG base contains customer contracts, not every employee should be able to ask about them.
- What happens to the questions and answers. The question "what's the margin on project X?" itself contains confidential information — even if the answer is never generated. The system must guarantee that the content of questions and answers doesn't leave the company and doesn't reach an external AI model supplier.
- Whether the data trains the model. An important distinction: in a well-designed RAG system, company documents are NOT used to train the AI model. They are searched at the moment of generating an answer — but they don't become part of the model. When you delete them from the base, the system "doesn't remember" them.
This isn't a checklist from a brochure — these are architectural decisions that have to be made at the very start of the project, because they affect the choice of infrastructure, suppliers and method of deployment.
🔧 Deep dive
Data security in the context of RAG and AI agents spans several layers:
- Infrastructure — a private cloud solution (e.g. AWS, Azure, GCP with a dedicated configuration) or on-premise infrastructure. Data stored in a region compliant with regulatory requirements (e.g. the EU for European companies).
- Model isolation — the AI model can run as an external service (API), but the input and output data is not used by the supplier to train the model. Alternatively: a model hosted locally (e.g. an open-source LLM on company infrastructure), which gives full control but requires greater resources.
- Access control — a permissions system integrated with the company's existing IT infrastructure (e.g. Active Directory, LDAP). Different user groups have access to different subsets of documents.
- Audit and logging — who asked what and when. Logs stored in a controlled environment, accessible only internally — not shared with the model supplier.
- Regulatory compliance — depending on the industry: GDPR, industry standards, sector requirements (e.g. BaFin for the financial sector in Germany). The EU AI Act introduces additional requirements for high-risk AI systems (cf. What is Artificial Intelligence? — the section on the EU AI Act).
If a company uses an external LLM via an API, it's necessary to conclude a Data Processing Agreement (DPA) with the model supplier — that's a GDPR requirement when personal data is transferred to a third party.
For us, data security isn't an "add-on" to the implementation — it's the starting point. We begin every project with the question: what data will be in the system, who should have access to it, what regulations apply. Only with that map do we then choose the infrastructure and architecture. For companies in the European Union — and especially in Poland and Germany, where we work most often — this is particularly important: the EU regulatory framework (GDPR, the EU AI Act) is demanding, and a client's trust is built over years and lost in minutes.
— The cm-opti perspective
5. Where to start with implementing RAG in a company
⚡ In one sentence
It's worth starting a RAG implementation with a single set of documents and a single type of question — where the team loses the most time searching for answers.
💡 In plain terms
You don't have to build a system that answers every question about everything right away. The best RAG implementations start with a narrow, well-defined scope:
- Identify the knowledge locked in documents. Where in the company do people lose time searching for information? Handbooks? Industry regulations? Technical documentation? Internal procedures? Contract terms? The more often someone has to search — the higher the return on investment.
- Check the quality of the documents. Are the documents up to date? Are there no conflicting versions of the same document? Are they in a format that can be processed (PDF, Word, HTML)? A RAG base is only as good as the documents it rests on — and organizing the documents is, in practice, the first step of optimization (cf. What is process optimization?).
- Define who will ask and about what. Not "everyone about everything" — specifically: salespeople about pricing terms, technicians about standards, managers about procedures. The more precisely you define the user group and their questions, the faster the system will start giving accurate answers.
- Start with a human in the loop. The first stage is a system that proposes answers — and a human verifies them. Over time, trust grows and the proportion of independent answers increases.
A separate question is the method of delivery — a ready-made platform, a cloud tool or a solution built from scratch. That depends on the scale, the security requirements and how non-standard the documents are. We develop this topic in the article on systems integration.
Companies across the European Union face the same challenge: the growing complexity of regulations, ever more documents, and knowledge locked in the heads of a few people. RAG is a tool that solves this problem — but only when the implementation starts with understanding what questions the team asks and which documents it looks in for answers.
Our first step is always a diagnosis: not "which model to choose", but "what knowledge do you want to make available, and to whom". We choose the technology last.
— The cm-opti perspective
Do you have knowledge locked away in documents that your team spends hours searching through? Let's talk — together we'll work out where a RAG assistant will deliver the fastest return.
Frequently asked questions (FAQ)
How does RAG differ from an ordinary chatbot?
A chatbot answers only from what it learned during training. A RAG system searches a designated document base before answering — so the answer is based on specific passages, and the user can check the source.
Does RAG eliminate AI hallucinations?
It reduces them, but doesn't eliminate them. The model can still misinterpret a passage or generate an answer that goes beyond the supplied documents. That's why verifying answers and a well-prepared base matter.
Can an AI agent operate without human supervision?
In a business context, an AI agent operates within established rules and with a human verifying the result. It isn't an autonomous program making strategic decisions — it's a system that automates sequences of operational steps.
Will my company documents end up in an external AI model?
It depends on the architecture. In a well-designed system, documents stay in a controlled environment and are not used to train the model. This is a decision that has to be made at the start of the project — not after deployment.
How long does it take to implement RAG in a company?
From a few weeks to a few months — depending on the number of documents, the security requirements and whether the company has a well-organized knowledge base. A first working prototype on a narrow set of documents can be launched quickly; scaling to the whole organization is a separate stage.
Related articles in the cm-opti Knowledge base
- What is Artificial Intelligence?
- What is OCR, NLP and how does AI read documents?
- What is automation?
- What is process optimization?
- What is Computer Vision?
- What is systems integration?
- What is data analysis and BI?
Concepts explained in this article → Glossary
RAG (Retrieval-Augmented Generation), AI agent, chatbot, embedding, vector database, LLM (Large Language Model), AI hallucinations, chunking, reranking, prompt, human-in-the-loop, ReAct (Reason + Act), context window, DPA (Data Processing Agreement)
Sources and references
- The term RAG — Patrick Lewis et al., "Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks", Meta AI / UCL / NYU, NeurIPS 2020 — arxiv.org
- The ReAct approach — Shunyu Yao et al., "ReAct: Synergizing Reasoning and Acting in Language Models", Princeton / Google Research, ICLR 2023 — arxiv.org
- Case study — a cm-opti project in the German market (construction sector, a RAG system on HOAI/VOB documents).