{"id":2259,"date":"2026-06-19T13:28:10","date_gmt":"2026-06-19T11:28:10","guid":{"rendered":"https:\/\/teszarypeter.hu\/?post_type=publikacio&#038;p=2259"},"modified":"2026-06-19T13:28:11","modified_gmt":"2026-06-19T11:28:11","slug":"mi-az-a-rag-a-retrieval-augmented-generation-teljes-utmutatoja-elmelet-es-gyakorlat","status":"publish","type":"publikacio","link":"https:\/\/teszarypeter.hu\/en\/publikacio\/mi-az-a-rag-a-retrieval-augmented-generation-teljes-utmutatoja-elmelet-es-gyakorlat\/","title":{"rendered":"What is RAG? The complete guide to Retrieval-Augmented Generation: Theory and practice"},"content":{"rendered":"<h2 class=\"wp-block-heading\">What is RAG? The complete guide to Retrieval-Augmented Generation: Theory and practice<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">1. Introduction: Limitations of LLMs and the RAG Revolution<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">The transformational power of large language models (LLMs) is unquestionable, but their architecture-level limitations \u2013 the \u2018knowledge cutoff\u2019 and hallucinations \u2013 pose serious risks in a business environment. The AI does not think fundamentally, but recognizes complex patterns and predicts the next unit of text on a probabilistic basis. For example, if we ask what follows after the start of \u2018Apple\u2026\u2019, the model chooses the most likely continuation based on statistical data (for example, seven times it has seen the word \u2018red\u2019, twice it has seen the word \u2018green\u2019).<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The problem arises if the answer requires specific or recent data that was not included in the training set. The model then generates a \u2018hallucinal\u2019 response, i.e. a statistically plausible but factually false response. A&nbsp;<strong>Retrieval-Augmented Generation (RAG)<\/strong>&nbsp;this is the solution: instead of relying on the internal, static knowledge of the model, the system is complemented by an external, dynamic knowledge base that serves as a \u2018rifle\u2019 during the inference.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">2. Theoretical foundations of RAG: How does AI \u201clearn\u201d from data?<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">The mathematical basis of machine learning is the search for hypotheses. In supervised learning, the goal is to&nbsp;<em>h<\/em>&nbsp;Finding a function (hypothesis) that is closest to the real but unknown&nbsp;<em>y<\/em>=<em>f<\/em>(<em>x<\/em>) the connection. Linear classifiers and neural networks&nbsp;<em>w<\/em>&nbsp;They seek a consistent hypothesis by fine-tuning weight vectors. Valid here&nbsp;<strong>Ockham's Razor<\/strong>: the simplest of several possible solutions (e.g. a lower degree of polynomial) should be chosen for better generalisation. In mathematical formalism, to shift the hyperplane (bias) is an extra&nbsp;<em>x<\/em>0\u200b=1 attribute is introduced so that the decision limit can be precisely matched.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The RAG breaks with this weight adjustment constraint. While fine-tuning changes the internal parameters of the model, the RAG directly changes the&nbsp;<em>y<\/em>=<em>f<\/em>(<em>x<\/em>) provide context as part of the prompt.<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><tbody><tr><th><strong>Traditional LLM Teaching<\/strong><\/th><th><strong>RAG approach<\/strong><\/th><\/tr><tr><td><strong>Changes in weights:<\/strong>&nbsp;In the course of learning the&nbsp;<em>w<\/em>&nbsp;Weight vectors are fixed.<\/td><td><strong>Contextual augmentation:<\/strong>&nbsp;The weights remain intact.<\/td><\/tr><tr><td><strong>Consistent hypothesis:<\/strong>&nbsp;The model generalizes from learned schemes.<\/td><td><strong>Direct data transfer:<\/strong>&nbsp;The model works from the source document received.<\/td><\/tr><tr><td><strong>Static knowledge:<\/strong>&nbsp;New information requires costly re-teaching.<\/td><td><strong>Dynamic freshness:<\/strong>&nbsp;Immediate knowledge base expansion by uploading.<\/td><\/tr><tr><td><strong>Closed system:<\/strong>&nbsp;It only has access to pre-slaughter data.<\/td><td><strong>Open system:<\/strong>&nbsp;Access to corporate and live data.<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\">3. Operating mechanism of the RAG: Step by step<\/h3>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>Data Source and Embeddings:<\/strong>\u00a0Raw text is broken down into semantic units (chunks) and then multidimensional using a special model\u00a0<strong>vector space<\/strong>\u00a0(Vector space) is placed.<\/li>\n\n\n\n<li><strong>Vector databases:<\/strong>\u00a0Documents are not stored as text, but as mathematical coordinates, which allows for a report-based search.<\/li>\n\n\n\n<li><strong>Search and query:<\/strong>\u00a0We also turn the user question into a vector and find the closest (most relevant) data points to it.<\/li>\n\n\n\n<li><strong>Augmentation:<\/strong>\u00a0We combine the extracted resources and the original question into a structured prompt.<\/li>\n\n\n\n<li><strong>Generation:<\/strong>\u00a0The LLM does the inference, but limits its response to the resources received.<\/li>\n<\/ol>\n\n\n\n<h3 class=\"wp-block-heading\">4. Advantages of the technology: Why is it worth implementing?<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Drastic reduction of hallucinations:<\/strong>\u00a0The model responds not on the basis of its probability weights, but on the basis of the facts attached.<\/li>\n\n\n\n<li><strong>Security of your data:<\/strong>\u00a0Sensitive company documents can be used without being included in the teaching set of public models.<\/li>\n\n\n\n<li><strong>Auditability:<\/strong>\u00a0At the end of the answers there is a precise source indication (citation) so that the information can be traced back.<\/li>\n\n\n\n<li><strong>Cost-effectiveness:<\/strong>\u00a0There is no need for continuous re-teaching requiring expensive GPU capacities.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">5. Practical implementation: OpenWebUI and Ollama architecture<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">The core of modern implementations is&nbsp;<strong>Ollama<\/strong>&nbsp;(Model Management) and&nbsp;<strong>OpenWebUI<\/strong>&nbsp;(interface and RAG engine). As a senior architecture, it is important to see the scalability of the system: OpenWebUI can handle multiple Ollama instances in one&nbsp;<strong>random selection strategy<\/strong>&nbsp;Based on Random Selection Load Balancing. For this, it is essential that Model IDs (e.g.&nbsp;<code>deepseek-r1:latest<\/code>) are exactly the same on all nodes.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Configuration corners:<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Model-specific settings:<\/strong>\u00a0In the DeepSeek-R1 or Qwen3 logic models, Ollamat\u00a0<code>--reasoning-parser<\/code>\u00a0The flag should be triggered so that OpenWebUI correctly separates the thought process from the final response.<\/li>\n\n\n\n<li><strong>RBAC (Role-based Access):<\/strong>\u00a0OpenWebUI handles three basic roles:\u00a0<strong>Admin<\/strong>,\u00a0<strong>User<\/strong>, and the post-registration waiting list manager\u00a0<strong>Pending<\/strong>. Eligibility management\u00a0<strong>additive<\/strong>\u00a0(additive permissions), which means that the user's ultimate powers are summed up from the union of permissions for his role and its linked groups.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">6. Troubleshooting and technical challenges<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">The most common error when installing RAG systems is the misunderstanding of network layers.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>\u2018Localhost Trap\u2019:<\/strong>&nbsp;As a rule, it means something different for the browser and the backend.&nbsp;<code>localhost<\/code>. While in the browser it means the user's own machine, for the OpenWebUI backend running inside the Docker container it means its own container.<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Solution:<\/strong>\u00a0For backend connections (Ollama, APIs), use a fixed internal IP or\u00a0<code>host.docker.internal<\/code>\u00a0address.<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Common Symptoms and Solutions:<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>\u2018Unexpected token \u2018d\u2019\u2019 or JSON error:<\/strong>\u00a0Typically, a CORS or WebSocket configuration error. Let's check the\u00a0<code>CORS_ALLOW_ORIGIN<\/code>\u00a0variable.<\/li>\n\n\n\n<li><strong>Disintegrating Markdown (e.g. ##, ** remains visible):<\/strong>\u00a0This is\u00a0<strong>Nginx proxy buffering<\/strong>\u00a0It breaks the SSE (Server-Sent Events) stream. Solution:\u00a0<code>proxy_buffering off;<\/code>.<\/li>\n\n\n\n<li><strong>Endless charging on model list:<\/strong>\u00a0If a configured endpoint is unavailable, the system's default 10-second timeout adds up. We use the\u00a0<code>AIOHTTP_CLIENT_TIMEOUT_MODEL_LIST=2<\/code>\u00a0Setup to improve response time.<\/li>\n\n\n\n<li><strong>SSL authentication errors:<\/strong>\u00a0For internal devices (e.g. Tika), use:\n<ul class=\"wp-block-list\">\n<li><code>HTTPS_CHECK=False<\/code><\/li>\n\n\n\n<li><code>OLLAMA_SSL_VERIFICATION=False<\/code><\/li>\n<\/ul>\n<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">7. Ethics and responsible use of AI<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Although RAG is a technical solution to hallucinations, it does not eliminate human bias. If the data entering the knowledge base is biased or erroneous, the system will uncritically present it as a fact. In a corporate environment, therefore, controlled data entry and RBAC-based access control are not only technical but also ethical requirements. Data confidentiality and responsible use of algorithms are ensured by transparent source designation.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">8. Summary and Future Perspectives<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Technological advances point towards General Artificial Intelligence (AGI), where machines will no longer follow patterns, but will be able to solve complex, multi-stage problems. Today, RAG is the most accessible and safest way to use AI not only for entertainment, but as a specialist assistant that generates real business value. For the readers of megarypeter.hu, the implementation of RAG is the bridge between static language models and real, data-driven intelligence.<\/p>","protected":false},"excerpt":{"rendered":"<p>The transformational power of large language models (LLMs) is unquestionable, but their architecture-level limitations \u2013 the \u2018knowledge cutoff\u2019 and hallucinations \u2013 pose serious risks in a business environment. <\/p>","protected":false},"author":2,"featured_media":1557,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","format":"standard","meta":{"slim_seo":{"title":"Mi az a RAG? A Retrieval-Augmented Generation teljes \u00fatmutat\u00f3ja: Elm\u00e9let \u00e9s gyakorlat - Tesz\u00e1ry P\u00e9ter","description":"A nagy nyelvi modellek (LLM) transzform\u00e1ci\u00f3s ereje megk\u00e9rd\u0151jelezhetetlen, azonban architekt\u00fara-szint\u0171 korl\u00e1taik \u2013 a \"tud\u00e1s-v\u00e1g\u00e1si id\u0151pont\" (knowledge cutoff) \u00e9s"},"footnotes":""},"categories":[93],"tags":[94,96,95],"class_list":["post-2259","publikacio","type-publikacio","status-publish","format-standard","has-post-thumbnail","hentry","category-ai","tag-ai","tag-mesterseges-intelligencia","tag-mi"],"desktop_mode_lock":null,"desktop_mode_contributors":[],"desktop_mode_attached_media":[1557],"meta_box":[],"_links":{"self":[{"href":"https:\/\/teszarypeter.hu\/en\/wp-json\/wp\/v2\/publikacio\/2259","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/teszarypeter.hu\/en\/wp-json\/wp\/v2\/publikacio"}],"about":[{"href":"https:\/\/teszarypeter.hu\/en\/wp-json\/wp\/v2\/types\/publikacio"}],"author":[{"embeddable":true,"href":"https:\/\/teszarypeter.hu\/en\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/teszarypeter.hu\/en\/wp-json\/wp\/v2\/comments?post=2259"}],"version-history":[{"count":1,"href":"https:\/\/teszarypeter.hu\/en\/wp-json\/wp\/v2\/publikacio\/2259\/revisions"}],"predecessor-version":[{"id":2260,"href":"https:\/\/teszarypeter.hu\/en\/wp-json\/wp\/v2\/publikacio\/2259\/revisions\/2260"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/teszarypeter.hu\/en\/wp-json\/wp\/v2\/media\/1557"}],"wp:attachment":[{"href":"https:\/\/teszarypeter.hu\/en\/wp-json\/wp\/v2\/media?parent=2259"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/teszarypeter.hu\/en\/wp-json\/wp\/v2\/categories?post=2259"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/teszarypeter.hu\/en\/wp-json\/wp\/v2\/tags?post=2259"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}