What is the AI model? How does ChatGPT or an image generator work? Get to know the types and operation of the most important AI models used today in an easy-to-understand way, through the eyes of a developer.
Have you ever wondered what the magic is behind ChatGPT texts, breathtaking images from Midjourney, or code additions from GitHub Copilot? It's not a tiny elf sitting in the machine, it's an incredibly complex, data-taught AI model.
As a developer and academic, it's part of my job to dig deep into technology. This topic is also at the heart of my doctoral research, so I decided to start a series of articles that will help me navigate the world of artificial intelligence models.
This is the first part where we lay the foundations: what are these models, what are their main types and how do they ‘think’?

What is an AI model, simply?
Imagine the AI model as a brain trained in specialized knowledge. It's not a traditional program that's pre-written if... then... it works by the rules. Instead, by analyzing vast amounts of data (text, images, code) he learns to recognize patterns and create new, original content based on them.
The AI model is therefore not the program itself (as the ChatGPT interface), but the engine under the bonnet, which carries out the real ‘thinking’.
The most important types of AI models by function
Although there are many models, most of them can be categorized into a few main categories based on the type of data they work with.
1. Large Language Models (LLMs)
These are the best known. They are responsible for comprehension, creation, summary and translation.
- How do they work? ‘Word prediction machines’ operating essentially on a statistical basis. Based on the huge amount of text they have trained, they learn which is the most likely next word after a given word or part of a sentence. Repeating this process, they build up complete sentences and paragraphs that seem meaningful.
- Famous examples:
- GPT series (OpenAI): The engine of ChatGPT, currently GPT-4 is the most well-known.
- Gemini (Google): Google's response, which is deeply integrated into the search engine and other Google products.
- Claude (Anthropic): A highly developed model focusing on security and ‘honesty’.
2. Image Generation Models (Image Generation Models)
These models are able to create completely new, photorealistic or artistic images based on textual descriptions (prompts).
- How do they work? Most modern models use so-called ‘diffusion’ techniques. Imagine that, starting from a completely noisy, random image, step by step the image is ‘cleaned’ according to the guidelines of the text prompt until the desired creation is created.
- More well-known examples:
- Midjourney: The model that produces the highest artistic quality and can be used via Discord.
- DALL-E 3 (OpenAI): Integrated into ChatGPT, it is a highly creative image generator that interprets prompts well.
- Stable Diffusion: It is a favorite of the open source community, which can even be run on a home computer.
3. Code Generation Models
These are the developers’ best friends (‘now’). They are specially trained to write, complete, debug, and translate code.
- How do they work? These are essentially special language models that are not taught on books, but on billions of lines of code found on GitHub and other source code databases.
- Examples include:
- GitHub Copilot: The most common ‘co-programmer’ integrated into code editors.
- AlphaCode 2 (DeepMind): Google's model, which is already capable of solving competition programming tasks.
4. Other exciting directions
The world doesn't stop at text, images, and code. New models are constantly appearing:
- Sound (audio): Models that convert text to sound (ElevenLabs) or generate music (Suno AI).
- Video: Highly resource-intensive video-to-text models such as the OPEN AI Sora
How does an AI model ‘learn’?
Simply put, the process is similar to how a child learns.
- The structure (neural mesh): The AI model is based on a mathematical structure that mimics the neuronal network of the human brain.
- The ‘textbook’ (teaching dataset): The model is ‘released’ onto a gigantic data set. Most of the texts on the Internet, millions of books, billions of images.
- "Learning" (training): The model starts to process the data and tries to guess patterns (e.g. which pixels or other words are often associated with the word ‘dog’). Each time you try, you get feedback and refine your internal ‘weights’ to make your next tip more accurate. This process requires an amazing computational capacity.
This article was just scratching the surface. My goal is to dive deeper into a certain area in the coming weeks and months. We will look at how to write powerful ‘prompts’ for image generators, how to use ChatGPT to speed up our daily work, or even how to run simpler models on our own home server.
