Mar 10, 2025 12 min read

The Ultimate Generative AI Glossary: Understanding the Jargon

Software engineers love their jargon. Every field has specific terms to describe its concepts, and understanding this specialised vocabulary is a rite of passage. It's also essential for comprehension—without knowing the terminology, reading about a topic can feel like deciphering spells from Harry Potter. This can be especially overwhelming in newer fields where everyone seems fluent in concepts you're still trying to grasp.

Generative AI is currently the hottest topic in tech, so we've created a comprehensive glossary with easy-to-understand explanations. Use this to learn fundamental concepts or as a reference the next time an industry leader tweets something that leaves you wondering if they're discussing a new model or casting a spell.

Core Concepts

Generative AI (Gen AI)

AI systems designed to create new content (text, images, music, code, etc.) by learning patterns from training data.

Example: When you ask ChatGPT to write a story or DALL-E to create an image, you're using generative AI.

Further reading: "Generative AI: A Creative New World" - Stanford HAI

Prompt

The input you provide to an AI model to generate a response.

Example: "Write a poem about autumn leaves" or "Create an image of a cat wearing a space helmet."

Further reading: "Prompt Engineering Guide" - DAIR.AI

Completion

The AI-generated response to your prompt. It's called a "completion" because early models like GPT were designed to complete text that you started.

Example: If your prompt is "The best way to learn programming is," the completion might be "to practice regularly by building small projects that interest you."

Token

The basic unit that AI models process—usually parts of words, whole words, or individual characters. Models have token limits that determine how much text they can process at once.

Example: The phrase "I love machine learning" might be split into tokens like ["I", "love", "machine", "learn", "ing"].

Why tokens instead of words? Tokens allow models to handle vocabulary more efficiently. Common words are single tokens, while rare words may be split into multiple tokens.

Temperature

A setting that controls the randomness of AI responses. Values typically range from 0 to 1.

Low temperature (0-0.3): More focused, consistent, and predictable responses. High temperature (0.7-1.0): More creative, diverse, and sometimes surprising results.

Example: When asking for a business email, you might use a temperature of 0.1 for consistency. When asking for a creative story, you might use 0.8 for more imaginative results.

Top-K & Top-P Sampling

Methods to control how the model selects the next word in a sequence.

Top-K sampling: The model only considers the K most likely next words. Top-P (nucleus) sampling: The model considers the smallest set of words whose combined probability exceeds P.

Example: With Top-K = 10, if the model predicts the next word after "The weather is," it will only choose from the 10 most likely words (e.g., "sunny," "rainy," "cold," etc.) rather than considering all possible words in its vocabulary.

Model Types & Architectures

Transformer

The revolutionary neural network architecture that powers most modern AI systems like GPT, BERT, and LLaMA. It uses a mechanism called "attention" to process text efficiently.

Example: When you use ChatGPT, you're interacting with a transformer-based model.

Further reading: "Attention Is All You Need" - The original transformer paper by Vaswani et al.

GPT (Generative Pre-trained Transformer)

A type of large language model designed to generate human-like text based on the input it receives.

Example: ChatGPT is based on the GPT architecture. When you ask it to write an essay or explain a concept, it uses its pre-trained knowledge to generate relevant text.

Further reading: "Language Models are Few-Shot Learners" - The GPT-3 paper by Brown et al.

BERT (Bidirectional Encoder Representations from Transformers)

A model designed primarily for understanding text rather than generating it. It reads text in both directions (left-to-right and right-to-left) to better grasp context.

Example: When Google Search better understands the meaning behind your query (rather than just matching keywords), it's likely using BERT or a similar model.

Further reading: "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding" - Devlin et al.

Diffusion Model

A type of model that generates images by gradually removing noise from random patterns. Popular for image generation.

Example: DALL-E, Midjourney, and Stable Diffusion all use diffusion models to create images from text descriptions.

Further reading: "High-Resolution Image Synthesis with Latent Diffusion Models" - The Stable Diffusion paper by Rombach et al.

GAN (Generative Adversarial Network)

A system where two AI models compete: one creates content (the generator), and the other tries to identify if the content is real or AI-generated (the discriminator). This competition improves the quality of the generated content.

Example: Earlier AI image generators like StyleGAN used GANs to create realistic human faces that don't actually exist.

Further reading: "Generative Adversarial Networks" - The original GAN paper by Goodfellow et al.

Training & Fine-Tuning

Pre-training

The initial phase where a model learns language patterns from massive datasets (often hundreds of gigabytes of text) without specific tasks in mind.

Example: Before GPT-4 could answer your specific questions, it was pre-trained on a diverse range of internet text, books, and other sources to learn grammar, facts, reasoning, and more.

Fine-tuning

The process of adapting a pre-trained model for specific tasks using a smaller, more targeted dataset.

Example: A company might take a general-purpose model like GPT-3.5 and fine-tune it on customer service conversations to create a specialized customer support chatbot.

Further reading: "Parameter-Efficient Transfer Learning for NLP" - Houlsby et al.

Reinforcement Learning from Human Feedback (RLHF)

A technique where human ratings and preferences are used to improve AI responses, making them more helpful, accurate, and aligned with human values.

Example: Human evaluators rate different AI responses to the same question (e.g., "Which response is more helpful and accurate?"). These ratings help the model learn what humans prefer.

Further reading: "Training language models to follow instructions with human feedback" - Ouyang et al. (InstructGPT paper)

Zero-shot Learning

When an AI performs a task it wasn't specifically trained for, using only its general knowledge.

Example: Asking ChatGPT to "Write a poem about climate change in the style of Dr. Seuss" without ever showing it examples of such poems.

Few-shot Learning

When an AI improves performance by seeing just a few examples of a task.

Example: Telling ChatGPT, "Here are two examples of professional emails: [Example 1] [Example 2]. Now write me a similar professional email to schedule a meeting with a client."

Further reading: "Language Models are Few-Shot Learners" - Brown et al.

Image & Video Generation Terms

Text-to-Image

AI systems that generate images based on text descriptions.

Example: Typing "sunset over mountains with a lake reflection" into DALL-E or Midjourney and receiving a generated image matching that description.

Further reading: "DALL·E 2: Creating Images from Text" - OpenAI blog post

Latent Space

An abstract mathematical representation where images (or other data) are encoded as points. Similar images are close together in this space, enabling the generation of new, related content.

Example: In latent space, all images of dogs might be clustered together. By navigating this space, AI can generate new dog images or gradually transform a dog image into something else.

Style Transfer

The technique of applying the artistic style of one image to the content of another.

Example: Taking a photo of your house and making it look like it was painted by Vincent van Gogh or drawn as a cartoon.

Further reading: "Image Style Transfer Using Convolutional Neural Networks" - Gatys et al.

Audio & Speech Generation

Text-to-Speech (TTS)

Technology that converts written text into natural-sounding spoken words.

Example: When your navigation app reads directions aloud or when Amazon's Alexa responds to your questions.

Further reading: "Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions" - The Tacotron 2 paper by Shen et al.

Speech-to-Text (STT)

Technology that transcribes spoken language into written text.

Example: When you use voice dictation on your phone to write a text message or when Zoom automatically generates meeting transcripts.

Further reading: "Whisper: Robust Speech Recognition via Large-Scale Weak Supervision" - Radford et al.

Voice Cloning

Technology that can replicate a specific person's voice based on samples of their speech.

Example: After listening to a few minutes of someone speaking, AI can generate new speech that sounds like that person saying things they never actually said.

Ethics & Risks

Hallucination

When AI confidently generates incorrect or made-up information as if it were factual.

Example: An AI might state that "The Golden Gate Bridge was completed in 1792" (it was actually completed in 1937) or refer to non-existent research papers or events.

Further reading: "Hallucination in Large Language Models: A Survey" - Ji et al.

Bias

When AI systems reflect or amplify unfair prejudices present in their training data or design.

Example: An AI hiring system might favor certain demographic groups if trained on historical hiring data that contains these biases.

Further reading: "On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?" - Bender et al.

Deepfake

Highly realistic but fabricated media (usually images or videos) created using AI to make it appear that someone did or said something they didn't.

Example: Videos appearing to show a politician or celebrity saying something controversial that they never actually said.

Further reading: "Deepfakes and Disinformation: Exploring the Impact of Synthetic Media" - Wilson Center report

Guardrails

Safety measures implemented to prevent AI systems from generating harmful, biased, or inappropriate content.

Example: ChatGPT refusing to generate instructions for illegal activities or implementing content filters on image generation systems to prevent creation of explicit material.

Neural Networks & Components

Feedforward Neural Network (FNN)

The simplest type of neural network where information flows in one direction only—from input to output through one or more layers of neurons.

Example: A basic image classifier that takes pixel values as input and outputs the probability of the image containing a cat, dog, or other object.

Recurrent Neural Network (RNN)

A type of neural network designed for sequential data, where information from previous steps influences current processing—making it useful for tasks like text generation.

Example: Early language models used RNNs to predict the next word in a sentence based on all previous words.

Further reading: "Sequence to Sequence Learning with Neural Networks" - Sutskever et al.

Long Short-Term Memory (LSTM)

A specialized type of RNN that can remember information over longer sequences, solving the "vanishing gradient" problem that plagued earlier RNNs.

Example: Before transformers became dominant, LSTMs were commonly used for translation, text summarization, and speech recognition.

Further reading: "Long Short-Term Memory" - The original LSTM paper by Hochreiter & Schmidhuber

Attention Mechanism

A technique that allows neural networks to focus on relevant parts of the input when generating each part of the output.

Example: When translating "The man with the black hat walked quickly," an attention mechanism helps the model focus on "man" when generating "el hombre" and on "black hat" when generating "con el sombrero negro."

Further reading: "Neural Machine Translation by Jointly Learning to Align and Translate" - Bahdanau et al.

Self-Attention

A key component of transformer models that allows each word in a sentence to "attend to" (or focus on) other words in the same sentence to better understand context.

Example: In the sentence "The bank is closed because it is a holiday," self-attention helps the model understand that "it" refers to "the bank," not something else mentioned earlier.

Multi-Head Attention

A technique where multiple attention mechanisms work in parallel, allowing the model to focus on different aspects of the input simultaneously.

Example: When analyzing "I saw her duck," one attention head might focus on the relationship between "her" and "duck" (possession), while another might focus on "saw" and "duck" (the action).

Positional Encoding

A method used in transformers to give the model information about word order, since the basic attention mechanism doesn't inherently understand sequence.

Example: Without positional encoding, "Dog bites man" and "Man bites dog" would look the same to a transformer, despite having very different meanings.

Training & Optimization

Backpropagation

The primary algorithm for training neural networks, where errors in predictions are used to adjust the model's parameters, working backward from output to input layers.

Example: If a model predicts an image is 95% likely to be a cat when it's actually a dog, backpropagation adjusts the model's weights to reduce this error.

Further reading: "Learning representations by back-propagating errors" - Rumelhart, Hinton & Williams

Gradient Descent

An optimization algorithm that iteratively adjusts model parameters to minimize error, similar to finding the bottom of a valley by always walking downhill.

Example: Imagine trying to find the lowest point in a hilly landscape while blindfolded—you feel the slope under your feet and keep walking downhill until you can't go any lower.

Adam Optimizer

A popular optimization algorithm that adapts the learning rate for each parameter, making training more efficient than basic gradient descent.

Example: While standard gradient descent uses the same step size for all parameters, Adam might use smaller steps for frequently updated parameters and larger steps for rarely updated ones.

Further reading: "Adam: A Method for Stochastic Optimization" - Kingma & Ba

Batch Normalization

A technique that normalizes the input to each layer during training, making neural networks train faster and more stably.

Example: If different features have very different scales (like age [0-100] and income [thousands or millions]), batch normalization helps prevent the larger-scale features from dominating the training process.

Further reading: "Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift" - Ioffe & Szegedy

Dropout

A regularization technique where random neurons are temporarily ignored during training, which helps prevent overfitting (when a model performs well on training data but poorly on new data).

Example: Imagine studying for a test with your notes sometimes randomly hidden—this forces you to understand the material more deeply rather than just memorizing your notes.

Further reading: "Dropout: A Simple Way to Prevent Neural Networks from Overfitting" - Srivastava et al.

Weight Initialization

Strategies for setting the initial values of a neural network's parameters before training begins, which can significantly impact training success.

Example: Starting with weights that are too large or too small can cause signals to explode or vanish as they pass through the network, making learning impossible.

Further reading: "Understanding the difficulty of training deep feedforward neural networks" - Glorot & Bengio

Loss Functions

Mathematical functions that measure how far a model's predictions are from the actual values, guiding the learning process.

Example: For a classification task like spam detection, a loss function would assign a higher penalty when the model confidently predicts "not spam" for an actual spam email.

Generative AI Techniques & Algorithms

Text Generation

Beam Search

A decoding algorithm that explores multiple possible next words at each step and selects the sequence with the highest overall probability.

Example: When generating "The capital of France is," beam search might consider both "Paris" and "located" as the next word, then evaluate further words after each to find the most probable complete sentence.

Nucleus Sampling (Top-p)

A technique that dynamically selects from the most probable words whose cumulative probability exceeds a threshold p, allowing for more diversity in generation.

Example: If p=0.9, the model will only consider the most likely words that together have a 90% chance of being correct, ignoring the long tail of very unlikely words.

Further reading: "The Curious Case of Neural Text Degeneration" - Holtzman et al.

Tokenization

The process of breaking text into smaller units (tokens) that the model can process.

Example: The word "unreasonable" might be tokenized as ["un", "reason", "able"], allowing the model to understand parts of words it hasn't seen before.

Further reading: "Neural Machine Translation of Rare Words with Subword Units" - Sennrich et al.

Image & Video Generation

Variational Autoencoder (VAE)

A type of generative model that learns to encode data into a compressed representation and then decode it back, with added randomness that enables generation of new samples.

Example: A VAE trained on face images can generate new, fictional faces by sampling points in its latent space and decoding them.

Further reading: "Auto-Encoding Variational Bayes" - Kingma & Welling

Upscaling (Super-Resolution)

Techniques that increase the resolution of images beyond their original size, filling in missing details.

Example: Taking a blurry 480p image and enhancing it to appear as if it were captured in 4K resolution.

Further reading: "ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks" - Wang et al.

ControlNet

A technique that allows more precise control over image generation by conditioning it on additional inputs like sketches, poses, or depth maps.

Example: Providing a simple outline of a cat and having the AI generate a photorealistic cat that matches that exact pose and outline.

Further reading: "Adding Conditional Control to Text-to-Image Diffusion Models" - Zhang et al.

Model Deployment

Quantization

Reducing model size and computational requirements by representing weights with lower precision numbers.

Example: Converting a model's weights from 32-bit floating-point numbers to 8-bit integers can reduce its size by 75% with minimal loss in accuracy.

Further reading: "Quantizing deep convolutional networks for efficient inference: A whitepaper" - Krishnamoorthi

Pruning

Removing unnecessary connections or neurons from a neural network to make it smaller and faster.

Example: After training, analysis might reveal that 30% of a model's connections have very small weights that contribute little to the output, so these can be removed.

Further reading: "The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks" - Frankle & Carbin

Inference

The process of using a trained model to make predictions on new data.

Example: When you ask a question to ChatGPT, it's performing inference—using its trained parameters to generate a response to your specific input.

Metrics for Model Performance

Perplexity (PPL)

A measure of how well a language model predicts text. Lower perplexity means better prediction.

Example: If a model assigns a high probability to the actual next word in a text, its perplexity will be low, indicating it "understands" the language well.

Further reading: "A Neural Probabilistic Language Model" - Bengio et al.

BLEU Score

A metric that measures the similarity between machine-generated text and human-written reference text, commonly used for translation evaluation.

Example: When evaluating a translation from English to Spanish, BLEU compares how many words and phrases in the machine translation match those in reference translations by human experts.

Further reading: "BLEU: a Method for Automatic Evaluation of Machine Translation" - Papineni et al.

FID (Fréchet Inception Distance)

A metric for evaluating the quality of AI-generated images by comparing their statistical properties to those of real images.

Example: A low FID score indicates that AI-generated cat images have similar characteristics (colors, textures, shapes) to real cat photos.

Further reading: "GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium" - Heusel et al.

Fine-Tuning & Adaptation

LoRA (Low-Rank Adaptation)

A memory-efficient fine-tuning method that adds small, trainable matrices to a frozen pre-trained model instead of modifying all parameters.

Example: Fine-tuning GPT-3 traditionally would require updating billions of parameters, but LoRA might add just a few million trainable parameters while achieving similar results.

Further reading: "LoRA: Low-Rank Adaptation of Large Language Models" - Hu et al.

Adapters

Small modules inserted between layers of a pre-trained model to adapt it to new tasks without modifying the original weights.

Example: A company with limited computing resources could add small adapter modules to a large language model to customize it for legal document analysis without retraining the entire model.

Further reading: "Parameter-Efficient Transfer Learning for NLP" - Houlsby et al.

Well that's all for now. We'll probably need to update this soon though. This glossary should help you navigate the world of Gen AI without feeling like you're lost in a wizard's spellbook!