Famous Untrained Models
There are several well-known neural network architectures / untrained models (the “blank brains”) that have official names. They are a combination of algorithms, theoretical papers on design details, and some later program libraries that were later developed to implement those models, like TensorFlow or PyTorch.
They’re famous in AI research because many custom untrained models are built from them.
Some examples:
- Transformer – The architecture introduced by Google in 2017 whose iterations / variations / implementations power most modern LLMs, including GPT, Claude, and Gemini.
- BERT (Bidirectional Encoder Representations from Transformers) – Originally released by Google, famous for natural language understanding tasks.
- Vision Transformer (ViT) – A transformer architecture adapted for images instead of text.
- ResNet (Residual Network) – Very influential architecture for image recognition.
- UNet – Widely used for image segmentation and also a core part of image generators like Stable Diffusion.
When companies like OpenAI, Anthropic, or Google build a new LLM, they usually start with their own customized untrained architecture based on these concepts / untrained models, give it an internal name, and then train it into the final product (e.g., GPT-4, Claude 3, Gemini 1.5).
Anyone can download an open-source Transformer or ResNet implementation and either train it from scratch or load someone else’s pre-trained weights.
Training and Following the Patterns
Training is the process where an AI model is given enormous amounts of data and learns to find patterns in it. The more data it sees, the more patterns it will discover. Patterns – especially repeating ones – are everywhere, both in nature and in information, and they are the foundation of how the AI learns.
Once training is complete, the AI has built a huge internal collection of patterns and the “rules” for how those patterns tend to behave or evolve. When you ask the AI model a question, it takes your question as new data, searches for the underlying pattern, and then finds where that pattern fits within the vast pattern library it built during training. Using its knowledge of how patterns evolve and connect, the AI then predicts the next pieces of that pattern – whether that means the next words in a sentence, the next pixels in an image, or the next notes in a melody.
In this way, your question is like a single puzzle piece, and the AI’s job is to figure out where that piece belongs in the bigger picture it has learned to assemble.
A Concrete Pattern Example
Let’s say during training, the future LLM model sees thousands of examples like these in its training data:
- “The capital of France is Paris”
- “Paris is the capital of France”
- “France’s capital city is Paris”
- “The French capital, Paris, is known for…”
Through training, the model learns the pattern that connects “France” + “capital” → “Paris”. It doesn’t just memorize these exact sentences – it builds an internal understanding of the relationship.
Later, when you ask “What is the capital of France?”, the model:
- Recognizes the pattern – It identifies this as a “capital city” question about France
- Searches its pattern library – It finds all the learned connections between France and capital cities
- Predicts the continuation – Based on the strong pattern it learned, it predicts “Paris” as the most likely completion
The same process works for more complex patterns. If the model learned that “The Eiffel Tower is in Paris” and “Paris is in France”, it can combine these patterns to answer, “What country is the Eiffel Tower in?” even if it never saw that exact question during training.
This is why training on diverse, high-quality data matters – the more varied examples the model sees, the richer and more flexible its pattern library becomes.
Of course, this is a very simplified example. In truth, the patterns learned are on a far more deep and abstract levels – and that’s partially were magic happens. But the idea is the same.
In the next few sections, we will dive a bit deeper just to gain a greater appreciation towards the complexity of it all. However, do note that all this is oversimplification. In reality, it’s all way more complex, but it’s a starting point.
When we often talk about neural networks, and their learning process, we mention terms like “numbers”, “tokens”, “weights”. What do they mean?
By: Vahe Demirkhanyan, Sr. Security Analyst, Illumant