What is an LLM?

Question

What is an LLM?

Accepted Answer

## What is an LLM?

A **Large Language Model (LLM)** is a type of deep learning model trained on massive text datasets to understand and generate human language. It learns statistical patterns across billions of text examples to predict what text should come next in a sequence.

### Core Concept

> LLMs are trained with one simple objective: **predict the next token** given a sequence of tokens.

From this simple task, they develop emergent abilities: reasoning, coding, translation, summarization, and more.

### Architecture

All modern LLMs are based on the **Transformer** architecture:

```
Input Text → Tokenizer → Token Embeddings
                              ↓
                    [Transformer Blocks × N]
                    - Multi-head self-attention
                    - Feed-forward network
                    - Layer normalization
                              ↓
                    Output logits → Next token
```

### Training Process

| Phase | Description |
|-------|-------------|
| **Pre-training** | Self-supervised learning on massive text corpora (predict next token) |
| **Supervised Fine-tuning (SFT)** | Fine-tune on high-quality instruction-following examples |
| **RLHF** | Reinforcement Learning from Human Feedback to align with human preferences |

### Key Metrics That Define "Large"

| Model | Parameters | Training Tokens |
|-------|-----------|----------------|
| GPT-2 | 1.5B | 40B |
| GPT-3 | 175B | 300B |
| GPT-4 | ~1.8T (estimated) | ~13T |
| Llama 3.1 405B | 405B | 15T+ |

### What LLMs Can Do

* Generate human-quality text and code
* Answer questions and explain concepts
* Translate between languages
* Summarize documents
* Reason through multi-step problems
* Follow complex instructions

### Using an LLM via API

```python
from anthropic import Anthropic

client = Anthropic()

response = client.messages.create(
    model="claude-opus-4-6",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Explain attention mechanisms in 3 sentences."}
    ]
)
print(response.content[0].text)
```

### LLM Limitations

| Limitation | Description |
|-----------|-------------|
| **Knowledge cutoff** | Doesn't know events after training date |
| **Hallucination** | Can generate plausible but false information |
| **Context window** | Can only process limited text at once |
| **No real-time access** | Can't browse the web (without tools) |
| **Cost** | API calls cost money per token |

### Popular LLMs (2025)

| Provider | Model | Strength |
|---------|-------|---------|
| Anthropic | Claude 3.5 Sonnet | Coding, reasoning |
| OpenAI | GPT-4o | Multimodal, instruction following |
| Google | Gemini 1.5 Pro | Long context (1M tokens) |
| Meta | Llama 3.1 | Open source |
| DeepSeek | DeepSeek-V3 | Cost-efficient |

What is an LLM?

Answer

What is an LLM?

Core Concept

Architecture

Training Process

Key Metrics That Define "Large"

What LLMs Can Do

Using an LLM via API

LLM Limitations

Popular LLMs (2025)

Related Concepts

What is AI?

What are all the current types of AI?

What is Machine Learning (ML)?

What is Deep Learning in AI?

What is the difference between LLM and AI?

Phase	Description
Pre-training	Self-supervised learning on massive text corpora (predict next token)
Supervised Fine-tuning (SFT)	Fine-tune on high-quality instruction-following examples
RLHF	Reinforcement Learning from Human Feedback to align with human preferences

Model	Parameters	Training Tokens
GPT-2	1.5B	40B
GPT-3	175B	300B
GPT-4	~1.8T (estimated)	~13T
Llama 3.1 405B	405B	15T+

Limitation	Description
Knowledge cutoff	Doesn't know events after training date
Hallucination	Can generate plausible but false information
Context window	Can only process limited text at once
No real-time access	Can't browse the web (without tools)
Cost	API calls cost money per token

Provider	Model	Strength
Anthropic	Claude 3.5 Sonnet	Coding, reasoning
OpenAI	GPT-4o	Multimodal, instruction following
Google	Gemini 1.5 Pro	Long context (1M tokens)
Meta	Llama 3.1	Open source
DeepSeek	DeepSeek-V3	Cost-efficient