What are billions of parameter trained models (e.g., 7B, 120B)?
#gen-ai#llm
Answer
Billions of Parameters in AI Models (7B, 70B, 120B, etc.)
Parameters are the learned numerical values in a neural network β specifically the weights and biases that determine how the model transforms inputs to outputs. "7B" means 7 billion such numbers.
What Are Parameters?
In a simple linear layer:
pythonimport torch.nn as nn # A linear layer with 1000 input features β 500 output features layer = nn.Linear(1000, 500) # Parameters = weights (1000Γ500) + biases (500) total = 1000 * 500 + 500 # = 500,500 parameters print(f"Parameters: {sum(p.numel() for p in layer.parameters()):,}")
A model with 7 billion parameters has 7,000,000,000 such numbers, arranged across thousands of layers.
Why Parameter Count Matters
| More Parameters β | Effect |
|---|---|
| More knowledge | Can store more facts and patterns |
| Better reasoning | More capacity for complex computation |
| Higher compute | More GPU memory and processing needed |
| Slower inference | More calculations per token |
| More expensive | Higher hardware and API costs |
Common Model Sizes
| Size | Approx VRAM needed | Example Models |
|---|---|---|
| 1-3B | 2-6 GB | Phi-3 Mini, Llama 3.2 3B |
| 7-8B | 8-16 GB | Llama 3.1 8B, Mistral 7B, Gemma 7B |
| 13-14B | 16-28 GB | Llama 2 13B, Phi-3 Medium |
| 34-35B | 40-70 GB | CodeLlama 34B |
| 70-72B | 80-140 GB | Llama 3.1 70B, Qwen 72B |
| 120-180B | 2-4Γ A100s | DeepSeek-V2 |
| 400B+ | Multi-node GPU cluster | Llama 3.1 405B, GPT-4 (est.) |
Parameter Count vs Actual Storage
Each parameter is typically stored as:
- FP32: 4 bytes β 7B model = ~28 GB
- FP16/BF16: 2 bytes β 7B model = ~14 GB
- INT8 quantized: 1 byte β 7B model = ~7 GB
- INT4 quantized: 0.5 bytes β 7B model = ~3.5 GB
python# Calculate model size def estimate_model_size_gb(params_billions: float, bytes_per_param: float = 2) -> float: return params_billions * 1e9 * bytes_per_param / (1024**3) print(f"7B at FP16: {estimate_model_size_gb(7):.1f} GB") # 13.0 GB print(f"70B at FP16: {estimate_model_size_gb(70):.1f} GB") # 130.2 GB print(f"7B at INT4: {estimate_model_size_gb(7, 0.5):.1f} GB") # 3.3 GB
Scaling Laws
Research shows predictable relationships:
- More parameters β better performance (up to a point)
- More training data β better performance
- Optimal: balance parameters and tokens (Chinchilla law: ~20 tokens per parameter)
Quality vs Size Tradeoff
textSize β Quality β Speed β Cost ββββββββββββββΌβββββββββββΌββββββββββΌββββββ 1-3B β Basic β Fast β Low 7-8B β Good β Fast β Low 13-14B β Better β Medium β Medium 70B β Excellentβ Slow β High 400B+ β Best β Very slowβ Very high
Running Models Locally
bash# Ollama - easily run models by size ollama pull llama3.1:8b # 8B β runs on 8GB RAM ollama pull llama3.1:70b # 70B β needs 64GB+ RAM ollama pull phi3:mini # 3.8B β great for laptops
The 7B-8B size sweet spot is popular for local inference β good quality with practical hardware requirements.