Concept #109Mediumextended-ai-concepts

What are billions of parameter trained models (e.g., 7B, 120B)?

#gen-ai#llm

Answer

Billions of Parameters in AI Models (7B, 70B, 120B, etc.)

Parameters are the learned numerical values in a neural network β€” specifically the weights and biases that determine how the model transforms inputs to outputs. "7B" means 7 billion such numbers.

What Are Parameters?

In a simple linear layer:

python
import torch.nn as nn

# A linear layer with 1000 input features β†’ 500 output features
layer = nn.Linear(1000, 500)

# Parameters = weights (1000Γ—500) + biases (500)
total = 1000 * 500 + 500  # = 500,500 parameters
print(f"Parameters: {sum(p.numel() for p in layer.parameters()):,}")

A model with 7 billion parameters has 7,000,000,000 such numbers, arranged across thousands of layers.

Why Parameter Count Matters

More Parameters β†’Effect
More knowledgeCan store more facts and patterns
Better reasoningMore capacity for complex computation
Higher computeMore GPU memory and processing needed
Slower inferenceMore calculations per token
More expensiveHigher hardware and API costs

Common Model Sizes

SizeApprox VRAM neededExample Models
1-3B2-6 GBPhi-3 Mini, Llama 3.2 3B
7-8B8-16 GBLlama 3.1 8B, Mistral 7B, Gemma 7B
13-14B16-28 GBLlama 2 13B, Phi-3 Medium
34-35B40-70 GBCodeLlama 34B
70-72B80-140 GBLlama 3.1 70B, Qwen 72B
120-180B2-4Γ— A100sDeepSeek-V2
400B+Multi-node GPU clusterLlama 3.1 405B, GPT-4 (est.)

Parameter Count vs Actual Storage

Each parameter is typically stored as:

  • FP32: 4 bytes β†’ 7B model = ~28 GB
  • FP16/BF16: 2 bytes β†’ 7B model = ~14 GB
  • INT8 quantized: 1 byte β†’ 7B model = ~7 GB
  • INT4 quantized: 0.5 bytes β†’ 7B model = ~3.5 GB
python
# Calculate model size
def estimate_model_size_gb(params_billions: float, bytes_per_param: float = 2) -> float:
    return params_billions * 1e9 * bytes_per_param / (1024**3)

print(f"7B at FP16:  {estimate_model_size_gb(7):.1f} GB")    # 13.0 GB
print(f"70B at FP16: {estimate_model_size_gb(70):.1f} GB")   # 130.2 GB
print(f"7B at INT4:  {estimate_model_size_gb(7, 0.5):.1f} GB")  # 3.3 GB

Scaling Laws

Research shows predictable relationships:

  • More parameters β†’ better performance (up to a point)
  • More training data β†’ better performance
  • Optimal: balance parameters and tokens (Chinchilla law: ~20 tokens per parameter)

Quality vs Size Tradeoff

text
Size         β”‚ Quality  β”‚ Speed   β”‚ Cost
─────────────┼──────────┼─────────┼──────
1-3B         β”‚ Basic    β”‚ Fast    β”‚ Low
7-8B         β”‚ Good     β”‚ Fast    β”‚ Low
13-14B       β”‚ Better   β”‚ Medium  β”‚ Medium
70B          β”‚ Excellentβ”‚ Slow    β”‚ High
400B+        β”‚ Best     β”‚ Very slowβ”‚ Very high

Running Models Locally

bash
# Ollama - easily run models by size
ollama pull llama3.1:8b    # 8B β€” runs on 8GB RAM
ollama pull llama3.1:70b   # 70B β€” needs 64GB+ RAM
ollama pull phi3:mini       # 3.8B β€” great for laptops

The 7B-8B size sweet spot is popular for local inference β€” good quality with practical hardware requirements.