Answer
How to Fine-Tune an LLM
Fine-tuning adapts a pre-trained LLM to a specific task or domain by continuing training on a curated dataset. It lets you customize model behavior without training from scratch.
Fine-Tuning Approaches
| Approach | Description | VRAM | Best For |
|---|---|---|---|
| Full Fine-tuning | Update all weights | 80GB+ | Max accuracy, large budget |
| LoRA | Train low-rank adapter matrices | 16–24GB | Most production use cases |
| QLoRA | LoRA on quantized (4-bit) model | 8–12GB | Consumer GPUs |
| Prefix Tuning | Prepend trainable tokens | 8GB | Minimal parameter change |
| Prompt Tuning | Tune soft prompt embeddings only | 4GB | Lightest approach |
Step-by-Step Fine-Tuning with QLoRA
Step 1 — Install Dependencies
bashpip install transformers peft datasets trl bitsandbytes accelerate
Step 2 — Prepare Dataset
pythonfrom datasets import Dataset data = [ {"instruction": "Summarize the text.", "input": "AI is transforming...", "output": "AI is changing industries."}, {"instruction": "Translate to French.", "input": "Hello world", "output": "Bonjour le monde"}, ] def format_prompt(example): return {"text": f"### Instruction: {example['instruction']} ### Input: {example['input']} ### Response: {example['output']}"} dataset = Dataset.from_list(data).map(format_prompt)
Step 3 — Load Model in 4-bit (QLoRA)
pythonfrom transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig import torch bnb_config = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_use_double_quant=True, bnb_4bit_quant_type="nf4", bnb_4bit_compute_dtype=torch.bfloat16 ) model_id = "meta-llama/Llama-3.2-3B" model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=bnb_config, device_map="auto") tokenizer = AutoTokenizer.from_pretrained(model_id)
Step 4 — Configure LoRA Adapters
pythonfrom peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training model = prepare_model_for_kbit_training(model) lora_config = LoraConfig( r=16, # rank — higher = more capacity lora_alpha=32, # scaling factor target_modules=["q_proj", "v_proj"], lora_dropout=0.05, bias="none", task_type="CAUSAL_LM" ) model = get_peft_model(model, lora_config) model.print_trainable_parameters() # trainable params: 4,194,304 || all params: 3,216,793,600 || trainable%: 0.13%
Step 5 — Train with SFTTrainer
pythonfrom trl import SFTTrainer from transformers import TrainingArguments args = TrainingArguments( output_dir="./fine-tuned-model", num_train_epochs=3, per_device_train_batch_size=4, gradient_accumulation_steps=4, learning_rate=2e-4, fp16=True, logging_steps=10, save_strategy="epoch", warmup_ratio=0.03, lr_scheduler_type="cosine", ) trainer = SFTTrainer( model=model, train_dataset=dataset, dataset_text_field="text", tokenizer=tokenizer, args=args, max_seq_length=512, ) trainer.train()
Step 6 — Save and Merge Adapters
python# Save LoRA adapters only trainer.model.save_pretrained("./lora-adapters") # Optional: merge adapters back into base model from peft import PeftModel merged = PeftModel.from_pretrained(model, "./lora-adapters") merged = merged.merge_and_unload() merged.save_pretrained("./merged-model")
Step 7 — Evaluate
pythonfrom transformers import pipeline pipe = pipeline("text-generation", model="./merged-model", tokenizer=tokenizer) output = pipe("### Instruction: Summarize. ### Input: AI is transforming... ### Response: ") print(output[0]["generated_text"])
Key Hyperparameters to Tune
| Parameter | Typical Range | Effect |
|---|---|---|
text | 8–64 | Higher = more capacity, more memory |
text | 1e-4 – 5e-4 | Too high = instability |
text | 1–5 | More = overfitting risk |
text | 512–4096 | Longer = more VRAM |
Best Practices
- Start with QLoRA — best cost/quality tradeoff for most tasks
- Use instruction-tuning format (system/instruction/response) for chat tasks
- Keep datasets small but high quality (1K–50K examples is often enough)
- Always evaluate on a held-out set to detect overfitting
- Use or TensorBoard to monitor training losstext
wandb
Tip: For production, fine-tune on your task-specific data, then use RAG on top for dynamic knowledge — you get the best of both worlds.