Skip to content

LLM Fine-tuning

Key idea:

Fine-tuning — an additional training step after pretrain, where the model is adapted to a specific task / domain on your data. Full FT changes all weights (requires a lot of GPU). LoRA / QLoRA — only low-rank adapters (1% of parameters → 10x faster, works on 1 GPU). Use cases: JSON structured output, domain tone, coding style, non-English languages. OpenAI, Together.ai, Hugging Face AutoTrain — easy APIs.

Below: details, example, related terms, FAQ.

Try it now — free →

Details

  • Full fine-tuning: all weights updated. 70B needs 8× H100
  • LoRA: low-rank decomposition, 0.1-1% params. Runs on 1× A100 for 7B model
  • QLoRA: LoRA + 4-bit quantisation → 7B fits in 24GB RAM (consumer GPU)
  • Data format: JSONL { messages: [{role, content}] } OpenAI, Alpaca format, etc
  • Training time: 1-10 hours for 1-10k examples on LoRA

Example

# OpenAI fine-tuning (JSONL format)
# train.jsonl:
{"messages": [{"role":"user","content":"What is TCP?"},{"role":"assistant","content":"TCP — reliable stream protocol..."}]}

# Upload
openai api files.create -p fine-tune train.jsonl

# Create fine-tune
openai api fine_tuning.jobs.create -t file-abc -m gpt-4o-mini

# Monitor
openai api fine_tuning.jobs.retrieve ft-xyz

Related Terms

Learn more

Frequently Asked Questions

Do I need fine-tune?

If prompt engineering + RAG do not reach desired quality, style or structured output — yes. Otherwise optimise the prompt.

Dataset size?

Minimum 100 examples for noticeable effect. 1k-10k — recommended range. More — diminishing returns.

Cost?

OpenAI gpt-4o-mini FT: $3 per 1M training tokens. Together.ai Llama 70B: ~$10. Full FT 70B in cloud — $500+.