Fine-tuning — an additional training step after pretrain, where the model is adapted to a specific task / domain on your data. Full FT changes all weights (requires a lot of GPU). LoRA / QLoRA — only low-rank adapters (1% of parameters → 10x faster, works on 1 GPU). Use cases: JSON structured output, domain tone, coding style, non-English languages. OpenAI, Together.ai, Hugging Face AutoTrain — easy APIs.
Below: details, example, related terms, FAQ.
# OpenAI fine-tuning (JSONL format)
# train.jsonl:
{"messages": [{"role":"user","content":"What is TCP?"},{"role":"assistant","content":"TCP — reliable stream protocol..."}]}
# Upload
openai api files.create -p fine-tune train.jsonl
# Create fine-tune
openai api fine_tuning.jobs.create -t file-abc -m gpt-4o-mini
# Monitor
openai api fine_tuning.jobs.retrieve ft-xyzIf prompt engineering + RAG do not reach desired quality, style or structured output — yes. Otherwise optimise the prompt.
Minimum 100 examples for noticeable effect. 1k-10k — recommended range. More — diminishing returns.
OpenAI gpt-4o-mini FT: $3 per 1M training tokens. Together.ai Llama 70B: ~$10. Full FT 70B in cloud — $500+.