Fine-tuning 2026: (1) Подготовьте 100-10k examples в JSONL, (2) Выберите platform — OpenAI (gpt-4o-mini FT $3/1M), Together.ai Llama 3 70B LoRA ($5-20), self-host через Axolotl/Unsloth, (3) Upload dataset + start job (1-10 часов), (4) Eval via test set, (5) Deploy — OpenAI создаёт endpoint, Together даёт API. Когда НЕ стоит: если RAG + prompt engineering решают задачу.
Ниже: пошаговая инструкция, рабочие примеры, типичные ошибки, FAQ.
openai fine_tuning.jobs.create -t file-X -m gpt-4o-mini| Сценарий | Конфиг |
|---|---|
| OpenAI JSONL format | {"messages": [
{"role": "system", "content": "You are a customer support bot for Enterno."},
{"role": "user", "content": "Where is my invoice?"},
{"role": "assistant", "content": "You can find invoices at /dashboard → Billing → History."}
]} |
| QLoRA локально (Unsloth) | from unsloth import FastLanguageModel
model, tokenizer = FastLanguageModel.from_pretrained('unsloth/llama-3-8b-bnb-4bit')
model = FastLanguageModel.get_peft_model(model, r=16, target_modules=['q_proj','k_proj','v_proj'])
trainer = SFTTrainer(model=model, train_dataset=ds, max_seq_length=2048)
trainer.train() |
| Together.ai CLI | $ together files upload train.jsonl
$ together fine-tuning create \
--training-file FILE_ID \
--model meta-llama/Meta-Llama-3.1-70B-Instruct-Reference \
--lora --lora-r 16 --lora-alpha 32 |
| Start inference после FT | # OpenAI
resp = client.chat.completions.create(
model='ft:gpt-4o-mini-2024:myorg::abc',
messages=[...]
) |
| Eval с Ragas | from ragas import evaluate
from ragas.metrics import answer_relevancy, faithfulness
results = evaluate(dataset, metrics=[answer_relevancy, faithfulness]) |
RAG: dynamic knowledge, easy update. FT: style, tone, format consistency. Часто комбинируются — FT для tone + RAG для facts.
OpenAI gpt-4o-mini FT: $3/1M training tokens. Together Llama 3 70B LoRA: ~$5-20 за run. Self-host: $0 cost если GPU есть.
Held-out test set (20%). Metrics зависят от task: exact match, BLEU, LLM-as-judge (GPT-4 rates outputs).
LoRA: 0.1-1% params updated, fast, cheap. Full FT: all params, лучшее качество но 10-100x cost. Для 95% use cases LoRA достаточно.