Vector embedding — dense numeric representation (an array of floats) of any object: text, image, audio. Typically 512-3072 dimensions. Example: "dog" → [0.23, -0.15, 0.67, ...]. Similar objects → close vectors (cosine similarity > 0.8). Used in semantic search, clustering, RAG, image similarity. Models: OpenAI text-embedding-3 (3072 dim), Cohere embed-v3, jina-embeddings-v3 (open), bge-m3 (multilingual).
Below: details, example, related terms, FAQ.
# OpenAI Embedding API
import { OpenAI } from 'openai';
const openai = new OpenAI();
const response = await openai.embeddings.create({
model: 'text-embedding-3-large',
input: 'TCP vs UDP protocols'
});
console.log(response.data[0].embedding); // [0.01, -0.23, ..., 0.05] — 3072 floatsCosine (normalised vectors) — dominant for text/NLP. Euclidean — for images/raw features. Dot product — if vectors are pre-normalised.
3072 dim ≫ 512 dim in recall on complex queries, but 6x storage + compute. Balance by dataset size + accuracy requirement.
Embedding search — fast but approximate. Rerank (Cohere Rerank, Voyage rerank) — slower but better on top-5. Pipeline: retrieve 50 → rerank top 10.