Skip to content

Vector Embedding

Key idea:

Vector embedding — dense numeric representation (an array of floats) of any object: text, image, audio. Typically 512-3072 dimensions. Example: "dog" → [0.23, -0.15, 0.67, ...]. Similar objects → close vectors (cosine similarity > 0.8). Used in semantic search, clustering, RAG, image similarity. Models: OpenAI text-embedding-3 (3072 dim), Cohere embed-v3, jina-embeddings-v3 (open), bge-m3 (multilingual).

Below: details, example, related terms, FAQ.

Try it now — free →

Details

  • Properties: dense (all dimensions non-zero), fixed length per model
  • Distance metrics: cosine (normalised), euclidean, dot product
  • Cost: $0.02-0.13 per 1M tokens for embedding models
  • Multilingual: bge-m3, multilingual-e5, jina-v3 — work for 100+ languages
  • Fine-tuning: possible for domain-specific search (medical, legal)

Example

# OpenAI Embedding API
import { OpenAI } from 'openai';
const openai = new OpenAI();
const response = await openai.embeddings.create({
  model: 'text-embedding-3-large',
  input: 'TCP vs UDP protocols'
});
console.log(response.data[0].embedding); // [0.01, -0.23, ..., 0.05] — 3072 floats

Related Terms

Learn more

Frequently Asked Questions

Cosine vs euclidean?

Cosine (normalised vectors) — dominant for text/NLP. Euclidean — for images/raw features. Dot product — if vectors are pre-normalised.

Does size matter?

3072 dim ≫ 512 dim in recall on complex queries, but 6x storage + compute. Balance by dataset size + accuracy requirement.

Do I need rerank?

Embedding search — fast but approximate. Rerank (Cohere Rerank, Voyage rerank) — slower but better on top-5. Pipeline: retrieve 50 → rerank top 10.