Research ยท Text Embeddings ยท OpenAI

Do You Really Need
All Those Dimensions?

A deep dive into Matryoshka embeddings, dimension reduction economics, and why pizza statistically belongs closer to video games than a controller does.

LR
Larissa Rodrigues de Oliveira
AI Engineer & Researcher
NLPVector SearchRAG
Scroll
"Do we actually need all the dimensions we're using in our embeddings?"
๐ŸŽฏ Spoiler: we probably don't โ€” and the answer reveals more about how embeddings work than expected.
The Two Models

A generation jump in
semantic representation

OpenAI's third-generation models introduce dynamic dimensionality via Matryoshka learning โ€” a complete departure from the rigid fixed-size vectors of the past.

Model Max Dimensions MTEB English MIRACL Multilingual Price / 1M tokens
text-embedding-3-small 3rd gen 1,536
62.3%
44.0%
$0.02
text-embedding-3-large 3rd gen 3,072
64.6%
54.9%
$0.13
๐ŸŒธ
text-embedding-3-small
The Efficient One
Low Cost
Max dimensions1,536
API cost$0.02 / 1M tok
MIRACL multilingual44.0%
Best forEnglish ยท High volume
๐Ÿ’™
text-embedding-3-large
The Precision One
Frontier
Max dimensions3,072
API cost$0.13 / 1M tok
MIRACL multilingual54.9% ๐Ÿš€
Best forPT ยท ES ยท FR ยท Deep search
Matryoshka Architecture

The nesting doll
that changes everything

Matryoshka Representation Learning (MRL) makes dimension-flexible embeddings possible. Understanding it reshapes how you think about vector storage.

MRL forces information priority: critical concepts live in the earliest dimensions
64
Core identityBroadest category
โ†’
256
+ ContextDomain clarity
โ†’
512
+ NuanceSentiment ยท tone
โ†’
1024
+ SyntaxComplex deps
โ†’
3072
+ Micro-detailJargon ยท subtlety
โœ‚๏ธ
Safe truncationSlice the vector from the end โ€” the most important information is always front-loaded and stays valid.
๐Ÿ‹๏ธ
Multi-loss trainingOptimizes simultaneously at dimensions 64, 256, 512, 1024, and 3072 โ€” each prefix is independently meaningful.
๐Ÿ“
Infrastructure flexibilityConstrained to 1,024 dims? Use the frontier model, truncate to fit โ€” no need to settle for a natively smaller model.
The Analogies

Two insights that will
change how you think

๐Ÿ“ธ

Dimensions = Image Resolution

A 512ร—512 and a 1024ร—1024 photo both show you a leather boot โ€” one just has more detail. But do you need 10,000px to recognize it's a boot? At some point, extra resolution stops adding meaningful information. The first few hundred dimensions capture broad ontological category โ€” enough for most retrieval tasks.

๐Ÿ’ก Extra dimensions are high-frequency modifiers: useful for microscopic nuance, not for core classification.
๐Ÿ“ท

Model Quality = Camera Sensor

Two cameras both set to 512ร—512. One is a basic smartphone; the other a professional DSLR. Same resolution โ€” but the DSLR captures better color fidelity, texture, and depth. The Large model is that DSLR: more internal parameters, deeper attention heads, packing 768 dimensions with higher-quality representations than Small does.

๐Ÿ’ก Output dimensionality is just a container. What's poured in depends entirely on architectural depth.
Model Showdown

Same query. Real rankings.
See the difference.

Query: "Video game" โ€” results ordered by cosine similarity (most similar first), extracted directly from the experiment. Two stories worth telling.

โš ๏ธ Tested in a product-catalog / e-commerce context (short product names and categories). Results in other domains may differ โ€” always validate on your own data before migrating.
๐Ÿคฏ The surprising one

Large at 384 dim vs. Small at 1,536 dim

ยผ of the max storage. Better results. The model's internal architecture matters more than dimension count.

๐ŸŒธ Small ยท 1,536 dim Full capacity
  1. 1๐Ÿช‘ Gaming Chair
  2. 2Mouse
  3. 3๐ŸŽฎ PlayStation 5
  4. 4TV
  5. 5Switch
Gaming Chair tops the list โ€” arguably not the most semantically central gaming concept.
VS
๐Ÿ’™ Large ยท 384 dim ยผ the storage
  1. 1๐ŸŽฎ PlayStation 5
  2. 2๐Ÿช‘ Gaming Chair
  3. 3Switch
  4. 4Mouse
  5. 5๐Ÿ• Pizza
PlayStation 5 lands at #1 โ€” at ยผ of Small's storage. The DSLR wins even at lower resolution.
๐Ÿ’ก The practical one

Large at 768 dim vs. Large at 1,536 dim

Half the storage. Identical ranking. This is Matryoshka learning doing exactly what it promises.

๐Ÿ’™ Large ยท 1,536 dim Full cost
  1. 1๐ŸŽฎ PlayStation 5
  2. 2๐Ÿช‘ Gaming Chair
  3. 3Mouse
  4. 4Switch
  5. 5๐Ÿ• Pizza
๐Ÿ’พ 1,536 ร— 4 bytes ร— N vectors
=
๐Ÿ’™ Large ยท 768 dim ยฝ the storage โœฆ
  1. 1๐ŸŽฎ PlayStation 5
  2. 2๐Ÿช‘ Gaming Chair
  3. 3Mouse
  4. 4Switch
  5. 5๐Ÿ• Pizza
๐Ÿ’พ 768 ร— 4 bytes ร— N vectors
โœ“ Exact same top-5 order. Matryoshka front-loading means the first 768 dimensions already carry the full semantic weight needed for this type of task.
Interactive

What changes as you
tune the Large model's dimensions?

๐Ÿ“Œ Applies to: text-embedding-3-large only
Dimensions to request
768
dimensions
3847681,5363,072
โœจ 768 dim โ€” the production sweet spot. Identical semantic rankings to Large-1536 at half the storage cost. Recommended for most product-search and semantic matching pipelines.
Key Findings

Three results that
actually surprised me

Query: "Video game" tested against 11 candidate terms, across both models and four dimension sizes, using Euclidean distance and Cosine similarity.

1
Large at 768 dimensions = Large at 1,536 dimensions

Half the storage. Half the vector length. Identical semantic rankings in this context. For product-catalog matching tasks, you can cut your vector database footprint in half with zero measurable loss in retrieval quality.

โš ๏ธ Tested on product names in an e-commerce context. More nuanced tasks โ€” like matching paragraphs of dense legal or academic text โ€” may reveal differences at higher dimensions.

Storage Win
2
Large at 384 outperformed Small at 1,536 dimensions

One-eighth of the Large model's max storage โ€” and it still produces better rankings. This confirms the camera-sensor analogy: the model's internal architecture dominates over raw dimension count. The Small model, fully unrolled, can't match Large's compressed precision in this domain.

โš ๏ธ This finding applies to short, categorical terms (product names, labels). For longer, denser inputs the gap may narrow or shift.

Architecture Wins
3
๐Ÿ• "Pizza" ranked closer to "Video game" than "Controller" did

A controller is literally a video game accessory. Pizza is food. Yet the model places pizza closer in the vector space โ€” and once you understand how embeddings actually work, this makes complete sense.

Mind-bending
Real Data

How the model actually
ranked the results

Query: "Video game" ยท Model: text-embedding-3-large ยท 768 dimensions ยท Cosine distance (lower = more similar).

๐Ÿ“ Cosine distance Lower value = more contextually related to "Video game" ยท Values read directly from the experiment
1
PlayStation 5
0.477
2
Gaming Chair
0.516
3
Mouse
0.572
4
Switch
0.573
5
๐Ÿ• Pizza beats Controller!
0.580
6
Headphone
0.592
7
Controller โ†“ behind pizza
0.596
8
Keyboard
0.611
9
Running
0.617
10
Monitor
0.617
11
TV
0.648
๐Ÿ‡ง๐Ÿ‡ท These terms were originally tested in Portuguese
Gaming Chairโ†’Cadeira Gamer
Controllerโ†’Controle
Keyboardโ†’Teclado
Headphoneโ†’Fone de Ouvido
Runningโ†’Corrida
TVโ†’Televisรฃo
PlayStation 5, Switch, Mouse, Monitor, Pizzaโ†’same in Portuguese ๐Ÿ™‚
The Pizza Paradox

Why contextual co-occurrence
beats taxonomy

๐Ÿ•

Embeddings don't measure synonyms โ€” they measure context

Ontologically, pizza and video games share zero overlap. Yet in the statistical geography of human language โ€” across billions of forum posts, stories, and conversations โ€” they're neighbors.

Semantic Similarity
What it means to be alike

Direct synonyms or near-identical meaning. Words that can substitute for each other without changing the core meaning.

"automobile" โ‰ˆ "car"
Semantic Relatedness
What it means to co-exist

Linked by function, culture, or co-occurrence โ€” not necessarily alike in meaning, but appearing in the same situations.

"pizza" + "video game" = leisure
๐Ÿ”€ Why "Controller" drifts: the polysemy problem

The word "controller" appears across vastly different contexts in billions of training examples. Its vector is pulled in many directions simultaneously โ€” diluting its gravitational pull toward gaming:

๐ŸŽฎ Game controller ๐Ÿ’ฐ Financial controller โš™๏ธ Micro-controller ๐Ÿง  Emotional control โœˆ๏ธ Air traffic controller ๐Ÿ“ฆ Inventory control
The Economics

The real cost isn't
the API call

API inference costs are often negligible at enterprise scale. The compounding cost lives in storage, RAM, and query latency โ€” and that's where dimension reduction pays off.

๐Ÿ’ธ
6.5ร—
cost difference between Small ($0.02) and Large ($0.13) per 1M tokens
๐Ÿ’พ
87.5%
storage reduction when truncating Large from 3,072 โ†’ 384 dimensions
โšก
2ร—
query speed gain from 3,072 โ†’ 1,536 dims (dot product scales linearly)
๐ŸŒ
+24.8%
multilingual MIRACL improvement of Large vs. Small โ€” a paradigm shift for non-English corpora
Storage overhead per 1M vectors (relative to max)
Large โ€” 3,072 dim (maximum)100% baseline
Large โ€” 1,536 dim50%
Large โ€” 768 dim โœฆ Sweet spot25%
Large โ€” 384 dim12.5%
Engineering tip: "Funnel retrieval" is emerging as the gold standard โ€” store truncated vectors (768 or 384 dim) for fast approximate-nearest-neighbor broad search, then re-rank the top-K using full 3,072-dim vectors loaded from disk. Best of both worlds.
The Key Takeaway

The question isn't just
"which model?"

The real question is: which model โ€” at what dimension count โ€” for your specific use case?
๐ŸŽฏ
For Portuguese, Spanish, French
Always use Large. The 24.8% multilingual MIRACL gap is a paradigm shift, not an incremental improvement. Small simply doesn't map non-English semantic spaces with enough fidelity.
โšก
For production at scale
Large at 768 dimensions is the sweet spot for product-search contexts: better multilingual performance, half the storage of Large-1536, and no measurable quality loss for standard retrieval tasks.
๐Ÿ”ฌ
Before committing
Always A/B test on your actual domain. These findings were observed in a product-search context. Results for legal, academic, or conversational text may differ โ€” so validate before migrating.