Tokens, Embeddings and Vectors: How AI Turns Language Into Numbers

A computer processes numbers, not letters. So how does a word, a sentence, even a meaning turn into a number? From tokenization to embeddings, from vector space to semantic similarity, we explain how AI grasps language with concrete analogies and no math required.

Why We Need a Translation in the First Place

When we say AI "understands language," we are telling a small white lie. A language model does not see letters, words or sentences the way we do. In its world, everything is a number. Just as a calculator works only with digits, the one and only thing flowing through a modern AI system is vast strings of numbers. So there is a basic problem to solve: how do we translate text written by a human into numbers a machine can process?

This translation may sound mechanical, but the real difficulty is not in the mechanics. Mapping a word to some arbitrary number is easy; you give every word in the dictionary an ID and you are done. But that throws meaning straight in the bin. If you assign 4172 to "king" and 9038 to "queen," nothing in those two numbers carries the faintest trace of how closely related the two concepts are. The real challenge is to do the translation so that meaning travels along with the text.

In this article we will build the three cornerstones of that translation, one at a time: first tokenization, which splits text into pieces; then embeddings, which load meaning onto each piece; and finally the vector space in which those meanings live. The goal is to make this quiet mechanism at the heart of modern AI understandable without requiring any background in mathematics.

Tokenization: Cutting Text Into Bite-Size Pieces

The first step of the translation is breaking text into manageable pieces. These pieces are called tokens. A token is often not a whole word; sometimes it is part of a word, sometimes a suffix, sometimes just a few letters. The system feeding the model splits incoming text into these tokens and assigns each one its index in the model's vocabulary. Perhaps the simplest analogy is food: you cannot swallow a sentence whole, you first chew it into bites. Tokenization is the step where the model cuts text into digestible bites.

So why use these odd sub-pieces instead of whole words? Because fitting every word in the world into a single dictionary is impossible. New words, typos, names and technical terms can be invented endlessly. Instead, models use a fixed-size vocabulary made of frequently occurring pieces. A long, rare word like "unbelievable" might be split into familiar parts such as "un," "believ" and "able." This way the model can reconstruct even a word it has never seen out of pieces it already knows, much as you can sound out an unfamiliar word by breaking it into syllables.

This splitting looks innocent, but its consequences run deep. How many tokens a text breaks into directly determines cost, speed and how much context the model can hold at once, because models measure text in tokens. And token boundaries do not fall with equal efficiency in every language. Many tokenizers built for English shred words in agglutinative languages like Turkish at points that have nothing to do with meaning, which means burning more tokens to say the same thing. Contrary to what most people assume, tokenization is not a technical footnote; it is a foundational design decision that quietly shapes product quality.

Embedding: Loading Meaning Onto a Token

We turned tokens into index numbers, but as we just saw those numbers are meaningless; "4172" tells you nothing about "king." This is exactly where embeddings come in. An embedding represents each token not as a single number but as a list of numbers, that is, a vector. The word "king" becomes not 4172 alone but a long list like [0.21, -0.83, 0.05, ...] made of hundreds, even thousands, of numbers. Each number in this list carries a tiny clue about one feature of the word.

The key point is that these numbers are not assigned by hand. The model learns the vectors itself while reading billions of sentences. By watching which words a word sits next to and which contexts it shows up in, it gives it a list of numbers such that words appearing in similar contexts end up with similar lists. There is a beloved saying in linguistics: "You shall know a word by the company it keeps." Embeddings turn exactly this intuition into numbers; a word's meaning is distilled from the words it appears alongside.

Think of it with a simple analogy. Suppose we score every word along a few axes: how "alive" it is, how "royal" it is, how "masculine versus feminine" it is. "King" scores high on royal and high on masculine; "queen" scores high on royal and high on feminine; "banana" scores low on all three. In real embeddings these axes number in the hundreds and none of them carry labels as tidy as ours; but the idea is the same. Each word becomes a list of coordinates describing the features that make up its meaning.

Vector Space: A Map of Meaning

Since we have turned every word into a list of numbers, that is, into coordinates, we can now picture every word as a point, just like a city pinned on a map by latitude and longitude. With two numbers we can build a two-dimensional map, with three a three-dimensional space. Because embeddings contain hundreds of numbers, the place where words live is a space of hundreds of dimensions. We cannot visualize it, but the logic is exactly the same: each word is a point with its own position in this vast space. This space is called the vector space.

The beauty of this map is that the positions on it are not random. When embeddings are learned well, words close in meaning land close together. "King," "queen," "throne" and "palace" gather in one neighborhood, while "banana," "apple" and "strawberry" cluster in a completely different one. Meaning itself has turned into physical proximity. Just as cities near each other on a geographic map often share a similar climate, points near each other on this map of meaning carry similar meanings.

The most surprising part is that even directions in this space can carry meaning. The classic example: take the vector for "king," subtract the "male" direction and add the "female" direction, and the point you arrive at falls strikingly close to "queen." In other words, a particular movement through the space roughly corresponds to "turning masculine into feminine." This tidy example does not work perfectly in every case, but it points to something real: the model encodes meaning not only in positions but also in the relationships between them. Meaning lives both in the map's points and in its directions.

Semantic Similarity: The Angle Between Two Meanings

Now that words have become points on a map, we can ask a very powerful question: how similar are two words? To measure this, we look at how much the two points "face the same direction." The most common method is cosine similarity. The name sounds intimidating, but the idea is simple: picture each vector as an arrow reaching from the center of the space out to its point. If two arrows point the same way, the meanings they represent are similar. The smaller the angle between the arrows, the greater the similarity.

Cosine similarity turns exactly this angle into a number. If two arrows point in precisely the same direction, the similarity is close to 1; if they are completely unrelated, standing at a right angle, it is near 0; if they point in opposite directions, it approaches -1. An important subtlety: cosine similarity considers only the direction of the arrows, not their length. So even if a concept has a "longer" vector because it appears very often, the similarity verdict is not affected; only the direction of the meaning speaks.

Let us see why this matters with an example. "Automobile" and "car" look completely different in spelling; they share not a single common run of letters. A classic text search cannot match them. But in vector space the two land almost on top of each other, because they keep showing up in the same contexts. Cosine similarity catches this closeness with a high score. This is the secret behind machines comparing meanings rather than words: similarity is now a measure of meanings, not of letters.

Sentences and Documents: From a Single Word to a Whole Text

So far we have spoken only of single words, but in the real world we deal with questions, sentences and pages of documents. The good news: the same idea scales. Modern models can compress not just words but an entire sentence or paragraph into a single vector, that is, a single point on the map of meaning. This vector is like a summary of the whole text's meaning. The question "How is an employment contract terminated?" and a multi-page piece of legislation both end up as points in the same space.

This compression is a subtler job than for words. The meaning of a sentence is not merely the sum of its words; their order, which word modifies which, and the surrounding context all come into play. "The dog bit the man" and "The man bit the dog" contain the same words but say entirely different things. So models that produce sentence embeddings account for the relationships between words and generate a single holistic representation. The result is again a vector, but now it carries the meaning of a sentence or a paragraph.

This ability opens the door to one of AI's most useful applications: semantic search. You turn a user's question into a vector, then, among the vectors of the thousands of documents you hold, you retrieve the ones facing the most similar direction, that is, those with the highest cosine similarity to the question. The user gets not the documents that happen to contain the exact words of the question, but the documents closest to its meaning. Documents that say the same thing in different words, which keyword matching misses entirely, are caught this way too.

What Does This Do in a Real Domain Like Law?

These concepts can seem abstract, but their value shows up when they touch ground in a concrete domain. Turkish law is exactly such a domain. What a lawyer is after is usually not a document that happens to contain a particular word, but rulings that deal with a particular legal situation. When a user searches for "termination of a lease for just cause," they want rulings on that same legal issue to surface even if those rulings never use that exact phrase. Semantic similarity makes this possible because it reaches past the words to grasp the essence of the matter.

İçtiHub, the legal-tech product we build at EcoFluxion, is founded on precisely this mechanism. Millions of rulings and thousands of pages of legislation are placed onto a map of meaning through embeddings. When a user asks a question, that question lands on the same map as a point, and the system retrieves the texts closest in meaning within seconds. Rulings that classic keyword search would miss, phrased in different terms but pointing to the same legal conclusion, become visible this way.

It is right here that Turkish's own peculiar challenges enter the picture. The system has to recognize that forms like "taşınmaz," "taşınmazın" and "taşınmazlardan" all belong to the same concept; tokenization has to suit Turkish structure; and embeddings have to work meaningfully with Turkish legal language. So the three cornerstones described in this article turn into concrete engineering decisions we wrestle with every day beneath İçtiHub. Without the right tokens, meaningful embeddings and a sound similarity measure, you simply cannot build reliable semantic search in law.

Wrapping Up: The Intuition Beneath the Numbers

Let us return to the start. AI does not understand language; it translates it into numbers and operates on those numbers. We have seen the three links of that translation. Tokenization splits text into digestible pieces. Embedding gives each piece a list of coordinates reflecting its meaning. And the vector space is a map of meaning in which those coordinates live; on this map, closeness means similarity, and cosine similarity lets us measure that closeness.

The power of this chain lies in turning meaning into geometry. Once meaning becomes a position on a map, a human question like "how similar are these two things?" turns into a machine-solvable one like "how close are these two points?" This is the quiet idea beneath modern AI's appearance of grasping language: placing meaning somewhere in space.

Once you hold this intuition, the workings of many AI tools around you light up. Why a recommender system can find things "similar to this," why an assistant catches what you mean even when you do not use the exact words, why a legal tool can surface rulings phrased differently but about the same issue; the same idea sits beneath all of them. Language is turned into numbers, meaning is placed on a map, and machines navigate that map to produce a sense of meaning that resembles our own.