Where Keyword Search Hits a Wall
We have relied on classic search engines for years, and most of the time they do the job. The mechanism is actually simple: you type a word, and the system returns the documents that contain it. This is keyword search. Behind the scenes there is usually an 'inverted index' — much like the index at the back of a book, it lists in advance which documents each word appears in. That is why it answers in the blink of an eye.
But this approach has a blind spot: it knows words, not meaning. Search for 'automobile' and you will miss a document that only ever says 'car.' Even though the two describe the same thing, to the system they are two entirely different strings of characters. Modern engines patch this partly with synonym lists and a few tricks, but the core logic never changes: it looks for letter-by-letter matches.
In law, that blind spot stings far more. A user might think in terms of 'the cap placed on rent increases,' while the relevant court ruling expresses the same idea as 'the maximum rate to be applied in determining the rental fee.' The same legal concept sits in two different sentences without sharing a single word. Keyword search can never bring these texts together, because there is no matching word — only a matching meaning.
Semantic search was born precisely to close this gap. Its goal is to turn the question 'which words do they share?' into the question 'are these two texts talking about the same thing?' To pull that off, we first have to give the computer some representation of meaning — even if it is made of numbers. The rest of this article is about how that representation is built and how it is searched.
Embeddings: Turning Meaning into Numbers
A computer does not understand text; it processes numbers. So the first step of semantic search is to convert a piece of text into a sequence of numbers that carries its meaning. We call this sequence an embedding. In practice an embedding is a long list of, say, 768 or 1,536 numbers, where each number is like a coordinate pointing to one dimension of the text's meaning.
The easiest way to picture this is as a map. You locate a city with two numbers, latitude and longitude. Embeddings do the same thing, only they use hundreds of 'dimensions' instead of two. In this vast space, every word, sentence, or document settles at a point. The magic is this: the neural network that produces these points has been trained to place semantically similar texts close together and unrelated texts far apart.
That is why 'dog' and 'cat' end up as near neighbors in this space, while 'dog' and 'accounting' sit kilometers apart. There is even a famous example: take the vector for 'king,' subtract the vector for 'man,' add the vector for 'woman,' and the point you arrive at lands surprisingly close to 'queen.' This is a simplified illustration and it does not always work out this cleanly, but it captures the idea beautifully: meaning takes on a geometric shape in this numerical space, where closeness means similarity and direction means relationship.
To measure how alike two embeddings are, we usually use 'cosine similarity.' The name sounds intimidating, but the idea is plain: imagine two arrows. If they point the same way they are very similar, if they form a right angle they are unrelated, and if they point opposite ways they are opposites. Cosine similarity looks only at the direction of the arrows, not their length — so it cares less about how long a text is and more about what it is about. Semantic search takes the arrow of your query and surfaces the documents whose arrows point most nearly the same way.
Searching Among Millions of Vectors: ANN and HNSW
Suppose we have millions of document vectors, each representing a meaning, and we want to find the ones nearest to a user's query vector. The most honest method is to compare the query against every document one by one; this is called 'brute-force' or exact search. For a few thousand documents it is no problem. But scanning millions of documents from scratch on every query quickly becomes unaffordable in terms of speed and cost.
The solution is to settle for a 'good enough' answer instead of a perfect one. ANN — Approximate Nearest Neighbor — algorithms do exactly that: they find the nearest neighbors not with mathematical certainty but with very high probability, and at incomparable speed. In practice the tiny loss in accuracy often goes unnoticed, while the speed gain can run into the hundreds. This trade-off is what makes semantic search practical in the real world.
One of the most widely used ANN methods today is HNSW (Hierarchical Navigable Small World). The name is long, but the intuition is very familiar — think of air travel. To reach a remote village, you first fly from a major airport to another continent, then transfer to a regional airport, and finally take local roads to the village. HNSW builds the vector space as exactly this kind of layered road network.
At the top layer there are only a few 'long-distance' links; from here the algorithm leaps quickly toward the general region of the target. Then, descending to lower layers, it shrinks its steps and tightens its circle around the destination. Instead of inspecting all of the millions of points, it reaches the nearest neighbors in just a few hundred smart steps. Thanks to this elegant idea, a query can be answered in milliseconds even over an enormous dataset.
So What Does a Vector Database Actually Do?
So far we have discussed embeddings and how to search them quickly. A vector database is the system that brings all of these pieces together, packaged to run in production. In the shortest possible terms: it is a specialized database that stores and indexes millions of embeddings and reliably answers the query 'bring me the N documents nearest to this vector.'
You tell a traditional database 'give me products priced under 100 lira,' and the answer is exact and crisp. You tell a vector database 'give me the records most similar in meaning to this text.' This fundamental difference demands a different foundation: managing ANN indexes like HNSW, laying out vectors so they fit in memory, and speeding up similarity computations without cutting corners are the real areas of expertise for these systems.
A good vector database does not only search by similarity; it also does 'metadata filtering.' You can attach labels to each vector: document type, date, source, in-force status. This makes a query like 'the records closest in meaning to this question, but only decisions from the last five years that are still in force' possible. The ability to combine semantic closeness with hard rules in a single query is what makes these systems useful in real applications.
In this space you will find dedicated products like Pinecone, Weaviate, Milvus, and Qdrant, alongside extensions such as pgvector added to PostgreSQL, and managed services from the major cloud providers. They all make the same promise: to store meaning and make it searchable in a scalable, low-latency way. Which one you pick depends on data size, latency targets, and your existing infrastructure — there is no single magic right answer.
Re-ranking: The Final Polish on Fast Search
ANN search is fast, but that speed comes at a price. Embedding-based search turns the query and the documents into vectors separately and then compares them; in other words it has 'pre-summarized' each document with no knowledge of the query. This lets it scan a broad pool of candidates very quickly, but it sometimes misses fine distinctions. The first retrieval pass is good at finding the 20-50 most relevant candidates, yet not always good at putting them in the right order.
This is exactly where re-ranking comes in. A re-ranker is usually a 'cross-encoder' model: instead of weighing query and document separately, it places them side by side and reads them together, judging their true relevance far more deeply. Seeing the query and the document at the same time lets the model answer 'does this sentence actually address this question?' with much greater accuracy. But that deeper look is expensive to compute.
The solution is a two-stage architecture: first the fast-but-coarse vector search distills a small candidate list (say, 50 documents) out of millions; then the expensive-but-precise re-ranker weighs only those 50 and picks the best 5. Speed over the broad pool, precision over the narrow one — a practical balance that combines the best of both worlds.
The real value of re-ranking is that it weeds out candidates that are irrelevant yet superficially similar. In AI-powered systems especially, keeping the context handed to the model clean is worth its weight in gold, because the model takes everything placed in front of it seriously, and one wrong document can become one wrong answer. Getting the first few results in the right order is often far more valuable than retrieving more results.
Where Does All This Pay Off? Real Use Cases
Semantic search and vector databases are not an abstract engineering curiosity; they are the quiet engine behind many products you use today. The most visible example is recommendation systems: when a music service says 'if you liked this, you'll like that,' it is often relying on how close the songs sit in embedding space. Similar products, similar articles, similar profiles — all draw on the same idea of 'nearness.'
The second large area is enterprise knowledge search. Asking 'how does a customer refund work?' across a company's thousands of pages of documentation, support tickets, or internal wiki makes it possible to reach the right document without even knowing the right words. The same logic works for image and audio search: you can turn a photo or a melody into a vector and find similar ones, because the idea of an embedding is not limited to text.
Perhaps the fastest-growing use is RAG with AI. RAG — Retrieval-Augmented Generation — is the method of 'fetching' relevant documents and placing them in front of a language model before it answers. At the heart of that fetch step lies vector search: semantic search tells the model what to read. This way the model speaks not from memory but from current, real sources placed in front of it, and the vector database effectively acts as the long-term memory of modern AI assistants.
All these examples share one common denominator: the user can express their intent without having to know exactly which word to search for. Semantic search shifts the burden of 'finding the right keyword' from the human to the machine. This seemingly small change makes an enormous difference in user experience.
The İçtiHub Example: Finding Case Law by Meaning, Not by Words
Turkish legal research is a domain that shows very clearly why semantic search matters. A lawyer usually arrives with a concrete situation: 'whether the renovation cited as a just cause for evicting a tenant was actually carried out.' The exact words in that sentence very likely do not appear in the precedent you are looking for, because every decision narrates its facts in its own language and its own patterns. Keyword search frequently comes back empty here.
The approach at the core of İçtiHub targets exactly this problem. Millions of decisions and legislative texts are first split into meaningful chunks, then each chunk is converted into an embedding and stored in a vector index. The user's query is carried into the same space, and the system now asks not 'which words do they share?' but 'which decisions are closest in meaning to this situation?' In this way a perfectly on-point precedent that shares not a single word can surface.
But semantic closeness alone is not enough in law. When you need an exact article number or a precise term verbatim, the precision of keyword search is still valuable. So a hybrid approach that uses semantic search and classic search together comes into play, followed by a re-ranking layer that orders the retrieved candidates as accurately as possible. Speed comes from the broad pool, and precision from this fine filtering.
Finally, metadata filtering is almost mandatory in law: presenting a repealed article or an overruled precedent as if it were the right answer is unacceptable. The vector database's ability to combine semantic closeness with hard filters like date of entry into force or court type within the same query is what turns semantic search, for law, from merely impressive into genuinely trustworthy.
To Sum Up: The New Grammar of Search
Semantic search is quietly redefining the relationship between a computer and information. For decades, search was a game of 'whoever knows the right word wins.' Embeddings and vector databases broke that rule: now it is enough to express your intent, and finding the right word is the machine's job. You can sum up the mental model with this chain: text becomes an embedding, embeddings are stored in a vector database, ANN indexes (such as HNSW) search them quickly, and re-ranking adds the final touch.
It is worth remembering that this technology is not a magic wand. An embedding model is only as good as the data it was trained on; agglutinative languages like Turkish and domain-specific ones like law carry nuances that general-purpose models miss. Semantic search can also occasionally be fooled by superficial similarity. That is precisely why layers like hybrid search, re-ranking, and metadata filtering are not decoration but the very things that make the system trustworthy.
The good news for a curious reader is that none of these concepts is out of reach. If you think of an embedding as a map of meaning, the vector database as a fast atlas of that map, ANN as a clever shortcut, and re-ranking as one final review, you have seen the skeleton of even the most complex AI search systems. At EcoFluxion we build İçtiHub with exactly these building blocks, because what a lawyer is searching for is rarely a word and almost always a meaning.