Why a Raw LLM Gets the Law Wrong
At their core, large language models predict the next token by probability. Because they learn patterns from training data, they produce text that is fluent, plausible and often correct. But fluency guarantees nothing about knowledge: the model has learned what a statutory provision should look like, not what it actually says. That distinction is harmless in casual conversation and decisive in law.
This is exactly where hallucination begins. A model can fabricate a non-existent article number, a wrong date, or a Court of Cassation ruling it was never given, and it will do so with total confidence. With Turkish legal texts the problem runs deeper still: legislation changes constantly, repealed provisions fall out of force, and several regulations on the same subject can contradict one another. The model's training data, meanwhile, is frozen at a fixed point and blind to every change made since.
A second fundamental limit is the inability to cite. A raw model cannot tell you which statute, article or decision a sentence came from, because no such link exists in its internal representation. For a lawyer that is a fatal gap: every claim in a brief has to rest on a basis. An answer that cannot be verified is more dangerous in law than no answer at all.
What Is Retrieval-Augmented Generation?
Retrieval-Augmented Generation (RAG) is an architecture that, instead of leaving a language model alone with its memory, hands it a set of current, verified sources before it answers. The idea is simple but powerful: rather than telling the model 'write what you know,' you tell it 'here are the relevant documents, answer using only these.' The model's creativity stays in language; the facts come from outside.
The flow has two broad stages. First, in retrieval, the user's question goes to a search system that surfaces the document chunks most relevant to it. Then, in generation, those chunks are passed to the model as context alongside the question, and the model builds its answer leaning only on that context. Its job is not to recall facts but to synthesize, correctly, the evidence placed in front of it.
For law this is an almost perfect fit. When legislation changes you do not retrain the model; you simply update the documents in the retrieval layer. Any answer can be traced back to a concrete source, because it was produced from that source to begin with. MevzuatBot, at the core of İçtiHub, runs on exactly this principle: the model supplies the fluency, while the retrieved Turkish legislation and case law supply the guarantee of accuracy.
In short, RAG turns a model that 'knows everything but can cite nothing' into an assistant that 'finds the right source and speaks from it.' What counts in law is not memorization but grounding, and RAG restores precisely that grounding to the system.
Chunking Legal Documents: The Quiet Heart of the System
The quality of a RAG system is largely set by the quality of the chunks you feed into the retrieval layer. The size at which you split a legal text, and the boundaries you split it on, directly decide whether the system can later find the right answer. So for us chunking is not an incidental pre-processing step but one of the most consequential design decisions in the whole architecture.
Cutting blindly on raw character count is a disaster in law; slicing an article down the middle severs the context that carries its meaning. Instead we follow the document's own structure: the hierarchy of statute, article, paragraph and clause gives us natural boundaries. In a court decision, the summary of facts, the reasoning and the ruling each serve a different function. We shape our chunks to respect these semantic units, so that every chunk stays meaningful on its own.
To each chunk we attach rich metadata: the source name, the article number, the date of entry into force, any repealed status, and the document type. This metadata lets us both filter during retrieval and emit a clean source citation at the end of an answer. The agglutinative grammar of Turkish and the long, deeply nested sentences of legal language make this step far more domain-specific than any general-purpose chunker could ever handle.
Embeddings and Vector Search: Searching by Meaning
Once the chunks are ready, we have to make them searchable. This is where embeddings come in: we turn each text chunk into a high-dimensional vector that represents its meaning numerically. Two semantically similar texts land at nearby points in that space. So a user asking about a 'rent increase cap' can reach the relevant article even if those exact words never appear in it.
This is a fundamental leap beyond traditional keyword search. Classic search finds only literally matching words, yet in legal language a single concept can be phrased in dozens of ways. Vector search matches concepts, not words. The query passes through the same embedding model, and the system retrieves the document vectors nearest to the query vector, returning the most semantically relevant chunks.
In practice, semantic search on its own is not always enough. In law you sometimes need an exact article number or a specific term verbatim. So we take a hybrid route: we combine semantic vector search with classic keyword search, capturing conceptual proximity and exact matches in the same pass.
All of this infrastructure sits on the Vertex AI and Gemini ecosystem that powers İçtiHub. Embedding generation, vector storage and scalable search are tuned to serve a high volume of production queries at low latency, because for the user a correct answer that arrives quickly matters as much as a correct answer at all.
Re-ranking: Refining the First Retrieval
Vector search is fast and pulls a broad candidate set in seconds, but its precision is not always perfect. The first pass is good at finding, say, the twenty most relevant candidates out of tens of thousands of chunks, yet it can be weak at putting those twenty in the right order. Order matters in law: handing the model the most relevant article first directly shapes the quality of the answer.
So we use a two-stage strategy. In the first stage, vector search produces a broad but coarse candidate list. In the second, a re-ranking model weighs each candidate against the question more deeply and re-orders them by true relevance. This model is more expensive, but because it runs only on a small candidate set, its cost stays well in hand.
The real value of re-ranking in law is that it filters noise out before it reaches the model. The less context you give the model, yet the more on-target it is, the less it drifts and the less it hallucinates. Dropping an irrelevant article from the context is often as valuable as adding a correct one, because the model tends to take everything in front of it seriously.
Citations and Grounding: Tying the Answer to Reality
For a lawyer, the value of a RAG system lies not just in giving the right answer but in making that answer verifiable. So grounding is, for us, not an optional feature but the system's basic contract. The model must tie every significant claim it makes to the document chunks it was given, and show plainly which source it relied on.
We enforce this with strict instructions that confine the model to the retrieved context, and with the metadata each chunk carries. When an answer is generated, the system appends a citation for the article or decision it was derived from: the statute name, the article number, and where possible a direct link. The user can then open the source in a single click and check it. The assistant never has the last word; it offers a starting point the lawyer can audit.
Just as important, the system has to know when to stay quiet. If the retrieval layer cannot find a source relevant enough to the question, the right move is not to invent one but to say 'I have no basis for this.' A legal assistant that can admit what it doesn't know is far more trustworthy than one that answers everything confidently and occasionally makes things up. Grounding is how we build that honesty into the architecture itself.
Evaluation: Measurement, Not Intuition
Shipping a RAG system to production because 'it looks good' would be irresponsible in law. So we measure the system continuously and in a decomposed way. We evaluate the two layers separately: is the retrieval layer finding the right documents, and is the generation layer using them faithfully? When an answer is bad, we need to know whether the fault was retrieving the wrong document or misreading the right one.
On the retrieval side, using question sets whose correct answers are known in advance, we track whether the right source shows up among the retrieved chunks and how high it ranks. On the generation side we measure faithfulness: is everything the model says actually supported by the provided context, or does it step beyond it and invent? Faithfulness is perhaps the single most critical metric for law.
This evaluation is not a one-off exam but a process that runs continuously. Feedback from lawyers, examples of faulty answers and changing legislation all feed our test sets. When we swap a chunking strategy, an embedding model or a prompt, we confirm with numbers, not intuition, that the change genuinely improves quality. The reliability of İçtiHub rests squarely on this disciplined measurement loop.
Conclusion: In Law, Grounding Is Not a Luxury but a Requirement
In many domains, an occasional slip by a language model is tolerable. In law, a wrong article, a repealed regulation or a fabricated decision has real consequences. So for us RAG is not an ornament bolted onto the model's intelligence but the very framework that makes it trustworthy. The difference between a smart model and a reliable legal assistant lives exactly in this retrieval and grounding layer.
Chunking, embeddings, vector search, re-ranking, citations and evaluation: none of these is a magic bullet on its own. But assembled together, with domain-specific rigor, they turn a raw language model into a system that can rest every sentence on a real source. This is precisely the philosophy behind MevzuatBot, the engine at the core of İçtiHub.
At EcoFluxion our conviction is simple: AI for Turkish law should not be an oracle that users trust blindly, but a tool that can show its basis for every claim and say 'I don't know' when it must. Putting grounding at the center of the architecture is the only honest way to turn that tool into a colleague a lawyer can genuinely trust.