The Prototype Fund and the German Federal Ministry of Education and Research recently funded my human rights violation prediction tool. This is the only project I have ever come across that aims to help rights-holders identify if they have been subject to a human rights violation without the need to go to an external party or spend money for assistance. Have a quick look at the attached video to see it in action. Also, have a play and let me know what you think of it!
In short, a user based in the jurisdiction of the European Court of Human Rights can input what happened to them in simple language, even with spelling or grammatical mistakes, and leave with an understanding of:
What their legal issue is (Issue Identifier);
The percentage chance of any human rights violation (Prediction Tool); and
What cases are semantically similar to their legal issue (Semantic Search).
I aimed to take tools and science that were relegated to the private sector or academia and implement them in a way that communities or researchers who do not have access to financial resources, language skills, time, or personal connections can still obtain the same benefits that individuals and organizations that "have" can.
The remainder of the article will shed light on each of the components of the prediction tool mentioned above.
Issue Identifier
The Issue Identifier converts simple language to a legal representation of the issue in the language utilized by the Court. To conduct this exercise, I carried out the following tasks:
Isolated the "fact" section of every case;
Ran an LLM as a summarizer on the facts and ensured that the key issues and language were retained;
With another LLM, created a synthetic database of simple language conversions of the summarized facts through a prompt that focused on utilizing the language of an 11-year-old and introduced typos and grammatical mistakes. For each legal summary, I created five simple language conversions;
Lastly, fine-tuned an LLM with pairs of the simple language conversions and the legal summaries.
The above process created a large enough synthetic dataset to enable an effective conversion of simple language to legal language.
In short, this replicates the initial job of a junior lawyer—to identify the legal problem of their client.
Prediction
I want to note at the outset that the Prediction tool, despite being based on current academic research, is meant to be used as a form of navigation to the human rights that are most likely harmed, rather than identifying what your human rights violation actually is.
In short, the prediction model, RoBERTa Multilabel Classifier, was trained on facts paired with the relevant human rights articles. Then, for any new input, it assesses whether those facts correspond to other facts in the model that showed a violation or not. In essence, it creates a relationship between a pattern of words and any label—in this case, a number that represents the human rights article number. It assesses any new input text to find a similarity to the facts it has stored and then determines whether that also creates a relationship in the same way to any label. Overall, the model is performing a very sophisticated similarity assessment.
This task is normally also the job of a lawyer, as they must first identify what the specific law in question is (in this case, which human right specifically) before they engage with their case law research.
Another interesting way the Prediction tool could be used is as a form of accountability to the Court itself. A new judgment delivered by the European Court of Human Rights could be entered into the model to assess if it has been decided consistently with previous case law. If a discrepancy is found, this can lead to further investigation that may reveal valuable insights.
Semantic Search
The Semantic Search function identifies the underlying representation of the input sentence and compares it to all the cases within its saved index. This index was created at the paragraph level; for example, each paragraph of a case had its representation (you can think of this as a list of a bunch of numbers) saved as a vector. This representation therefore captures the meaning of the entire sentence rather than just containing words. For instance, if your library consisted of court documents about the police, and a user searched this library for "cop" in a normal keyword search system, it is unlikely that any result would be found. However, with a semantic search, "cop" can be semantically linked to "police officer."
In the same manner, this search can span an entire sentence. For example, if you enter the following into the European Court of Human Rights database, it will lead to zero results: "a cop beat me up."
But if you type this into the prediction tool, it will still lead to useful cases that detail police officers utilizing force.
Supporting Marginalized Communities
Many individuals face substantial barriers to accessing legal assistance due to language differences, limited education, or socio-economic challenges. This tool is specifically designed to assist those who might struggle with traditional legal processes. By accepting inputs in simple language—even with spelling and grammatical errors—it ensures that people with varying levels of literacy can still gain insights into their legal issues.
We are actively seeking collaborations with NGOs and community organizations to extend the tool's reach to those who might benefit most. By partnering with groups that work directly with marginalized populations, we aim to provide a supportive resource that can help individuals recognize and address human rights violations they may have experienced.