logo
ResearchBunny Logo
Introduction
Artificial intelligence (AI), particularly Natural Language Processing (NLP) through ML, is increasingly used in legal contexts. Studies show significant efficiency gains in organizations like the US Department of Labor and the Supreme Court of Brazil. However, current ML applications in law primarily focus on named entity recognition, sentiment analysis, and case classification, overlooking the core legal concepts of rights and duties. Legal documents describe power interactions based on the interests of involved parties, leading to the imposition of rights and duties. This study aims to determine if ML can effectively capture these principal legal dimensions, improving explainability and accuracy in legal text analysis. The research leverages the interest theory of rights, which focuses on the protection of interests, and Hohfeldian taxonomy, which categorizes legal relations into rights, duties, privileges, and no-rights. These theories offer a foundational, less politically charged, and conceptually clear framework compared to will theory, facilitating better ML application.
Literature Review
Two main theories of rights exist: the interest theory and the will theory. The paper opts for the interest theory because it avoids complexities related to the ability to exercise power and make rational choices, making it more inclusive and less susceptible to political influences compared to the will theory. Hohfeld's taxonomy is chosen for its clear and irreducible dimensions (rights, duties, privileges, no-rights) which provide a fundamental and refined classification of legal relations. Alternative models like Honoré's and Salmond's are criticized for conflating concepts or not sufficiently distinguishing relations at a fundamental level. The paper focuses on the first-order Hohfeldian relations and how they can be identified digitally.
Methodology
The study proposes a philosophical heuristic based on the Golden Rule ('Do unto others as you would have them do unto you') to identify duties and their correlative rights. This heuristic is operationalized using three studies: Study 1 uses masked language modeling (ALBERT) to predict masked words in sentences reformulated according to the heuristic. For example, 'The man murdered the police officer' becomes 'A man would [MASK] like to be murdered', where the model is expected to predict 'not'. Study 2 employs custom sentence embedding formulations using the Universal Sentence Encoder (USE) and cosine similarity to compare sentences. It measures how similar a test sentence is to 'wanted' versus 'unwanted' sentence pairs. Synonyms and antonyms of 'wish' are used to create vectors to capture different senses and minimize homonymy issues. A logistic regression classifier after PCA is used for prediction. Study 3 uses various sentence transformers (including BERT pre-trained on legal texts) and UMAP for dimensionality reduction to classify sentences into 'rights-duties' and 'privileges-no-rights' categories. Logistic regression is used for classification, and accuracy is evaluated. The dataset used for studies 2 and 3 consists of sentences extracted from UK anti-religious discrimination legislation, labeled by a legal expert.
Key Findings
Study 1 (masked language modeling) achieved 86.5% accuracy in classifying sentences as 'wanted' or 'unwanted' based on the heuristic. Study 2 (custom embeddings) achieved 79.5% accuracy using the summed vector representation of the axiom, and 72% accuracy using individual vector comparisons and a logistic regression classifier. Study 3 (classifying Hohfeldian relations) yielded varying accuracies across different language models, ranging from 79.3% to 92.5% for logistic regression classification and from 85.0% to 92.5% when directly using the language model for prediction. The 'paraphrase-mpnet-base-v2' model performed best (92.5%). The BERT model pre-trained on legal texts achieved 91.7% accuracy.
Discussion
The results demonstrate the feasibility of using ML to identify and classify fundamental legal concepts like rights and duties. Accuracies, while not perfect, are promising, particularly for the classification of Hohfeldian relations. The surprising outperformance of the paraphrase-mpnet-base-v2 model over the legal BERT model may be due to the imprecision in legal language in existing corpora. The study's success in applying a philosophical heuristic to digital analysis highlights the potential of bridging humanities, social sciences, and computational science for legal reasoning tools. The use of a harm-aversion based heuristic offers a neutral starting point for cross-cultural analysis, unlike approaches relying on specific ethical frameworks.
Conclusion
This paper makes a unique contribution by demonstrating the potential of ML, particularly SBERT, to delineate Hohfeldian relations and the interest theory of law. The use of a universal ethical heuristic and the high accuracy achieved in classifying legal relations pave the way for more explainable and ethically sound legal analytics tools. Future research should focus on handling more complex sentences, expanding the corpus to broader legal domains, and integrating the proposed method into a holistic legal analytics software library.
Limitations
Limitations include the simplicity of the sentences used, potential ambiguity in some sentences for Study 1, and the limited size and potentially biased sample of sentences for Study 2. Further work should incorporate more complex sentences, larger datasets, and methods to handle ambiguities using correlativity and oppositional properties of Hohfeldian relations. The ethical implications of applying ML to legal analysis should also be considered further.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs—just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny