'%3E%0A%3Cg clip-path='url(%23c0)' opacity='0.5'%3E%0A%3Cpath d='M73.6 85H835.8' class='g0'/%3E%0A%3C/g%3E%0A%3C/g%3E%0A%3Cimage preserveAspectRatio='none' x='74' y='1205' width='107' height='30' href='data:image/png%3Bbase64%2CiVBORw0KGgoAAAANSUhEUgAAAGsAAAAeCAMAAAAcjhBMAAAC%2BlBMVEUAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAD/gAD1kADzkwD0jwCAgAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAADoiwD4lQD3lAD2lAD4lgD4lgAAAAAAAAAAAAAAAAAAAAD5lQD2lAD4lQD3lgAAAAAAAAAAAAAAAAAAAAAAAAAAAADzjADvkgD3jwAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAD/AAD1kwDwkwD/gAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAADqlQD6lwD3lQD4lgD/lQAAAAAAAAAAAAAAAAAAAAAAAAAAAAD4lQD2kwD2lAD4lQDyjAAAAAAAAAAAAAAAAAAAAAD/gAD1kgDzkQD5kwAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAB0H799AAAA/nRSTlOHhXJIDQBEJl%2BAEQfU%2BO%2ByYAlkgQqEYxttqt/38sqORwT//uMTINjCJLvzErp27P29L/b77s4isfEfBZigyAy5iu3EvOLmMM0UFxYePcwIhjn6GgtT28YDyTF1AjVAMAK/kZIB/EZN0gvZ/f36S38CwPkV2PzyP4L0KGJRrycqMSDHsFy3IWY/D%2BTqk6uoqbPnfdMpLNpqLaLBWVZYg9FA6R1pm5Z%2BSvABGiEIOGe13JUyc1rD6wzS88MMBtV71jw7Ut///%2BMUnUtMXZoETVgoetDXNuU1ixisa5fhtJCJz5yP3RzZEGi2DiUZfCp4eWFvjIh0bk5UcWV3SUOlOrWMzFoAAARmSURBVHjaxZZpeBNFGIC3x2ivUNME6KXm24LQQC0mUkpAWqS1tECxTUtovICt4oGJqUcqLWpVaMUgJFxWLFprPUEKiFeReivgEQSvggVEDguCN3g9j7Mz2WQC2W7rD31/ZCbfTPbdzHxzcJyPsPCISOTjLI47GwWI4qJjEIqNU0n0iz9H7W9N4DRqxKLtzw2Q6gMTk5JTUs8973wd/sKlgASfphlEe0QDRDA/jofUwQipLwCGIUOl1nTQxwS5hqlgOK3pMi70dc8ccRF2GcB4sUhqJg6NzPK5eNY1CrJHI2TSjJEYi7WXjPO1DofU01wpkENrufiR4y8dNSGPB8i/THQVTCwsLCyaNLl/P4Ap42RdLMVTeQhXdF2ugpKppbhiLisHmGbBrulSp4pkAKus64orr5K4%2BhrdDBii6JoJMEugsaH5UHkt60LXGWG2Ts51/Q03%2Brhpzs3IBmlaBZf9FnBUSEE8nPFBLvtsMFTJum69TeL2O5AT0swKLss0qPZn0J0D5tYEuVAtVM6THcO77pa4BxXXKY%2BhcC/AfWxDsOt%2BMM7vTW7EjAVYoDhfiTj76uVcDQ/AQq2cK/ZBlytpEeYh1%2BIlACnuLCVXgwdAlRS1VBfK5QaoFeRc5nIIZnyCoLC%2BlnnEfobli1as1AW71BkGUNXL5nzxwyMCzGj0OMCYoOBCw8I9KvJexrpHLKKrfNWjmLlNq/E2lYt6tZZFHlsOjzcruHBqr3yipU6P9w3%2BSR12AS8i2lVuOrStAE8FuZ6ehNAzzz7H8jxaY4S1ii6S/FVlL/DAzxJdFMe6tvW%2Bxg0AGwNddS/CpiKEXnr5FZZXX2vfDG29com61/Gfwa4tHfWYN%2BaZ/S1vAtiYBF8IcXbseutthnfefc9UAO/35Ir5YGtFYAfNI67pZ0xFlh7iAmfSNoDtuPjwo49ZPkGD9eDtybWMhx12f/RTGRdaDLxGkI67LVA5%2BcwuqCoHHDt7cml3QeYKKbjTIOf6rBqMn39B8qgDp%2BeX4uuZvnL7idZoOnfjXZzmfLbMfDnx2vJOJNU1m/F5FdqF1joANu35uquloARgL7Ga1522ljP3aamr0rNDwmVnXOoc3Cs/fn9XZx1eZd%2BUyrjQKv%2BTS1wH6C8b0xjyvk2qsfv2KIbVrAsdnKn3XzCmrEfchPS2UC7UcOjwLoMq29PZIU2cYGERpHBXOsN27DpyOL1Lesp3tsYlBkd1cm0UHgNOsAsoNIK2qvtgO1JCsDMINMI80mIu6h6kJgEO/Xf8/y5pMgQhOCr01PzvXNtsGeRWutF59BhdAuJH5Pf7rCQcERZG7xG24xssYrmVHgMRh0hhOtEX1w8V3aS0Doz80cL0Ok7XKicc%2BYnUvMhbSq7ZS8myd2rILT22tS%2Bun3/5lbqaUa4p0EvwkuLAUXwFI7UxrTS5qau5pek34nL3xWU9eYq8ofX3pjK6hqgrmh5/J63cH/TOIOSSA%2B8EGdLEmj%2BdYtn%2B16m/Q7r%2BAXLIv3/yPfucAAAAAElFTkSuQmCC'/%3E%0A%3Cpath d='M81.2 340.9H197m0 0H298.8m0 0H434.2m-353 38.2H197m0 0H298.8m0 0H434.2m-353 83H197m0 0H298.8m0 0H434.2' class='g1'/%3E%0A%3Cpath d='M770 757.3h65.8M498 776.4H700.6M543.9 852.8h211m38.5 57.3h42.4M498 929.2H727.8m42.2 76.4h65.8M498 1024.7H693.5m76.5 57.2h65.8M498 1101H694.1m-118.4 38.2H809.5' class='g2'/%3E%0A%3C/svg%3E)
J. Law Epistemic Stud. (July - December 2025) 3(2): 13-18 17
outlined in the study’s variable matrix, the methodological
operationalization included both qualitative indicators (such
as type of system and type of hallucination) and quantitative
ones (such as frequency, entropy, AUROC, and Kappa). This
mixed-methods framework enabled a nuanced understand-
ing of the phenomenon, integrating structural, semantic, and
functional dimensions.
Table 3. Performance metrics of automatic hallucination
detection methods
Detection
method
AUROC
Kappa (vs.
Human Coding)
Semantic
Entropy
0.78 0.72
SEPs (Kossen
et al., 2024)
0.75 0.68
For instance, semantic entropy behaved as a reliable pre-
dictor: systems with higher entropy values (ChatGPT 4 and
Llama 2) also exhibited higher hallucination rates. AUROC
scores demonstrated the technical viability of automating
alerts for potential errors with high reliability. Furthermore,
the Kappa coefficient showed that automatic detection can
approximate human judgment criteria, thereby reducing the
burden of expert review. This type of integral analysis en-
hances the study’s internal validity and provides a robust
foundation for practical and normative recommendations.
The results are consistent with those of Magesh et al.
(2024), who identified error rates in Lexis+ AI and Westlaw
AI ranging from 17% to 33%, and with the work of Farquhar
et al. (2024) on the performance of semantic entropy as a
predictor of confabulations. The novelty of this study lies in
the systematic combination of manual coding, classical en-
tropy, and SEPs, alongside the use of an IRAC-based corpus
specifically designed for verifiable legal evaluation.
From a technical perspective, the data suggest that RAG
architecture reduces—but does not eliminate—the risk of le-
gal hallucinations. The adoption of SEPs in real-world legal
environments could facilitate the integration of automatic
alert mechanisms, thereby minimizing the risk of errors in
sensitive legal documents.
From a legal and epistemic standpoint, hallucinations
compromise fundamental epistemic rights: access to reliable
knowledge, transparency of sources, and protection against
deception. This breach is particularly critical in judicial pro-
ceedings, where the inadvertent use of fabricated citations
may result in disciplinary sanctions, procedural nullities, or
a violation of due process guarantees.
It is important to note that the study is limited to U.S. feder-
al law and English-language systems. Future research should
replicate the analysis in multi-jurisdictional contexts (e.g.,
Latin American or Continental European law) and with mul-
tilingual models. Additionally, it is recommended to evaluate
the actual incorporation of SEPs in law firm workflows and
their impact on decision-making processes
Conclusions
This comparative quasi-experimental study found that le-
gal artificial intelligence systems differ significantly in the
frequency, type, and detectability of legal hallucinations,
with higher rates in general-purpose models such as Chat-
GPT 4 (60%) and Llama 2 (85%) than in specialized tools
like Lexis+ AI (25%) and Westlaw AI (30%), with fabricated
legal citations as the most common error. The study validated
the use of semantic entropy and Semantic Entropy Probes as
efficient mechanisms for detecting inconsistencies and redu-
cing costs without compromising accuracy, thereby enabling
the generation of real-time alerts. It warns that these hallu-
cinations constitute an emerging form of epistemic injusti-
ce, as they simulate authority without ensuring truthfulness,
thereby undermining fundamental rights and increasing te-
chnical, ethical, and procedural risks. The recommendations
include requiring internal validation in legal AI tools, esta-
blishing ethical and regulatory protocols, auditing databases
and methodologies, and incorporating the epistemic rights
framework into regulation to ensure a reliable, transparent,
and fair use of these technologies.
References
Bench-Capon, T., Prakken, H., & Sartor, G. (2022). Artifi-
cial intelligence and legal reasoning: Past, present and
future. Artificial Intelligence, 303, 103644. https://doi.
org/10.1016/j.artint.2021.103644
Dahl, M., Magesh, V., Suzgun, M., & Ho, D. E. (2024). Lar-
ge legal fictions: Profiling legal hallucinations in lar-
ge language models. Journal of Legal Analysis, 16(1),
64–93. https://doi.org/10.1093/jla/laae003
Farquhar, S., Kossen, J., Kuhn, L., & Gal, Y. (2024). De-
tecting hallucinations in large language models using
semantic entropy. Nature, 630(8017), 625–630. https://
doi.org/10.1038/s41586-024-07421-0
Fricker, M. (2007). Epistemic injustice: Power and the ethics
of knowing. Oxford University Press.
Kay, J., Kasirzadeh, A., & Mohamed, S. (2024). Epis-
temic injustice in generative AI. arXiv. https://doi.
org/10.48550/arXiv.2408.11441
Kossen, J., Han, J., Razzak, M., Schut, L., Malik, S., & Gal,
Y. (2024). Semantic entropy probes: Robust and che-
ap hallucination detection in LLMs. arXiv. https://doi.
org/10.48550/arXiv.2406.15927
Langton, R. (2010). Epistemic injustice: Power and the ethics
of knowing. https://www.jstor.org/stable/40602716
Latif, Y. A. (2025). Hallucinations in large language models
and their influence on legal reasoning: Examining the