(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 14, No. 3, 2023
783 | P a g e
www.ijacsa.thesai.org
VII. CONCLUSION AND FUTURE WORK
In this paper, a corpus of Indian judgment papers is
presented that is annotated with 7 distinct types of entities and
can be used to identify legal named entities. In order to create
the annotated dataset, a variety of annotation tools were
reviewed. 30 court documents that are available publicly were
manually annotated. With the dataset, a spacy model was also
trained utilizing the trained NER pipelines en_core_sci_sm and
en_core_web_trf. The model displays an F1-score of almost
60%, indicating that the dataset has better quality. It is believed
that the dataset will be useful for additional NLP tasks on
Indian judicial material, such as relationship extraction,
knowledge graph modelling, extractive summarization, etc.
In terms of future work, the author will explore approaches
for extending and further optimizing the dataset. They will also
perform additional experiments with more recent state-of-the-
art approaches. The researchers plan to produce a CSV version
of the dataset, which will simplify the data format, enhance
compatibility, facilitate data pre-processing, and enable data
analysis.
REFERENCES
[1]
J. Marrero, S. Urbano, J. S. nchez Cuadrado, J. M. Morato, and G. mez
Berb´ ıs, “Named entity recognition: fallacies, challenges and
opportunities,” Computer Standards & Interfaces, vol. 35, no. 5, pp.
482–489, 2013.
[2]
I. Mugisha and Paik, “Comparison of Neural Language Modeling
Pipelines for Outcome Prediction from Unstructured Medical Text
Notes,” IEEE Access, vol. 10, pp. 16–489, 2022.
[3]
Han, Xu, Chee Keong Kwoh, and Jung-jae Kim. "Clustering based
active learning for biomedical named entity recognition." In 2016
International joint conference on neural networks (IJCNN), pp. 1253-
1260. IEEE, 2016.
[4]
U. Neves and Leser, “A survey on annotation tools for the biomedical
literature,” Briefings in bioinformatics, vol. 15, no. 2, pp. 327–340,
2014.
[5]
Neudecker, “An open corpus for named entity recognition in historic
newspapers,” Proceedings of the Tenth International Conference on
Language Resources and Evaluation (LREC’16), pp. 4348–4352, 2016.
[6]
J. M. Steinkamp, W. Bala, A. Sharma, and J. J. Kantrowitz, “Task
definition, annotated dataset, and supervised natural language processing
models for symptom extraction from unstructured clinical notes,”
Journal of biomedical informatics, vol. 102, pp. 103–354, 2020.
[7]
J. Rodriguez, A. Diego, A. Caldwell, and Liu, “Transfer learning for
entity recognition of novel classes,” Proceedings of the 27th
International Conference on Computational Linguistics, pp. 1974–1985,
2018.
[8]
K. Bontcheva, H. Cunningham, I. Roberts, and V. Tablan, “Web-based
collaborative corpus annotation: Requirements and a framework
implementation New Challenges for NLP Frameworks,” pp. 20–27,
2010.
[9]
A. Brandsen, S. Verberne, K. Lambers, M. Wansleeben, N. Calzolari, F.
B. chet, and P. Blache, “Creating a dataset for named entity recognition
in the archaeology domain,” The European Language Resources
Association, pp. 4573–4577, 2020.
[10]
Cristian Cardellino, Milagro Teruel, Laura Alonso Alemany, Serena
Villata. A Low-cost, High-coverage Legal Named Entity Recognizer,
Classifier and Linker. ICAIL-2017 - 16th International Conference on
Artificial Intelligence and Law, Jun 2017, Londres, United Kingdom.
pp.22. ffhal-01541446.
[11]
S. Tripathi, H. Prakash, and Rai, “SimNER-an accurate and faster
algorithm for named entity recognition,” Second International
Conference on Advances in Computing, Control and Communication
Technology (IAC3T), pp. 115–119, 2018.
[12]
E. F. Tjong, K. Sang, and F. D. Meulder, “Introduction to the CoNLL-
2003 Shared Task: Language Independent Named Entity Recognition,”
Proceedings of the Seventh Conference on Natural Language Learning,
2003.
[13]
B. Glaser, F. Waltl, and Matthes, “Named entity recognition, extraction,
and linking in German legal contracts,” IRIS: Internationals
Rechtsinformatik Symposium, pp. 325–334, 2018.
[14]
F. Dernoncourt, J. Y. Lee, and P. Szolovits, “NeuroNER: an easy-to-use
program for named-entity recognition based on neural networks,” 2017.
[15]
T. Green, D. Maynard, and C. Lin, “Development of a benchmark
corpus to support entity recognition in job descriptions,” Proceedings of
the Thirteenth Language Resources and Evaluation Conference, pp.
1201–1208, 2022.
[16]
J. Devlin, M. W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-
training of deep bidirectional transformers for language understanding,”
Proceedings of the Conference of the North American Chapter of the
Association for Computational Linguistics: Human Language
Technologies, vol. 1, pp. 4171–4186, 2019.
[17]
J. Li, Q. Wei, O. Ghiasvand, M. Chen, V. Lobanov, C. Weng, and H.
Xu, “Study of Pre-trained Language Models for Named Entity
Recognition in Clinical Trial.”
[18]
E. Leitner, G. Rehm, and J. Moreno-Schneider, “A dataset of German
legal documents for named entity recognition,” 2020.
[19]
P. Kalamkar, A. Agarwal, A. Tiwari, S. Gupta, S. Karn, and V.
Raghavan, “Named Entity Recognition in Indian court judgments,”
2022.
[20]
K. Okoli and Schabram, “A guide to conducting a systematic literature
review of information systems research,” 2010.
[21]
S. Yadav and Bethard, “A survey on recent advances in named entity
recognition from deep learning models,” 2019.
[22]
P. Stenetorp, S. Pyysalo, G. Topic´, T. Ohta, S. Ananiadou, and J. Tsujii,
“BRAT: a web-based tool for NLP- assisted text annotation,”
Proceedings of the Demonstra- tions at the 13th Conference of the
European Chapter, pp. 102–107, 2012.
[23]
V. Sarnovsky´, N. M.-K. kova´, and Hrabovska´, “Annotated dataset for
the fake news classification in Slovak language,” 2020 18th
International Conference on Emerging eLearning Technologies and
Applications (IC- ETA), pp. 574–579, 2020.
[24]
K. Bontcheva, H. Cunningham, I. Roberts, A. Roberts, V. Tablan, N.
Aswani, and G. Gorrell, “GATE Teamware: a web-based, collaborative
text annotation framework,” Language Resources and Evaluation, vol.
47, no. 4, pp. 1007–1029, 2013.
[25]
T. Perry, “Lighttag: Text annotation platform,” 2021.
[26]
J. B. Gillette, S. Khushal, Z. Shah, S. Tariq, Algamdi, Krstev, M. Ivan,
B. Mishkovski, S. Mirchev, and G. Golubova, “Extracting Entities and
Relations in Analyst Stock Ratings News,” 2022 IEEE International
Conference on Big Data (Big Data), pp. 3315–3323, 2022.
[27]
A. Barriere and Fouret, ““May I Check Again? -A simple but efficient
way to generate and use contextual dictionaries for Named Entity
Recognition. Application to French Legal Texts,” 2019.
[28]
S. Paul, P. Goyal, and S. Ghosh, “LeSICiN: A heterogeneous graph-
based approach for automatic legal statute identification from Indian
legal documents,” Proceedings of the AAAI Conference on Artificial
Intelligence, vol. 36, pp. 11–139, 2022.
[29]
I. Angelidis, M. Chalkidis, and Koubarakis, “Named Entity Recognition,
Linking and Generation for Greek Legislation,” JURIX, pp. 1–10, 2018.
[30]
S. Paul, A. Mandal, P. Goyal, and S. Ghosh, “Pre-training transformers
on Indian legal text,” 2022.
[31]
Chiu, Jason PC, and Eric Nichols., “Named entity recognition with
bidirectional LSTM-CNNs,” Transactions of the association for
computational linguistics, vol. 4, pp. 357–370, 2016.
[32]
S. Yadav and Bethard, “A survey on recent advances in named entity
recognition from deep learning models,” 2019.
[33]
Lison, Pierre, Aliaksandr Hubin, Jeremy Barnes, and Samia Touileb.
"Named entity recognition without labelled data: A weak supervision
approach." arXiv preprint arXiv:2004.14723 (2020).