TY - GEN
T1 - Implementation of a Customized Named Entity Recognition (NER) Model in Document Categorization
AU - Hernandez-Lareda, Freddy
AU - Auccahuasi, Wilver
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - In institutions, one of the problems that arise is related to accumulating a large amount of documentation about their processes and other important information; a recurring task to manage these documents occurs when classifying them. In addition to this, in current times, with the exploitation of applications using Artificial Intelligence (AI) and Natural Language Processing (NLP), they are allowing to provide solutions at a technological level. In this work, we propose a methodology for the development of a customized Named Entity Recognition (NER) model and implement it, for the task of classifying documents and determining and verifying if the classification performed corresponds to a manual classification. To evaluate the performance of the classifier, an automatic classification of 1049 documents from the corpus for knowledge management of an institution was used, using a customized NER model. The results allow us to determine that the presented model achieves 95% positive classifications; the methodology is developed with the purpose of being replicated and scaled, according to the different needs of organizations.
AB - In institutions, one of the problems that arise is related to accumulating a large amount of documentation about their processes and other important information; a recurring task to manage these documents occurs when classifying them. In addition to this, in current times, with the exploitation of applications using Artificial Intelligence (AI) and Natural Language Processing (NLP), they are allowing to provide solutions at a technological level. In this work, we propose a methodology for the development of a customized Named Entity Recognition (NER) model and implement it, for the task of classifying documents and determining and verifying if the classification performed corresponds to a manual classification. To evaluate the performance of the classifier, an automatic classification of 1049 documents from the corpus for knowledge management of an institution was used, using a customized NER model. The results allow us to determine that the presented model achieves 95% positive classifications; the methodology is developed with the purpose of being replicated and scaled, according to the different needs of organizations.
KW - document classification
KW - Named Entity Recognition
KW - Natural Language Processing
UR - http://www.scopus.com/inward/record.url?scp=85217366004&partnerID=8YFLogxK
U2 - 10.1109/ICACRS62842.2024.10841691
DO - 10.1109/ICACRS62842.2024.10841691
M3 - Conference contribution
AN - SCOPUS:85217366004
T3 - 3rd International Conference on Automation, Computing and Renewable Systems, ICACRS 2024 - Proceedings
SP - 776
EP - 784
BT - 3rd International Conference on Automation, Computing and Renewable Systems, ICACRS 2024 - Proceedings
T2 - 3rd International Conference on Automation, Computing and Renewable Systems, ICACRS 2024
Y2 - 4 December 2024 through 6 December 2024
ER -