Linguist group

8/14/2023

However, most of the digital documents are indexed through their OCRed version which includes numerous errors that may hinder access to them. In my talk, I will speak about the named entity recognition (NER) and entity linking (EL) of digital text, the impact of OCR errors on NER and EL systems performances as well as existing strategies and solutions to deal with OCR noise. In order to improve the quality of user searches in a system, it is thus necessary to ensure the quality of these particular terms. For this reason, NEs can be given a higher semantic value than other words. The analysis of digital documents requires therefore text extraction using optical character recognition (OCR) systems. Several studies have shown that named entities (NEs) are strongly used to index documents since they are the first point of entry in a search system for document retrieval. Billions of digital documents are usually scanned and archived as images which represent a substantial resource for natural language processing (NLP) tasks. Title: Content Analysis of Digital Text with Special Focus on Named Entity Recognition and LinkingĪbstract: Digital humanity institutions are steadily contributing an increasing amount of digital documents (either born-digital or digitised). Speaker: Dr Ahmed Hamdi, La Rochelle University This entry was posted in Conferences, news and tagged #Award, #conference on Jby riilp. Our approach provided a partial Reciprocal Rank (pRR) score of 0.49 on the test set, proving its strong performance on the task. We further improve the results using various ensemble learning strategies. Our methodology uses transfer learning to take advantage of available Arabic MRC data. This paper describes the DTW entry to the Quran QA 2022 shared task. The goal of the Qur’an QA 2022 shared task is to fill this gap by producing state-of-the-art question answering and reading comprehension research on Qur’an. However, the research in MRC has been understudied in several domains, including religious texts. It has gained popularity in the natural language processing (NLP) field mainly due to the large number of datasets released for many languages. The task of machine reading comprehension (MRC) is a useful benchmark to evaluate the natural language understanding of machines. Title: DTW at Qur’an QA 2022: Utilising Transfer Learning with Transformers for Question Answering in a Low-resource Domain The organisers evaluated the papers based on different metrics:Īuthors: Damith Premasiri, Tharindu Ranasinghe, Wajdi Zaghouani, Ruslan Mitkov This is the first best paper award for the recently established RIGHT Lab. This is at the 5th Workshop on Open-Source Arabic Corpora and Processing Tools (OSACT5) at the 13th Language Resources and Evaluation Conference (LREC 2022). RGCL is delighted to share that a team of our academics and PhD student have recently been awarded the Best Paper Award for Qur’an QA shared task 2022.

0 Comments

Linguist group

Leave a Reply.

Author

Archives

Categories