PolEval is a SemEval-inspired evaluation campaign for natural language processing tools for Polish. Submitted tools compete against one another within certain tasks selected by organizers, using available data and are evaluated according to pre-established procedures. The PolEval session in FedCSIS program will consist of two parts: presentations of selected papers related to the topics below and presentations of submissions to the 2022/2023 edition of PolEval competition.
PolEval 2022/2023 tasks
Task 1: Punctuation prediction from conversational language
Speech transcripts generated by Automatic Speech Recognition (ASR) systems typically do not contain any punctuation or capitalization. In longer stretches of automatically recognized speech, lack of punctuation affects the general clarity of the output text. The primary purpose of punctuation restoration (PR), punctuation prediction (PP), and capitalization restoration (CR) as a distinct natural language processing (NLP) task is to improve the legibility of ASR-generated text and possibly other types of texts without punctuation. For the purposes of this task, we define PR as restoration of originally available punctuation from read speech transcripts (which was the goal of a separate task in the PolEval 2021 competition) and PP as prediction of possible punctuation in transcripts of spoken/ conversational language. Aside from their intrinsic value, PR, PP, and CR may improve the performance of other NLP aspects such as Named Entity Recognition (NER), part-of-speech (POS), and semantic parsing or spoken dialog segmentation.
Task 2: Abbreviation disambiguation
Abbreviations are often overlooked in many NLP pipelines. However, they are still an important point to tackle, especially in such applications as machine translation, named entity recognition, or text-to-speech systems.
Task 3: Passage retrieval
Passage Retrieval is a crucial part of modern open-domain question-answering systems that rely on precise and efficient retrieval components to find passages containing correct answers.
Traditionally, lexical methods like TF-IDF or BM25 were commonly used to power the retrieval systems. They are fast, interpretable, and don’t require any training (and therefore a training set). However, they can only return a document if it contains a keyword present in a query. Moreover, their text understanding is limited because they ignore the word order.
All submissions should be made using the PolEval Challenge platform.
Presentations and publication
The authors of selected submissions will be invited to prepare the extended versions of their reports for publication in the conference proceedings and presentation at FedCSIS 2023. The selection will be made by a Jury on the basis of final evaluation results, quality of the submitted reports and originality of the presented methods.
Participants of the PolEval challenge may submit papers describing their systems with no submission fee.
- March 1, 2023: Announcement of final test data
- March 15, 2023: End of accepting applications from participants and announcement of results.
- July 9, 2023: Deadline for submitting invited papers
- July 11, 2023: Author notification
- July 31, 2023: Final paper submission, registration
- Sept 20, 2023: FedCSIS conference session