Description:
-- Text. -- SeqCondenser: Inductive Representation Learning of Sequences by Sampling Characteristic Functions -- Is Prompting What Term Extraction Needs?. -- Bilingual Lexicon Induction From Comparable and Parallel Data: A Comparative Analysis. -- Explaining Metaphors in the French Language by Solving Analogies using a Knowledge Graph. -- The Aranea Corpora Family: Ten+ Years of Processing Web-Crawled Data. -- Continual Learning Under Language Shift. -- Neural Spell-Checker: Beyond Words with Synthetic Data Generation. -- CoastTerm: a Corpus for Multidisciplinary Term Extraction in Coastal Scientific Literature. -- New Human-Annotated Dataset of Czech Health Records for Training Medical Concept Recognition Models. -- Analyzing Biases in Popular Answer Selection Datasets on Neural-based QA Models. -- Using Neural Coherence Models to Assess Discourse Coherence. -- Named Entity Linking in English-Czech Parallel Corpus. -- TamSiPara: A Tamil – Sinhala Parallel Corpus. -- Automatic Ellipsis Reconstruction in Coordinated German Sentences Based on Text-To-Text Transfer Transformers. -- Better Low-Resource Machine Translation with Smaller Vocabularies. -- Bella Turca: A Large-Scale Dataset of Diverse Text Sources for Turkish Language Modeling. -- Evaluation Metrics in LLM Code Generation. -- Kernel Least Squares Transformations for Cross-lingual Semantic Spaces. -- Unsupervised Extraction of Morphological Categories for Morphemes. -- Introducing LCC’s NavProc 1.0 Corpus: Annotated Procedural Texts in the Naval Domain. -- Models and Strategies for Russian Word Sense Disambiguation: A Comparative Analysis. -- Open-Source Web Service with Morphological Dictionary–Supplemented Deep Learning for Morphosyntactic Analysis of Czech. -- Mistrík’s Readability Metric – an Online Library.
The two-volume set LNAI 15048 and 15049 constitutes the refereed proceedings of the 27th International Conference on Text, Speech, and Dialogue, TSD 2024, held in Brno, Czech Republic, during September 9–13, 2024. The 50 revised full papers presented in these deadline proceedings were carefully reviewed and selected from 103 submissions. The papers are organized in the following topical sections: Part I: Text Part II: Speech, Dialogue.