Approaches to corpus creation for low-resource language technology: the case of Southern Kurdish and Laki

S Ahmadi, Z Azin, S Belelli… - arXiv preprint arXiv …, 2023 - arxiv.org
One of the major challenges that under-represented and endangered language
communities face in language technology is the lack or paucity of language data. This is …

Language and Speech Technology for Central Kurdish Varieties

S Ahmadi, DQ Jaff, MMI Alam… - arXiv preprint arXiv …, 2024 - arxiv.org
Kurdish, an Indo-European language spoken by over 30 million speakers, is considered a
dialect continuum and known for its diversity in language varieties. Previous studies …

Challenges in Kurdish text processing

KS Esmaili - arXiv preprint arXiv:1212.0074, 2012 - arxiv.org
Despite having a large number of speakers, the Kurdish language is among the less-
resourced languages. In this work we highlight the challenges and problems in providing the …

The Kurdish Language corpus: state of the art

K Jacksi, I Ali - Science Journal of University of Zakho, 2023 - sjuoz.uoz.edu.krd
The notable growth of the digital communities and different online news streams led to the
growing availability of online natural language content. However not all natural languages …

[PDF][PDF] Corpora of social media in minority Uralic languages

T Arkhangelskiy - Proceedings of the Fifth International Workshop …, 2019 - aclanthology.org
This paper presents an ongoing project aimed at creation of corpora for minority Uralic
languages that contain texts posted on social media. Corpora for Udmurt and Erzya are fully …

[PDF][PDF] Sorani Kurdish versus Kurmanji Kurdish: an empirical comparison

KS Esmaili, S Salavati - Proceedings of the 51st Annual Meeting of …, 2013 - aclanthology.org
Resource scarcity along with diversity–both in dialect and script–are the two primary
challenges in Kurdish language processing. In this paper we aim at addressing these two …

Building a corpus for the Zaza–Gorani language family

S Ahmadi - Proceedings of the 7th workshop on NLP for similar …, 2020 - aclanthology.org
Thanks to the growth of local communities and various news websites along with the
increasing accessibility of the Web, some of the endangered and less-resourced languages …

Developing a fine-grained corpus for a less-resourced language: the case of Kurdish

RO Abdulrahman, H Hassani, S Ahmadi - arXiv preprint arXiv:1909.11467, 2019 - arxiv.org
Kurdish is a less-resourced language consisting of different dialects written in various
scripts. Approximately 30 million people in different countries speak the language. The lack …

Toward computational processing of less resourced languages: Primarily experiments for Moroccan Amazigh language

FA Allah, S Boulaknadel - Text Mining. Rijeka: InTech, 2012 - books.google.com
The world is undergoing a huge transformation from industrial economies into an informa‐-
tion economy, in which the indices of value are shifting from material to non-material re …

[PDF][PDF] Developing language technology tools and resources for a resource-poor language: Sindhi

R Motlani - Proceedings of the NAACL Student Research …, 2016 - aclanthology.org
Abstract Sindhi, an Indo-Aryan language with more than 75 million native speakers1 is a
resourcepoor language in terms of the availability of language technology tools and …

関連かんれんキーワード