Approaches to corpus creation for low-resource language technology: the case of Southern Kurdish and Laki
One of the major challenges that under-represented and endangered language
communities face in language technology is the lack or paucity of language data. This is …
communities face in language technology is the lack or paucity of language data. This is …
Language and Speech Technology for Central Kurdish Varieties
Kurdish, an Indo-European language spoken by over 30 million speakers, is considered a
dialect continuum and known for its diversity in language varieties. Previous studies …
dialect continuum and known for its diversity in language varieties. Previous studies …
Challenges in Kurdish text processing
KS Esmaili - arXiv preprint arXiv:1212.0074, 2012 - arxiv.org
Despite having a large number of speakers, the Kurdish language is among the less-
resourced languages. In this work we highlight the challenges and problems in providing the …
resourced languages. In this work we highlight the challenges and problems in providing the …
The Kurdish Language corpus: state of the art
The notable growth of the digital communities and different online news streams led to the
growing availability of online natural language content. However not all natural languages …
growing availability of online natural language content. However not all natural languages …
[PDF][PDF] Corpora of social media in minority Uralic languages
T Arkhangelskiy - Proceedings of the Fifth International Workshop …, 2019 - aclanthology.org
This paper presents an ongoing project aimed at creation of corpora for minority Uralic
languages that contain texts posted on social media. Corpora for Udmurt and Erzya are fully …
languages that contain texts posted on social media. Corpora for Udmurt and Erzya are fully …
[PDF][PDF] Sorani Kurdish versus Kurmanji Kurdish: an empirical comparison
KS Esmaili, S Salavati - Proceedings of the 51st Annual Meeting of …, 2013 - aclanthology.org
Resource scarcity along with diversity–both in dialect and script–are the two primary
challenges in Kurdish language processing. In this paper we aim at addressing these two …
challenges in Kurdish language processing. In this paper we aim at addressing these two …
Building a corpus for the Zaza–Gorani language family
S Ahmadi - Proceedings of the 7th workshop on NLP for similar …, 2020 - aclanthology.org
Thanks to the growth of local communities and various news websites along with the
increasing accessibility of the Web, some of the endangered and less-resourced languages …
increasing accessibility of the Web, some of the endangered and less-resourced languages …
Developing a fine-grained corpus for a less-resourced language: the case of Kurdish
Kurdish is a less-resourced language consisting of different dialects written in various
scripts. Approximately 30 million people in different countries speak the language. The lack …
scripts. Approximately 30 million people in different countries speak the language. The lack …
Toward computational processing of less resourced languages: Primarily experiments for Moroccan Amazigh language
FA Allah, S Boulaknadel - Text Mining. Rijeka: InTech, 2012 - books.google.com
The world is undergoing a huge transformation from industrial economies into an informa‐-
tion economy, in which the indices of value are shifting from material to non-material re …
tion economy, in which the indices of value are shifting from material to non-material re …
[PDF][PDF] Developing language technology tools and resources for a resource-poor language: Sindhi
R Motlani - Proceedings of the NAACL Student Research …, 2016 - aclanthology.org
Abstract Sindhi, an Indo-Aryan language with more than 75 million native speakers1 is a
resourcepoor language in terms of the availability of language technology tools and …
resourcepoor language in terms of the availability of language technology tools and …