(Translated by https://www.hiragana.jp/)
Search | arXiv e-print repository
Skip to main content

Showing 1–17 of 17 results for author: Kutlu, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.10133  [pdf, other

    cs.CL

    Turkronicles: Diachronic Resources for the Fast Evolving Turkish Language

    Authors: Togay Yazar, Mucahid Kutlu, İsa Kerem Bayırlı

    Abstract: Over the past century, the Turkish language has undergone substantial changes, primarily driven by governmental interventions. In this work, our goal is to investigate the evolution of the Turkish language since the establishment of Türkiye in 1923. Thus, we first introduce Turkronicles which is a diachronic corpus for Turkish derived from the Official Gazette of Türkiye. Turkronicles contains 45,… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

  2. arXiv:2311.18054  [pdf, other

    cs.CL cs.AI

    I Know You Did Not Write That! A Sampling Based Watermarking Method for Identifying Machine Generated Text

    Authors: Kaan Efe Keleş, Ömer Kaan Gürbüz, Mucahid Kutlu

    Abstract: Potential harms of Large Language Models such as mass misinformation and plagiarism can be partially mitigated if there exists a reliable way to detect machine generated text. In this paper, we propose a new watermarking method to detect machine-generated texts. Our method embeds a unique pattern within the generated text, ensuring that while the content remains coherent and natural to human reade… ▽ More

    Submitted 11 December, 2023; v1 submitted 29 November, 2023; originally announced November 2023.

  3. arXiv:2301.08062  [pdf, other

    cs.IR

    New Metrics to Encourage Innovation and Diversity in Information Retrieval Approaches

    Authors: Mehmet Deniz Türkmen, Matthew Lease, Mucahid Kutlu

    Abstract: In evaluation campaigns, participants often explore variations of popular, state-of-the-art baselines as a low-risk strategy to achieve competitive results. While effective, this can lead to local "hill climbing" rather than more radical and innovative departure from standard methods. Moreover, if many participants build on similar baselines, the overall diversity of approaches considered may be l… ▽ More

    Submitted 30 January, 2023; v1 submitted 19 January, 2023; originally announced January 2023.

    Comments: 16 pages, 6 figures, to be published in ECIR 2023

  4. arXiv:2207.11500  [pdf, ps, other

    cs.CL cs.CY

    Catch Me If You Can: Deceiving Stance Detection and Geotagging Models to Protect Privacy of Individuals on Twitter

    Authors: Dilara Dogan, Bahadir Altun, Muhammed Said Zengin, Mucahid Kutlu, Tamer Elsayed

    Abstract: The recent advances in natural language processing have yielded many exciting developments in text analysis and language understanding models; however, these models can also be used to track people, bringing severe privacy concerns. In this work, we investigate what individuals can do to avoid being detected by those models while using social media platforms. We ground our investigation in two exp… ▽ More

    Submitted 23 July, 2022; originally announced July 2022.

    Comments: This paper is accepted at 17TH INTERNATIONAL CONFERENCE ON WEB AND SOCIAL MEDIA (ICWSM) 2023

  5. arXiv:2207.11497  [pdf, ps, other

    cs.IR

    Patent Search Using Triplet Networks Based Fine-Tuned SciBERT

    Authors: Utku Umur Acikalin, Mucahid Kutlu

    Abstract: In this paper, we propose a novel method for the prior-art search task. We fine-tune SciBERT transformer model using Triplet Network approach, allowing us to represent each patent with a fixed-size vector. This also enables us to conduct efficient vector similarity computations to rank patents in query time. In our experiments, we show that our proposed method outperforms baseline methods.

    Submitted 23 July, 2022; originally announced July 2022.

    Comments: This paper is accepted at the 3rd Workshop on Patent Text Mining and Semantic Technologies (PatentSemTech) 2022

  6. arXiv:2109.12987  [pdf, other

    cs.CL cs.IR cs.LG cs.SI

    Overview of the CLEF--2021 CheckThat! Lab on Detecting Check-Worthy Claims, Previously Fact-Checked Claims, and Fake News

    Authors: Preslav Nakov, Giovanni Da San Martino, Tamer Elsayed, Alberto Barrón-Cedeño, Rubén Míguez, Shaden Shaar, Firoj Alam, Fatima Haouari, Maram Hasanain, Watheq Mansour, Bayan Hamdan, Zien Sheikh Ali, Nikolay Babulkov, Alex Nikolov, Gautam Kishore Shahi, Julia Maria Struß, Thomas Mandl, Mucahid Kutlu, Yavuz Selim Kartal

    Abstract: We describe the fourth edition of the CheckThat! Lab, part of the 2021 Conference and Labs of the Evaluation Forum (CLEF). The lab evaluates technology supporting tasks related to factuality, and covers Arabic, Bulgarian, English, Spanish, and Turkish. Task 1 asks to predict which posts in a Twitter stream are worth fact-checking, focusing on COVID-19 and politics (in all five languages). Task 2 a… ▽ More

    Submitted 23 September, 2021; originally announced September 2021.

    Comments: Check-Worthiness Estimation, Fact-Checking, Veracity, Evidence-based Verification, Detecting Previously Fact-Checked Claims, Social Media Verification, Computational Journalism, COVID-19

    MSC Class: 68T50 ACM Class: F.2.2; I.2.7

    Journal ref: CLEF-2021

  7. arXiv:2106.09775  [pdf, other

    cs.CL cs.IR

    An Information Retrieval Approach to Building Datasets for Hate Speech Detection

    Authors: Md Mustafizur Rahman, Dinesh Balakrishnan, Dhiraj Murthy, Mucahid Kutlu, Matthew Lease

    Abstract: Building a benchmark dataset for hate speech detection presents various challenges. Firstly, because hate speech is relatively rare, random sampling of tweets to annotate is very inefficient in finding hate speech. To address this, prior datasets often include only tweets matching known "hate words". However, restricting data to a pre-defined vocabulary may exclude portions of the real-world pheno… ▽ More

    Submitted 9 November, 2021; v1 submitted 17 June, 2021; originally announced June 2021.

    Comments: Accepted as a full paper at 35th Conference on Neural Information Processing Systems (NeurIPS 2021) Track on Datasets and Benchmarks. (https://openreview.net/group?id=NeurIPS.cc/2021/Track/Datasets_and_Benchmarks/Round2)

  8. arXiv:2012.13292  [pdf, other

    cs.IR

    Understanding and Predicting Characteristics of Test Collections in Information Retrieval

    Authors: Md Mustafizur Rahman, Mucahid Kutlu, Matthew Lease

    Abstract: Research community evaluations in information retrieval, such as NIST's Text REtrieval Conference (TREC), build reusable test collections by pooling document rankings submitted by many teams. Naturally, the quality of the resulting test collection thus greatly depends on the number of participating teams and the quality of their submitted runs. In this work, we investigate: i) how the number of pa… ▽ More

    Submitted 5 June, 2022; v1 submitted 24 December, 2020; originally announced December 2020.

    Comments: Accepted as a full paper at iConference 2022

  9. arXiv:2005.09649  [pdf, other

    cs.SI cs.CL cs.CY

    Embeddings-Based Clustering for Target Specific Stances: The Case of a Polarized Turkey

    Authors: Ammar Rashed, Mucahid Kutlu, Kareem Darwish, Tamer Elsayed, Cansın Bayrak

    Abstract: On June 24, 2018, Turkey conducted a highly consequential election in which the Turkish people elected their president and parliament in the first election under a new presidential system. During the election period, the Turkish people extensively shared their political opinions on Twitter. One aspect of polarization among the electorate was support for or opposition to the reelection of Recep Tay… ▽ More

    Submitted 24 February, 2022; v1 submitted 19 May, 2020; originally announced May 2020.

    Comments: arXiv admin note: text overlap with arXiv:1909.10213

    Journal ref: ICWSM, vol. 15, no. 1, pp. 537-548, May 2021

  10. arXiv:2004.08166  [pdf, ps, other

    cs.CL

    Too Many Claims to Fact-Check: Prioritizing Political Claims Based on Check-Worthiness

    Authors: Yavuz Selim Kartal, Busra Guvenen, Mucahid Kutlu

    Abstract: The massive amount of misinformation spreading on the Internet on a daily basis has enormous negative impacts on societies. Therefore, we need automated systems helping fact-checkers in the combat against misinformation. In this paper, we propose a model prioritizing the claims based on their check-worthiness. We use BERT model with additional features including domain-specific controversial topic… ▽ More

    Submitted 14 February, 2021; v1 submitted 17 April, 2020; originally announced April 2020.

  11. arXiv:1909.10213  [pdf, ps, other

    cs.SI

    Embedding-based Qualitative Analysis of Polarization in Turkey

    Authors: Mucahid Kutlu, Kareem Darwish, Cansin Bayrak, Ammar Rashed, Tamer Elsayed

    Abstract: On June 24, 2018, Turkey conducted a highly-consequential election in which the Turkish people elected their president and parliament in the first election under a new presidential system. During the election period, the Turkish people extensively shared their political opinions on Twitter. One access of polarization among the electorate was support for or opposition to the reelection of Recep Tay… ▽ More

    Submitted 23 September, 2019; originally announced September 2019.

  12. arXiv:1807.06655  [pdf, other

    cs.SI

    Devam vs. Tamam: 2018 Turkish Elections

    Authors: Mucahid Kutlu, Kareem Darwish, Tamer Elsayed

    Abstract: On June 24, 2018, Turkey held a historical election, transforming its parliamentary system to a presidential one. One of the main questions for Turkish voters was whether to start this new political era with reelecting its long-time political leader Recep Tayyip Erdogan or not. In this paper, we analyzed 108M tweets posted in the two months leading to the election to understand the groups that sup… ▽ More

    Submitted 17 July, 2018; originally announced July 2018.

  13. arXiv:1806.00755  [pdf, other

    cs.IR

    Mix and Match: Collaborative Expert-Crowd Judging for Building Test Collections Accurately and Affordably

    Authors: Mucahid Kutlu, Tyler McDonnell, Aashish Sheshadri, Tamer Elsayed, Matthew Lease

    Abstract: Crowdsourcing offers an affordable and scalable means to collect relevance judgments for IR test collections. However, crowd assessors may show higher variance in judgment quality than trusted assessors. In this paper, we investigate how to effectively utilize both groups of assessors in partnership. We specifically investigate how agreement in judging is correlated with three factors: relevance c… ▽ More

    Submitted 9 June, 2018; v1 submitted 3 June, 2018; originally announced June 2018.

  14. arXiv:1802.00323  [pdf, other

    cs.IR

    Correlation and Prediction of Evaluation Metrics in Information Retrieval

    Authors: Mucahid Kutlu, Vivek Khetan, Matthew Lease

    Abstract: Because researchers typically do not have the time or space to present more than a few evaluation metrics in any published study, it can be difficult to assess relative effectiveness of prior methods for unreported metrics when baselining a new method or conducting a systematic meta-review. While sharing of study data would help alleviate this, recent attempts to encourage consistent sharing have… ▽ More

    Submitted 1 February, 2018; originally announced February 2018.

  15. Efficient Test Collection Construction via Active Learning

    Authors: Md Mustafizur Rahman, Mucahid Kutlu, Tamer Elsayed, Matthew Lease

    Abstract: To create a new IR test collection at low cost, it is valuable to carefully select which documents merit human relevance judgments. Shared task campaigns such as NIST TREC pool document rankings from many participating systems (and often interactive runs as well) in order to identify the most likely relevant documents for human judging. However, if one's primary goal is merely to build a test coll… ▽ More

    Submitted 4 August, 2020; v1 submitted 17 January, 2018; originally announced January 2018.

    Comments: Accepted as a full paper in ICTIR 2020. https://ictir2020.org/accepted-papers/

    ACM Class: H.3.3

  16. arXiv:1708.05517  [pdf, other

    cs.IR

    EveTAR: Building a Large-Scale Multi-Task Test Collection over Arabic Tweets

    Authors: Maram Hasanain, Reem Suwaileh, Tamer Elsayed, Mucahid Kutlu, Hind Almerekhi

    Abstract: This article introduces a new language-independent approach for creating a large-scale high-quality test collection of tweets that supports multiple information retrieval (IR) tasks without running a shared-task campaign. The adopted approach (demonstrated over Arabic tweets) designs the collection around significant (i.e., popular) events, which enables the development of topics that represent fr… ▽ More

    Submitted 21 August, 2017; v1 submitted 18 August, 2017; originally announced August 2017.

  17. arXiv:1701.07810  [pdf, other

    cs.IR

    Intelligent Topic Selection for Low-Cost Information Retrieval Evaluation: A New Perspective on Deep vs. Shallow Judging

    Authors: Mucahid Kutlu, Tamer Elsayed, Matthew Lease

    Abstract: While test collections provide the cornerstone for Cranfield-based evaluation of information retrieval (IR) systems, it has become practically infeasible to rely on traditional pooling techniques to construct test collections at the scale of today's massive document collections. In this paper, we propose a new intelligent topic selection method which reduces the number of search topics needed for… ▽ More

    Submitted 19 September, 2017; v1 submitted 26 January, 2017; originally announced January 2017.