(Translated by https://www.hiragana.jp/)
Search | arXiv e-print repository
Skip to main content

Showing 1–20 of 20 results for author: Milios, E

.
  1. arXiv:2406.11171  [pdf, other

    cs.CV cs.CL cs.LG

    SUGARCREPE++ Dataset: Vision-Language Model Sensitivity to Semantic and Lexical Alterations

    Authors: Sri Harsha Dumpala, Aman Jaiswal, Chandramouli Sastry, Evangelos Milios, Sageev Oore, Hassan Sajjad

    Abstract: Despite their remarkable successes, state-of-the-art large language models (LLMs), including vision-and-language models (VLMs) and unimodal language models (ULMs), fail to understand precise semantics. For example, semantically equivalent sentences expressed using different lexical compositions elicit diverging representations. The degree of this divergence and its impact on encoded semantics is n… ▽ More

    Submitted 18 June, 2024; v1 submitted 16 June, 2024; originally announced June 2024.

    Comments: Added the dataset link to the abstract

    MSC Class: 68T45; 68T50 ACM Class: I.2.7; I.2.10

  2. arXiv:2404.16365  [pdf, other

    cs.CL cs.AI

    VISLA Benchmark: Evaluating Embedding Sensitivity to Semantic and Lexical Alterations

    Authors: Sri Harsha Dumpala, Aman Jaiswal, Chandramouli Sastry, Evangelos Milios, Sageev Oore, Hassan Sajjad

    Abstract: Despite their remarkable successes, state-of-the-art language models face challenges in grasping certain important semantic details. This paper introduces the VISLA (Variance and Invariance to Semantic and Lexical Alterations) benchmark, designed to evaluate the semantic and lexical understanding of language models. VISLA presents a 3-way semantic (in)equivalence task with a triplet of sentences a… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

  3. arXiv:2404.05496  [pdf, other

    eess.SY

    Stability Mechanisms for Predictive Safety Filters

    Authors: Elias Milios, Kim Peter Wabersich, Felix Berkel, Lukas Schwenkel

    Abstract: Predictive safety filters enable the integration of potentially unsafe learning-based control approaches and humans into safety-critical systems. In addition to simple constraint satisfaction, many control problems involve additional stability requirements that may vary depending on the specific use case or environmental context. In this work, we address this problem by augmenting predictive safet… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

  4. arXiv:2403.14895  [pdf, other

    cs.CL cs.AI

    Stance Reasoner: Zero-Shot Stance Detection on Social Media with Explicit Reasoning

    Authors: Maksym Taranukhin, Vered Shwartz, Evangelos Milios

    Abstract: Social media platforms are rich sources of opinionated content. Stance detection allows the automatic extraction of users' opinions on various topics from such content. We focus on zero-shot stance detection, where the model's success relies on (a) having knowledge about the target topic; and (b) learning general reasoning strategies that can be employed for new topics. We present Stance Reasoner,… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

    Comments: Accepted to COLING 2024

  5. arXiv:2403.12678  [pdf, other

    cs.CL cs.AI

    Empowering Air Travelers: A Chatbot for Canadian Air Passenger Rights

    Authors: Maksym Taranukhin, Sahithya Ravi, Gabor Lukacs, Evangelos Milios, Vered Shwartz

    Abstract: The Canadian air travel sector has seen a significant increase in flight delays, cancellations, and other issues concerning passenger rights. Recognizing this demand, we present a chatbot to assist passengers and educate them about their rights. Our system breaks a complex user input into simple queries which are used to retrieve information from a collection of documents detailing air travel regu… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

    Comments: under review

  6. arXiv:2310.20558  [pdf, other

    cs.CL cs.AI

    Breaking the Token Barrier: Chunking and Convolution for Efficient Long Text Classification with BERT

    Authors: Aman Jaiswal, Evangelos Milios

    Abstract: Transformer-based models, specifically BERT, have propelled research in various NLP tasks. However, these models are limited to a maximum token limit of 512 tokens. Consequently, this makes it non-trivial to apply it in a practical setting with long input. Various complex methods have claimed to overcome this limit, but recent research questions the efficacy of these models across different classi… ▽ More

    Submitted 31 October, 2023; originally announced October 2023.

    Comments: 11 pages, 6 figures, submitted to NeurIPS 23

    ACM Class: I.2.7

  7. arXiv:2309.01015  [pdf, other

    cs.IR cs.LG

    MPTopic: Improving topic modeling via Masked Permuted pre-training

    Authors: Xinche Zhang, Evangelos milios

    Abstract: Topic modeling is pivotal in discerning hidden semantic structures within texts, thereby generating meaningful descriptive keywords. While innovative techniques like BERTopic and Top2Vec have recently emerged in the forefront, they manifest certain limitations. Our analysis indicates that these methods might not prioritize the refinement of their clustering mechanism, potentially compromising the… ▽ More

    Submitted 2 September, 2023; originally announced September 2023.

    Comments: 12 pages, will submit to ECIR 2024

  8. arXiv:2306.11832  [pdf, other

    cs.IR cs.CL

    QuOTeS: Query-Oriented Technical Summarization

    Authors: Juan Ramirez-Orta, Eduardo Xamena, Ana Maguitman, Axel J. Soto, Flavia P. Zanoto, Evangelos Milios

    Abstract: Abstract. When writing an academic paper, researchers often spend considerable time reviewing and summarizing papers to extract relevant citations and data to compose the Introduction and Related Work sections. To address this problem, we propose QuOTeS, an interactive system designed to retrieve sentences related to a summary of the research from a collection of potential references and hence ass… ▽ More

    Submitted 20 June, 2023; originally announced June 2023.

    Comments: Accepted at ICDAR 2023

  9. arXiv:2211.16752  [pdf, other

    cs.LG

    DimenFix: A novel meta-dimensionality reduction method for feature preservation

    Authors: Qiaodan Luo, Leonardo Christino, Fernando V Paulovich, Evangelos Milios

    Abstract: Dimensionality reduction has become an important research topic as demand for interpreting high-dimensional datasets has been increasing rapidly in recent years. There have been many dimensionality reduction methods with good performance in preserving the overall relationship among data points when mapping them to a lower-dimensional space. However, these existing methods fail to incorporate the d… ▽ More

    Submitted 30 November, 2022; originally announced November 2022.

  10. arXiv:2204.00585  [pdf, other

    cs.HC

    A Theoretical Approach for Structuring and Analysing Knowledge Provenance for Visual Analytics

    Authors: Leonardo Christino, Sima Rezaeipourfarsangi, Evangelos Milios, Fernando V. Paulovich

    Abstract: The primary goal of Visual Analytics (VA) is to enable user-guided knowledge generation. Theoretical VA works to explain how the different aspects of a VA tool bring forth new insights through user interactivity, which itself can be captured through tracking methods for reproduction or evaluation. However, the process of automatically capturing the user's thought process, such as intent and insigh… ▽ More

    Submitted 27 October, 2023; v1 submitted 1 April, 2022; originally announced April 2022.

    Comments: 14 pgs, submitted to Computer Graphics Forum 2023

    ACM Class: I.2.4; I.2.6

  11. arXiv:2109.06264  [pdf, other

    cs.CL

    Post-OCR Document Correction with large Ensembles of Character Sequence-to-Sequence Models

    Authors: Juan Ramirez-Orta, Eduardo Xamena, Ana Maguitman, Evangelos Milios, Axel J. Soto

    Abstract: In this paper, we propose a novel method based on character sequence-to-sequence models to correct documents already processed with Optical Character Recognition (OCR) systems. The main contribution of this paper is a set of strategies to accurately process strings much longer than the ones used to train the sequence model while being sample- and resource-efficient, supported by thorough experimen… ▽ More

    Submitted 24 January, 2022; v1 submitted 13 September, 2021; originally announced September 2021.

  12. arXiv:2106.03953  [pdf, other

    cs.CL cs.LG

    Neural Abstractive Unsupervised Summarization of Online News Discussions

    Authors: Ignacio Tampe Palma, Marcelo Mendoza, Evangelos Milios

    Abstract: Summarization has usually relied on gold standard summaries to train extractive or abstractive models. Social media brings a hurdle to summarization techniques since it requires addressing a multi-document multi-author approach. We address this challenging task by introducing a novel method that generates abstractive summaries of online news discussions. Our method extends a BERT-based architectur… ▽ More

    Submitted 18 June, 2021; v1 submitted 7 June, 2021; originally announced June 2021.

  13. arXiv:2104.05741  [pdf, other

    cs.LG cs.CL

    Active learning for medical code assignment

    Authors: Martha Dais Ferreira, Michal Malyska, Nicola Sahar, Riccardo Miotto, Fernando Paulovich, Evangelos Milios

    Abstract: Machine Learning (ML) is widely used to automatically extract meaningful information from Electronic Health Records (EHR) to support operational, clinical, and financial decision-making. However, ML models require a large number of annotated examples to provide satisfactory results, which is not possible in most healthcare scenarios due to the high cost of clinician-labeled data. Active Learning (… ▽ More

    Submitted 12 April, 2021; originally announced April 2021.

    Comments: It was accepted in the ACM CHIL 2021 workshop track

  14. arXiv:2104.02604  [pdf, other

    q-bio.BM cs.LG q-bio.QM

    Using Molecular Embeddings in QSAR Modeling: Does it Make a Difference?

    Authors: María Virginia Sabando, Ignacio Ponzoni, Evangelos E. Milios, Axel J. Soto

    Abstract: With the consolidation of deep learning in drug discovery, several novel algorithms for learning molecular representations have been proposed. Despite the interest of the community in developing new methods for learning molecular embeddings and their theoretical benefits, comparing molecular embeddings with each other and with traditional representations is not straightforward, which in turn hinde… ▽ More

    Submitted 28 July, 2021; v1 submitted 20 March, 2021; originally announced April 2021.

    Journal ref: Briefings in Bioinformatics, Volume 23, Issue 1, January 2022, bbab365

  15. arXiv:2008.10022  [pdf

    cs.CL cs.CY cs.IR cs.SI

    COVID-19 Pandemic: Identifying Key Issues using Social Media and Natural Language Processing

    Authors: Oladapo Oyebode, Chinenye Ndulue, Dinesh Mulchandani, Banuchitra Suruliraj, Ashfaq Adib, Fidelia Anulika Orji, Evangelos Milios, Stan Matwin, Rita Orji

    Abstract: The COVID-19 pandemic has affected people's lives in many ways. Social media data can reveal public perceptions and experience with respect to the pandemic, and also reveal factors that hamper or support efforts to curb global spread of the disease. In this paper, we analyzed COVID-19-related comments collected from six social media platforms using Natural Language Processing (NLP) techniques. We… ▽ More

    Submitted 23 August, 2020; originally announced August 2020.

    Comments: 12 pages, 7 figures, 3 tables

    Journal ref: Journal of Healthcare Informatics Research. 2022

  16. arXiv:2007.01379  [pdf, other

    cs.CL cs.LG

    Detecting Ongoing Events Using Contextual Word and Sentence Embeddings

    Authors: Mariano Maisonnave, Fernando Delbianco, Fernando Tohmé, Ana Maguitman, Evangelos Milios

    Abstract: This paper introduces the Ongoing Event Detection (OED) task, which is a specific Event Detection task where the goal is to detect ongoing event mentions only, as opposed to historical, future, hypothetical, or other forms or events that are neither fresh nor current. Any application that needs to extract structured information about ongoing events from unstructured texts can take advantage of an… ▽ More

    Submitted 5 February, 2021; v1 submitted 2 July, 2020; originally announced July 2020.

  17. arXiv:2001.11631  [pdf, ps, other

    cs.IR cs.CL cs.LG

    Enhancement of Short Text Clustering by Iterative Classification

    Authors: Md Rashadul Hasan Rakib, Norbert Zeh, Magdalena Jankowska, Evangelos Milios

    Abstract: Short text clustering is a challenging task due to the lack of signal contained in such short texts. In this work, we propose iterative classification as a method to b o ost the clustering quality (e.g., accuracy) of short texts. Given a clustering of short texts obtained using an arbitrary clustering algorithm, iterative classification applies outlier removal to obtain outlier-free clusters. Then… ▽ More

    Submitted 30 January, 2020; originally announced January 2020.

    Comments: 30 pages, 2 figures

  18. arXiv:1702.03470  [pdf, ps, other

    cs.CL

    Vector Embedding of Wikipedia Concepts and Entities

    Authors: Ehsan Sherkat, Evangelos Milios

    Abstract: Using deep learning for different machine learning tasks such as image classification and word embedding has recently gained many attentions. Its appealing performance reported across specific Natural Language Processing (NLP) tasks in comparison with other approaches is the reason for its popularity. Word embedding is the task of mapping words or phrases to a low dimensional numerical vector. In… ▽ More

    Submitted 11 February, 2017; originally announced February 2017.

  19. arXiv:1611.06950  [pdf, ps, other

    cs.CV cs.CL cs.LG

    Statistical Learning for OCR Text Correction

    Authors: Jie Mei, Aminul Islam, Yajing Wu, Abidalrahman Moh'd, Evangelos E. Milios

    Abstract: The accuracy of Optical Character Recognition (OCR) is crucial to the success of subsequent applications used in text analyzing pipeline. Recent models of OCR post-processing significantly improve the quality of OCR-generated text, but are still prone to suggest correction candidates from limited observations while insufficiently accounting for the characteristics of OCR errors. In this paper, we… ▽ More

    Submitted 21 November, 2016; originally announced November 2016.

  20. arXiv:1311.5978  [pdf, other

    cs.SI physics.soc-ph

    Event Evolution Tracking from Streaming Social Posts

    Authors: Pei Lee, Laks V. S. Lakshmanan, Evangelos E. Milios

    Abstract: Online social post streams such as Twitter timelines and forum discussions have emerged as important channels for information dissemination. They are noisy, informal, and surge quickly. Real life events, which may happen and evolve every minute, are perceived and circulated in post streams by social users. Intuitively, an event can be viewed as a dense cluster of posts with a life cycle sharing th… ▽ More

    Submitted 23 November, 2013; originally announced November 2013.