(Translated by https://www.hiragana.jp/)
Search | arXiv e-print repository
Skip to main content

Showing 1–21 of 21 results for author: Moreo, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2403.15123  [pdf, other

    cs.LG stat.ML

    Quantification using Permutation-Invariant Networks based on Histograms

    Authors: Olaya Pérez-Mon, Alejandro Moreo, Juan José del Coz, Pablo González

    Abstract: Quantification, also known as class prevalence estimation, is the supervised learning task in which a model is trained to predict the prevalence of each class in a given bag of examples. This paper investigates the application of deep neural networks to tasks of quantification in scenarios where it is possible to apply a symmetric supervised approach that eliminates the need for classification as… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

  2. arXiv:2403.11265  [pdf, other

    cs.LG cs.AI cs.CL

    Forging the Forger: An Attempt to Improve Authorship Verification via Data Augmentation

    Authors: Silvia Corbara, Alejandro Moreo

    Abstract: Authorship Verification (AV) is a text classification task concerned with inferring whether a candidate text has been written by one specific author or by someone else. It has been shown that many AV systems are vulnerable to adversarial attacks, where a malicious author actively tries to fool the classifier by either concealing their writing style, or by imitating the style of another author. In… ▽ More

    Submitted 17 March, 2024; originally announced March 2024.

  3. arXiv:2401.00490  [pdf, other

    cs.LG stat.ML

    Kernel Density Estimation for Multiclass Quantification

    Authors: Alejandro Moreo, Pablo González, Juan José del Coz

    Abstract: Several disciplines, like the social sciences, epidemiology, sentiment analysis, or market research, are interested in knowing the distribution of the classes in a population rather than the individual labels of the members thereof. Quantification is the supervised machine learning task concerned with obtaining accurate predictors of class prevalence, and to do so particularly in the presence of l… ▽ More

    Submitted 2 January, 2024; v1 submitted 31 December, 2023; originally announced January 2024.

    Comments: fixed broken references to appendices

  4. arXiv:2311.02237  [pdf, other

    cs.LG

    Explainable Authorship Identification in Cultural Heritage Applications: Analysis of a New Perspective

    Authors: Mattia Setzu, Silvia Corbara, Anna Monreale, Alejandro Moreo, Fabrizio Sebastiani

    Abstract: While a substantial amount of work has recently been devoted to enhance the performance of computational Authorship Identification (AId) systems, little to no attention has been paid to endowing AId systems with the ability to explain the reasons behind their predictions. This lacking substantially hinders the practical employment of AId methodologies, since the predictions returned by such system… ▽ More

    Submitted 3 November, 2023; originally announced November 2023.

  5. arXiv:2310.09210  [pdf, other

    cs.LG

    Regularization-Based Methods for Ordinal Quantification

    Authors: Mirko Bunse, Alejandro Moreo, Fabrizio Sebastiani, Martin Senz

    Abstract: Quantification, i.e., the task of training predictors of the class prevalence values in sets of unlabeled data items, has received increased attention in recent years. However, most quantification research has concentrated on developing algorithms for binary and multiclass problems in which the classes are not ordered. Here, we study the ordinal case, i.e., the case in which a total order is defin… ▽ More

    Submitted 13 October, 2023; originally announced October 2023.

    Comments: 45 pages

  6. arXiv:2310.04565  [pdf, other

    cs.LG cs.AI

    Binary Quantification and Dataset Shift: An Experimental Investigation

    Authors: Pablo González, Alejandro Moreo, Fabrizio Sebastiani

    Abstract: Quantification is the supervised learning task that consists of training predictors of the class prevalence values of sets of unlabelled data, and is of special interest when the labelled data on which the predictor has been trained and the unlabelled data are not IID, i.e., suffer from dataset shift. To date, quantification methods have mostly been tested only on a special case of dataset shift,… ▽ More

    Submitted 6 October, 2023; originally announced October 2023.

  7. arXiv:2301.09862  [pdf, other

    cs.LG cs.AI

    Same or Different? Diff-Vectors for Authorship Analysis

    Authors: Silvia Corbara, Alejandro Moreo, Fabrizio Sebastiani

    Abstract: We investigate the effects on authorship identification tasks of a fundamental shift in how to conceive the vectorial representations of documents that are given as input to a supervised learner. In ``classic'' authorship analysis a feature vector represents a document, the value of a feature represents (an increasing function of) the relative frequency of the feature in the document, and the clas… ▽ More

    Submitted 24 January, 2023; originally announced January 2023.

  8. arXiv:2211.08063  [pdf, other

    cs.LG

    Multi-Label Quantification

    Authors: Alejandro Moreo, Manuel Francisco, Fabrizio Sebastiani

    Abstract: Quantification, variously called "supervised prevalence estimation" or "learning to quantify", is the supervised learning task of generating predictors of the relative frequencies (a.k.a. "prevalence values") of the classes of interest in unlabelled data samples. While many quantification methods have been proposed in the past for binary problems and, to a lesser extent, single-label multiclass pr… ▽ More

    Submitted 15 November, 2022; originally announced November 2022.

  9. arXiv:2111.11249  [pdf, ps, other

    cs.LG cs.IR

    LeQua@CLEF2022: Learning to Quantify

    Authors: Andrea Esuli, Alejandro Moreo, Fabrizio Sebastiani

    Abstract: LeQua 2022 is a new lab for the evaluation of methods for "learning to quantify" in textual datasets, i.e., for training predictors of the relative frequencies of the classes of interest in sets of unlabelled textual documents. While these predictions could be easily achieved by first classifying all documents via a text classifier and then counting the numbers of documents assigned to the classes… ▽ More

    Submitted 11 December, 2021; v1 submitted 22 November, 2021; originally announced November 2021.

  10. arXiv:2110.14764  [pdf, other

    cs.CL cs.LG

    Generalized Funnelling: Ensemble Learning and Heterogeneous Document Embeddings for Cross-Lingual Text Classification

    Authors: Alejandro Moreo, Andrea Pedrotti, Fabrizio Sebastiani

    Abstract: \emph{Funnelling} (Fun) is a recently proposed method for cross-lingual text classification (CLTC) based on a two-tier learning ensemble for heterogeneous transfer learning (HTL). In this ensemble method, 1st-tier classifiers, each working on a different and language-dependent feature space, return a vector of calibrated posterior probabilities (with one dimension for each class) for each document… ▽ More

    Submitted 7 February, 2022; v1 submitted 17 September, 2021; originally announced October 2021.

  11. arXiv:2110.14203  [pdf, other

    cs.CL cs.LG

    Syllabic Quantity Patterns as Rhythmic Features for Latin Authorship Attribution

    Authors: Silvia Corbara, Alejandro Moreo, Fabrizio Sebastiani

    Abstract: It is well known that, within the Latin production of written text, peculiar metric schemes were followed not only in poetic compositions, but also in many prose works. Such metric patterns were based on so-called syllabic quantity, i.e., on the length of the involved syllables, and there is substantial evidence suggesting that certain authors had a preference for certain metric patterns over othe… ▽ More

    Submitted 27 October, 2021; originally announced October 2021.

  12. Measuring Fairness Under Unawareness of Sensitive Attributes: A Quantification-Based Approach

    Authors: Alessandro Fabris, Andrea Esuli, Alejandro Moreo, Fabrizio Sebastiani

    Abstract: Algorithms and models are increasingly deployed to inform decisions about people, inevitably affecting their lives. As a consequence, those in charge of developing these models must carefully evaluate their impact on different groups of people and favour group fairness, that is, ensure that groups determined by sensitive demographic attributes, such as race or sex, are not treated unjustly. To ach… ▽ More

    Submitted 27 March, 2023; v1 submitted 17 September, 2021; originally announced September 2021.

    Comments: Accepted for publication in the Journal of Artificial Intelligence Research

    Journal ref: Journal of Artificial Intelligence Research (JAIR) 76 (2023) 1117-1180

  13. arXiv:2106.11057  [pdf, other

    cs.LG cs.AI cs.CL

    QuaPy: A Python-Based Framework for Quantification

    Authors: Alejandro Moreo, Andrea Esuli, Fabrizio Sebastiani

    Abstract: QuaPy is an open-source framework for performing quantification (a.k.a. supervised prevalence estimation), written in Python. Quantification is the task of training quantifiers via supervised learning, where a quantifier is a predictor that estimates the relative frequencies (a.k.a. prevalence values) of the classes of interest in a sample of unlabelled data. While quantification can be trivially… ▽ More

    Submitted 18 June, 2021; originally announced June 2021.

  14. arXiv:2011.08091  [pdf, other

    cs.CL cs.AI cs.IR cs.LG

    Tweet Sentiment Quantification: An Experimental Re-Evaluation

    Authors: Alejandro Moreo, Fabrizio Sebastiani

    Abstract: Sentiment quantification is the task of training, by means of supervised learning, estimators of the relative frequency (also called ``prevalence'') of sentiment-related classes (such as \textsf{Positive}, \textsf{Neutral}, \textsf{Negative}) in a sample of unlabelled texts. This task is especially important when these texts are tweets, since the final goal of most sentiment classification efforts… ▽ More

    Submitted 17 September, 2021; v1 submitted 4 November, 2020; originally announced November 2020.

  15. arXiv:2011.02552  [pdf, ps, other

    cs.LG cs.AI cs.IR

    Re-Assessing the "Classify and Count" Quantification Method

    Authors: Alejandro Moreo, Fabrizio Sebastiani

    Abstract: Learning to quantify (a.k.a.\ quantification) is a task concerned with training unbiased estimators of class prevalence via supervised learning. This task originated with the observation that "Classify and Count" (CC), the trivial method of obtaining class prevalence estimates, is often a biased estimator, and thus delivers suboptimal quantification accuracy; following this observation, several me… ▽ More

    Submitted 22 January, 2021; v1 submitted 4 November, 2020; originally announced November 2020.

    Comments: This is the final version of the paper, identical to the one that is going to appear on the Proceedings of the 43rd European Conference on Information Retrieval (ECIR 2021)

    Journal ref: Final version published in Proceedings of the 43rd European Conference on Information Retrieval (ECIR 2021), Lucca, IT, 2021, pp. 75-91

  16. arXiv:2006.12289  [pdf, ps, other

    cs.CL cs.IR

    MedLatinEpi and MedLatinLit: Two Datasets for the Computational Authorship Analysis of Medieval Latin Texts

    Authors: Silvia Corbara, Alejandro Moreo, Fabrizio Sebastiani, Mirko Tavoni

    Abstract: We present and make available MedLatinEpi and MedLatinLit, two datasets of medieval Latin texts to be used in research on computational authorship analysis. MedLatinEpi and MedLatinLit consist of 294 and 30 curated texts, respectively, labelled by author; MedLatinEpi texts are of epistolary nature, while MedLatinLit texts consist of literary comments and treatises about various subjects. As such,… ▽ More

    Submitted 11 September, 2021; v1 submitted 22 June, 2020; originally announced June 2020.

    Journal ref: Forthcoming in the ACM Journal of Computing and Cultural Heritage, 2021

  17. arXiv:1911.11506  [pdf, other

    cs.LG cs.CL stat.ML

    Word-Class Embeddings for Multiclass Text Classification

    Authors: Alejandro Moreo, Andrea Esuli, Fabrizio Sebastiani

    Abstract: Pre-trained word embeddings encode general word semantics and lexical regularities of natural language, and have proven useful across many NLP tasks, including word sense disambiguation, machine translation, and sentiment analysis, to name a few. In supervised tasks such as multiclass text classification (the focus of this article) it seems appealing to enhance word representations with ad-hoc emb… ▽ More

    Submitted 26 November, 2019; originally announced November 2019.

    Journal ref: Final version published in Data Mining and Knowledge Discovery 35(3), 911-963, 2021

  18. arXiv:1904.07965  [pdf, ps, other

    cs.LG cs.IR stat.ML

    Cross-Lingual Sentiment Quantification

    Authors: Andrea Esuli, Alejandro Moreo, Fabrizio Sebastiani

    Abstract: \emph{Sentiment Quantification} (i.e., the task of estimating the relative frequency of sentiment-related classes -- such as \textsf{Positive} and \textsf{Negative} -- in a set of unlabelled documents) is an important topic in sentiment analysis, as the study of sentiment-related quantities and trends across a population is often of higher interest than the analysis of individual instances. In thi… ▽ More

    Submitted 7 July, 2020; v1 submitted 16 April, 2019; originally announced April 2019.

    Comments: Identical to previous version, but for the abstract, which is now identical to the one in the published version

    Journal ref: Final version published in IEEE Intelligent Systems 35(3):106-114, 2020

  19. arXiv:1903.12110  [pdf, other

    cs.IR cs.LG

    Building Automated Survey Coders via Interactive Machine Learning

    Authors: Andrea Esuli, Alejandro Moreo, Fabrizio Sebastiani

    Abstract: Software systems trained via machine learning to automatically classify open-ended answers (a.k.a. verbatims) are by now a reality. Still, their adoption in the survey coding industry has been less widespread than it might have been. Among the factors that have hindered a more massive takeup of this technology are the effort involved in manually coding a sufficient amount of training data, the fac… ▽ More

    Submitted 28 March, 2019; originally announced March 2019.

    Comments: To appear in the International Journal of Market Research

    Journal ref: Final version published in International Journal of Market Research, 61(4):408-429, 2019

  20. arXiv:1901.11459  [pdf, other

    cs.LG cs.AI cs.IR stat.ML

    Funnelling: A New Ensemble Method for Heterogeneous Transfer Learning and its Application to Cross-Lingual Text Classification

    Authors: Andrea Esuli, Alejandro Moreo, Fabrizio Sebastiani

    Abstract: Cross-lingual Text Classification (CLC) consists of automatically classifying, according to a common set C of classes, documents each written in one of a set of languages L, and doing so more accurately than when naively classifying each document via its corresponding language-specific classifier. In order to obtain an increase in the classification accuracy for a given language, the system thus n… ▽ More

    Submitted 16 April, 2019; v1 submitted 31 January, 2019; originally announced January 2019.

    Comments: 28 pages, 4 figures

    Journal ref: Final version published in ACM Transactions on Information Systems 37(3), 37:1-37:30, 2019

  21. arXiv:1810.09311  [pdf, other

    cs.CL cs.LG stat.ML

    Revisiting Distributional Correspondence Indexing: A Python Reimplementation and New Experiments

    Authors: Alejandro Moreo, Andrea Esuli, Fabrizio Sebastiani

    Abstract: This paper introduces PyDCI, a new implementation of Distributional Correspondence Indexing (DCI) written in Python. DCI is a transfer learning method for cross-domain and cross-lingual text classification for which we had provided an implementation (here called JaDCI) built on top of JaTeCS, a Java framework for text classification. PyDCI is a stand-alone version of DCI that exploits scikit-learn… ▽ More

    Submitted 19 October, 2018; originally announced October 2018.