Search | arXiv e-print repository

LongHealth: A Question Answering Benchmark with Long Clinical Documents

Authors: Lisa Adams, Felix Busch, Tianyu Han, Jean-Baptiste Excoffier, Matthieu Ortala, Alexander Löser, Hugo JWL. Aerts, Jakob Nikolas Kather, Daniel Truhn, Keno Bressem

Abstract: Background: Recent advancements in large language models (LLMs) offer potential benefits in healthcare, particularly in processing extensive patient records. However, existing benchmarks do not fully assess LLMs' capability in handling real-world, lengthy clinical data. Methods: We present the LongHealth benchmark, comprising 20 detailed fictional patient cases across various diseases, with each… ▽ More Background: Recent advancements in large language models (LLMs) offer potential benefits in healthcare, particularly in processing extensive patient records. However, existing benchmarks do not fully assess LLMs' capability in handling real-world, lengthy clinical data. Methods: We present the LongHealth benchmark, comprising 20 detailed fictional patient cases across various diseases, with each case containing 5,090 to 6,754 words. The benchmark challenges LLMs with 400 multiple-choice questions in three categories: information extraction, negation, and sorting, challenging LLMs to extract and interpret information from large clinical documents. Results: We evaluated nine open-source LLMs with a minimum of 16,000 tokens and also included OpenAI's proprietary and cost-efficient GPT-3.5 Turbo for comparison. The highest accuracy was observed for Mixtral-8x7B-Instruct-v0.1, particularly in tasks focused on information retrieval from single and multiple patient documents. However, all models struggled significantly in tasks requiring the identification of missing information, highlighting a critical area for improvement in clinical data interpretation. Conclusion: While LLMs show considerable potential for processing long clinical documents, their current accuracy levels are insufficient for reliable clinical use, especially in scenarios requiring the identification of missing information. The LongHealth benchmark provides a more realistic assessment of LLMs in a healthcare setting and highlights the need for further model refinement for safe and effective clinical application. We make the benchmark and evaluation code publicly available. △ Less

Submitted 25 January, 2024; originally announced January 2024.

Comments: 11 pages, 3 figures, 5 tables

arXiv:2304.08247 [pdf, ps, other]

MedAlpaca -- An Open-Source Collection of Medical Conversational AI Models and Training Data

Authors: Tianyu Han, Lisa C. Adams, Jens-Michalis Papaioannou, Paul Grundmann, Tom Oberhauser, Alexander Löser, Daniel Truhn, Keno K. Bressem

Abstract: As large language models (LLMs) like OpenAI's GPT series continue to make strides, we witness the emergence of artificial intelligence applications in an ever-expanding range of fields. In medicine, these LLMs hold considerable promise for improving medical workflows, diagnostics, patient care, and education. Yet, there is an urgent need for open-source models that can be deployed on-premises to s… ▽ More As large language models (LLMs) like OpenAI's GPT series continue to make strides, we witness the emergence of artificial intelligence applications in an ever-expanding range of fields. In medicine, these LLMs hold considerable promise for improving medical workflows, diagnostics, patient care, and education. Yet, there is an urgent need for open-source models that can be deployed on-premises to safeguard patient privacy. In our work, we present an innovative dataset consisting of over 160,000 entries, specifically crafted to fine-tune LLMs for effective medical applications. We investigate the impact of fine-tuning these datasets on publicly accessible pre-trained LLMs, and subsequently, we juxtapose the performance of pre-trained-only models against the fine-tuned models concerning the examinations that future medical doctors must pass to achieve certification. △ Less

Submitted 4 October, 2023; v1 submitted 14 April, 2023; originally announced April 2023.

arXiv:2303.08179 [pdf, other]

doi 10.1016/j.eswa.2023.121598

MEDBERT.de: A Comprehensive German BERT Model for the Medical Domain

Authors: Keno K. Bressem, Jens-Michalis Papaioannou, Paul Grundmann, Florian Borchert, Lisa C. Adams, Leonhard Liu, Felix Busch, Lina Xu, Jan P. Loyen, Stefan M. Niehues, Moritz Augustin, Lennart Grosser, Marcus R. Makowski, Hugo JWL. Aerts, Alexander Löser

Abstract: This paper presents medBERTde, a pre-trained German BERT model specifically designed for the German medical domain. The model has been trained on a large corpus of 4.7 Million German medical documents and has been shown to achieve new state-of-the-art performance on eight different medical benchmarks covering a wide range of disciplines and medical document types. In addition to evaluating the ove… ▽ More This paper presents medBERTde, a pre-trained German BERT model specifically designed for the German medical domain. The model has been trained on a large corpus of 4.7 Million German medical documents and has been shown to achieve new state-of-the-art performance on eight different medical benchmarks covering a wide range of disciplines and medical document types. In addition to evaluating the overall performance of the model, this paper also conducts a more in-depth analysis of its capabilities. We investigate the impact of data deduplication on the model's performance, as well as the potential benefits of using more efficient tokenization methods. Our results indicate that domain-specific models such as medBERTde are particularly useful for longer texts, and that deduplication of training data does not necessarily lead to improved performance. Furthermore, we found that efficient tokenization plays only a minor role in improving model performance, and attribute most of the improved performance to the large amount of training data. To encourage further research, the pre-trained model weights and new benchmarks based on radiological data are made publicly available for use by the scientific community. △ Less

Submitted 24 March, 2023; v1 submitted 14 March, 2023; originally announced March 2023.

Comments: Keno K. Bressem and Jens-Michalis Papaioannou and Paul Grundmann contributed equally

Journal ref: Expert Systems with Applications 2024;237(21):121598

arXiv:2210.08500 [pdf, other]

This Patient Looks Like That Patient: Prototypical Networks for Interpretable Diagnosis Prediction from Clinical Text

Authors: Betty van Aken, Jens-Michalis Papaioannou, Marcel G. Naik, Georgios Eleftheriadis, Wolfgang Nejdl, Felix A. Gers, Alexander Löser

Abstract: The use of deep neural models for diagnosis prediction from clinical text has shown promising results. However, in clinical practice such models must not only be accurate, but provide doctors with interpretable and helpful results. We introduce ProtoPatient, a novel method based on prototypical networks and label-wise attention with both of these abilities. ProtoPatient makes predictions based on… ▽ More The use of deep neural models for diagnosis prediction from clinical text has shown promising results. However, in clinical practice such models must not only be accurate, but provide doctors with interpretable and helpful results. We introduce ProtoPatient, a novel method based on prototypical networks and label-wise attention with both of these abilities. ProtoPatient makes predictions based on parts of the text that are similar to prototypical patients - providing justifications that doctors understand. We evaluate the model on two publicly available clinical datasets and show that it outperforms existing baselines. Quantitative and qualitative evaluations with medical doctors further demonstrate that the model provides valuable explanations for clinical decision support. △ Less

Submitted 16 October, 2022; originally announced October 2022.

Comments: AACL-IJCNLP 2022 Main Conference (Long Paper)

arXiv:2208.01912 [pdf, other]

Cross-Lingual Knowledge Transfer for Clinical Phenotyping

Authors: Jens-Michalis Papaioannou, Paul Grundmann, Betty van Aken, Athanasios Samaras, Ilias Kyparissidis, George Giannakoulas, Felix Gers, Alexander Löser

Abstract: Clinical phenotyping enables the automatic extraction of clinical conditions from patient records, which can be beneficial to doctors and clinics worldwide. However, current state-of-the-art models are mostly applicable to clinical notes written in English. We therefore investigate cross-lingual knowledge transfer strategies to execute this task for clinics that do not use the English language and… ▽ More Clinical phenotyping enables the automatic extraction of clinical conditions from patient records, which can be beneficial to doctors and clinics worldwide. However, current state-of-the-art models are mostly applicable to clinical notes written in English. We therefore investigate cross-lingual knowledge transfer strategies to execute this task for clinics that do not use the English language and have a small amount of in-domain data available. We evaluate these strategies for a Greek and a Spanish clinic leveraging clinical notes from different clinical domains such as cardiology, oncology and the ICU. Our results reveal two strategies that outperform the state-of-the-art: Translation-based methods in combination with domain-specific encoders and cross-lingual encoders plus adapters. We find that these strategies perform especially well for classifying rare phenotypes and we advise on which method to prefer in which situation. Our results show that using multilingual data overall improves clinical phenotyping models and can compensate for data sparseness. △ Less

Submitted 3 August, 2022; originally announced August 2022.

Comments: LREC 2022 submmision: January 2022

ACM Class: I.2.7; J.3

Journal ref: Proceedings of the Language Resources and Evaluation Conference. 2022; 900-909

arXiv:2111.15512 [pdf, other]

What Do You See in this Patient? Behavioral Testing of Clinical NLP Models

Authors: Betty van Aken, Sebastian Herrmann, Alexander Löser

Abstract: Decision support systems based on clinical notes have the potential to improve patient care by pointing doctors towards overseen risks. Predicting a patient's outcome is an essential part of such systems, for which the use of deep neural networks has shown promising results. However, the patterns learned by these networks are mostly opaque and previous work revealed flaws regarding the reproductio… ▽ More Decision support systems based on clinical notes have the potential to improve patient care by pointing doctors towards overseen risks. Predicting a patient's outcome is an essential part of such systems, for which the use of deep neural networks has shown promising results. However, the patterns learned by these networks are mostly opaque and previous work revealed flaws regarding the reproduction of unintended biases. We thus introduce an extendable testing framework that evaluates the behavior of clinical outcome models regarding changes of the input. The framework helps to understand learned patterns and their influence on model decisions. In this work, we apply it to analyse the change in behavior with regard to the patient characteristics gender, age and ethnicity. Our evaluation of three current clinical NLP models demonstrates the concrete effects of these characteristics on the models' decisions. They show that model behavior varies drastically even when fine-tuned on the same data and that allegedly best-performing models have not always learned the most medically plausible patterns. △ Less

Submitted 30 November, 2021; originally announced November 2021.

Comments: NeurIPS 2021 Research2Clinics Workshop, Bridging the Gap: From Machine Learning Research to Clinical Practice

arXiv:2108.00775 [pdf, other]

Self-supervised Answer Retrieval on Clinical Notes

Authors: Paul Grundmann, Sebastian Arnold, Alexander Löser

Abstract: Retrieving answer passages from long documents is a complex task requiring semantic understanding of both discourse and document context. We approach this challenge specifically in a clinical scenario, where doctors retrieve cohorts of patients based on diagnoses and other latent medical aspects. We introduce CAPR, a rule-based self-supervision objective for training Transformer language models fo… ▽ More Retrieving answer passages from long documents is a complex task requiring semantic understanding of both discourse and document context. We approach this challenge specifically in a clinical scenario, where doctors retrieve cohorts of patients based on diagnoses and other latent medical aspects. We introduce CAPR, a rule-based self-supervision objective for training Transformer language models for domain-specific passage matching. In addition, we contribute a novel retrieval dataset based on clinical notes to simulate this scenario on a large corpus of clinical notes. We apply our objective in four Transformer-based architectures: Contextual Document Vectors, Bi-, Poly- and Cross-encoders. From our extensive evaluation on MIMIC-III and three other healthcare datasets, we report that CAPR outperforms strong baselines in the retrieval of domain-specific passages and effectively generalizes across rule-based and human-labeled passages. This makes the model powerful especially in zero-shot scenarios where only limited training data is available. △ Less

Submitted 2 August, 2021; originally announced August 2021.

arXiv:2103.15509 [pdf, other]

Automatic Clustering in Hyrise

Authors: Alexander Löser

Abstract: Physical data layout is an important performance factor for modern databases. Clustering, i.e., storing similar values in proximity, can lead to performance gains in several ways. We present an automated model to determine beneficial clustering columns and a clustering algorithm for the column-oriented, memory-resident database Hyrise. To automatically select clustering columns, the model analyzes… ▽ More Physical data layout is an important performance factor for modern databases. Clustering, i.e., storing similar values in proximity, can lead to performance gains in several ways. We present an automated model to determine beneficial clustering columns and a clustering algorithm for the column-oriented, memory-resident database Hyrise. To automatically select clustering columns, the model analyzes the database's workload and provides estimates by how much certain clustering columns would impact the workload's latency. We evaluate the precision of the model's estimates, as well as the overall quality of its clustering suggestions. To apply a determined clustering configuration, we developed an online clustering algorithm. The clustering algorithm supports an arbitrary number of clustering dimensions. We show that the algorithm is robust against concurrently running data modifying queries. We obtain a 5% latency reduction for the TPC-H benchmark when clustering the lineitem table and a 4% latency reduction for the TPC-DS benchmark when clustering the store_sales table. △ Less

Submitted 29 March, 2021; originally announced March 2021.

arXiv:2102.04110 [pdf, other]

Clinical Outcome Prediction from Admission Notes using Self-Supervised Knowledge Integration

Authors: Betty van Aken, Jens-Michalis Papaioannou, Manuel Mayrdorfer, Klemens Budde, Felix A. Gers, Alexander Löser

Abstract: Outcome prediction from clinical text can prevent doctors from overlooking possible risks and help hospitals to plan capacities. We simulate patients at admission time, when decision support can be especially valuable, and contribute a novel admission to discharge task with four common outcome prediction targets: Diagnoses at discharge, procedures performed, in-hospital mortality and length-of-sta… ▽ More Outcome prediction from clinical text can prevent doctors from overlooking possible risks and help hospitals to plan capacities. We simulate patients at admission time, when decision support can be especially valuable, and contribute a novel admission to discharge task with four common outcome prediction targets: Diagnoses at discharge, procedures performed, in-hospital mortality and length-of-stay prediction. The ideal system should infer outcomes based on symptoms, pre-conditions and risk factors of a patient. We evaluate the effectiveness of language models to handle this scenario and propose clinical outcome pre-training to integrate knowledge about patient outcomes from multiple public sources. We further present a simple method to incorporate ICD code hierarchy into the models. We show that our approach improves performance on the outcome tasks against several baselines. A detailed analysis reveals further strengths of the model, including transferability, but also weaknesses such as handling of vital values and inconsistencies in the underlying data. △ Less

Submitted 8 February, 2021; originally announced February 2021.

Comments: EACL 2021

arXiv:2011.04507 [pdf, other]

VisBERT: Hidden-State Visualizations for Transformers

Authors: Betty van Aken, Benjamin Winter, Alexander Löser, Felix A. Gers

Abstract: Explainability and interpretability are two important concepts, the absence of which can and should impede the application of well-performing neural networks to real-world problems. At the same time, they are difficult to incorporate into the large, black-box models that achieve state-of-the-art results in a multitude of NLP tasks. Bidirectional Encoder Representations from Transformers (BERT) is… ▽ More Explainability and interpretability are two important concepts, the absence of which can and should impede the application of well-performing neural networks to real-world problems. At the same time, they are difficult to incorporate into the large, black-box models that achieve state-of-the-art results in a multitude of NLP tasks. Bidirectional Encoder Representations from Transformers (BERT) is one such black-box model. It has become a staple architecture to solve many different NLP tasks and has inspired a number of related Transformer models. Understanding how these models draw conclusions is crucial for both their improvement and application. We contribute to this challenge by presenting VisBERT, a tool for visualizing the contextual token representations within BERT for the task of (multi-hop) Question Answering. Instead of analyzing attention weights, we focus on the hidden states resulting from each encoder block within the BERT model. This way we can observe how the semantic representations are transformed throughout the layers of the model. VisBERT enables users to get insights about the model's internal state and to explore its inference steps or potential shortcomings. The tool allows us to identify distinct phases in BERT's transformations that are similar to a traditional NLP pipeline and offer insights during failed predictions. △ Less

Submitted 9 November, 2020; originally announced November 2020.

Comments: Published in WWW '20: Companion Proceedings of the Web Conference 2020

Journal ref: Companion Proceedings of the Web Conference 2020

arXiv:2002.00835 [pdf, other]

doi 10.1145/3366423.3380208

Learning Contextualized Document Representations for Healthcare Answer Retrieval

Authors: Sebastian Arnold, Betty van Aken, Paul Grundmann, Felix A. Gers, Alexander Löser

Abstract: We present Contextual Discourse Vectors (CDV), a distributed document representation for efficient answer retrieval from long healthcare documents. Our approach is based on structured query tuples of entities and aspects from free text and medical taxonomies. Our model leverages a dual encoder architecture with hierarchical LSTM layers and multi-task training to encode the position of clinical ent… ▽ More We present Contextual Discourse Vectors (CDV), a distributed document representation for efficient answer retrieval from long healthcare documents. Our approach is based on structured query tuples of entities and aspects from free text and medical taxonomies. Our model leverages a dual encoder architecture with hierarchical LSTM layers and multi-task training to encode the position of clinical entities and aspects alongside the document discourse. We use our continuous representations to resolve queries with short latency using approximate nearest neighbor search on sentence level. We apply the CDV model for retrieving coherent answer passages from nine English public health resources from the Web, addressing both patients and medical professionals. Because there is no end-to-end training data available for all application scenarios, we train our model with self-supervised data from Wikipedia. We show that our generalized model significantly outperforms several state-of-the-art baselines for healthcare passage ranking and is able to adapt to heterogeneous domains without additional fine-tuning. △ Less

Submitted 3 February, 2020; originally announced February 2020.

Comments: The Web Conference 2020 (WWW '20)

arXiv:1911.02453 [pdf, other]

From Symmetry to Asymmetry: Generalizing TSP Approximations by Parametrization

Authors: Lukas Behrendt, Katrin Casel, Tobias Friedrich, J. A. Gregor Lagodzinski, Alexander Löser, Marcus Wilhelm

Abstract: We generalize the tree doubling and Christofides algorithm, the two most common approximations for TSP, to parameterized approximations for ATSP. The parameters we consider for the respective parameterizations are upper bounded by the number of asymmetric distances in the given instance, which yields algorithms to efficiently compute constant factor approximations also for moderately asymmetric TS… ▽ More We generalize the tree doubling and Christofides algorithm, the two most common approximations for TSP, to parameterized approximations for ATSP. The parameters we consider for the respective parameterizations are upper bounded by the number of asymmetric distances in the given instance, which yields algorithms to efficiently compute constant factor approximations also for moderately asymmetric TSP instances. As generalization of the Christofides algorithm, we derive a parameterized 2.5-approximation, where the parameter is the size of a vertex cover for the subgraph induced by the asymmetric edges. Our generalization of the tree doubling algorithm gives a parameterized 3-approximation, where the parameter is the number of asymmetric edges in a given minimum spanning arborescence. Both algorithms are also stated in the form of additive lossy kernelizations, which allows to combine them with known polynomial time approximations for ATSP. Further, we combine them with a notion of symmetry relaxation which allows to trade approximation guarantee for runtime. We complement our results by experimental evaluations, which show that generalized tree-doubling frequently outperforms generalized Christofides with respect to parameter size. △ Less

Submitted 26 February, 2020; v1 submitted 6 November, 2019; originally announced November 2019.

arXiv:1909.04925 [pdf, other]

doi 10.1145/3357384.3358028

How Does BERT Answer Questions? A Layer-Wise Analysis of Transformer Representations

Authors: Betty van Aken, Benjamin Winter, Alexander Löser, Felix A. Gers

Abstract: Bidirectional Encoder Representations from Transformers (BERT) reach state-of-the-art results in a variety of Natural Language Processing tasks. However, understanding of their internal functioning is still insufficient and unsatisfactory. In order to better understand BERT and other Transformer-based models, we present a layer-wise analysis of BERT's hidden states. Unlike previous research, which… ▽ More Bidirectional Encoder Representations from Transformers (BERT) reach state-of-the-art results in a variety of Natural Language Processing tasks. However, understanding of their internal functioning is still insufficient and unsatisfactory. In order to better understand BERT and other Transformer-based models, we present a layer-wise analysis of BERT's hidden states. Unlike previous research, which mainly focuses on explaining Transformer models by their attention weights, we argue that hidden states contain equally valuable information. Specifically, our analysis focuses on models fine-tuned on the task of Question Answering (QA) as an example of a complex downstream task. We inspect how QA models transform token vectors in order to find the correct answer. To this end, we apply a set of general and QA-specific probing tasks that reveal the information stored in each representation layer. Our qualitative analysis of hidden state visualizations provides additional insights into BERT's reasoning process. Our results show that the transformations within BERT go through phases that are related to traditional pipeline tasks. The system can therefore implicitly incorporate task-specific information into its token representations. Furthermore, our analysis reveals that fine-tuning has little impact on the models' semantic abilities and that prediction errors can be recognized in the vector representations of even early layers. △ Less

Submitted 11 September, 2019; originally announced September 2019.

Comments: Accepted at CIKM 2019

arXiv:1902.04793 [pdf, other]

SECTOR: A Neural Model for Coherent Topic Segmentation and Classification

Authors: Sebastian Arnold, Rudolf Schneider, Philippe Cudré-Mauroux, Felix A. Gers, Alexander Löser

Abstract: When searching for information, a human reader first glances over a document, spots relevant sections and then focuses on a few sentences for resolving her intention. However, the high variance of document structure complicates to identify the salient topic of a given section at a glance. To tackle this challenge, we present SECTOR, a model to support machine reading systems by segmenting document… ▽ More When searching for information, a human reader first glances over a document, spots relevant sections and then focuses on a few sentences for resolving her intention. However, the high variance of document structure complicates to identify the salient topic of a given section at a glance. To tackle this challenge, we present SECTOR, a model to support machine reading systems by segmenting documents into coherent sections and assigning topic labels to each section. Our deep neural network architecture learns a latent topic embedding over the course of a document. This can be leveraged to classify local topics from plain text and segment a document at topic shifts. In addition, we contribute WikiSection, a publicly available dataset with 242k labeled sections in English and German from two distinct domains: diseases and cities. From our extensive evaluation of 20 architectures, we report a highest score of 71.6% F1 for the segmentation and classification of 30 topics from the English city domain, scored by our SECTOR LSTM model with bloom filter embeddings and bidirectional segmentation. This is a significant improvement of 29.5 points F1 compared to state-of-the-art CNN classifiers with baseline segmentation. △ Less

Submitted 13 February, 2019; originally announced February 2019.

Comments: Author's final version, accepted for publication at TACL, 2019

arXiv:1809.07572 [pdf, other]

Challenges for Toxic Comment Classification: An In-Depth Error Analysis

Authors: Betty van Aken, Julian Risch, Ralf Krestel, Alexander Löser

Abstract: Toxic comment classification has become an active research field with many recently proposed approaches. However, while these approaches address some of the task's challenges others still remain unsolved and directions for further research are needed. To this end, we compare different deep learning and shallow approaches on a new, large comment dataset and propose an ensemble that outperforms all… ▽ More Toxic comment classification has become an active research field with many recently proposed approaches. However, while these approaches address some of the task's challenges others still remain unsolved and directions for further research are needed. To this end, we compare different deep learning and shallow approaches on a new, large comment dataset and propose an ensemble that outperforms all individual models. Further, we validate our findings on a second dataset. The results of the ensemble enable us to perform an extensive error analysis, which reveals open challenges for state-of-the-art methods and directions towards pending future research. These challenges include missing paradigmatic context and inconsistent dataset labels. △ Less

Submitted 20 September, 2018; originally announced September 2018.

Comments: ALW2: 2nd Workshop on Abusive Language Online to be held at EMNLP 2018 (Brussels, Belgium), October 31st, 2018

arXiv:1805.09648 [pdf, other]

Crowd-Labeling Fashion Reviews with Quality Control

Authors: Iurii Chernushenko, Felix A. Gers, Alexander Löser, Alessandro Checco

Abstract: We present a new methodology for high-quality labeling in the fashion domain with crowd workers instead of experts. We focus on the Aspect-Based Sentiment Analysis task. Our methods filter out inaccurate input from crowd workers but we preserve different worker labeling to capture the inherent high variability of the opinions. We demonstrate the quality of labeled data based on Facebook's FastText… ▽ More We present a new methodology for high-quality labeling in the fashion domain with crowd workers instead of experts. We focus on the Aspect-Based Sentiment Analysis task. Our methods filter out inaccurate input from crowd workers but we preserve different worker labeling to capture the inherent high variability of the opinions. We demonstrate the quality of labeled data based on Facebook's FastText framework as a baseline. △ Less

Submitted 5 April, 2018; originally announced May 2018.

arXiv:1803.04884 [pdf, other]

IDEL: In-Database Entity Linking with Neural Embeddings

Authors: Torsten Kilias, Alexander Löser, Felix A. Gers, Richard Koopmanschap, Ying Zhang, Martin Kersten

Abstract: We present a novel architecture, In-Database Entity Linking (IDEL), in which we integrate the analytics-optimized RDBMS MonetDB with neural text mining abilities. Our system design abstracts core tasks of most neural entity linking systems for MonetDB. To the best of our knowledge, this is the first defacto implemented system integrating entity-linking in a database. We leverage the ability of Mon… ▽ More We present a novel architecture, In-Database Entity Linking (IDEL), in which we integrate the analytics-optimized RDBMS MonetDB with neural text mining abilities. Our system design abstracts core tasks of most neural entity linking systems for MonetDB. To the best of our knowledge, this is the first defacto implemented system integrating entity-linking in a database. We leverage the ability of MonetDB to support in-database-analytics with user defined functions (UDFs) implemented in Python. These functions call machine learning libraries for neural text mining, such as TensorFlow. The system achieves zero cost for data shipping and transformation by utilizing MonetDB's ability to embed Python processes in the database kernel and exchange data in NumPy arrays. IDEL represents text and relational data in a joint vector space with neural embeddings and can compensate errors with ambiguous entity representations. For detecting matching entities, we propose a novel similarity function based on joint neural embeddings which are learned via minimizing pairwise contrastive ranking loss. This function utilizes a high dimensional index structures for fast retrieval of matching entities. Our first implementation and experiments using the WebNLG corpus show the effectiveness and the potentials of IDEL. △ Less

Submitted 13 March, 2018; originally announced March 2018.

Comments: This manuscript is a preprint for a paper submitted to VLDB2018

arXiv:1710.09788 [pdf, other]

FashionBrain Project: A Vision for Understanding Europe's Fashion Data Universe

Authors: Alessandro Checco, Gianluca Demartini, Alexander Loeser, Ines Arous, Mourad Khayati, Matthias Dantone, Richard Koopmanschap, Svetlin Stalinov, Martin Kersten, Ying Zhang

Abstract: A core business in the fashion industry is the understanding and prediction of customer needs and trends. Search engines and social networks are at the same time a fundamental bridge and a costly middleman between the customer's purchase intention and the retailer. To better exploit Europe's distinctive characteristics e.g., multiple languages, fashion and cultural differences, it is pivotal to re… ▽ More A core business in the fashion industry is the understanding and prediction of customer needs and trends. Search engines and social networks are at the same time a fundamental bridge and a costly middleman between the customer's purchase intention and the retailer. To better exploit Europe's distinctive characteristics e.g., multiple languages, fashion and cultural differences, it is pivotal to reduce retailers' dependence to search engines. This goal can be achieved by harnessing various data channels (manufacturers and distribution networks, online shops, large retailers, social media, market observers, call centers, press/magazines etc.) that retailers can leverage in order to gain more insight about potential buyers, and on the industry trends as a whole. This can enable the creation of novel on-line shopping experiences, the detection of influencers, and the prediction of upcoming fashion trends. In this paper, we provide an overview of the main research challenges and an analysis of the most promising technological solutions that we are investigating in the FashionBrain project. △ Less

Submitted 26 October, 2017; originally announced October 2017.

arXiv:1707.07499 [pdf, other]

Analysing Errors of Open Information Extraction Systems

Authors: Rudolf Schneider, Tom Oberhauser, Tobias Klatt, Felix A. Gers, Alexander Löser

Abstract: We report results on benchmarking Open Information Extraction (OIE) systems using RelVis, a toolkit for benchmarking Open Information Extraction systems. Our comprehensive benchmark contains three data sets from the news domain and one data set from Wikipedia with overall 4522 labeled sentences and 11243 binary or n-ary OIE relations. In our analysis on these data sets we compared the performance… ▽ More We report results on benchmarking Open Information Extraction (OIE) systems using RelVis, a toolkit for benchmarking Open Information Extraction systems. Our comprehensive benchmark contains three data sets from the news domain and one data set from Wikipedia with overall 4522 labeled sentences and 11243 binary or n-ary OIE relations. In our analysis on these data sets we compared the performance of four popular OIE systems, ClausIE, OpenIE 4.2, Stanford OpenIE and PredPatt. In addition, we evaluated the impact of five common error classes on a subset of 749 n-ary tuples. From our deep analysis we unreveal important research directions for a next generation of OIE systems. △ Less

Submitted 24 July, 2017; originally announced July 2017.

Comments: Accepted at Building Linguistically Generalizable NLP Systems at EMNLP 2017

arXiv:1608.06757 [pdf, other]

Robust Named Entity Recognition in Idiosyncratic Domains

Authors: Sebastian Arnold, Felix A. Gers, Torsten Kilias, Alexander Löser

Abstract: Named entity recognition often fails in idiosyncratic domains. That causes a problem for depending tasks, such as entity linking and relation extraction. We propose a generic and robust approach for high-recall named entity recognition. Our approach is easy to train and offers strong generalization over diverse domain-specific language, such as news documents (e.g. Reuters) or biomedical text (e.g… ▽ More Named entity recognition often fails in idiosyncratic domains. That causes a problem for depending tasks, such as entity linking and relation extraction. We propose a generic and robust approach for high-recall named entity recognition. Our approach is easy to train and offers strong generalization over diverse domain-specific language, such as news documents (e.g. Reuters) or biomedical text (e.g. Medline). Our approach is based on deep contextual sequence learning and utilizes stacked bidirectional LSTM networks. Our model is trained with only few hundred labeled sentences and does not rely on further external knowledge. We report from our results F1 scores in the range of 84-94% on standard datasets. △ Less

Submitted 24 August, 2016; originally announced August 2016.

Comments: 8 pages, 1 figure

Showing 1–20 of 20 results for author: Löser, A