-
ROCKER: A Refinement Operator for Key Discovery
Authors:
Tommaso Soru,
Edgard Marx,
Axel-Cyrille Ngonga Ngomo
Abstract:
The Linked Data principles provide a decentral approach for publishing structured data in the RDF format on the Web. In contrast to structured data published in relational databases where a key is often provided explicitly, finding a set of properties that allows identifying a resource uniquely is a non-trivial task. Still, finding keys is of central importance for manifold applications such as re…
▽ More
The Linked Data principles provide a decentral approach for publishing structured data in the RDF format on the Web. In contrast to structured data published in relational databases where a key is often provided explicitly, finding a set of properties that allows identifying a resource uniquely is a non-trivial task. Still, finding keys is of central importance for manifold applications such as resource deduplication, link discovery, logical data compression and data integration. In this paper, we address this research gap by specifying a refinement operator, dubbed ROCKER, which we prove to be finite, proper and non-redundant. We combine the theoretical characteristics of this operator with two monotonicities of keys to obtain a time-efficient approach for detecting keys, i.e., sets of properties that describe resources uniquely. We then utilize a hash index to compute the discriminability score efficiently. Therewith, we ensure that our approach can scale to very large knowledge bases. Results show that ROCKER yields more accurate results, has a comparable runtime, and consumes less memory w.r.t. existing state-of-the-art techniques.
△ Less
Submitted 11 May, 2017;
originally announced May 2017.
-
An Evaluation of Models for Runtime Approximation in Link Discovery
Authors:
Kleanthi Georgala,
Micheal Hoffmann,
Axel-Cyrille Ngonga Ngomo
Abstract:
Time-efficient link discovery is of central importance to implement the vision of the Semantic Web. Some of the most rapid Link Discovery approaches rely internally on planning to execute link specifications. In newer works, linear models have been used to estimate the runtime the fastest planners. However, no other category of models has been studied for this purpose so far. In this paper, we stu…
▽ More
Time-efficient link discovery is of central importance to implement the vision of the Semantic Web. Some of the most rapid Link Discovery approaches rely internally on planning to execute link specifications. In newer works, linear models have been used to estimate the runtime the fastest planners. However, no other category of models has been studied for this purpose so far. In this paper, we study non-linear runtime estimation functions for runtime estimation. In particular, we study exponential and mixed models for the estimation of the runtimes of planners. To this end, we evaluate three different models for runtime on six datasets using 400 link specifications. We show that exponential and mixed models achieve better fits when trained but are only to be preferred in some cases. Our evaluation also shows that the use of better runtime approximation models has a positive impact on the overall execution of link specifications.
△ Less
Submitted 1 December, 2016;
originally announced December 2016.
-
Annex: Radon - Rapid Discovery of Topological Relations
Authors:
Mohamed Ahmed Sherif,
Kevin Dreßler,
Panayiotis Smeros,
Axel-Cyrille Ngonga Ngomo
Abstract:
Datasets containing geo-spatial resources are increasingly being represented according to the Linked Data principles. Several time-efficient approaches for discovering links between RDF resources have been developed over the last years. However, the time-efficient discovery of topological relations between geospatial resources has been paid little attention to. We address this research gap by pres…
▽ More
Datasets containing geo-spatial resources are increasingly being represented according to the Linked Data principles. Several time-efficient approaches for discovering links between RDF resources have been developed over the last years. However, the time-efficient discovery of topological relations between geospatial resources has been paid little attention to. We address this research gap by presenting Radon, a novel approach for the rapid computation of topological relations between geo-spatial resources. Our approach uses a sparse tiling index in combination with minimum bounding boxes to reduce the computation time of topological relations. Our evaluation of Radon's runtime on 45 datasets and in more than 800 experiments shows that it outperforms the state of the art by up to 3 orders of magnitude while maintaining an F-measure of 100%. Moreover, our experiments suggest that Radon scales up well when implemented in parallel.
△ Less
Submitted 18 November, 2016;
originally announced November 2016.
-
Self-Wiring Question Answering Systems
Authors:
Ricardo Usbeck,
Jonathan Huthmann,
Nico Duldhardt,
Axel-Cyrille Ngonga Ngomo
Abstract:
Question answering (QA) has been the subject of a resurgence over the past years. The said resurgence has led to a multitude of question answering (QA) systems being developed both by companies and research facilities. While a few components of QA systems get reused across implementations, most systems do not leverage the full potential of component reuse. Hence, the development of QA systems is c…
▽ More
Question answering (QA) has been the subject of a resurgence over the past years. The said resurgence has led to a multitude of question answering (QA) systems being developed both by companies and research facilities. While a few components of QA systems get reused across implementations, most systems do not leverage the full potential of component reuse. Hence, the development of QA systems is currently still a tedious and time-consuming process. We address the challenge of accelerating the creation of novel or tailored QA systems by presenting a concept for a self-wiring approach to composing QA systems. Our approach will allow the reuse of existing, web-based QA systems or modules while developing new QA platforms. To this end, it will rely on QA modules being described using the Web Ontology Language. Based on these descriptions, our approach will be able to automatically compose QA systems using a data-driven approach automatically.
△ Less
Submitted 8 November, 2016; v1 submitted 6 November, 2016;
originally announced November 2016.
-
LITMUS: An Open Extensible Framework for Benchmarking RDF Data Management Solutions
Authors:
Harsh Thakkar,
Mohnish Dubey,
Gezim Sejdiu,
Axel-Cyrille Ngonga Ngomo,
Jeremy Debattista,
Christoph Lange,
Jens Lehmann,
Sören Auer,
Maria-Esther Vidal
Abstract:
Developments in the context of Open, Big, and Linked Data have led to an enormous growth of structured data on the Web. To keep up with the pace of efficient consumption and management of the data at this rate, many data Management solutions have been developed for specific tasks and applications. We present LITMUS, a framework for benchmarking data management solutions. LITMUS goes beyond classic…
▽ More
Developments in the context of Open, Big, and Linked Data have led to an enormous growth of structured data on the Web. To keep up with the pace of efficient consumption and management of the data at this rate, many data Management solutions have been developed for specific tasks and applications. We present LITMUS, a framework for benchmarking data management solutions. LITMUS goes beyond classical storage benchmarking frameworks by allowing for analysing the performance of frameworks across query languages. In this position paper we present the conceptual architecture of LITMUS as well as the considerations that led to this architecture.
△ Less
Submitted 9 August, 2016;
originally announced August 2016.