-
Acknowledgements.
People who the author would like to thank for their assistance in the creation of the book "Foundations and Trends in Databases: Information Extraction," Vol. 1 are mentioned.
-
Acknowledgments.
People that the author would like to thank for their assistance in the creation of his book are mentioned.
-
Adaptive Query Processing.
As the data management field has diversified to consider settings in which queries are increasingly complex, statistics are less available, or data is stored remotely, there has been an acknowledgment that the traditional optimize-then-execute paradigm is insufficient. This has led to a plethora of new techniques, generally placed under the common banner of adaptive query processing, that focus on using runtime feedback to modify query processing in a way that provides better response time or more efficient CPU utilization. In this survey paper, we identify many of the common issues, themes, and approaches that pervade this work, and the settings in which each piece of work is most appropriate. Our goal with this paper is to be a ‘value-add’ over the existing papers on the material, providing not only a brief overview of each technique, but also a basic framework for understanding the field of adaptive query processing in general. We focus primarily on intra-query adaptivity of long-running, but not full-fledged streaming, queries. We conclude with a discussion of open research problems that are of high importance.ABSTRACT FROM AUTHORCopyright of Foundations &Trends in Databases is the property of Now Publishers and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract.
-
Architecture of a Database System.
Database Management Systems (DBMSs) are a ubiquitous and critical component of modern computing, and the result of decades of research and development in both academia and industry. Historically, DBMSs were among the earliest multi-user server systems to be developed, and thus pioneered many systems design techniques for scalability and reliability now in use in many other contexts. While many of the algorithms and abstractions used by a DBMS are textbook material, there has been relatively sparse coverage in the literature of the systems design issues that make a DBMS work. This paper presents an architectural discussion of DBMS design principles, including process models, parallel architecture, storage system design, transaction system implementation, query processor and optimizer architectures, and typical shared components and utilities. Successful commercial and open-source systems are used as points of reference, particularly when multiple alternative designs have been adopted by different groups.ABSTRACT FROM AUTHORCopyright of Foundations &Trends in Databases is the property of Now Publishers and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract.
-
Chapter 1: Introduction.
The article discusses various reports published within the issue, including one on the identification of the dimensions of adaptive query processing and another on the approaches to adapt query execution.
-
Chapter 1: Introduction.
Chapter 1 of the book "Foundations and Trends in Databases: Information Extraction," Vol. 1, is presented. It describes the concept of information extraction as the automatic extraction of structured information like entities, relationships between entities, and attributes describing entities from unstructured sources. It also discusses the evolution of the techniques of structure extraction.
-
Chapter 2: Background: Conventional Optimization Techniques.
Chapter 2 of the book "Foundations and Trends in Databases" is presented. It discusses key concepts in single-pass, non-adaptive query optimization in relation databases. It emphasizes on the foundations for discussing adaptive query processing techniques. Moreover, the impact of cost estimates on the optimization process is analyzed. Strategies that have been proposed to make plan selection robust to erroneous estimates is also presented.
-
Chapter 2: Entity Extraction: Rule-based Methods.
Chapter 2 of the book "Foundations and Trends in Databases: Information Extraction," Vol. 1, is presented. It discusses the rule-based methods for entity extraction which consist of two parts including a collection of rules and a set of policies to control the firings of multiple rules. It also presents the basic form of rules and the rule-consolidation policies.
-
Chapter 3: Entity Extraction: Statistical Methods.
Chapter 3 of the book "Foundations and Trends in Databases: Information Extraction," Vol. 1, is presented. It discusses the statistical methods for entity extraction which convert the extraction task to a problem of designing a decomposition of the unstructured text. It also explains that a common method of creating text chunks is through natural language parsing techniques that identify noun chunks in a sentence.
-
Chapter 3: Foundations of Adaptive Query Processing.
Chapter 3 of the book "Foundations and Trends in Databases" is presented. It introduces some of the foundations of adaptive query processing. Three new operators are presented that play a central role in several adaptive technique by allowing for greater scheduling flexibility and more opportunities for adaptation. A framework called adaptivity loop is discussed to illustrate the adaptive query processing techniques. Furthermore, using traditional query plans, the behavior of an adaptive technique was analyzed.
-
Chapter 4: Adaptive Selection Ordering.
Chapter 4 of the book "Foundations and Trends in Databases" is presented. It presents the adaptive greedy (A-Greedy) technique that was proposed for evaluating selection ordering queries over data streams. It monitors the selectivities of the query predicates continuously using a random sample. The A-Greedy takes into account the correlations in the data and the approximation guarantees can be provided on its performance.
-
Chapter 4: Relationship Extraction.
Chapter 4 of the book "Foundations and Trends in Databases: Information Extraction," Vol. 1, is presented. It explains that need to relate the extracted entities in extraction tasks since the unstructured source is not naturally partitioned into records. It also focuses on the extraction of binary relationships, although in general relationships can be multi-way involving three or more entities.
-
Chapter 5: Adaptive Join Processing: Overview.
Chapter 5 of the book "Foundations and Trends in Databases" is presented. It discusses adaptation schemes that are complicated to design and analyze compared to selection ordering in the U.S. The execution plan space is larger and complex than the plan space for selection ordering. The complexity arises from the large number of join orders. It is noted that database system typically uses the execution plans with the mixture of blocking operators and pipelines that can be adapted independently by using different techniques.
-
Chapter 5: Management of Information Extraction Systems.
Chapter 5 of the book "Foundations and Trends in Databases: Information Extraction," Vol. 1, is presented. It discusses the various issues in the management of information extraction systems. It also explains the techniques for optimizing the incremental process of extraction and detecting when data drift causes extractors to fail.
-
Chapter 6: Adaptive Join Processing: History-Independent Pipelined Execution.
Chapter 6 of the book "Foundations and Trends in Databases" is presented. It discusses several techniques for adaptation of multi-way loin queries when the execution is pipelined and history dependent in the U.S. StreaMON is a specific adaptive system that chooses the probing sequence using the A-Greedy algorithm while SteMs enable an adaptation framework that allow a rich space of routing policies. The biggest advantages of history-independent execution is the plan space that is easier to analyze and to design algorithms.
-
Chapter 6: Concluding Remarks.
Chapter 6 of the book "Foundations and Trends in Databases: Information Extraction," Vol. 1, is presented. It explains that many applications depend on the automatic extraction of structure from unstructured data for better means of querying, organizing, and analyzing data. It also discusses the concluding remarks about the topics discussed in this book.
-
Chapter 7: Adaptive Join Processing: History-Dependent Pipelined Execution.
Chapter 7 of the book "Foundations and Trends in Databases" is presented. It discusses a variety of schemes for adaptive query processing that use tress of binary join operators for query execution. It also analyzes some of the techniques from the perspective of the adaptivity loop. It shows that a corrective query processing uses conventional plan but also chooses the multiple plans over the entire execution. Meanwhile, corrective query processing resembles a conventional query optimizers's cost model.
-
Chapter 8: Adaptive Join Processing: Non-pipelined Execution.
Chapter 8 of the book "Foundations and Trends in Databases" is presented. It discusses adaptation techniques for non-pipelined operators for Database Management System (DBMS) application in the U.S. Plan staging is a method widely used by DBMS applications to get predictable query performance. Mid-query reoptimization is an adaptation that become popular and used processing systems including those designed with a focus on adaptation.
-
Chapter 9: Summary and Open Questions.
Chapter 9 of the book "Foundations and Trends in Databases" is presented. It discusses declarative query processing as a fundamental value preposition of the Database Management System (DBMS) in the U.S. Query processors provide feature by optimizing and executing model due to serious problem in estimating cardinalities, increasing query complexity, and new DBMS environments. The prominent exceptions of competitive execution invest resources on exploration, most adaptive query processing strategies that focus on exploitation to detect a better alternative.
-
Information Extraction.
The automatic extraction of information from unstructured sources has opened up new avenues for querying, organizing, and analyzing data by drawing upon the clean semantics of structured databases and the abundance of unstructured data. The field of information extraction has its genesis in the natural language processing community where the primary impetus came from competitions centered around the recognition of named entities like people names and organization from news articles. As society became more data oriented with easy online access to both structured and unstructured data, new applications of structure extraction came around. Now, there is interest in converting our personal desktops to structured databases, the knowledge in scientific publications to structured records, and harnessing the Internet for structured fact finding queries. Consequently, there are many different communities of researchers bringing in techniques from machine learning, databases, information retrieval, and computational linguistics for various aspects of the information extraction problem. This review is a survey of information extraction research of over two decades from these diverse communities. We create a taxonomy of the field along various dimensions derived from the nature of the extraction task, the techniques used for extraction, the variety of input resources exploited, and the type of output produced. We elaborate on rule-based and statistical methods for entity and relationship extraction. In each case we highlight the different kinds of models for capturing the diversity of clues driving the recognition process and the algorithms for training and efficiently deploying the models. We survey techniques for optimizing the various steps in an information extraction pipeline, adapting to dynamic data, integrating with existing entities and handling uncertainty in the extraction process.ABSTRACT FROM AUTHORCopyright of Foundations &Trends in Databases is the property of Now Publishers and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract.
-
References.
References for the articles published in the journal "Foundations and Trends in Databases" are presented.
-
References.
The sources cited within this issue are presented including "Mining reference tables for automatic text segmentation," by E. Agichtein and V. Ganti, "Learning information extraction rules: An inductive logic programming approach," by J. Aitken, and "Querying text databases for efficient information extraction," by E. Agichtein and L. Gravano.
Have a comment about this page?
Please, contact us. If this is a correction, your suggested change will be reviewed by our editorial staff.