(Translated by https://www.hiragana.jp/)
Lemo: A Cache-Enhanced Learned Optimizer for Concurrent Queries | Proceedings of the ACM on Management of Data

research-article

Open access

Lemo: A Cache-Enhanced Learned Optimizer for Concurrent Queries

Authors:

Zhifeng BaoAuthors Info & Claims

Proceedings of the ACM on Management of Data, Volume 1, Issue 4

Article No.: 247, Pages 1 - 26

https://doi.org/10.1145/3626734

Published: 12 December 2023 Publication History

Abstract

With the expansion of modern database services, multi-user access has become a crucial feature in various practical application scenarios, including enterprise applications and e-commerce platforms. However, if multiple users submit queries within a short time frame, it can result in potential issues such as redundant computation and query concurrency. Unfortunately, most existing multi-query optimization methods, which aim to enhance query processing efficiency, have not adequately addressed these two problems, especially in the setting where multiple queries are being executed concurrently. To this end, we propose a novel method named Lemo for the multi-query optimization problem. Specifically, we propose a novel value network to predict latencies of concurrent queries as the foundation model for query plan generation. Furthermore, we introduce a shared buffer manager component to cache the intermediate results of sub-queries. The shared buffer manager applies a novel replacement policy to maintain the cached buffer with the objective of maximizing the opportunity for the reuse of the cached sub-queries. Based on the shared buffer, our proposed value network can incorporate the cached results into cost estimation to further guide Lemo in generating query plans, thus avoiding redundant computation. Lemo has been integrated into PostgreSQL and experiments conducted on real datasets with PostgreSQL show that it outperforms all the baselines in efficiency.

References

[1]

Subi Arumugam, Alin Dobra, Christopher M. Jermaine, Niketan Pansare, and Luis Leopoldo Perez. 2010. The DataPath System: A Data-centric Analytic Processing Engine for Large Data Warehouses. In Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2010, Indianapolis, Indiana, USA, June 6--10, 2010. ACM, 519--530.

Digital Library

[2]

Joan Bruna, Wojciech Zaremba, Arthur Szlam, and Yann LeCun. 2014. Spectral Networks and Locally Connected Networks on Graphs. In Proceedings of the 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, April 14--16, 2014, Conference Track Proceedings.

[3]

George Candea, Neoklis Polyzotis, and Radek Vingralek. 2009. A Scalable, Predictable Join Operator for Highly Concurrent Data Warehouses. Proc. VLDB Endow., Vol. 2, 1 (2009), 277--288.

Digital Library

[4]

Nilesh N. Dalvi, Sumit K. Sanghai, Prasan Roy, and S. Sudarshan. 2001. Pipelining in Multi-Query Optimization. In Proceedings of the 20th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, May 21--23, 2001, Santa Barbara, California, USA. ACM.

[5]

Georgios Giannikis, Gustavo Alonso, and Donald Kossmann. 2012. SharedDB: Killing One Thousand Queries with One Stone. Proc. VLDB Endow., Vol. 5, 6 (2012), 526--537.

Digital Library

[6]

Georgios Giannikis, Darko Makreshanski, Gustavo Alonso, and Donald Kossmann. 2014. Shared Workload Optimization. Proc. VLDB Endow., Vol. 7, 6 (2014), 429--440.

Digital Library

[7]

Stavros Harizopoulos, Vladislav Shkapenyuk, and Anastassia Ailamaki. 2005. QPipe: A Simultaneously Pipelined Relational Query Engine. In Proceedings of the ACM SIGMOD International Conference on Management of Data, Baltimore, Maryland, USA, June 14--16, 2005. ACM, 383--394.

Digital Library

[8]

Sailesh Krishnamurthy, Michael J. Franklin, Joseph M. Hellerstein, and Garrett Jacobson. 2004. The Case for Precision Sharing. In Proceedings of the 30th International Conference on Very Large Data Bases, VLDB 2004, Toronto, Canada, August 31 - September 3 2004. Morgan Kaufmann, 972--986.

[9]

Donghee Lee, Jongmoo Choi, Jong-Hun Kim, Sam H. Noh, Sang Lyul Min, Yookun Cho, and Chong-Sang Kim. 2001. LRFU: A Spectrum of Policies that Subsumes the Least Recently Used and Least Frequently Used Policies. IEEE Trans. Computers, Vol. 50, 12 (2001), 1352--1361.

Digital Library

[10]

Viktor Leis, Andrey Gubichev, Atanas Mirchev, Peter A. Boncz, Alfons Kemper, and Thomas Neumann. 2015. How Good Are Query Optimizers, Really? Proc. VLDB Endow., Vol. 9, 3 (2015), 204--215.

Digital Library

[11]

Samuel Madden, Mehul A. Shah, Joseph M. Hellerstein, and Vijayshankar Raman. 2002. Continuously Adaptive Continuous Queries over Streams. In Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2002, Madison, Wisconsin, USA, June 3--6, 2002. ACM, 49--60.

Digital Library

[12]

Darko Makreshanski, Georgios Giannikis, Gustavo Alonso, and Donald Kossmann. 2016. MQJoin: Efficient Shared Execution of Main-Memory Joins. Proc. VLDB Endow., Vol. 9, 6 (2016), 480--491.

Digital Library

[13]

Ryan Marcus, Parimarjan Negi, Hongzi Mao, Nesime Tatbul, Mohammad Alizadeh, and Tim Kraska. 2021. Bao: Making Learned Query Optimization Practical. In Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2021, Virtual Event, China, June 20--25, 2021. ACM, 1275--1288.

Digital Library

[14]

Ryan C. Marcus, Parimarjan Negi, Hongzi Mao, Chi Zhang, Mohammad Alizadeh, Tim Kraska, Olga Papaemmanouil, and Nesime Tatbul. 2019. Neo: A Learned Query Optimizer. Proc. VLDB Endow., Vol. 12, 11 (2019), 1705--1718.

Digital Library

[15]

Silvano Martello and Paolo Toth. 1987. Algorithms for Knapsack Problems. North-Holland Mathematics Studies, Vol. 132 (1987), 213--257.

[16]

Pietro Michiardi, Damiano Carra, and Sara Migliorini. 2021. Cache-based multi-query optimization for data-intensive scalable computing frameworks. Information Systems Frontiers, Vol. 23 (2021), 35--51.

Digital Library

[17]

Lili Mou, Ge Li, Lu Zhang, Tao Wang, and Zhi Jin. 2016. Convolutional Neural Networks over Tree Structures for Programming Language Processing. In Proceedings of the 30th AAAI Conference on Artificial Intelligence, AAAI 2016, Phoenix, Arizona, USA, February 12--17, 2016. AAAI Press, 1287--1293.

[18]

Fabian Nagel, Peter A. Boncz, and Stratis Viglas. 2013. Recycling in Pipelined Query Evaluation. In Proceedings of the 29th IEEE International Conference on Data Engineering, ICDE 2013, Brisbane, Australia, April 8--12, 2013. IEEE Computer Society, 338--349.

Digital Library

[19]

Jooseok Park and Arie Segev. 1988. Using Common Subexpressions to Optimize Multiple Queries. In Proceedings of the 4th International Conference on Data Engineering, ICDE 1988, Los Angeles, California, USA, February 1--5, 1988. IEEE Computer Society, 311--319.

[20]

Ulrich Pferschy and Joachim Schauer. 2009. The Knapsack Problem with Conflict Graphs. J. Graph Algorithms Appl., Vol. 13, 2 (2009), 233--249.

[21]

Prasan Roy, S. Seshadri, S. Sudarshan, and Siddhesh Bhobe. 2000. Efficient and Extensible Algorithms for Multi Query Optimization. In Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2000, Dallas, Texas, USA, May 16--18, 2000. ACM, 249--260.

Digital Library

[22]

Timos K. Sellis. 1988. Multiple-Query Optimization. ACM Trans. Database Syst., Vol. 13, 1 (1988), 23--52.

Digital Library

[23]

Kyuseok Shim, Timos K. Sellis, and Dana S. Nau. 1994. Improvements on a Heuristic Algorithm for Multiple-Query Optimization. Data Knowl. Eng., Vol. 12, 2 (1994), 197--222.

Digital Library

[24]

Yasin N Silva, Paul-Ake Larson, and Jingren Zhou. 2012. Exploiting common subexpressions for cloud query processing. In 2012 IEEE 28th International Conference on Data Engineering. IEEE, 1337--1348.

Digital Library

[25]

Panagiotis Sioulas and Anastasia Ailamaki. 2021. Scalable Multi-Query Execution using Reinforcement Learning. In Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2021, Virtual Event, China, June 20--25, 2021. ACM, 1651--1663.

Digital Library

[26]

Kai Sheng Tai, Richard Socher, and Christopher D. Manning. 2015. Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, ACL-IJCNLP 2015, Beijing, China, July 26--31, 2015. The Association for Computer Linguistics, 1556--1566.

[27]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. In Advances in Neural Information Processing Systems 30: NIPS 2017, Long Beach, CA, USA, December 4--9, 2017. 5998--6008.

[28]

Zongheng Yang, Wei-Lin Chiang, Sifei Luan, Gautam Mittal, Michael Luo, and Ion Stoica. 2022. Balsa: Learning a Query Optimizer Without Expert Demonstrations. In Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2022, Philadelphia, PA, USA, June 12 - 17, 2022. ACM, 931--944.

Digital Library

[29]

Zhengyu Yang, Danlin Jia, Stratis Ioannidis, Ningfang Mi, and Bo Sheng. 2018. Intermediate data caching optimization for multi-stage and parallel big data frameworks. In 2018 IEEE 11th International Conference on Cloud Computing (CLOUD). IEEE, 277--284.

[30]

Xiang Yu, Chengliang Chai, Guoliang Li, and Jiabin Liu. 2022. Cost-based or Learning-based? A Hybrid Query Optimizer for Query Plan Selection. Proc. VLDB Endow., Vol. 15, 13 (2022), 3924--3936.

Digital Library

[31]

Yue Zhao, Gao Cong, Jiachen Shi, and Chunyan Miao. 2022. QueryFormer: A Tree Transformer Model for Query Plan Representation. Proc. VLDB Endow., Vol. 15, 8 (2022), 1658--1670.

Digital Library

[32]

Haoyi Zhou, Shanghang Zhang, Jieqi Peng, Shuai Zhang, Jianxin Li, Hui Xiong, and Wancai Zhang. 2021. Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting. In Proceedings of the 35th AAAI Conference on Artificial Intelligence, AAAI 2021, Virtually, February 2--9, 2021. AAAI Press, 11106--11115.

[33]

Jingren Zhou, Per-Åke Larson, Johann Christoph Freytag, and Wolfgang Lehner. 2007. Efficient Exploitation of Similar Subexpressions for Query Processing. In Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2007, Beijing, China, June 12--14, 2007. ACM, 533--544.

Digital Library

[34]

Xuanhe Zhou, Ji Sun, Guoliang Li, and Jianhua Feng. 2020. Query Performance Prediction for Concurrent Queries using Graph Embedding. Proc. VLDB Endow., Vol. 13, 9 (2020), 1416--1428.

Digital Library

[35]

Rong Zhu, Wei Chen, Bolin Ding, Xingguang Chen, Andreas Pfadler, Ziniu Wu, and Jingren Zhou. 2023. Lero: A Learning-to-Rank Query Optimizer. Proc. VLDB Endow. (2023). https://arxiv.org/abs/2302.06873

Digital Library

Cited By

Yu WLuo SYu ZCong G(2024)CAMAL: Optimizing LSM-trees via Active LearningProceedings of the ACM on Management of Data10.1145/36771382:4(1-26)Online publication date: 30-Sep-2024
https://dl.acm.org/doi/10.1145/3677138
Kang RSong S(2024)Optimizing Time Series Queries with VersionsProceedings of the ACM on Management of Data10.1145/36549622:3(1-27)Online publication date: 30-May-2024
https://dl.acm.org/doi/10.1145/3654962
Cong GYang JZhao YBarcelo PSanchez-Pi NMeliou ASudarshan S(2024)Machine Learning for Databases: Foundations, Paradigms, and Open problemsCompanion of the 2024 International Conference on Management of Data10.1145/3626246.3654686(622-629)Online publication date: 9-Jun-2024
https://dl.acm.org/doi/10.1145/3626246.3654686

Index Terms

Lemo: A Cache-Enhanced Learned Optimizer for Concurrent Queries
1. Information systems
  1. Data management systems
    1. Database management system engines
      1. Database query processing
        Query optimization

Recommendations

A monotone preservation result for Boolean queries expressed as a containment of conjunctive queries
Highlights
- Using containments of conjunctive queries (CQ) to express boolean queries.
- ...
Abstract
When a relational database is queried, the result is normally a relation. Some queries, however, only require a yes/no answer; such queries are often called boolean queries. It is customary in database theory to express boolean queries ...
Rewriting queries using views in the presence of arithmetic comparisons

We consider the problem of answering queries using views, where queries and views are conjunctive queries with arithmetic comparisons over dense orders. Previous work only considered limited variants of this problem, without giving a complete solution. ...
Query containment under bag and bag-set semantics

Conjunctive queries (CQs) are at the core of query languages encountered in many logic-based research fields such as AI, or database systems. The majority of existing work assumes set semantics but often in real applications the manipulation of ...

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Management of Data

Proceedings of the ACM on Management of Data Volume 1, Issue 4

PACMMOD

December 2023

1317 pages

EISSN:2836-6573

DOI:10.1145/3637468

Editor:
Divyakant Agrawal
UC Santa Barbara, United States

Issue’s Table of Contents

Copyright © 2023 Owner/Author.

This work is licensed under a Creative Commons Attribution International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 December 2023

Published in PACMMOD Volume 1, Issue 4

Author Tags

Qualifiers

Research-article

Funding Sources

ARC
National Research Foundation Singapore under its AI Singapore Programme
MOE Tier-2 grant

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
721
Total Downloads

Downloads (Last 12 months)721
Downloads (Last 6 weeks)76

Reflects downloads up to 06 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Yu WLuo SYu ZCong G(2024)CAMAL: Optimizing LSM-trees via Active LearningProceedings of the ACM on Management of Data10.1145/36771382:4(1-26)Online publication date: 30-Sep-2024
https://dl.acm.org/doi/10.1145/3677138
Kang RSong S(2024)Optimizing Time Series Queries with VersionsProceedings of the ACM on Management of Data10.1145/36549622:3(1-27)Online publication date: 30-May-2024
https://dl.acm.org/doi/10.1145/3654962
Cong GYang JZhao YBarcelo PSanchez-Pi NMeliou ASudarshan S(2024)Machine Learning for Databases: Foundations, Paradigms, and Open problemsCompanion of the 2024 International Conference on Management of Data10.1145/3626246.3654686(622-629)Online publication date: 9-Jun-2024
https://dl.acm.org/doi/10.1145/3626246.3654686

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Media

Figures

Other

Tables

View Issue’s Table of Contents