(Translated by https://www.hiragana.jp/)
Lemo: A Cache-Enhanced Learned Optimizer for Concurrent Queries | Proceedings of the ACM on Management of Data skip to main content
research-article
Open access

Lemo: A Cache-Enhanced Learned Optimizer for Concurrent Queries

Published: 12 December 2023 Publication History

Abstract

With the expansion of modern database services, multi-user access has become a crucial feature in various practical application scenarios, including enterprise applications and e-commerce platforms. However, if multiple users submit queries within a short time frame, it can result in potential issues such as redundant computation and query concurrency. Unfortunately, most existing multi-query optimization methods, which aim to enhance query processing efficiency, have not adequately addressed these two problems, especially in the setting where multiple queries are being executed concurrently. To this end, we propose a novel method named Lemo for the multi-query optimization problem. Specifically, we propose a novel value network to predict latencies of concurrent queries as the foundation model for query plan generation. Furthermore, we introduce a shared buffer manager component to cache the intermediate results of sub-queries. The shared buffer manager applies a novel replacement policy to maintain the cached buffer with the objective of maximizing the opportunity for the reuse of the cached sub-queries. Based on the shared buffer, our proposed value network can incorporate the cached results into cost estimation to further guide Lemo in generating query plans, thus avoiding redundant computation. Lemo has been integrated into PostgreSQL and experiments conducted on real datasets with PostgreSQL show that it outperforms all the baselines in efficiency.

References

[1]
Subi Arumugam, Alin Dobra, Christopher M. Jermaine, Niketan Pansare, and Luis Leopoldo Perez. 2010. The DataPath System: A Data-centric Analytic Processing Engine for Large Data Warehouses. In Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2010, Indianapolis, Indiana, USA, June 6--10, 2010. ACM, 519--530.
[2]
Joan Bruna, Wojciech Zaremba, Arthur Szlam, and Yann LeCun. 2014. Spectral Networks and Locally Connected Networks on Graphs. In Proceedings of the 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, April 14--16, 2014, Conference Track Proceedings.
[3]
George Candea, Neoklis Polyzotis, and Radek Vingralek. 2009. A Scalable, Predictable Join Operator for Highly Concurrent Data Warehouses. Proc. VLDB Endow., Vol. 2, 1 (2009), 277--288.
[4]
Nilesh N. Dalvi, Sumit K. Sanghai, Prasan Roy, and S. Sudarshan. 2001. Pipelining in Multi-Query Optimization. In Proceedings of the 20th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, May 21--23, 2001, Santa Barbara, California, USA. ACM.
[5]
Georgios Giannikis, Gustavo Alonso, and Donald Kossmann. 2012. SharedDB: Killing One Thousand Queries with One Stone. Proc. VLDB Endow., Vol. 5, 6 (2012), 526--537.
[6]
Georgios Giannikis, Darko Makreshanski, Gustavo Alonso, and Donald Kossmann. 2014. Shared Workload Optimization. Proc. VLDB Endow., Vol. 7, 6 (2014), 429--440.
[7]
Stavros Harizopoulos, Vladislav Shkapenyuk, and Anastassia Ailamaki. 2005. QPipe: A Simultaneously Pipelined Relational Query Engine. In Proceedings of the ACM SIGMOD International Conference on Management of Data, Baltimore, Maryland, USA, June 14--16, 2005. ACM, 383--394.
[8]
Sailesh Krishnamurthy, Michael J. Franklin, Joseph M. Hellerstein, and Garrett Jacobson. 2004. The Case for Precision Sharing. In Proceedings of the 30th International Conference on Very Large Data Bases, VLDB 2004, Toronto, Canada, August 31 - September 3 2004. Morgan Kaufmann, 972--986.
[9]
Donghee Lee, Jongmoo Choi, Jong-Hun Kim, Sam H. Noh, Sang Lyul Min, Yookun Cho, and Chong-Sang Kim. 2001. LRFU: A Spectrum of Policies that Subsumes the Least Recently Used and Least Frequently Used Policies. IEEE Trans. Computers, Vol. 50, 12 (2001), 1352--1361.
[10]
Viktor Leis, Andrey Gubichev, Atanas Mirchev, Peter A. Boncz, Alfons Kemper, and Thomas Neumann. 2015. How Good Are Query Optimizers, Really? Proc. VLDB Endow., Vol. 9, 3 (2015), 204--215.
[11]
Samuel Madden, Mehul A. Shah, Joseph M. Hellerstein, and Vijayshankar Raman. 2002. Continuously Adaptive Continuous Queries over Streams. In Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2002, Madison, Wisconsin, USA, June 3--6, 2002. ACM, 49--60.
[12]
Darko Makreshanski, Georgios Giannikis, Gustavo Alonso, and Donald Kossmann. 2016. MQJoin: Efficient Shared Execution of Main-Memory Joins. Proc. VLDB Endow., Vol. 9, 6 (2016), 480--491.
[13]
Ryan Marcus, Parimarjan Negi, Hongzi Mao, Nesime Tatbul, Mohammad Alizadeh, and Tim Kraska. 2021. Bao: Making Learned Query Optimization Practical. In Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2021, Virtual Event, China, June 20--25, 2021. ACM, 1275--1288.
[14]
Ryan C. Marcus, Parimarjan Negi, Hongzi Mao, Chi Zhang, Mohammad Alizadeh, Tim Kraska, Olga Papaemmanouil, and Nesime Tatbul. 2019. Neo: A Learned Query Optimizer. Proc. VLDB Endow., Vol. 12, 11 (2019), 1705--1718.
[15]
Silvano Martello and Paolo Toth. 1987. Algorithms for Knapsack Problems. North-Holland Mathematics Studies, Vol. 132 (1987), 213--257.
[16]
Pietro Michiardi, Damiano Carra, and Sara Migliorini. 2021. Cache-based multi-query optimization for data-intensive scalable computing frameworks. Information Systems Frontiers, Vol. 23 (2021), 35--51.
[17]
Lili Mou, Ge Li, Lu Zhang, Tao Wang, and Zhi Jin. 2016. Convolutional Neural Networks over Tree Structures for Programming Language Processing. In Proceedings of the 30th AAAI Conference on Artificial Intelligence, AAAI 2016, Phoenix, Arizona, USA, February 12--17, 2016. AAAI Press, 1287--1293.
[18]
Fabian Nagel, Peter A. Boncz, and Stratis Viglas. 2013. Recycling in Pipelined Query Evaluation. In Proceedings of the 29th IEEE International Conference on Data Engineering, ICDE 2013, Brisbane, Australia, April 8--12, 2013. IEEE Computer Society, 338--349.
[19]
Jooseok Park and Arie Segev. 1988. Using Common Subexpressions to Optimize Multiple Queries. In Proceedings of the 4th International Conference on Data Engineering, ICDE 1988, Los Angeles, California, USA, February 1--5, 1988. IEEE Computer Society, 311--319.
[20]
Ulrich Pferschy and Joachim Schauer. 2009. The Knapsack Problem with Conflict Graphs. J. Graph Algorithms Appl., Vol. 13, 2 (2009), 233--249.
[21]
Prasan Roy, S. Seshadri, S. Sudarshan, and Siddhesh Bhobe. 2000. Efficient and Extensible Algorithms for Multi Query Optimization. In Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2000, Dallas, Texas, USA, May 16--18, 2000. ACM, 249--260.
[22]
Timos K. Sellis. 1988. Multiple-Query Optimization. ACM Trans. Database Syst., Vol. 13, 1 (1988), 23--52.
[23]
Kyuseok Shim, Timos K. Sellis, and Dana S. Nau. 1994. Improvements on a Heuristic Algorithm for Multiple-Query Optimization. Data Knowl. Eng., Vol. 12, 2 (1994), 197--222.
[24]
Yasin N Silva, Paul-Ake Larson, and Jingren Zhou. 2012. Exploiting common subexpressions for cloud query processing. In 2012 IEEE 28th International Conference on Data Engineering. IEEE, 1337--1348.
[25]
Panagiotis Sioulas and Anastasia Ailamaki. 2021. Scalable Multi-Query Execution using Reinforcement Learning. In Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2021, Virtual Event, China, June 20--25, 2021. ACM, 1651--1663.
[26]
Kai Sheng Tai, Richard Socher, and Christopher D. Manning. 2015. Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, ACL-IJCNLP 2015, Beijing, China, July 26--31, 2015. The Association for Computer Linguistics, 1556--1566.
[27]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. In Advances in Neural Information Processing Systems 30: NIPS 2017, Long Beach, CA, USA, December 4--9, 2017. 5998--6008.
[28]
Zongheng Yang, Wei-Lin Chiang, Sifei Luan, Gautam Mittal, Michael Luo, and Ion Stoica. 2022. Balsa: Learning a Query Optimizer Without Expert Demonstrations. In Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2022, Philadelphia, PA, USA, June 12 - 17, 2022. ACM, 931--944.
[29]
Zhengyu Yang, Danlin Jia, Stratis Ioannidis, Ningfang Mi, and Bo Sheng. 2018. Intermediate data caching optimization for multi-stage and parallel big data frameworks. In 2018 IEEE 11th International Conference on Cloud Computing (CLOUD). IEEE, 277--284.
[30]
Xiang Yu, Chengliang Chai, Guoliang Li, and Jiabin Liu. 2022. Cost-based or Learning-based? A Hybrid Query Optimizer for Query Plan Selection. Proc. VLDB Endow., Vol. 15, 13 (2022), 3924--3936.
[31]
Yue Zhao, Gao Cong, Jiachen Shi, and Chunyan Miao. 2022. QueryFormer: A Tree Transformer Model for Query Plan Representation. Proc. VLDB Endow., Vol. 15, 8 (2022), 1658--1670.
[32]
Haoyi Zhou, Shanghang Zhang, Jieqi Peng, Shuai Zhang, Jianxin Li, Hui Xiong, and Wancai Zhang. 2021. Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting. In Proceedings of the 35th AAAI Conference on Artificial Intelligence, AAAI 2021, Virtually, February 2--9, 2021. AAAI Press, 11106--11115.
[33]
Jingren Zhou, Per-Åke Larson, Johann Christoph Freytag, and Wolfgang Lehner. 2007. Efficient Exploitation of Similar Subexpressions for Query Processing. In Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2007, Beijing, China, June 12--14, 2007. ACM, 533--544.
[34]
Xuanhe Zhou, Ji Sun, Guoliang Li, and Jianhua Feng. 2020. Query Performance Prediction for Concurrent Queries using Graph Embedding. Proc. VLDB Endow., Vol. 13, 9 (2020), 1416--1428.
[35]
Rong Zhu, Wei Chen, Bolin Ding, Xingguang Chen, Andreas Pfadler, Ziniu Wu, and Jingren Zhou. 2023. Lero: A Learning-to-Rank Query Optimizer. Proc. VLDB Endow. (2023). https://arxiv.org/abs/2302.06873

Cited By

View all
  • (2024)CAMAL: Optimizing LSM-trees via Active LearningProceedings of the ACM on Management of Data10.1145/36771382:4(1-26)Online publication date: 30-Sep-2024
  • (2024)Optimizing Time Series Queries with VersionsProceedings of the ACM on Management of Data10.1145/36549622:3(1-27)Online publication date: 30-May-2024
  • (2024)Machine Learning for Databases: Foundations, Paradigms, and Open problemsCompanion of the 2024 International Conference on Management of Data10.1145/3626246.3654686(622-629)Online publication date: 9-Jun-2024

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Management of Data
Proceedings of the ACM on Management of Data  Volume 1, Issue 4
PACMMOD
December 2023
1317 pages
EISSN:2836-6573
DOI:10.1145/3637468
Issue’s Table of Contents
This work is licensed under a Creative Commons Attribution International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 December 2023
Published in PACMMOD Volume 1, Issue 4

Author Tags

  1. concurrent query optimization
  2. databases
  3. machine learning

Qualifiers

  • Research-article

Funding Sources

  • ARC
  • National Research Foundation Singapore under its AI Singapore Programme
  • MOE Tier-2 grant

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)721
  • Downloads (Last 6 weeks)76
Reflects downloads up to 06 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)CAMAL: Optimizing LSM-trees via Active LearningProceedings of the ACM on Management of Data10.1145/36771382:4(1-26)Online publication date: 30-Sep-2024
  • (2024)Optimizing Time Series Queries with VersionsProceedings of the ACM on Management of Data10.1145/36549622:3(1-27)Online publication date: 30-May-2024
  • (2024)Machine Learning for Databases: Foundations, Paradigms, and Open problemsCompanion of the 2024 International Conference on Management of Data10.1145/3626246.3654686(622-629)Online publication date: 9-Jun-2024

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media