(Translated by https://www.hiragana.jp/)
Search | arXiv e-print repository
Skip to main content

Showing 1–50 of 98 results for author: Gai, K

Searching in archive cs. Search in all archives.
.
  1. T2VIndexer: A Generative Video Indexer for Efficient Text-Video Retrieval

    Authors: Yili Li, Jing Yu, Keke Gai, Bang Liu, Gang Xiong, Qi Wu

    Abstract: Current text-video retrieval methods mainly rely on cross-modal matching between queries and videos to calculate their similarity scores, which are then sorted to obtain retrieval results. This method considers the matching between each candidate video and the query, but it incurs a significant time cost and will increase notably with the increase of candidates. Generative models are common in nat… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

  2. arXiv:2408.07989  [pdf, other

    cs.CV cs.AI

    IIU: Independent Inference Units for Knowledge-based Visual Question Answering

    Authors: Yili Li, Jing Yu, Keke Gai, Gang Xiong

    Abstract: Knowledge-based visual question answering requires external knowledge beyond visible content to answer the question correctly. One limitation of existing methods is that they focus more on modeling the inter-modal and intra-modal correlations, which entangles complex multimodal clues by implicit embeddings and lacks interpretability and generalization ability. The key challenge to solve the above… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

  3. arXiv:2408.05430  [pdf, other

    cs.IR cs.LG

    HoME: Hierarchy of Multi-Gate Experts for Multi-Task Learning at Kuaishou

    Authors: Xu Wang, Jiangxia Cao, Zhiyi Fu, Kun Gai, Guorui Zhou

    Abstract: In this paper, we present the practical problems and the lessons learned at short-video services from Kuaishou. In industry, a widely-used multi-task framework is the Mixture-of-Experts (MoE) paradigm, which always introduces some shared and specific experts for each task and then uses gate networks to measure related experts' contributions. Although the MoE achieves remarkable improvements, we st… ▽ More

    Submitted 10 August, 2024; originally announced August 2024.

    Comments: Work in progress

  4. arXiv:2408.00329  [pdf, other

    cs.LG cs.AI math.OC stat.ML

    OTAD: An Optimal Transport-Induced Robust Model for Agnostic Adversarial Attack

    Authors: Kuo Gai, Sicong Wang, Shihua Zhang

    Abstract: Deep neural networks (DNNs) are vulnerable to small adversarial perturbations of the inputs, posing a significant challenge to their reliability and robustness. Empirical methods such as adversarial training can defend against particular attacks but remain vulnerable to more powerful attacks. Alternatively, Lipschitz networks provide certified robustness to unseen perturbations but lack sufficient… ▽ More

    Submitted 1 August, 2024; originally announced August 2024.

    Comments: 14 pages, 2 figures

  5. TWIN V2: Scaling Ultra-Long User Behavior Sequence Modeling for Enhanced CTR Prediction at Kuaishou

    Authors: Zihua Si, Lin Guan, ZhongXiang Sun, Xiaoxue Zang, Jing Lu, Yiqun Hui, Xingchao Cao, Zeyu Yang, Yichen Zheng, Dewei Leng, Kai Zheng, Chenbin Zhang, Yanan Niu, Yang Song, Kun Gai

    Abstract: The significance of modeling long-term user interests for CTR prediction tasks in large-scale recommendation systems is progressively gaining attention among researchers and practitioners. Existing work, such as SIM and TWIN, typically employs a two-stage approach to model long-term user behavior sequences for efficiency concerns. The first stage rapidly retrieves a subset of sequences related to… ▽ More

    Submitted 16 August, 2024; v1 submitted 23 July, 2024; originally announced July 2024.

    Comments: Accepted by CIKM 2024

  6. arXiv:2407.15613  [pdf, other

    cs.CV

    Visual-Semantic Decomposition and Partial Alignment for Document-based Zero-Shot Learning

    Authors: Xiangyan Qu, Jing Yu, Keke Gai, Jiamin Zhuang, Yuanmin Tang, Gang Xiong, Gaopeng Gou, Qi Wu

    Abstract: Recent work shows that documents from encyclopedias serve as helpful auxiliary information for zero-shot learning. Existing methods align the entire semantics of a document with corresponding images to transfer knowledge. However, they disregard that semantic information is not equivalent between them, resulting in a suboptimal alignment. In this work, we propose a novel network to extract multi-v… ▽ More

    Submitted 23 July, 2024; v1 submitted 22 July, 2024; originally announced July 2024.

    Comments: Accepted to ACM International Conference on Multimedia (MM) 2024

  7. arXiv:2407.02540  [pdf, other

    stat.ML cs.AI cs.LG

    Analytical Solution of a Three-layer Network with a Matrix Exponential Activation Function

    Authors: Kuo Gai, Shihua Zhang

    Abstract: In practice, deeper networks tend to be more powerful than shallow ones, but this has not been understood theoretically. In this paper, we find the analytical solution of a three-layer network with a matrix exponential activation function, i.e., $$ f(X)=W_3\exp(W_2\exp(W_1X)), X\in \mathbb{C}^{d\times d} $$ have analytical solutions for the equations $$ Y_1=f(X_1),Y_2=f(X_2) $$ for… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: 8 pages,1 figure

  8. arXiv:2407.01607  [pdf, other

    cs.LG cs.IR stat.ML

    Multi-Epoch learning with Data Augmentation for Deep Click-Through Rate Prediction

    Authors: Zhongxiang Fan, Zhaocheng Liu, Jian Liang, Dongying Kong, Han Li, Peng Jiang, Shuang Li, Kun Gai

    Abstract: This paper investigates the one-epoch overfitting phenomenon in Click-Through Rate (CTR) models, where performance notably declines at the start of the second epoch. Despite extensive research, the efficacy of multi-epoch training over the conventional one-epoch approach remains unclear. We identify the overfitting of the embedding layer, caused by high-dimensional data sparsity, as the primary is… ▽ More

    Submitted 27 June, 2024; originally announced July 2024.

  9. arXiv:2406.11277  [pdf, other

    cs.CL

    Small Agent Can Also Rock! Empowering Small Language Models as Hallucination Detector

    Authors: Xiaoxue Cheng, Junyi Li, Wayne Xin Zhao, Hongzhi Zhang, Fuzheng Zhang, Di Zhang, Kun Gai, Ji-Rong Wen

    Abstract: Hallucination detection is a challenging task for large language models (LLMs), and existing studies heavily rely on powerful closed-source LLMs such as GPT-4. In this paper, we propose an autonomous LLM-based agent framework, called HaluAgent, which enables relatively smaller LLMs (e.g. Baichuan2-Chat 7B) to actively select suitable tools for detecting multiple hallucination types such as text, c… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  10. arXiv:2405.15208  [pdf, other

    cs.CL cs.AI

    Decoding at the Speed of Thought: Harnessing Parallel Decoding of Lexical Units for LLMs

    Authors: Chenxi Sun, Hongzhi Zhang, Zijia Lin, Jingyuan Zhang, Fuzheng Zhang, Zhongyuan Wang, Bin Chen, Chengru Song, Di Zhang, Kun Gai, Deyi Xiong

    Abstract: Large language models have demonstrated exceptional capability in natural language understanding and generation. However, their generation speed is limited by the inherently sequential nature of their decoding process, posing challenges for real-time applications. This paper introduces Lexical Unit Decoding (LUD), a novel decoding methodology implemented in a data-driven manner, accelerating the d… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

    Comments: Accepted for publication at LREC-COLING 2024

  11. arXiv:2405.14677  [pdf, other

    cs.CV cs.LG

    RectifID: Personalizing Rectified Flow with Anchored Classifier Guidance

    Authors: Zhicheng Sun, Zhenhao Yang, Yang Jin, Haozhe Chi, Kun Xu, Kun Xu, Liwei Chen, Hao Jiang, Di Zhang, Yang Song, Kun Gai, Yadong Mu

    Abstract: Customizing diffusion models to generate identity-preserving images from user-provided reference images is an intriguing new problem. The prevalent approaches typically require training on extensive domain-specific images to achieve identity preservation, which lacks flexibility across different use cases. To address this issue, we exploit classifier guidance, a training-free technique that steers… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  12. Modeling User Fatigue for Sequential Recommendation

    Authors: Nian Li, Xin Ban, Cheng Ling, Chen Gao, Lantao Hu, Peng Jiang, Kun Gai, Yong Li, Qingmin Liao

    Abstract: Recommender systems filter out information that meets user interests. However, users may be tired of the recommendations that are too similar to the content they have been exposed to in a short historical period, which is the so-called user fatigue. Despite the significance for a better user experience, user fatigue is seldom explored by existing recommenders. In fact, there are three main challen… ▽ More

    Submitted 22 May, 2024; v1 submitted 19 May, 2024; originally announced May 2024.

    Comments: SIGIR 2024

  13. arXiv:2405.04844  [pdf, ps, other

    cs.IR

    Full Stage Learning to Rank: A Unified Framework for Multi-Stage Systems

    Authors: Kai Zheng, Haijun Zhao, Rui Huang, Beichuan Zhang, Na Mou, Yanan Niu, Yang Song, Hongning Wang, Kun Gai

    Abstract: The Probability Ranking Principle (PRP) has been considered as the foundational standard in the design of information retrieval (IR) systems. The principle requires an IR module's returned list of results to be ranked with respect to the underlying user interests, so as to maximize the results' utility. Nevertheless, we point out that it is inappropriate to indiscriminately apply PRP through eve… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

    Comments: Accepted by WWW 2024

  14. arXiv:2405.04108  [pdf, other

    cs.CR cs.AI

    A2-DIDM: Privacy-preserving Accumulator-enabled Auditing for Distributed Identity of DNN Model

    Authors: Tianxiu Xie, Keke Gai, Jing Yu, Liehuang Zhu, Kim-Kwang Raymond Choo

    Abstract: Recent booming development of Generative Artificial Intelligence (GenAI) has facilitated an emerging model commercialization for the purpose of reinforcement on model performance, such as licensing or trading Deep Neural Network (DNN) models. However, DNN model trading may trigger concerns of the unauthorized replications or misuses over the model, so that the benefit of the model ownership will b… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

  15. arXiv:2405.03988  [pdf, other

    cs.IR cs.AI

    Knowledge Adaptation from Large Language Model to Recommendation for Practical Industrial Application

    Authors: Jian Jia, Yipei Wang, Yan Li, Honggang Chen, Xuehan Bai, Zhaocheng Liu, Jian Liang, Quan Chen, Han Li, Peng Jiang, Kun Gai

    Abstract: Contemporary recommender systems predominantly rely on collaborative filtering techniques, employing ID-embedding to capture latent associations among users and items. However, this approach overlooks the wealth of semantic information embedded within textual descriptions of items, leading to suboptimal performance in cold-start scenarios and long-tail user recommendations. Leveraging the capabili… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

    Comments: 11 pages, 6 figures

  16. arXiv:2405.02696  [pdf, other

    cs.CR cs.AI

    DiffuseTrace: A Transparent and Flexible Watermarking Scheme for Latent Diffusion Model

    Authors: Liangqi Lei, Keke Gai, Jing Yu, Liehuang Zhu

    Abstract: Latent Diffusion Models (LDMs) enable a wide range of applications but raise ethical concerns regarding illegal utilization.Adding watermarks to generative model outputs is a vital technique employed for copyright tracking and mitigating potential risks associated with AI-generated content. However, post-hoc watermarking techniques are susceptible to evasion. Existing watermarking methods for LDMs… ▽ More

    Submitted 4 May, 2024; originally announced May 2024.

  17. arXiv:2405.00985  [pdf, other

    cs.LG cs.AI math.OC math.ST

    Progressive Feedforward Collapse of ResNet Training

    Authors: Sicong Wang, Kuo Gai, Shihua Zhang

    Abstract: Neural collapse (NC) is a simple and symmetric phenomenon for deep neural networks (DNNs) at the terminal phase of training, where the last-layer features collapse to their class means and form a simplex equiangular tight frame aligning with the classifier vectors. However, the relationship of the last-layer features to the data and intermediate layers during training remains unexplored. To this e… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

    Comments: 14 pages, 5 figures

    MSC Class: 68T07 ACM Class: I.2.0

  18. M3oE: Multi-Domain Multi-Task Mixture-of Experts Recommendation Framework

    Authors: Zijian Zhang, Shuchang Liu, Jiaao Yu, Qingpeng Cai, Xiangyu Zhao, Chunxu Zhang, Ziru Liu, Qidong Liu, Hongwei Zhao, Lantao Hu, Peng Jiang, Kun Gai

    Abstract: Multi-domain recommendation and multi-task recommendation have demonstrated their effectiveness in leveraging common information from different domains and objectives for comprehensive user modeling. Nonetheless, the practical recommendation usually faces multiple domains and tasks simultaneously, which cannot be well-addressed by current methods. To this end, we introduce M3oE, an adaptive Multi-… ▽ More

    Submitted 12 May, 2024; v1 submitted 29 April, 2024; originally announced April 2024.

  19. arXiv:2404.11095  [pdf, other

    cs.CL cs.AI

    Inductive-Deductive Strategy Reuse for Multi-Turn Instructional Dialogues

    Authors: Jiao Ou, Jiayu Wu, Che Liu, Fuzheng Zhang, Di Zhang, Kun Gai

    Abstract: Aligning large language models (LLMs) with human expectations requires high-quality instructional dialogues, which can be achieved by raising diverse, in-depth, and insightful instructions that deepen interactions. Existing methods target instructions from real instruction dialogues as a learning goal and fine-tune a user simulator for posing instructions. However, the user simulator struggles to… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

    Comments: 27 pages, 3 figures, 12 tables

  20. arXiv:2404.08675  [pdf, other

    cs.IR cs.AI cs.CL

    RecGPT: Generative Personalized Prompts for Sequential Recommendation via ChatGPT Training Paradigm

    Authors: Yabin Zhang, Wenhui Yu, Erhan Zhang, Xu Chen, Lantao Hu, Peng Jiang, Kun Gai

    Abstract: ChatGPT has achieved remarkable success in natural language understanding. Considering that recommendation is indeed a conversation between users and the system with items as words, which has similar underlying pattern with ChatGPT, we design a new chat framework in item index level for the recommendation task. Our novelty mainly contains three parts: model, training and inference. For the model p… ▽ More

    Submitted 6 April, 2024; originally announced April 2024.

  21. Sequential Recommendation for Optimizing Both Immediate Feedback and Long-term Retention

    Authors: Ziru Liu, Shuchang Liu, Zijian Zhang, Qingpeng Cai, Xiangyu Zhao, Kesen Zhao, Lantao Hu, Peng Jiang, Kun Gai

    Abstract: In the landscape of Recommender System (RS) applications, reinforcement learning (RL) has recently emerged as a powerful tool, primarily due to its proficiency in optimizing long-term rewards. Nevertheless, it suffers from instability in the learning process, stemming from the intricate interactions among bootstrapping, off-policy training, and function approximation. Moreover, in multi-reward rec… ▽ More

    Submitted 10 June, 2024; v1 submitted 4 April, 2024; originally announced April 2024.

    Comments: SIGIR 2024

  22. arXiv:2402.10618   

    cs.CL

    Enhancing Role-playing Systems through Aggressive Queries: Evaluation and Improvement

    Authors: Yihong Tang, Jiao Ou, Che Liu, Fuzheng Zhang, Di Zhang, Kun Gai

    Abstract: The advent of Large Language Models (LLMs) has propelled dialogue generation into new realms, particularly in the field of role-playing systems (RPSs). While enhanced with ordinary role-relevant training dialogues, existing LLM-based RPSs still struggle to align with roles when handling intricate and trapped queries in boundary scenarios. In this paper, we design the Modular ORchestrated Trap-sett… ▽ More

    Submitted 15 June, 2024; v1 submitted 16 February, 2024; originally announced February 2024.

    Comments: After our paper was submitted to the conference for review, it was found that there were major problems, so it was revised by more than 80%, which can basically be regarded as new work

  23. arXiv:2402.03161  [pdf, other

    cs.CV cs.CL

    Video-LaVIT: Unified Video-Language Pre-training with Decoupled Visual-Motional Tokenization

    Authors: Yang Jin, Zhicheng Sun, Kun Xu, Kun Xu, Liwei Chen, Hao Jiang, Quzhe Huang, Chengru Song, Yuliang Liu, Di Zhang, Yang Song, Kun Gai, Yadong Mu

    Abstract: In light of recent advances in multimodal Large Language Models (LLMs), there is increasing attention to scaling them from image-text data to more informative real-world videos. Compared to static images, video poses unique challenges for effective large-scale pre-training due to the modeling of its spatiotemporal dynamics. In this paper, we address such limitations in video-language pre-training… ▽ More

    Submitted 3 June, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

  24. Future Impact Decomposition in Request-level Recommendations

    Authors: Xiaobei Wang, Shuchang Liu, Xueliang Wang, Qingpeng Cai, Lantao Hu, Han Li, Peng Jiang, Kun Gai, Guangming Xie

    Abstract: In recommender systems, reinforcement learning solutions have shown promising results in optimizing the interaction sequence between users and the system over the long-term performance. For practical reasons, the policy's actions are typically designed as recommending a list of items to handle users' frequent and continuous browsing requests more efficiently. In this list-wise recommendation scena… ▽ More

    Submitted 18 June, 2024; v1 submitted 29 January, 2024; originally announced January 2024.

    Comments: 12 pages, 8 figures

    ACM Class: H.3.3

  25. arXiv:2401.05895  [pdf, other

    cs.LG cs.AI cs.CR cs.DC

    Binary Linear Tree Commitment-based Ownership Protection for Distributed Machine Learning

    Authors: Tianxiu Xie, Keke Gai, Jing Yu, Liehuang Zhu

    Abstract: Distributed machine learning enables parallel training of extensive datasets by delegating computing tasks across multiple workers. Despite the cost reduction benefits of distributed machine learning, the dissemination of final model weights often leads to potential conflicts over model ownership as workers struggle to substantiate their involvement in the training computation. To address the abov… ▽ More

    Submitted 11 January, 2024; originally announced January 2024.

  26. arXiv:2311.08302  [pdf, other

    cs.IR

    Inverse Learning with Extremely Sparse Feedback for Recommendation

    Authors: Guanyu Lin, Chen Gao, Yu Zheng, Yinfeng Li, Jianxin Chang, Yanan Niu, Yang Song, Kun Gai, Zhiheng Li, Depeng Jin, Yong Li

    Abstract: Modern personalized recommendation services often rely on user feedback, either explicit or implicit, to improve the quality of services. Explicit feedback refers to behaviors like ratings, while implicit feedback refers to behaviors like user clicks. However, in the scenario of full-screen video viewing experiences like Tiktok and Reels, the click action is absent, resulting in unclear feedback f… ▽ More

    Submitted 20 November, 2023; v1 submitted 14 November, 2023; originally announced November 2023.

    Comments: WSDM 2024

  27. arXiv:2311.08272  [pdf, other

    cs.IR cs.LG

    Mixed Attention Network for Cross-domain Sequential Recommendation

    Authors: Guanyu Lin, Chen Gao, Yu Zheng, Jianxin Chang, Yanan Niu, Yang Song, Kun Gai, Zhiheng Li, Depeng Jin, Yong Li, Meng Wang

    Abstract: In modern recommender systems, sequential recommendation leverages chronological user behaviors to make effective next-item suggestions, which suffers from data sparsity issues, especially for new users. One promising line of work is the cross-domain recommendation, which trains models with data across multiple domains to improve the performance in data-scarce domains. Recent proposed cross-domain… ▽ More

    Submitted 14 November, 2023; originally announced November 2023.

    Comments: WSDM 2024

  28. arXiv:2311.08154  [pdf, other

    cs.CL cs.AI

    Just Ask One More Time! Self-Agreement Improves Reasoning of Language Models in (Almost) All Scenarios

    Authors: Lei Lin, Jiayi Fu, Pengli Liu, Qingyang Li, Yan Gong, Junchen Wan, Fuzheng Zhang, Zhongyuan Wang, Di Zhang, Kun Gai

    Abstract: Although chain-of-thought (CoT) prompting combined with language models has achieved encouraging results on complex reasoning tasks, the naive greedy decoding used in CoT prompting usually causes the repetitiveness and local optimality. To address this shortcoming, ensemble-optimization tries to obtain multiple reasoning paths to get the final answer assembly. However, current ensemble-optimizatio… ▽ More

    Submitted 24 May, 2024; v1 submitted 14 November, 2023; originally announced November 2023.

    Comments: Accepted by Findings of ACL 2024

  29. arXiv:2311.05863  [pdf, other

    cs.CR cs.CV

    Watermarking Vision-Language Pre-trained Models for Multi-modal Embedding as a Service

    Authors: Yuanmin Tang, Jing Yu, Keke Gai, Xiangyan Qu, Yue Hu, Gang Xiong, Qi Wu

    Abstract: Recent advances in vision-language pre-trained models (VLPs) have significantly increased visual understanding and cross-modal analysis capabilities. Companies have emerged to provide multi-modal Embedding as a Service (EaaS) based on VLPs (e.g., CLIP-based VLPs), which cost a large amount of training data and resources for high-performance service. However, existing studies indicate that EaaS is… ▽ More

    Submitted 9 November, 2023; originally announced November 2023.

  30. arXiv:2311.01677  [pdf, other

    cs.CL cs.AI

    DialogBench: Evaluating LLMs as Human-like Dialogue Systems

    Authors: Jiao Ou, Junda Lu, Che Liu, Yihong Tang, Fuzheng Zhang, Di Zhang, Kun Gai

    Abstract: Large language models (LLMs) have achieved remarkable breakthroughs in new dialogue capabilities by leveraging instruction tuning, which refreshes human impressions of dialogue systems. The long-standing goal of dialogue systems is to be human-like enough to establish long-term connections with users. Therefore, there has been an urgent need to evaluate LLMs as human-like dialogue systems. In this… ▽ More

    Submitted 29 March, 2024; v1 submitted 2 November, 2023; originally announced November 2023.

    Comments: Accepted at NAACL 2024 (main conference)

  31. arXiv:2310.13367  [pdf, other

    cs.LG cs.AI cs.DC

    VFedMH: Vertical Federated Learning for Training Multiple Heterogeneous Models

    Authors: Shuo Wang, Keke Gai, Jing Yu, Liehuang Zhu, Kim-Kwang Raymond Choo, Bin Xiao

    Abstract: Vertical federated learning has garnered significant attention as it allows clients to train machine learning models collaboratively without sharing local data, which protects the client's local private data. However, existing VFL methods face challenges when dealing with heterogeneous local models among participants, which affects optimization convergence and generalization. To address this chall… ▽ More

    Submitted 8 February, 2024; v1 submitted 20 October, 2023; originally announced October 2023.

  32. arXiv:2310.10462  [pdf, other

    cs.LG

    Adaptive Neural Ranking Framework: Toward Maximized Business Goal for Cascade Ranking Systems

    Authors: Yunli Wang, Zhiqiang Wang, Jian Yang, Shiyang Wen, Dongying Kong, Han Li, Kun Gai

    Abstract: Cascade ranking is widely used for large-scale top-k selection problems in online advertising and recommendation systems, and learning-to-rank is an important way to optimize the models in cascade ranking. Previous works on learning-to-rank usually focus on letting the model learn the complete order or top-k order, and adopt the corresponding rank metrics (e.g. OPA and NDCG@k) as optimization targ… ▽ More

    Submitted 21 February, 2024; v1 submitted 16 October, 2023; originally announced October 2023.

    Comments: 12 pages, Accepted by www2024

  33. arXiv:2310.07488  [pdf, other

    cs.CL cs.AI cs.LG

    KwaiYiiMath: Technical Report

    Authors: Jiayi Fu, Lei Lin, Xiaoyang Gao, Pengli Liu, Zhengzong Chen, Zhirui Yang, Shengnan Zhang, Xue Zheng, Yan Li, Yuliang Liu, Xucheng Ye, Yiqiao Liao, Chao Liao, Bin Chen, Chengru Song, Junchen Wan, Zijia Lin, Fuzheng Zhang, Zhongyuan Wang, Di Zhang, Kun Gai

    Abstract: Recent advancements in large language models (LLMs) have demonstrated remarkable abilities in handling a variety of natural language processing (NLP) downstream tasks, even on mathematical tasks requiring multi-step reasoning. In this report, we introduce the KwaiYiiMath which enhances the mathematical reasoning abilities of KwaiYiiBase1, by applying Supervised Fine-Tuning (SFT) and Reinforced Lea… ▽ More

    Submitted 19 October, 2023; v1 submitted 11 October, 2023; originally announced October 2023.

    Comments: technical report. arXiv admin note: text overlap with arXiv:2306.16636 by other authors

  34. arXiv:2310.07301  [pdf, other

    cs.CL

    Parrot: Enhancing Multi-Turn Instruction Following for Large Language Models

    Authors: Yuchong Sun, Che Liu, Kun Zhou, Jinwen Huang, Ruihua Song, Wayne Xin Zhao, Fuzheng Zhang, Di Zhang, Kun Gai

    Abstract: Humans often interact with large language models (LLMs) in multi-turn interaction to obtain desired answers or more information. However, most existing studies overlook the multi-turn instruction following ability of LLMs, in terms of training dataset, training method, and evaluation benchmark. In this paper, we introduce Parrot, a solution aiming to enhance multi-turn instruction following for LL… ▽ More

    Submitted 23 May, 2024; v1 submitted 11 October, 2023; originally announced October 2023.

  35. arXiv:2310.03984  [pdf, other

    cs.IR cs.LG

    AdaRec: Adaptive Sequential Recommendation for Reinforcing Long-term User Engagement

    Authors: Zhenghai Xue, Qingpeng Cai, Tianyou Zuo, Bin Yang, Lantao Hu, Peng Jiang, Kun Gai, Bo An

    Abstract: Growing attention has been paid to Reinforcement Learning (RL) algorithms when optimizing long-term user engagement in sequential recommendation tasks. One challenge in large-scale online recommendation systems is the constant and complicated changes in users' behavior patterns, such as interaction rates and retention tendencies. When formulated as a Markov Decision Process (MDP), the dynamics and… ▽ More

    Submitted 5 October, 2023; originally announced October 2023.

    Comments: Preprint. Under Review

  36. arXiv:2309.16141  [pdf, other

    cs.CV

    Align before Search: Aligning Ads Image to Text for Accurate Cross-Modal Sponsored Search

    Authors: Yuanmin Tang, Jing Yu, Keke Gai, Yujing Wang, Yue Hu, Gang Xiong, Qi Wu

    Abstract: Cross-Modal sponsored search displays multi-modal advertisements (ads) when consumers look for desired products by natural language queries in search engines. Since multi-modal ads bring complementary details for query-ads matching, the ability to align ads-specific information in both images and texts is crucial for accurate and flexible sponsored search. Conventional research mainly studies from… ▽ More

    Submitted 27 September, 2023; originally announced September 2023.

  37. arXiv:2309.16137  [pdf, other

    cs.CV

    Context-I2W: Mapping Images to Context-dependent Words for Accurate Zero-Shot Composed Image Retrieval

    Authors: Yuanmin Tang, Jing Yu, Keke Gai, Jiamin Zhuang, Gang Xiong, Yue Hu, Qi Wu

    Abstract: Different from Composed Image Retrieval task that requires expensive labels for training task-specific models, Zero-Shot Composed Image Retrieval (ZS-CIR) involves diverse tasks with a broad range of visual content manipulation intent that could be related to domain, scene, object, and attribute. The key challenge for ZS-CIR tasks is to learn a more accurate image representation that has adaptive… ▽ More

    Submitted 15 December, 2023; v1 submitted 27 September, 2023; originally announced September 2023.

    Journal ref: AAAI 2024

  38. arXiv:2309.13375  [pdf, other

    cs.IR

    Generative Retrieval with Semantic Tree-Structured Item Identifiers via Contrastive Learning

    Authors: Zihua Si, Zhongxiang Sun, Jiale Chen, Guozhang Chen, Xiaoxue Zang, Kai Zheng, Yang Song, Xiao Zhang, Jun Xu, Kun Gai

    Abstract: The retrieval phase is a vital component in recommendation systems, requiring the model to be effective and efficient. Recently, generative retrieval has become an emerging paradigm for document retrieval, showing notable performance. These methods enjoy merits like being end-to-end differentiable, suggesting their viability in recommendation. However, these methods fall short in efficiency and ef… ▽ More

    Submitted 7 July, 2024; v1 submitted 23 September, 2023; originally announced September 2023.

    Comments: 9 main pages

  39. arXiv:2309.12645  [pdf, other

    cs.IR

    KuaiSim: A Comprehensive Simulator for Recommender Systems

    Authors: Kesen Zhao, Shuchang Liu, Qingpeng Cai, Xiangyu Zhao, Ziru Liu, Dong Zheng, Peng Jiang, Kun Gai

    Abstract: Reinforcement Learning (RL)-based recommender systems (RSs) have garnered considerable attention due to their ability to learn optimal recommendation policies and maximize long-term user rewards. However, deploying RL models directly in online environments and generating authentic data through A/B tests can pose challenges and require substantial resources. Simulators offer an alternative approach… ▽ More

    Submitted 19 October, 2023; v1 submitted 22 September, 2023; originally announced September 2023.

  40. arXiv:2309.04669  [pdf, other

    cs.CV

    Unified Language-Vision Pretraining in LLM with Dynamic Discrete Visual Tokenization

    Authors: Yang Jin, Kun Xu, Kun Xu, Liwei Chen, Chao Liao, Jianchao Tan, Quzhe Huang, Bin Chen, Chenyi Lei, An Liu, Chengru Song, Xiaoqiang Lei, Di Zhang, Wenwu Ou, Kun Gai, Yadong Mu

    Abstract: Recently, the remarkable advance of the Large Language Model (LLM) has inspired researchers to transfer its extraordinary reasoning capability to both vision and language data. However, the prevailing approaches primarily regard the visual input as a prompt and focus exclusively on optimizing the text generation process conditioned upon vision content by a frozen LLM. Such an inequitable treatment… ▽ More

    Submitted 22 March, 2024; v1 submitted 8 September, 2023; originally announced September 2023.

    Comments: ICLR 2024

  41. arXiv:2308.06212  [pdf, other

    cs.IR cs.CL

    A Large Language Model Enhanced Conversational Recommender System

    Authors: Yue Feng, Shuchang Liu, Zhenghai Xue, Qingpeng Cai, Lantao Hu, Peng Jiang, Kun Gai, Fei Sun

    Abstract: Conversational recommender systems (CRSs) aim to recommend high-quality items to users through a dialogue interface. It usually contains multiple sub-tasks, such as user preference elicitation, recommendation, explanation, and item information search. To develop effective CRSs, there are some challenges: 1) how to properly manage sub-tasks; 2) how to effectively solve different sub-tasks; and 3) h… ▽ More

    Submitted 11 August, 2023; originally announced August 2023.

  42. Understanding and Modeling Passive-Negative Feedback for Short-video Sequential Recommendation

    Authors: Yunzhu Pan, Chen Gao, Jianxin Chang, Yanan Niu, Yang Song, Kun Gai, Depeng Jin, Yong Li

    Abstract: Sequential recommendation is one of the most important tasks in recommender systems, which aims to recommend the next interacted item with historical behaviors as input. Traditional sequential recommendation always mainly considers the collected positive feedback such as click, purchase, etc. However, in short-video platforms such as TikTok, video viewing behavior may not always represent positive… ▽ More

    Submitted 8 August, 2023; originally announced August 2023.

    Comments: Accepted by RecSys'23

  43. Graph Contrastive Learning with Generative Adversarial Network

    Authors: Cheng Wu, Chaokun Wang, Jingcao Xu, Ziyang Liu, Kai Zheng, Xiaowei Wang, Yang Song, Kun Gai

    Abstract: Graph Neural Networks (GNNs) have demonstrated promising results on exploiting node representations for many downstream tasks through supervised end-to-end training. To deal with the widespread label scarcity issue in real-world applications, Graph Contrastive Learning (GCL) is leveraged to train GNNs with limited or even no labels by maximizing the mutual information between nodes in its augmente… ▽ More

    Submitted 1 August, 2023; originally announced August 2023.

    Comments: KDD 2023

  44. arXiv:2306.04095  [pdf, other

    cs.IR

    PANE-GNN: Unifying Positive and Negative Edges in Graph Neural Networks for Recommendation

    Authors: Ziyang Liu, Chaokun Wang, Jingcao Xu, Cheng Wu, Kai Zheng, Yang Song, Na Mou, Kun Gai

    Abstract: Recommender systems play a crucial role in addressing the issue of information overload by delivering personalized recommendations to users. In recent years, there has been a growing interest in leveraging graph neural networks (GNNs) for recommender systems, capitalizing on advancements in graph representation learning. These GNN-based models primarily focus on analyzing users' positive feedback… ▽ More

    Submitted 7 June, 2023; v1 submitted 6 June, 2023; originally announced June 2023.

  45. arXiv:2306.03552  [pdf, other

    cs.LG cs.AI

    State Regularized Policy Optimization on Data with Dynamics Shift

    Authors: Zhenghai Xue, Qingpeng Cai, Shuchang Liu, Dong Zheng, Peng Jiang, Kun Gai, Bo An

    Abstract: In many real-world scenarios, Reinforcement Learning (RL) algorithms are trained on data with dynamics shift, i.e., with different underlying environment dynamics. A majority of current methods address such issue by training context encoders to identify environment parameters. Data with dynamics shift are separated according to their environment parameters to train the corresponding policy. Howeve… ▽ More

    Submitted 21 February, 2024; v1 submitted 6 June, 2023; originally announced June 2023.

    Comments: Accepted at NeurIPS 2023

  46. Generative Flow Network for Listwise Recommendation

    Authors: Shuchang Liu, Qingpeng Cai, Zhankui He, Bowen Sun, Julian McAuley, Dong Zheng, Peng Jiang, Kun Gai

    Abstract: Personalized recommender systems fulfill the daily demands of customers and boost online businesses. The goal is to learn a policy that can generate a list of items that matches the user's demand or interest. While most existing methods learn a pointwise scoring model that predicts the ranking score of each individual item, recent research shows that the listwise approach can further improve the r… ▽ More

    Submitted 9 June, 2023; v1 submitted 3 June, 2023; originally announced June 2023.

    Comments: 11 pages, 5 figures, 7 tables

    Journal ref: KDD '23, August 6-10, 2023, Long Beach, CA, USA

  47. Multi-behavior Self-supervised Learning for Recommendation

    Authors: Jingcao Xu, Chaokun Wang, Cheng Wu, Yang Song, Kai Zheng, Xiaowei Wang, Changping Wang, Guorui Zhou, Kun Gai

    Abstract: Modern recommender systems often deal with a variety of user interactions, e.g., click, forward, purchase, etc., which requires the underlying recommender engines to fully understand and leverage multi-behavior data from users. Despite recent efforts towards making use of heterogeneous data, multi-behavior recommendation still faces great challenges. Firstly, sparse target signals and noisy auxili… ▽ More

    Submitted 22 May, 2023; originally announced May 2023.

    Comments: SIGIR 2023

  48. When Search Meets Recommendation: Learning Disentangled Search Representation for Recommendation

    Authors: Zihua Si, Zhongxiang Sun, Xiao Zhang, Jun Xu, Xiaoxue Zang, Yang Song, Kun Gai, Ji-Rong Wen

    Abstract: Modern online service providers such as online shopping platforms often provide both search and recommendation (S&R) services to meet different user needs. Rarely has there been any effective means of incorporating user behavior data from both S&R services. Most existing approaches either simply treat S&R behaviors separately, or jointly optimize them by aggregating data from both services, ignori… ▽ More

    Submitted 18 May, 2023; originally announced May 2023.

    Comments: Accecpted by SIGIR 2023

  49. Exploration and Regularization of the Latent Action Space in Recommendation

    Authors: Shuchang Liu, Qingpeng Cai, Bowen Sun, Yuhao Wang, Ji Jiang, Dong Zheng, Kun Gai, Peng Jiang, Xiangyu Zhao, Yongfeng Zhang

    Abstract: In recommender systems, reinforcement learning solutions have effectively boosted recommendation performance because of their ability to capture long-term user-system interaction. However, the action space of the recommendation policy is a list of items, which could be extremely large with a dynamic candidate item pool. To overcome this challenge, we propose a hyper-actor and critic learning frame… ▽ More

    Submitted 7 February, 2023; v1 submitted 7 February, 2023; originally announced February 2023.

    Comments: Proceedings of the ACM Web Conference 2023 (WWW '23), May 1--5, 2023, Austin, TX, USA

  50. Multi-Task Recommendations with Reinforcement Learning

    Authors: Ziru Liu, Jiejie Tian, Qingpeng Cai, Xiangyu Zhao, Jingtong Gao, Shuchang Liu, Dayou Chen, Tonghao He, Dong Zheng, Peng Jiang, Kun Gai

    Abstract: In recent years, Multi-task Learning (MTL) has yielded immense success in Recommender System (RS) applications. However, current MTL-based recommendation models tend to disregard the session-wise patterns of user-item interactions because they are predominantly constructed based on item-wise datasets. Moreover, balancing multiple objectives has always been a challenge in this field, which is typic… ▽ More

    Submitted 9 March, 2023; v1 submitted 7 February, 2023; originally announced February 2023.

    Comments: TheWebConf2023