Search | arXiv e-print repository

Advancing Re-Ranking with Multimodal Fusion and Target-Oriented Auxiliary Tasks in E-Commerce Search

Authors: Enqiang Xu, Xinhui Li, Zhigong Zhou, Jiahao Ji, Jinyuan Zhao, Dadong Miao, Songlin Wang, Lin Liu, Sulong Xu

Abstract: In the rapidly evolving field of e-commerce, the effectiveness of search re-ranking models is crucial for enhancing user experience and driving conversion rates. Despite significant advancements in feature representation and model architecture, the integration of multimodal information remains underexplored. This study addresses this gap by investigating the computation and fusion of textual and v… ▽ More In the rapidly evolving field of e-commerce, the effectiveness of search re-ranking models is crucial for enhancing user experience and driving conversion rates. Despite significant advancements in feature representation and model architecture, the integration of multimodal information remains underexplored. This study addresses this gap by investigating the computation and fusion of textual and visual information in the context of re-ranking. We propose \textbf{A}dvancing \textbf{R}e-Ranking with \textbf{M}ulti\textbf{m}odal Fusion and \textbf{T}arget-Oriented Auxiliary Tasks (ARMMT), which integrates an attention-based multimodal fusion technique and an auxiliary ranking-aligned task to enhance item representation and improve targeting capabilities. This method not only enriches the understanding of product attributes but also enables more precise and personalized recommendations. Experimental evaluations on JD.com's search platform demonstrate that ARMMT achieves state-of-the-art performance in multimodal information integration, evidenced by a 0.22\% increase in the Conversion Rate (CVR), significantly contributing to Gross Merchandise Volume (GMV). This pioneering approach has the potential to revolutionize e-commerce re-ranking, leading to elevated user satisfaction and business growth. △ Less

Submitted 11 August, 2024; originally announced August 2024.

arXiv:2408.01456 [pdf]

Using GeoGebra to discover the motion of device in a well-known physical experimental instrument -- Looking into vibration-damping devices in the scanning tunneling microscope

Authors: Chengtian Liang, Enqi Xu

Abstract: In this paper, we take a vibration-damping devices in the well-known physical experimental instrument--scanning tunneling microscope as the study base, and with the help of GeoGebra software, we explain in detail the principle of damping the vibration of the damper in the magnetic field to realize the vibration-damping function of the whole device and establish a clear physical picture and a corre… ▽ More In this paper, we take a vibration-damping devices in the well-known physical experimental instrument--scanning tunneling microscope as the study base, and with the help of GeoGebra software, we explain in detail the principle of damping the vibration of the damper in the magnetic field to realize the vibration-damping function of the whole device and establish a clear physical picture and a correct and comprehensive knowledge. This question also shows the process of clarifying the meaning of the problem with the help of software tools. △ Less

Submitted 26 July, 2024; originally announced August 2024.

Comments: 5 pages, 8 figures

arXiv:2407.15476 [pdf, other]

MODRL-TA:A Multi-Objective Deep Reinforcement Learning Framework for Traffic Allocation in E-Commerce Search

Authors: Peng Cheng, Huimu Wang, Jinyuan Zhao, Yihao Wang, Enqiang Xu, Yu Zhao, Zhuojian Xiao, Songlin Wang, Guoyu Tang, Lin Liu, Sulong Xu

Abstract: Traffic allocation is a process of redistributing natural traffic to products by adjusting their positions in the post-search phase, aimed at effectively fostering merchant growth, precisely meeting customer demands, and ensuring the maximization of interests across various parties within e-commerce platforms. Existing methods based on learning to rank neglect the long-term value of traffic alloca… ▽ More Traffic allocation is a process of redistributing natural traffic to products by adjusting their positions in the post-search phase, aimed at effectively fostering merchant growth, precisely meeting customer demands, and ensuring the maximization of interests across various parties within e-commerce platforms. Existing methods based on learning to rank neglect the long-term value of traffic allocation, whereas approaches of reinforcement learning suffer from balancing multiple objectives and the difficulties of cold starts within realworld data environments. To address the aforementioned issues, this paper propose a multi-objective deep reinforcement learning framework consisting of multi-objective Q-learning (MOQ), a decision fusion algorithm (DFM) based on the cross-entropy method(CEM), and a progressive data augmentation system(PDA). Specifically. MOQ constructs ensemble RL models, each dedicated to an objective, such as click-through rate, conversion rate, etc. These models individually determine the position of items as actions, aiming to estimate the long-term value of multiple objectives from an individual perspective. Then we employ DFM to dynamically adjust weights among objectives to maximize long-term value, addressing temporal dynamics in objective preferences in e-commerce scenarios. Initially, PDA trained MOQ with simulated data from offline logs. As experiments progressed, it strategically integrated real user interaction data, ultimately replacing the simulated dataset to alleviate distributional shifts and the cold start problem. Experimental results on real-world online e-commerce systems demonstrate the significant improvements of MODRL-TA, and we have successfully deployed MODRL-TA on an e-commerce search platform. △ Less

Submitted 22 July, 2024; originally announced July 2024.

arXiv:2406.16170 [pdf, other]

SimCE: Simplifying Cross-Entropy Loss for Collaborative Filtering

Authors: Xiaodong Yang, Huiyuan Chen, Yuchen Yan, Yuxin Tang, Yuying Zhao, Eric Xu, Yiwei Cai, Hanghang Tong

Abstract: The learning objective is integral to collaborative filtering systems, where the Bayesian Personalized Ranking (BPR) loss is widely used for learning informative backbones. However, BPR often experiences slow convergence and suboptimal local optima, partially because it only considers one negative item for each positive item, neglecting the potential impacts of other unobserved items. To address t… ▽ More The learning objective is integral to collaborative filtering systems, where the Bayesian Personalized Ranking (BPR) loss is widely used for learning informative backbones. However, BPR often experiences slow convergence and suboptimal local optima, partially because it only considers one negative item for each positive item, neglecting the potential impacts of other unobserved items. To address this issue, the recently proposed Sampled Softmax Cross-Entropy (SSM) compares one positive sample with multiple negative samples, leading to better performance. Our comprehensive experiments confirm that recommender systems consistently benefit from multiple negative samples during training. Furthermore, we introduce a \underline{Sim}plified Sampled Softmax \underline{C}ross-\underline{E}ntropy Loss (SimCE), which simplifies the SSM using its upper bound. Our validation on 12 benchmark datasets, using both MF and LightGCN backbones, shows that SimCE significantly outperforms both BPR and SSM. △ Less

Submitted 23 June, 2024; originally announced June 2024.

arXiv:2406.11131 [pdf, other]

Are Large Language Models a Good Replacement of Taxonomies?

Authors: Yushi Sun, Hao Xin, Kai Sun, Yifan Ethan Xu, Xiao Yang, Xin Luna Dong, Nan Tang, Lei Chen

Abstract: Large language models (LLMs) demonstrate an impressive ability to internalize knowledge and answer natural language questions. Although previous studies validate that LLMs perform well on general knowledge while presenting poor performance on long-tail nuanced knowledge, the community is still doubtful about whether the traditional knowledge graphs should be replaced by LLMs. In this paper, we ask… ▽ More Large language models (LLMs) demonstrate an impressive ability to internalize knowledge and answer natural language questions. Although previous studies validate that LLMs perform well on general knowledge while presenting poor performance on long-tail nuanced knowledge, the community is still doubtful about whether the traditional knowledge graphs should be replaced by LLMs. In this paper, we ask if the schema of knowledge graph (i.e., taxonomy) is made obsolete by LLMs. Intuitively, LLMs should perform well on common taxonomies and at taxonomy levels that are common to people. Unfortunately, there lacks a comprehensive benchmark that evaluates the LLMs over a wide range of taxonomies from common to specialized domains and at levels from root to leaf so that we can draw a confident conclusion. To narrow the research gap, we constructed a novel taxonomy hierarchical structure discovery benchmark named TaxoGlimpse to evaluate the performance of LLMs over taxonomies. TaxoGlimpse covers ten representative taxonomies from common to specialized domains with in-depth experiments of different levels of entities in this taxonomy from root to leaf. Our comprehensive experiments of eighteen state-of-the-art LLMs under three prompting settings validate that LLMs can still not well capture the knowledge of specialized taxonomies and leaf-level entities. △ Less

Submitted 20 June, 2024; v1 submitted 16 June, 2024; originally announced June 2024.

Comments: Accepted by VLDB 2024

arXiv:2406.04744 [pdf, other]

CRAG -- Comprehensive RAG Benchmark

Authors: Xiao Yang, Kai Sun, Hao Xin, Yushi Sun, Nikita Bhalla, Xiangsen Chen, Sajal Choudhary, Rongze Daniel Gui, Ziran Will Jiang, Ziyu Jiang, Lingkun Kong, Brian Moran, Jiaqi Wang, Yifan Ethan Xu, An Yan, Chenyu Yang, Eting Yuan, Hanwen Zha, Nan Tang, Lei Chen, Nicolas Scheffer, Yue Liu, Nirav Shah, Rakesh Wanga, Anuj Kumar , et al. (2 additional authors not shown)

Abstract: Retrieval-Augmented Generation (RAG) has recently emerged as a promising solution to alleviate Large Language Model (LLM)'s deficiency in lack of knowledge. Existing RAG datasets, however, do not adequately represent the diverse and dynamic nature of real-world Question Answering (QA) tasks. To bridge this gap, we introduce the Comprehensive RAG Benchmark (CRAG), a factual question answering bench… ▽ More Retrieval-Augmented Generation (RAG) has recently emerged as a promising solution to alleviate Large Language Model (LLM)'s deficiency in lack of knowledge. Existing RAG datasets, however, do not adequately represent the diverse and dynamic nature of real-world Question Answering (QA) tasks. To bridge this gap, we introduce the Comprehensive RAG Benchmark (CRAG), a factual question answering benchmark of 4,409 question-answer pairs and mock APIs to simulate web and Knowledge Graph (KG) search. CRAG is designed to encapsulate a diverse array of questions across five domains and eight question categories, reflecting varied entity popularity from popular to long-tail, and temporal dynamisms ranging from years to seconds. Our evaluation on this benchmark highlights the gap to fully trustworthy QA. Whereas most advanced LLMs achieve <=34% accuracy on CRAG, adding RAG in a straightforward manner improves the accuracy only to 44%. State-of-the-art industry RAG solutions only answer 63% questions without any hallucination. CRAG also reveals much lower accuracy in answering questions regarding facts with higher dynamism, lower popularity, or higher complexity, suggesting future research directions. The CRAG benchmark laid the groundwork for a KDD Cup 2024 challenge, attracting thousands of participants and submissions within the first 50 days of the competition. We commit to maintaining CRAG to serve research communities in advancing RAG solutions and general QA solutions. △ Less

Submitted 7 June, 2024; originally announced June 2024.

arXiv:2405.05606 [pdf, other]

doi 10.1145/3626772.3661343

Optimizing E-commerce Search: Toward a Generalizable and Rank-Consistent Pre-Ranking Model

Authors: Enqiang Xu, Yiming Qiu, Junyang Bai, Ping Zhang, Dadong Miao, Songlin Wang, Guoyu Tang, Lin Liu, Mingming Li

Abstract: In large e-commerce platforms, search systems are typically composed of a series of modules, including recall, pre-ranking, and ranking phases. The pre-ranking phase, serving as a lightweight module, is crucial for filtering out the bulk of products in advance for the downstream ranking module. Industrial efforts on optimizing the pre-ranking model have predominantly focused on enhancing ranking c… ▽ More In large e-commerce platforms, search systems are typically composed of a series of modules, including recall, pre-ranking, and ranking phases. The pre-ranking phase, serving as a lightweight module, is crucial for filtering out the bulk of products in advance for the downstream ranking module. Industrial efforts on optimizing the pre-ranking model have predominantly focused on enhancing ranking consistency, model structure, and generalization towards long-tail items. Beyond these optimizations, meeting the system performance requirements presents a significant challenge. Contrasting with existing industry works, we propose a novel method: a Generalizable and RAnk-ConsistEnt Pre-Ranking Model (GRACE), which achieves: 1) Ranking consistency by introducing multiple binary classification tasks that predict whether a product is within the top-k results as estimated by the ranking model, which facilitates the addition of learning objectives on common point-wise ranking models; 2) Generalizability through contrastive learning of representation for all products by pre-training on a subset of ranking product embeddings; 3) Ease of implementation in feature construction and online deployment. Our extensive experiments demonstrate significant improvements in both offline metrics and online A/B test: a 0.75% increase in AUえーゆーC and a 1.28% increase in CVR. △ Less

Submitted 13 May, 2024; v1 submitted 9 May, 2024; originally announced May 2024.

ACM Class: H.3.3

arXiv:2404.06078 [pdf, other]

End-to-end training of Multimodal Model and ranking Model

Authors: Xiuqi Deng, Lu Xu, Xiyao Li, Jinkai Yu, Erpeng Xue, Zhongyuan Wang, Di Zhang, Zhaojie Liu, Guorui Zhou, Yang Song, Na Mou, Shen Jiang, Han Li

Abstract: Traditional recommender systems heavily rely on ID features, which often encounter challenges related to cold-start and generalization. Modeling pre-extracted content features can mitigate these issues, but is still a suboptimal solution due to the discrepancies between training tasks and model parameters. End-to-end training presents a promising solution for these problems, yet most of the existi… ▽ More Traditional recommender systems heavily rely on ID features, which often encounter challenges related to cold-start and generalization. Modeling pre-extracted content features can mitigate these issues, but is still a suboptimal solution due to the discrepancies between training tasks and model parameters. End-to-end training presents a promising solution for these problems, yet most of the existing works mainly focus on retrieval models, leaving the multimodal techniques under-utilized. In this paper, we propose an industrial multimodal recommendation framework named EM3: End-to-end training of Multimodal Model and ranking Model, which sufficiently utilizes multimodal information and allows personalized ranking tasks to directly train the core modules in the multimodal model to obtain more task-oriented content features, without overburdening resource consumption. First, we propose Fusion-Q-Former, which consists of transformers and a set of trainable queries, to fuse different modalities and generate fixed-length and robust multimodal embeddings. Second, in our sequential modeling for user content interest, we utilize Low-Rank Adaptation technique to alleviate the conflict between huge resource consumption and long sequence length. Third, we propose a novel Content-ID-Contrastive learning task to complement the advantages of content and ID by aligning them with each other, obtaining more task-oriented content embeddings and more generalized ID embeddings. In experiments, we implement EM3 on different ranking models in two scenario, achieving significant improvements in both offline evaluation and online A/B test, verifying the generalizability of our method. Ablation studies and visualization are also performed. Furthermore, we also conduct experiments on two public datasets to show that our proposed method outperforms the state-of-the-art methods. △ Less

Submitted 9 April, 2024; originally announced April 2024.

Comments: 9 pages, 8 figures

arXiv:2403.10045 [pdf, other]

Towards Adversarially Robust Dataset Distillation by Curvature Regularization

Authors: Eric Xue, Yijiang Li, Haoyang Liu, Yifan Shen, Haohan Wang

Abstract: Dataset distillation (DD) allows datasets to be distilled to fractions of their original size while preserving the rich distributional information so that models trained on the distilled datasets can achieve a comparable accuracy while saving significant computational loads. Recent research in this area has been focusing on improving the accuracy of models trained on distilled datasets. In this pa… ▽ More Dataset distillation (DD) allows datasets to be distilled to fractions of their original size while preserving the rich distributional information so that models trained on the distilled datasets can achieve a comparable accuracy while saving significant computational loads. Recent research in this area has been focusing on improving the accuracy of models trained on distilled datasets. In this paper, we aim to explore a new perspective of DD. We study how to embed adversarial robustness in distilled datasets, so that models trained on these datasets maintain the high accuracy and meanwhile acquire better adversarial robustness. We propose a new method that achieves this goal by incorporating curvature regularization into the distillation process with much less computational overhead than standard adversarial training. Extensive empirical experiments suggest that our method not only outperforms standard adversarial training on both accuracy and robustness with less computation overhead but is also capable of generating robust distilled datasets that can withstand various adversarial attacks. △ Less

Submitted 15 March, 2024; originally announced March 2024.

Comments: 17 pages, 3 figures

arXiv:2403.05530 [pdf, other]

Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1110 additional authors not shown)

Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February version on the great majority of capabilities and benchmarks; (2) Gemini 1.5 Flash, a more lightweight variant designed for efficiency with minimal regression in quality. Gemini 1.5 models achieve near-perfect recall on long-context retrieval tasks across modalities, improve the state-of-the-art in long-document QA, long-video QA and long-context ASR, and match or surpass Gemini 1.0 Ultra's state-of-the-art performance across a broad set of benchmarks. Studying the limits of Gemini 1.5's long-context ability, we find continued improvement in next-token prediction and near-perfect retrieval (>99%) up to at least 10M tokens, a generational leap over existing models such as Claude 3.0 (200k) and GPT-4 Turbo (128k). Finally, we highlight real-world use cases, such as Gemini 1.5 collaborating with professionals on completing their tasks achieving 26 to 75% time savings across 10 different job categories, as well as surprising new capabilities of large language models at the frontier; when given a grammar manual for Kalamang, a language with fewer than 200 speakers worldwide, the model learns to translate English to Kalamang at a similar level to a person who learned from the same content. △ Less

Submitted 8 August, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

arXiv:2403.04735 [pdf, other]

SnapNTell: Enhancing Entity-Centric Visual Question Answering with Retrieval Augmented Multimodal LLM

Authors: Jielin Qiu, Andrea Madotto, Zhaojiang Lin, Paul A. Crook, Yifan Ethan Xu, Xin Luna Dong, Christos Faloutsos, Lei Li, Babak Damavandi, Seungwhan Moon

Abstract: Vision-extended LLMs have made significant strides in Visual Question Answering (VQA). Despite these advancements, VLLMs still encounter substantial difficulties in handling queries involving long-tail entities, with a tendency to produce erroneous or hallucinated responses. In this work, we introduce a novel evaluative benchmark named \textbf{SnapNTell}, specifically tailored for entity-centric V… ▽ More Vision-extended LLMs have made significant strides in Visual Question Answering (VQA). Despite these advancements, VLLMs still encounter substantial difficulties in handling queries involving long-tail entities, with a tendency to produce erroneous or hallucinated responses. In this work, we introduce a novel evaluative benchmark named \textbf{SnapNTell}, specifically tailored for entity-centric VQA. This task aims to test the models' capabilities in identifying entities and providing detailed, entity-specific knowledge. We have developed the \textbf{SnapNTell Dataset}, distinct from traditional VQA datasets: (1) It encompasses a wide range of categorized entities, each represented by images and explicitly named in the answers; (2) It features QA pairs that require extensive knowledge for accurate responses. The dataset is organized into 22 major categories, containing 7,568 unique entities in total. For each entity, we curated 10 illustrative images and crafted 10 knowledge-intensive QA pairs. To address this novel task, we devised a scalable, efficient, and transparent retrieval-augmented multimodal LLM. Our approach markedly outperforms existing methods on the SnapNTell dataset, achieving a 66.5\% improvement in the BELURT score. We will soon make the dataset and the source code publicly accessible. △ Less

Submitted 7 March, 2024; originally announced March 2024.

arXiv:2403.03937 [pdf, ps, other]

Settling the Competition Complexity of Additive Buyers over Independent Items

Authors: Mahsa Derakhshan, Emily Ryu, S. Matthew Weinberg, Eric Xue

Abstract: The competition complexity of an auction setting is the number of additional bidders needed such that the simple mechanism of selling items separately (with additional bidders) achieves greater revenue than the optimal but complex (randomized, prior-dependent, Bayesian-truthful) optimal mechanism without the additional bidders. Our main result settles the competition complexity of $n$ bidders with… ▽ More The competition complexity of an auction setting is the number of additional bidders needed such that the simple mechanism of selling items separately (with additional bidders) achieves greater revenue than the optimal but complex (randomized, prior-dependent, Bayesian-truthful) optimal mechanism without the additional bidders. Our main result settles the competition complexity of $n$ bidders with additive values over $m < n$ independent items at $Θしーた(\sqrt{nm})$. The $O(\sqrt{nm})$ upper bound is due to [BW19], and our main result improves the prior lower bound of $Ωおめが(\ln n)$ to $Ωおめが(\sqrt{nm})$. Our main result follows from an explicit construction of a Bayesian IC auction for $n$ bidders with additive values over $m<n$ independent items drawn from the Equal Revenue curve truncated at $\sqrt{nm}$ ($\mathcal{ER}_{\le \sqrt{nm}}$), which achieves revenue that exceeds $\text{SRev}_{n+\sqrt{nm}}(\mathcal{ER}_{\le \sqrt{nm}}^m)$. Along the way, we show that the competition complexity of $n$ bidders with additive values over $m$ independent items is exactly equal to the minimum $c$ such that $\text{SRev}_{n+c}(\mathcal{ER}_{\le p}^m) \geq \text{Rev}_n(\mathcal{ER}_{\le p}^m)$ for all $p$ (that is, some truncated Equal Revenue witnesses the worst-case competition complexity). Interestingly, we also show that the untruncated Equal Revenue curve does not witness the worst-case competition complexity when $n > m$: $\text{SRev}_n(\mathcal{ER}^m) = nm+O_m(\ln (n)) \leq \text{SRev}_{n+O_m(\ln (n))}(\mathcal{ER}^m)$, and therefore our result can only follow by considering all possible truncations. △ Less

Submitted 6 March, 2024; originally announced March 2024.

Comments: 50 pages

arXiv:2401.03673 [pdf, other]

Comparing discriminating abilities of evaluation metrics in link prediction

Authors: Xinshan Jiao, Shuyan Wan, Qian Liu, Yilin Bi, Yan-Li Lee, En Xu, Dong Hao, Tao Zhou

Abstract: Link prediction aims to predict the potential existence of links between two unconnected nodes within a network based on the known topological characteristics. Evaluation metrics are used to assess the effectiveness of algorithms in link prediction. The discriminating ability of these evaluation metrics is vitally important for accurately evaluating link prediction algorithms. In this study, we pr… ▽ More Link prediction aims to predict the potential existence of links between two unconnected nodes within a network based on the known topological characteristics. Evaluation metrics are used to assess the effectiveness of algorithms in link prediction. The discriminating ability of these evaluation metrics is vitally important for accurately evaluating link prediction algorithms. In this study, we propose an artificial network model, based on which one can adjust a single parameter to monotonically and continuously turn the prediction accuracy of the specifically designed link prediction algorithm. Building upon this foundation, we show a framework to depict the effectiveness of evaluating metrics by focusing on their discriminating ability. Specifically, a quantitative comparison in the abilities of correctly discerning varying prediction accuracies was conducted encompassing nine evaluation metrics: Precision, Recall, F1-Measure, Matthews Correlation Coefficient (MCC), Balanced Precision (BP), the Area Under the receiver operating characteristic Curve (AUえーゆーC), the Area Under the Precision-Recall curve (AUPR), Normalized Discounted Cumulative Gain (NDCG), and the Area Under the magnified ROC (AUC-mROC). The results indicate that the discriminating abilities of the three metrics, AUC, AUPR, and NDCG, are significantly higher than those of other metrics. △ Less

Submitted 8 January, 2024; originally announced January 2024.

arXiv:2312.11805 [pdf, other]

Gemini: A Family of Highly Capable Multimodal Models

Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultra model advances the state of the art in 30 of 32 of these benchmarks - notably being the first model to achieve human-expert performance on the well-studied exam benchmark MMLU, and improving the state of the art in every one of the 20 multimodal benchmarks we examined. We believe that the new capabilities of the Gemini family in cross-modal reasoning and language understanding will enable a wide variety of use cases. We discuss our approach toward post-training and deploying Gemini models responsibly to users through services including Gemini, Gemini Advanced, Google AI Studio, and Cloud Vertex AI. △ Less

Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

arXiv:2312.11800 [pdf, other]

Non-Excludable Bilateral Trade Between Groups

Authors: Yixuan Even Xu, Hanrui Zhang, Vincent Conitzer

Abstract: Bilateral trade is one of the most natural and important forms of economic interaction: A seller has a single, indivisible item for sale, and a buyer is potentially interested. The two parties typically have different, privately known valuations for the item, and ideally, they would like to trade if the buyer values the item more than the seller. The celebrated impossibility result by Myerson and… ▽ More Bilateral trade is one of the most natural and important forms of economic interaction: A seller has a single, indivisible item for sale, and a buyer is potentially interested. The two parties typically have different, privately known valuations for the item, and ideally, they would like to trade if the buyer values the item more than the seller. The celebrated impossibility result by Myerson and Satterthwaite shows that any mechanism for this setting must violate at least one important desideratum. In this paper, we investigate a richer paradigm of bilateral trade, with many self-interested buyers and sellers on both sides of a single trade who cannot be excluded from the trade. We show that this allows for more positive results. In fact, we establish a dichotomy in the possibility of trading efficiently. If in expectation, the buyers value the item more, we can achieve efficiency in the limit. If this is not the case, then efficiency cannot be achieved in general. En route, we characterize trading mechanisms that encourage truth-telling, which may be of independent interest. We also evaluate our trading mechanisms experimentally, and the experiments align with our theoretical results. △ Less

Submitted 18 December, 2023; originally announced December 2023.

Comments: 14 pages, 2 figures, 1 table, aaai 2024

arXiv:2312.11029 [pdf, other]

Picsou: Enabling Efficient Cross-Consensus Communication

Authors: Reginald Frank, Micah Murray, Suyash Gupta, Ethan Xu, Natacha Crooks, Manos Kapritsos

Abstract: Replicated state machines (RSMs) cannot effectively communicate today as there is no formal framework or efficient protocol to do so. To address this issue, we introduce a new primitive, the Cross-Cluster Consistent Broadcast (C3B) and present PICSOU, a practical C3B implementation. PICSOU draws inspiration from networking and TCP to allow two RSMs to communicate with constant metadata overhead in… ▽ More Replicated state machines (RSMs) cannot effectively communicate today as there is no formal framework or efficient protocol to do so. To address this issue, we introduce a new primitive, the Cross-Cluster Consistent Broadcast (C3B) and present PICSOU, a practical C3B implementation. PICSOU draws inspiration from networking and TCP to allow two RSMs to communicate with constant metadata overhead in the failure-free case and minimal number of message resends in the case of failures. PICSOU is flexible and allows both crash fault-tolerant and byzantine fault-tolerant protocols to communicate. At the heart of PICSOU's good performance and generality lies a novel technique we call QUACKs (quorum acknowledgements) that allow nodes in each RSM to precisely determine when messages have definitely been received, or definitely been lost. Our results are promising: we obtain up to 24x better performance than existing all-to-all solutions. △ Less

Submitted 18 December, 2023; originally announced December 2023.

arXiv:2312.09058 [pdf, other]

Learning Coalition Structures with Games

Authors: Yixuan Even Xu, Chun Kai Ling, Fei Fang

Abstract: Coalitions naturally exist in many real-world systems involving multiple decision makers such as ridesharing, security, and online ad auctions, but the coalition structure among the agents is often unknown. We propose and study an important yet previously overseen problem -- Coalition Structure Learning (CSL), where we aim to carefully design a series of games for the agents and infer the underlyi… ▽ More Coalitions naturally exist in many real-world systems involving multiple decision makers such as ridesharing, security, and online ad auctions, but the coalition structure among the agents is often unknown. We propose and study an important yet previously overseen problem -- Coalition Structure Learning (CSL), where we aim to carefully design a series of games for the agents and infer the underlying coalition structure by observing their interactions in those games. We establish a lower bound on the sample complexity -- defined as the number of games needed to learn the structure -- of any algorithms for CSL and propose the Iterative Grouping (IG) algorithm for designing normal-form games to achieve the lower bound. We show that IG can be extended to other succinct games such as congestion games and graphical games. Moreover, we solve CSL in a more restrictive and practical setting: auctions. We show a variant of IG to solve CSL in the auction setting even if we cannot design the bidder valuations. Finally, we conduct experiments to evaluate IG in the auction setting and the results align with our theoretical analysis. △ Less

Submitted 18 December, 2023; v1 submitted 14 December, 2023; originally announced December 2023.

Comments: 13 pages, 4 figures, 3 tables, aaai 2024

arXiv:2312.01487 [pdf, other]

BetterMinton Service: Analyzing the Badminton Service using Open Kinetic Chain

Authors: Eden Cong-He Xu, Lung-Pan Cheng

Abstract: We present a badminton training system that focuses on the backhand short service. Unlike the prior motor skill training systems which focus on the trainee's posture, our system analyzes the process of moving joints with the open kinetic chain (OKC), which helps align movement and minimize muscle use for better joint control. We process the users' mocap data to visually show their last service pro… ▽ More We present a badminton training system that focuses on the backhand short service. Unlike the prior motor skill training systems which focus on the trainee's posture, our system analyzes the process of moving joints with the open kinetic chain (OKC), which helps align movement and minimize muscle use for better joint control. We process the users' mocap data to visually show their last service process comparing to 4 ideal OKC characteristics that we collected from a 6-sub-elite formative study as well as recommended contact posture. We validate our system through a 12-user study that measures serving accuracy, qualitative feedback, and skeletal data with users at various skill levels and open source our skeletal analysis model for future use. While the participants' overall service accuracy was not significantly improved, our results show that our system helps participants in the short term to fine-tune their service motion closer to our ideal 4 OKC characteristics. △ Less

Submitted 3 December, 2023; originally announced December 2023.

Comments: 14 pages, The 9th Annual Conference of Taiwanese Association of Computer-Human Interaction

arXiv:2310.06194 [pdf, other]

Stability and Regret bounds on Distributed Truncated Predictive Control for Networked Dynamical Systems

Authors: Eric Xu, Guannan Qu

Abstract: This work is primarily concerned about the distributed control of networked linear timeinvariant (LTI) systems. In particular, we propose a truncated predictive control algorithm based on $κかっぱ$-hop neighbourhoods of the agents of the networked system. We establish stability and regret bounds for the proposed algorithm, which shows that the regret decays exponentially when the temporal prediction hor… ▽ More This work is primarily concerned about the distributed control of networked linear timeinvariant (LTI) systems. In particular, we propose a truncated predictive control algorithm based on $κかっぱ$-hop neighbourhoods of the agents of the networked system. We establish stability and regret bounds for the proposed algorithm, which shows that the regret decays exponentially when the temporal prediction horizon $k$ and the spatial radius $κかっぱ$ increases. △ Less

Submitted 9 October, 2023; originally announced October 2023.

Comments: 25 pages, 2 figures, submitted to ACC 2024

arXiv:2310.05995 [pdf, other]

A One-Size-Fits-All Approach to Improving Randomness in Paper Assignment

Authors: Yixuan Even Xu, Steven Jecmen, Zimeng Song, Fei Fang

Abstract: The assignment of papers to reviewers is a crucial part of the peer review processes of large publication venues, where organizers (e.g., conference program chairs) rely on algorithms to perform automated paper assignment. As such, a major challenge for the organizers of these processes is to specify paper assignment algorithms that find appropriate assignments with respect to various desiderata.… ▽ More The assignment of papers to reviewers is a crucial part of the peer review processes of large publication venues, where organizers (e.g., conference program chairs) rely on algorithms to perform automated paper assignment. As such, a major challenge for the organizers of these processes is to specify paper assignment algorithms that find appropriate assignments with respect to various desiderata. Although the main objective when choosing a good paper assignment is to maximize the expertise of each reviewer for their assigned papers, several other considerations make introducing randomization into the paper assignment desirable: robustness to malicious behavior, the ability to evaluate alternative paper assignments, reviewer diversity, and reviewer anonymity. However, it is unclear in what way one should randomize the paper assignment in order to best satisfy all of these considerations simultaneously. In this work, we present a practical, one-size-fits-all method for randomized paper assignment intended to perform well across different motivations for randomness. We show theoretically and experimentally that our method outperforms currently-deployed methods for randomized paper assignment on several intuitive randomness metrics, demonstrating that the randomized assignments produced by our method are general-purpose. △ Less

Submitted 18 October, 2023; v1 submitted 8 October, 2023; originally announced October 2023.

Comments: 24 pages, 8 figures, 3 tables, neurips 2023 spotlight

arXiv:2310.03954 [pdf, other]

Many-Body-Expansion Based on Variational Quantum Eigensolver and Deflation for Dynamical Correlation

Authors: Enhua Xu, Yuma Shimomoto, Seiichiro L. Ten-no, Takashi Tsuchimochi

Abstract: In this study, we utilize the many-body expansion (MBE) framework to decompose electronic structures into fragments by incrementing the virtual orbitals. Our work aims to accurately solve the ground and excited state energies of each fragment using the variational quantum eigensolver and deflation algorithms. Although our approach is primarily based on unitary coupled cluster singles and doubles (… ▽ More In this study, we utilize the many-body expansion (MBE) framework to decompose electronic structures into fragments by incrementing the virtual orbitals. Our work aims to accurately solve the ground and excited state energies of each fragment using the variational quantum eigensolver and deflation algorithms. Although our approach is primarily based on unitary coupled cluster singles and doubles (UCCSD) and a generalization thereof, we also introduce modifications and approximations to conserve quantum resources in MBE by partially generalizing the UCCSD operator and neglecting the relaxation of the reference states. As a proof of concept, we investigate the potential energy surfaces for the bond-breaking processes of the ground state of two molecules ($\rm H_2O$ and $\rm N_2$) and calculate the ground and excited state energies of three molecules (LiH, CH$^+$, and $\rm H_2O$). The results demonstrate that our approach can, in principle, provide reliable descriptions in all tests, including strongly correlated systems, when appropriate approximations are chosen. Additionally, we perform model simulations to investigate the impact of shot noise on the total MBE energy and show that precise energy estimation is crucial for lower-order MBE fragments. △ Less

Submitted 5 October, 2023; originally announced October 2023.

arXiv:2308.10168 [pdf, other]

Head-to-Tail: How Knowledgeable are Large Language Models (LLMs)? A.K.A. Will LLMs Replace Knowledge Graphs?

Authors: Kai Sun, Yifan Ethan Xu, Hanwen Zha, Yue Liu, Xin Luna Dong

Abstract: Since the recent prosperity of Large Language Models (LLMs), there have been interleaved discussions regarding how to reduce hallucinations from LLM responses, how to increase the factuality of LLMs, and whether Knowledge Graphs (KGs), which store the world knowledge in a symbolic form, will be replaced with LLMs. In this paper, we try to answer these questions from a new angle: How knowledgeable… ▽ More Since the recent prosperity of Large Language Models (LLMs), there have been interleaved discussions regarding how to reduce hallucinations from LLM responses, how to increase the factuality of LLMs, and whether Knowledge Graphs (KGs), which store the world knowledge in a symbolic form, will be replaced with LLMs. In this paper, we try to answer these questions from a new angle: How knowledgeable are LLMs? To answer this question, we constructed Head-to-Tail, a benchmark that consists of 18K question-answer (QA) pairs regarding head, torso, and tail facts in terms of popularity. We designed an automated evaluation method and a set of metrics that closely approximate the knowledge an LLM confidently internalizes. Through a comprehensive evaluation of 16 publicly available LLMs, we show that existing LLMs are still far from being perfect in terms of their grasp of factual knowledge, especially for facts of torso-to-tail entities. △ Less

Submitted 2 April, 2024; v1 submitted 20 August, 2023; originally announced August 2023.

Comments: To appear in NAACL 2024

arXiv:2307.15051 [pdf]

Matching Patients to Clinical Trials with Large Language Models

Authors: Qiao Jin, Zifeng Wang, Charalampos S. Floudas, Fangyuan Chen, Changlin Gong, Dara Bracken-Clarke, Elisabetta Xue, Yifan Yang, Jimeng Sun, Zhiyong Lu

Abstract: Clinical trials are often hindered by the challenge of patient recruitment. In this work, we introduce TrialGPT, a first-of-its-kind large language model (LLM) framework to assist patient-to-trial matching. Given a patient note, TrialGPT predicts the patient's eligibility on a criterion-by-criterion basis and then consolidates these predictions to assess the patient's eligibility for the target tr… ▽ More Clinical trials are often hindered by the challenge of patient recruitment. In this work, we introduce TrialGPT, a first-of-its-kind large language model (LLM) framework to assist patient-to-trial matching. Given a patient note, TrialGPT predicts the patient's eligibility on a criterion-by-criterion basis and then consolidates these predictions to assess the patient's eligibility for the target trial. We evaluate the trial-level prediction performance of TrialGPT on three publicly available cohorts of 184 patients with over 18,000 trial annotations. We also engaged three physicians to label over 1,000 patient-criterion pairs to assess its criterion-level prediction accuracy. Experimental results show that TrialGPT achieves a criterion-level accuracy of 87.3% with faithful explanations, close to the expert performance (88.7%-90.0%). The aggregated TrialGPT scores are highly correlated with human eligibility judgments, and they outperform the best-competing models by 32.6% to 57.2% in ranking and excluding clinical trials. Furthermore, our user study reveals that TrialGPT can significantly reduce the screening time (by 42.6%) in a real-life clinical trial matching task. These results and analyses have demonstrated promising opportunities for clinical trial matching with LLMs such as TrialGPT. △ Less

Submitted 27 April, 2024; v1 submitted 27 July, 2023; originally announced July 2023.

Comments: Under review

arXiv:2307.02852 [pdf, other]

TDLE: 2-D LiDAR Exploration With Hierarchical Planning Using Regional Division

Authors: Xuyang Zhao, Chengpu Yu, Erpei Xu, Yixuan Liu

Abstract: Exploration systems are critical for enhancing the autonomy of robots. Due to the unpredictability of the future planning space, existing methods either adopt an inefficient greedy strategy or require a lot of resources to obtain a global solution. In this work, we address the challenge of obtaining global exploration routes with minimal computing resources. A hierarchical planning framework dynam… ▽ More Exploration systems are critical for enhancing the autonomy of robots. Due to the unpredictability of the future planning space, existing methods either adopt an inefficient greedy strategy or require a lot of resources to obtain a global solution. In this work, we address the challenge of obtaining global exploration routes with minimal computing resources. A hierarchical planning framework dynamically divides the planning space into subregions and arranges their orders to provide global guidance for exploration. Indicators that are compatible with the subregion order are used to choose specific exploration targets, thereby considering estimates of spatial structure and extending the planning space to unknown regions. Extensive simulations and field tests demonstrate the efficacy of our method in comparison to existing 2D LiDAR-based approaches. Our code has been made public for further investigation. △ Less

Submitted 6 July, 2023; originally announced July 2023.

Comments: Accepted in IEEE International Conference on Automation Science and Engineering (CASE) 2023

arXiv:2306.11820 [pdf, other]

Nearly Optimal Committee Selection For Bias Minimization

Authors: Yang Cai, Eric Xue

Abstract: We study the model of metric voting proposed by Feldman et al. [2020]. In this model, experts and candidates are located in a metric space, and each candidate possesses a quality that is independent of her location. An expert evaluates each candidate as the candidate's quality less a bias term--the distance between the candidate and the expert in the metric space. The expert then votes for her fav… ▽ More We study the model of metric voting proposed by Feldman et al. [2020]. In this model, experts and candidates are located in a metric space, and each candidate possesses a quality that is independent of her location. An expert evaluates each candidate as the candidate's quality less a bias term--the distance between the candidate and the expert in the metric space. The expert then votes for her favorite candidate. The goal is to select a voting rule and a committee of experts to mitigate the bias. More specifically, given $m$ candidates, what is the minimum number of experts needed to ensure that the voting rule selects a candidate whose quality is at most $\varepsilon$ worse than the best one? Our first main result is a new way to select the committee using exponentially less experts compared to the method proposed in Feldman et al. [2020]. Our second main result is a novel construction that substantially improves the lower bound on the committee size. Indeed, our upper and lower bounds match in terms of $m$, the number of candidates, and $\varepsilon$, the desired accuracy, for general convex normed spaces, and differ by a multiplicative factor that only depends on the dimension of the underlying normed space but is independent of other parameters of the problem. We extend the nearly matching upper and lower bounds to the setting in which each expert returns a ranking of her top $k$ candidates and we wish to choose $\ell$ candidates with cumulative quality at most $\varepsilon$ worse than that of the best set of $\ell$ candidates, settling an open problem of Feldman et al. [2020]. Finally, we consider the setting where there are multiple rounds of voting. We show that by introducing another round of voting, the number of experts needed to guarantee the selection of an $\varepsilon$-optimal candidate becomes independent of the number of candidates. △ Less

Submitted 22 June, 2023; v1 submitted 20 June, 2023; originally announced June 2023.

arXiv:2306.10648 [pdf, other]

doi 10.1145/3589334.3645418

Bidder Selection Problem in Position Auctions: A Fast and Simple Algorithm via Poisson Approximation

Authors: Nickolai Gravin, Yixuan Even Xu, Renfei Zhou

Abstract: In the Bidder Selection Problem (BSP) there is a large pool of $n$ potential advertisers competing for ad slots on the user's web page. Due to strict computational restrictions, the advertising platform can run a proper auction only for a fraction $k<n$ of advertisers. We consider the basic optimization problem underlying BSP: given $n$ independent prior distributions, how to efficiently find a su… ▽ More In the Bidder Selection Problem (BSP) there is a large pool of $n$ potential advertisers competing for ad slots on the user's web page. Due to strict computational restrictions, the advertising platform can run a proper auction only for a fraction $k<n$ of advertisers. We consider the basic optimization problem underlying BSP: given $n$ independent prior distributions, how to efficiently find a subset of $k$ with the objective of either maximizing expected social welfare or revenue of the platform. We study BSP in the classic multi-winner model of position auctions for welfare and revenue objectives using the optimal (respectively, VCG mechanism, or Myerson's auction) format for the selected set of bidders. Previous PTAS results for BSP optimization were only known for single-item auctions and in case of [Segev and Singla 2021] for $l$-unit auctions. More importantly, all of these PTASes were computational complexity results with impractically large running times, which defeats the purpose of using these algorithms under severe computational constraints. We propose a novel Poisson relaxation of BSP for position auctions that immediately implies that 1) BSP is polynomial-time solvable up to a vanishingly small error as the problem size $k$ grows; 2) there is a PTAS for position auctions after combining our relaxation with the trivial brute force algorithm. Unlike all previous PTASes, we implemented our algorithm and did extensive numerical experiments on practically relevant input sizes. First, our experiments corroborate the previous experimental findings of Mehta et al. that a few simple heuristics used in practice perform surprisingly well in terms of approximation factor. Furthermore, our algorithm outperforms Greedy both in running time and approximation on medium and large-sized instances. △ Less

Submitted 27 April, 2024; v1 submitted 18 June, 2023; originally announced June 2023.

Comments: 19 pages; in WWW 2024

MSC Class: 68W25

arXiv:2306.10336 [pdf, other]

Fair Causal Feature Selection

Authors: Zhaolong Ling, Enqi Xu, Peng Zhou, Liang Du, Kui Yu, Xindong Wu

Abstract: Fair feature selection for classification decision tasks has recently garnered significant attention from researchers. However, existing fair feature selection algorithms fall short of providing a full explanation of the causal relationship between features and sensitive attributes, potentially impacting the accuracy of fair feature identification. To address this issue, we propose a Fair Causal F… ▽ More Fair feature selection for classification decision tasks has recently garnered significant attention from researchers. However, existing fair feature selection algorithms fall short of providing a full explanation of the causal relationship between features and sensitive attributes, potentially impacting the accuracy of fair feature identification. To address this issue, we propose a Fair Causal Feature Selection algorithm, called FairCFS. Specifically, FairCFS constructs a localized causal graph that identifies the Markov blankets of class and sensitive variables, to block the transmission of sensitive information for selecting fair causal features. Extensive experiments on seven public real-world datasets validate that FairCFS has comparable accuracy compared to eight state-of-the-art feature selection algorithms, while presenting more superior fairness. △ Less

Submitted 18 September, 2023; v1 submitted 17 June, 2023; originally announced June 2023.

arXiv:2306.04718 [pdf, other]

Scalable Neural Symbolic Regression using Control Variables

Authors: Xieting Chu, Hongjue Zhao, Enze Xu, Hairong Qi, Minghan Chen, Huajie Shao

Abstract: Symbolic regression (SR) is a powerful technique for discovering the analytical mathematical expression from data, finding various applications in natural sciences due to its good interpretability of results. However, existing methods face scalability issues when dealing with complex equations involving multiple variables. To address this challenge, we propose ScaleSR, a scalable symbolic regressi… ▽ More Symbolic regression (SR) is a powerful technique for discovering the analytical mathematical expression from data, finding various applications in natural sciences due to its good interpretability of results. However, existing methods face scalability issues when dealing with complex equations involving multiple variables. To address this challenge, we propose ScaleSR, a scalable symbolic regression model that leverages control variables to enhance both accuracy and scalability. The core idea is to decompose multi-variable symbolic regression into a set of single-variable SR problems, which are then combined in a bottom-up manner. The proposed method involves a four-step process. First, we learn a data generator from observed data using deep neural networks (DNNs). Second, the data generator is used to generate samples for a certain variable by controlling the input variables. Thirdly, single-variable symbolic regression is applied to estimate the corresponding mathematical expression. Lastly, we repeat steps 2 and 3 by gradually adding variables one by one until completion. We evaluate the performance of our method on multiple benchmark datasets. Experimental results demonstrate that the proposed ScaleSR significantly outperforms state-of-the-art baselines in discovering mathematical expressions with multiple variables. Moreover, it can substantially reduce the search space for symbolic regression. The source code will be made publicly available upon publication. △ Less

Submitted 9 July, 2024; v1 submitted 7 June, 2023; originally announced June 2023.

arXiv:2304.12281 [pdf, other]

HOSNeRF: Dynamic Human-Object-Scene Neural Radiance Fields from a Single Video

Authors: Jia-Wei Liu, Yan-Pei Cao, Tianyuan Yang, Eric Zhongcong Xu, Jussi Keppo, Ying Shan, Xiaohu Qie, Mike Zheng Shou

Abstract: We introduce HOSNeRF, a novel 360° free-viewpoint rendering method that reconstructs neural radiance fields for dynamic human-object-scene from a single monocular in-the-wild video. Our method enables pausing the video at any frame and rendering all scene details (dynamic humans, objects, and backgrounds) from arbitrary viewpoints. The first challenge in this task is the complex object motions in… ▽ More We introduce HOSNeRF, a novel 360° free-viewpoint rendering method that reconstructs neural radiance fields for dynamic human-object-scene from a single monocular in-the-wild video. Our method enables pausing the video at any frame and rendering all scene details (dynamic humans, objects, and backgrounds) from arbitrary viewpoints. The first challenge in this task is the complex object motions in human-object interactions, which we tackle by introducing the new object bones into the conventional human skeleton hierarchy to effectively estimate large object deformations in our dynamic human-object model. The second challenge is that humans interact with different objects at different times, for which we introduce two new learnable object state embeddings that can be used as conditions for learning our human-object representation and scene representation, respectively. Extensive experiments show that HOSNeRF significantly outperforms SOTA approaches on two challenging datasets by a large margin of 40% ~ 50% in terms of LPIPS. The code, data, and compelling examples of 360° free-viewpoint renderings from single videos will be released in https://showlab.github.io/HOSNeRF. △ Less

Submitted 24 April, 2023; originally announced April 2023.

Comments: Project page: https://showlab.github.io/HOSNeRF

arXiv:2303.13091 [pdf, other]

Limits of Predictability in Top-N Recommendation

Authors: En Xu, Zhiwen Yu, Ying Zhang, Bin Guo, Lina Yao

Abstract: Top-N recommendation aims to recommend each consumer a small set of N items from a large collection of items, and its accuracy is one of the most common indexes to evaluate the performance of a recommendation system. While a large number of algorithms are proposed to push the Top-N accuracy by learning the user preference from their history purchase data, a predictability question is naturally rai… ▽ More Top-N recommendation aims to recommend each consumer a small set of N items from a large collection of items, and its accuracy is one of the most common indexes to evaluate the performance of a recommendation system. While a large number of algorithms are proposed to push the Top-N accuracy by learning the user preference from their history purchase data, a predictability question is naturally raised - whether there is an upper limit of such Top-N accuracy. This work investigates such predictability by studying the degree of regularity from a specific set of user behavior data. Quantifying the predictability of Top-N recommendations requires simultaneously quantifying the limits on the accuracy of the N behaviors with the highest probability. This greatly increases the difficulty of the problem. To achieve this, we firstly excavate the associations among N behaviors with the highest probability and describe the user behavior distribution based on the information theory. Then, we adopt the Fano inequality to scale and obtain the Top-N predictability. Extensive experiments are conducted on the real-world data where significant improvements are observed compared to the state-of-the-art methods. We have not only completed the predictability calculation for N targets but also obtained predictability that is much closer to the true value than existing methods. We expect our results to assist these research areas where the quantitative requirement of Top-N predictability is required. △ Less

Submitted 23 March, 2023; originally announced March 2023.

arXiv:2302.09539 [pdf]

Chemical Environment Adaptive Learning for Optical Band Gap Prediction of Doped Graphitic Carbon Nitride Nanosheets

Authors: Chen Chen, Enze Xu, Defu Yang, Chenggang Yan, Tao Wei, Hanning Chen, Yong Wei, Minghan Chen

Abstract: This study presents a novel Machine Learning Algorithm, named Chemical Environment Graph Neural Network (ChemGNN), designed to accelerate materials property prediction and advance new materials discovery. Graphitic carbon nitride (g-C3N4) and its doped variants have gained significant interest for their potential as optical materials. Accurate prediction of their band gaps is crucial for practical… ▽ More This study presents a novel Machine Learning Algorithm, named Chemical Environment Graph Neural Network (ChemGNN), designed to accelerate materials property prediction and advance new materials discovery. Graphitic carbon nitride (g-C3N4) and its doped variants have gained significant interest for their potential as optical materials. Accurate prediction of their band gaps is crucial for practical applications, however, traditional quantum simulation methods are computationally expensive and challenging to explore the vast space of possible doped molecular structures. The proposed ChemGNN leverages the learning ability of current graph neural networks (GNNs) to satisfactorily capture the characteristics of atoms' local chemical environment underlying complex molecular structures. Our benchmark results demonstrate more than 100% improvement in band gap prediction accuracy over existing GNNs on g-C3N4. Furthermore, the general ChemGNN model can precisely foresee band gaps of various doped g-C3N4 structures, making it a valuable tool for performing high-throughput prediction in materials design and development. △ Less

Submitted 18 September, 2023; v1 submitted 19 February, 2023; originally announced February 2023.

Comments: 25 pages,8 figures

arXiv:2212.03412 [pdf, other]

Artificial Intelligence Security Competition (AISC)

Authors: Yinpeng Dong, Peng Chen, Senyou Deng, Lianji L, Yi Sun, Hanyu Zhao, Jiaxing Li, Yunteng Tan, Xinyu Liu, Yangyi Dong, Enhui Xu, Jincai Xu, Shu Xu, Xuelin Fu, Changfeng Sun, Haoliang Han, Xuchong Zhang, Shen Chen, Zhimin Sun, Junyi Cao, Taiping Yao, Shouhong Ding, Yu Wu, Jian Lin, Tianpeng Wu , et al. (27 additional authors not shown)

Abstract: The security of artificial intelligence (AI) is an important research area towards safe, reliable, and trustworthy AI systems. To accelerate the research on AI security, the Artificial Intelligence Security Competition (AISC) was organized by the Zhongguancun Laboratory, China Industrial Control Systems Cyber Emergency Response Team, Institute for Artificial Intelligence, Tsinghua University, and… ▽ More The security of artificial intelligence (AI) is an important research area towards safe, reliable, and trustworthy AI systems. To accelerate the research on AI security, the Artificial Intelligence Security Competition (AISC) was organized by the Zhongguancun Laboratory, China Industrial Control Systems Cyber Emergency Response Team, Institute for Artificial Intelligence, Tsinghua University, and RealAI as part of the Zhongguancun International Frontier Technology Innovation Competition (https://www.zgc-aisc.com/en). The competition consists of three tracks, including Deepfake Security Competition, Autonomous Driving Security Competition, and Face Recognition Security Competition. This report will introduce the competition rules of these three tracks and the solutions of top-ranking teams in each track. △ Less

Submitted 6 December, 2022; originally announced December 2022.

Comments: Technical report of AISC

arXiv:2210.10370 [pdf, other]

On the Perturbation Function of Ranking and Balance for Weighted Online Bipartite Matching

Authors: Jingxun Liang, Zhihao Gavin Tang, Yixuan Even Xu, Yuhao Zhang, Renfei Zhou

Abstract: Ranking and Balance are arguably the two most important algorithms in the online matching literature. They achieve the same optimal competitive ratio of $1-1/e$ for the integral version and fractional version of online bipartite matching by Karp, Vazirani, and Vazirani (STOC 1990) respectively. The two algorithms have been generalized to weighted online bipartite matching problems, including verte… ▽ More Ranking and Balance are arguably the two most important algorithms in the online matching literature. They achieve the same optimal competitive ratio of $1-1/e$ for the integral version and fractional version of online bipartite matching by Karp, Vazirani, and Vazirani (STOC 1990) respectively. The two algorithms have been generalized to weighted online bipartite matching problems, including vertex-weighted online bipartite matching and AdWords, by utilizing a perturbation function. The canonical choice of the perturbation function is $f(x)=1-e^{x-1}$ as it leads to the optimal competitive ratio of $1-1/e$ in both settings. We advance the understanding of the weighted generalizations of Ranking and Balance in this paper, with a focus on studying the effect of different perturbation functions. First, we prove that the canonical perturbation function is the \emph{unique} optimal perturbation function for vertex-weighted online bipartite matching. In stark contrast, all perturbation functions achieve the optimal competitive ratio of $1-1/e$ in the unweighted setting. Second, we prove that the generalization of Ranking to AdWords with unknown budgets using the canonical perturbation function is at most $0.624$ competitive, refuting a conjecture of Vazirani (2021). More generally, as an application of the first result, we prove that no perturbation function leads to the prominent competitive ratio of $1-1/e$ by establishing an upper bound of $1-1/e-0.0003$. Finally, we propose the online budget-additive welfare maximization problem that is intermediate between AdWords and AdWords with unknown budgets, and we design an optimal $1-1/e$ competitive algorithm by generalizing Balance. △ Less

Submitted 5 July, 2023; v1 submitted 19 October, 2022; originally announced October 2022.

Comments: Conference version to appear at the European Symposium on Algorithms (ESA 2023). 16 pages, 2 figures, 8 pages appendix

arXiv:2210.05880 [pdf]

Pathology Steered Stratification Network for Subtype Identification in Alzheimer's Disease

Authors: Enze Xu, Jingwen Zhang, Jiadi Li, Qianqian Song, Defu Yang, Guorong Wu, Minghan Chen

Abstract: Alzheimer's disease (AD) is a heterogeneous, multifactorial neurodegenerative disorder characterized by beta-amyloid, pathologic tau, and neurodegeneration. There are no effective treatments for Alzheimer's disease at a late stage, urging for early intervention. However, existing statistical inference approaches of AD subtype identification ignore the pathological domain knowledge, which could lea… ▽ More Alzheimer's disease (AD) is a heterogeneous, multifactorial neurodegenerative disorder characterized by beta-amyloid, pathologic tau, and neurodegeneration. There are no effective treatments for Alzheimer's disease at a late stage, urging for early intervention. However, existing statistical inference approaches of AD subtype identification ignore the pathological domain knowledge, which could lead to ill-posed results that are sometimes inconsistent with the essential neurological principles. Integrating systems biology modeling with machine learning, we propose a novel pathology steered stratification network (PSSN) that incorporates established domain knowledge in AD pathology through a reaction-diffusion model, where we consider non-linear interactions between major biomarkers and diffusion along brain structural network. Trained on longitudinal multimodal neuroimaging data, the biological model predicts long-term trajectories that capture individual progression pattern, filling in the gaps between sparse imaging data available. A deep predictive neural network is then built to exploit spatiotemporal dynamics, link neurological examinations with clinical profiles, and generate subtype assignment probability on an individual basis. We further identify an evolutionary disease graph to quantify subtype transition probabilities through extensive simulations. Our stratification achieves superior performance in both inter-cluster heterogeneity and intra-cluster homogeneity of various clinical scores. Applying our approach to enriched samples of aging populations, we identify six subtypes spanning AD spectrum, where each subtype exhibits a distinctive biomarker pattern that is consistent with its clinical outcome. PSSN provides insights into pre-symptomatic diagnosis and practical guidance on clinical treatments, which may be further generalized to other neurodegenerative diseases. △ Less

Submitted 25 August, 2023; v1 submitted 11 October, 2022; originally announced October 2022.

arXiv:2209.06098 [pdf]

doi 10.1038/s41586-023-06076-7

Indefinite and Bidirectional Near Infrared Nanocrystal Photoswitching

Authors: Changhwan Lee, Emma Z. Xu, Kevin W. C. Kwock, Ayelet Teitelboim, Yawei Liu, Natalie Fardian-Melamed, Cassio C. S. Pedroso, Hye Sun Park, Jongwoo Kim, Stefanie D. Pritzl, Sang Hwan Nam, Theobald Lohmueller, Peter Ercius, Yung Doug Suh, Bruce E Cohen, Emory M Chan, P. James Schuck

Abstract: Materials whose luminescence can be switched by optical stimulation drive technologies ranging from superresolution imaging1-4, nanophotonics5, and optical data storage6-8, to targeted pharmacology, optogenetics, and chemical reactivity9. These photoswitchable probes, including organic fluorophores and proteins, are prone to photodegradation, and often require phototoxic doses of ultraviolet (UV)… ▽ More Materials whose luminescence can be switched by optical stimulation drive technologies ranging from superresolution imaging1-4, nanophotonics5, and optical data storage6-8, to targeted pharmacology, optogenetics, and chemical reactivity9. These photoswitchable probes, including organic fluorophores and proteins, are prone to photodegradation, and often require phototoxic doses of ultraviolet (UV) or visible light. Colloidal inorganic nanoparticles have significant stability advantages over existing photoswitchable materials, but the ability to switch emission bidirectionally, particularly with NIR light, has not been reported with nanoparticles. Here, we present 2-way, near-infrared (NIR) photoswitching of avalanching nanoparticles (ANPs), showing full optical control of upconverted emission using phototriggers in the NIR-I and NIR-II spectral regions useful for subsurface imaging. Employing single-step photodarkening10-13 and photobrightening12,14-18, we demonstrate indefinite photoswitching of individual nanoparticles (>1000 cycles over 7 h) in ambient or aqueous conditions without measurable photodegradation. Critical steps of the photoswitching mechanism are elucidated by modeling and by measuring the photon avalanche properties of single ANPs in both bright and dark states. Unlimited, reversible photoswitching of ANPs enables indefinitely rewritable 2D and 3D multi-level optical patterning of ANPs, as well as optical nanoscopy with sub-Å localization superresolution that allows us to distinguish individual ANPs within tightly packed clusters. △ Less

Submitted 13 September, 2022; originally announced September 2022.

Comments: 15 pages, 5 figures

arXiv:2209.04260 [pdf, other]

doi 10.1103/PhysRevD.106.063026

Search for relativistic fractionally charged particles in space

Authors: DAMPE Collaboration, F. Alemanno, C. Altomare, Q. An, P. Azzarello, F. C. T. Barbato, P. Bernardini, X. J. Bi, M. S. Cai, E. Casilli, E. Catanzani, J. Chang, D. Y. Chen, J. L. Chen, Z. F. Chen, M. Y. Cui, T. S. Cui, Y. X. Cui, H. T. Dai, A. De-Benedittis, I. De Mitri, F. de Palma, M. Deliyergiyev, A. Di Giovanni, M. Di Santo , et al. (126 additional authors not shown)

Abstract: More than a century after the performance of the oil drop experiment, the possible existence of fractionally charged particles FCP still remains unsettled. The search for FCPs is crucial for some extensions of the Standard Model in particle physics. Most of the previously conducted searches for FCPs in cosmic rays were based on experiments underground or at high altitudes. However, there have been… ▽ More More than a century after the performance of the oil drop experiment, the possible existence of fractionally charged particles FCP still remains unsettled. The search for FCPs is crucial for some extensions of the Standard Model in particle physics. Most of the previously conducted searches for FCPs in cosmic rays were based on experiments underground or at high altitudes. However, there have been few searches for FCPs in cosmic rays carried out in orbit other than AMS-01 flown by a space shuttle and BESS by a balloon at the top of the atmosphere. In this study, we conduct an FCP search in space based on on-orbit data obtained using the DArk Matter Particle Explorer (DAMPE) satellite over a period of five years. Unlike underground experiments, which require an FCP energy of the order of hundreds of GeV, our FCP search starts at only a few GeV. An upper limit of $6.2\times 10^{-10}~~\mathrm{cm^{-2}sr^{-1} s^{-1}}$ is obtained for the flux. Our results demonstrate that DAMPE exhibits higher sensitivity than experiments of similar types by three orders of magnitude that more stringently restricts the conditions for the existence of FCP in primary cosmic rays. △ Less

Submitted 9 September, 2022; originally announced September 2022.

Comments: 19 pages, 6 figures, accepted by PRD

Report number: 106, 063026

Journal ref: Physical Review D 106.6 (2022): 063026

arXiv:2208.12533 [pdf, ps, other]

Non-standard Golay Complementary Sequence Pair over QAM

Authors: Erzhong Xue, Zilong Wang, Guang Gong

Abstract: We generalize the three-stage process for constructing and enumerating Golay array and sequence pairs given in 2008 by Frank Fiedler et al. [A multi-dimensional approach to the construction and enumeration of Golay complementary sequences, Journal of Combinatorial Theory, Series A 115 (2008) 753-776] to $4^{q}$-QAM constellation based on para-unitary matrix method, which partly solves their open q… ▽ More We generalize the three-stage process for constructing and enumerating Golay array and sequence pairs given in 2008 by Frank Fiedler et al. [A multi-dimensional approach to the construction and enumeration of Golay complementary sequences, Journal of Combinatorial Theory, Series A 115 (2008) 753-776] to $4^{q}$-QAM constellation based on para-unitary matrix method, which partly solves their open questions. Our work not only includes the main part of known results of Golay complementary sequences over $4^{q}$-QAM based on Boolean functions and standard Golay sequence pairs over QPSK, but also generates new Golay complementary arrays (sequences) over $4^{q}$-QAM based on non-standard Golay array pairs over QPSK. △ Less

Submitted 26 August, 2022; originally announced August 2022.

arXiv:2208.02559 [pdf, other]

doi 10.1209/0295-5075/acc19e

Equivalence between Time Series Predictability and Bayes Error Rate

Authors: En Xu, Tao Zhou, Zhiwen Yu, Zhuo Sun, Bin Guo

Abstract: Predictability is an emerging metric that quantifies the highest possible prediction accuracy for a given time series, being widely utilized in assessing known prediction algorithms and characterizing intrinsic regularities in human behaviors. Lately, increasing criticisms aim at the inaccuracy of the estimated predictability, caused by the original entropy-based method. In this brief report, we s… ▽ More Predictability is an emerging metric that quantifies the highest possible prediction accuracy for a given time series, being widely utilized in assessing known prediction algorithms and characterizing intrinsic regularities in human behaviors. Lately, increasing criticisms aim at the inaccuracy of the estimated predictability, caused by the original entropy-based method. In this brief report, we strictly prove that the time series predictability is equivalent to a seemingly unrelated metric called Bayes error rate that explores the lowest error rate unavoidable in classification. This proof bridges two independently developed fields, and thus each can immediately benefit from the other. For example, based on three theoretical models with known and controllable upper bounds of prediction accuracy, we show that the estimation based on Bayes error rate can largely solve the inaccuracy problem of predictability. △ Less

Submitted 4 August, 2022; originally announced August 2022.

Comments: 1 Figure, 1 Table, 5 Pages

arXiv:2207.06782 [pdf, other]

Boolean Functions of Binary Type-II and Type-II/III Complementary Array Pair

Authors: Erzhong Xue, Zilong Wang, Jinjin Chai

Abstract: The sequence pairs of length $2^{m}$ projected from complementary array pairs of Type-II of size $\mathbf{2}^{(m)}$ and mixed Type-II/III and of size $\mathbf{2}^{(m-1)}\times2$ are complementary sequence pairs Type-II and Type-III respectively. An exhaustive search for binary Type-II and Type-III complementary sequence pairs of small lengths $2^{m}$ ($m=1,2,3,4$) shows that they are all projected… ▽ More The sequence pairs of length $2^{m}$ projected from complementary array pairs of Type-II of size $\mathbf{2}^{(m)}$ and mixed Type-II/III and of size $\mathbf{2}^{(m-1)}\times2$ are complementary sequence pairs Type-II and Type-III respectively. An exhaustive search for binary Type-II and Type-III complementary sequence pairs of small lengths $2^{m}$ ($m=1,2,3,4$) shows that they are all projected from the aforementioned complementary array pairs, whose algebraic normal forms satisfy specified expressions. It's natural to ask whether the conclusion holds for all $m$. In this paper, we proved that these expressions of algebraic normal forms determine all the binary complementary array pairs of Type-II of size $\mathbf{2}^{(m)}$ and mixed Type-II/III of size $\mathbf{2}^{(m-1)}\times2$ respectively. △ Less

Submitted 31 July, 2022; v1 submitted 14 July, 2022; originally announced July 2022.

arXiv:2207.04374 [pdf, ps, other]

The $q$-ary Golay complementary arrays of size $\mathbf{2}^{(m)}$ are standard

Authors: Erzhong Xue, Zilong Wang

Abstract: To find the non-standard binary Golay complementary sequences (GCSs) of length $2^{m}$ or theoretically prove the nonexistence of them are still open. Since it has been shown that all the standard $q$-ary (where $q$ is even) GCSs of length $2^m$ can be obtained by standard $q$-ary Golay complementary array pair (GAP) of dimension $m$ and size $2\times 2 \times \cdots \times 2$ (abbreviated to size… ▽ More To find the non-standard binary Golay complementary sequences (GCSs) of length $2^{m}$ or theoretically prove the nonexistence of them are still open. Since it has been shown that all the standard $q$-ary (where $q$ is even) GCSs of length $2^m$ can be obtained by standard $q$-ary Golay complementary array pair (GAP) of dimension $m$ and size $2\times 2 \times \cdots \times 2$ (abbreviated to size $\mathbf{2}^{(m)}$), it's natural to ask whether all the $q$-ary GAP of size $\mathbf{2}^{(m)}$ are standard. We give a positive answer to this question. △ Less

Submitted 9 July, 2022; originally announced July 2022.

arXiv:2207.01622 [pdf, other]

Egocentric Video-Language Pretraining @ Ego4D Challenge 2022

Authors: Kevin Qinghong Lin, Alex Jinpeng Wang, Mattia Soldan, Michael Wray, Rui Yan, Eric Zhongcong Xu, Difei Gao, Rongcheng Tu, Wenzhe Zhao, Weijie Kong, Chengfei Cai, Hongfa Wang, Dima Damen, Bernard Ghanem, Wei Liu, Mike Zheng Shou

Abstract: In this report, we propose a video-language pretraining (VLP) based solution \cite{kevin2022egovlp} for four Ego4D challenge tasks, including Natural Language Query (NLQ), Moment Query (MQ), Object State Change Classification (OSCC), and PNR Localization (PNR). Especially, we exploit the recently released Ego4D dataset \cite{grauman2021ego4d} to pioneer Egocentric VLP from pretraining dataset, pre… ▽ More In this report, we propose a video-language pretraining (VLP) based solution \cite{kevin2022egovlp} for four Ego4D challenge tasks, including Natural Language Query (NLQ), Moment Query (MQ), Object State Change Classification (OSCC), and PNR Localization (PNR). Especially, we exploit the recently released Ego4D dataset \cite{grauman2021ego4d} to pioneer Egocentric VLP from pretraining dataset, pretraining objective, and development set. Based on the above three designs, we develop a pretrained video-language model that is able to transfer its egocentric video-text representation or video-only representation to several video downstream tasks. Our Egocentric VLP achieves 10.46R@1&IoU @0.3 on NLQ, 10.33 mAP on MQ, 74% Acc on OSCC, 0.67 sec error on PNR. The code is available at https://github.com/showlab/EgoVLP. △ Less

Submitted 3 August, 2022; v1 submitted 4 July, 2022; originally announced July 2022.

Comments: Preprint. 4 pages, 2 figures, 5 tables. Code: https://github.com/showlab/EgoVLP. The Ego4D challenge technical report of EgoVLP arXiv:2206.01670. See EPIC challenge technical report arXiv:2207.01334 for overlap

arXiv:2207.01334 [pdf, other]

Egocentric Video-Language Pretraining @ EPIC-KITCHENS-100 Multi-Instance Retrieval Challenge 2022

Authors: Kevin Qinghong Lin, Alex Jinpeng Wang, Rui Yan, Eric Zhongcong Xu, Rongcheng Tu, Yanru Zhu, Wenzhe Zhao, Weijie Kong, Chengfei Cai, Hongfa Wang, Wei Liu, Mike Zheng Shou

Abstract: In this report, we propose a video-language pretraining (VLP) based solution \cite{kevin2022egovlp} for the EPIC-KITCHENS-100 Multi-Instance Retrieval (MIR) challenge. Especially, we exploit the recently released Ego4D dataset \cite{grauman2021ego4d} to pioneer Egocentric VLP from pretraining dataset, pretraining objective, and development set. Based on the above three designs, we develop a pretra… ▽ More In this report, we propose a video-language pretraining (VLP) based solution \cite{kevin2022egovlp} for the EPIC-KITCHENS-100 Multi-Instance Retrieval (MIR) challenge. Especially, we exploit the recently released Ego4D dataset \cite{grauman2021ego4d} to pioneer Egocentric VLP from pretraining dataset, pretraining objective, and development set. Based on the above three designs, we develop a pretrained video-language model that is able to transfer its egocentric video-text representation to MIR benchmark. Furthermore, we devise an adaptive multi-instance max-margin loss to effectively fine-tune the model and equip the dual-softmax technique for reliable inference. Our best single model obtains strong performance on the challenge test set with 47.39% mAP and 61.44% nDCG. The code is available at https://github.com/showlab/EgoVLP. △ Less

Submitted 3 August, 2022; v1 submitted 4 July, 2022; originally announced July 2022.

Comments: To appeared in CVPRW22. 5 pages, 2 figures, 2 tables. Code: https://github.com/showlab/EgoVLP. The EPIC challenge technical report of EgoVLP arXiv:2206.01670. See Ego4D challenge technical report arXiv:2207.01622

arXiv:2206.01670 [pdf, other]

Egocentric Video-Language Pretraining

Authors: Kevin Qinghong Lin, Alex Jinpeng Wang, Mattia Soldan, Michael Wray, Rui Yan, Eric Zhongcong Xu, Difei Gao, Rongcheng Tu, Wenzhe Zhao, Weijie Kong, Chengfei Cai, Hongfa Wang, Dima Damen, Bernard Ghanem, Wei Liu, Mike Zheng Shou

Abstract: Video-Language Pretraining (VLP), which aims to learn transferable representation to advance a wide range of video-text downstream tasks, has recently received increasing attention. Best performing works rely on large-scale, 3rd-person video-text datasets, such as HowTo100M. In this work, we exploit the recently released Ego4D dataset to pioneer Egocentric VLP along three directions. (i) We create… ▽ More Video-Language Pretraining (VLP), which aims to learn transferable representation to advance a wide range of video-text downstream tasks, has recently received increasing attention. Best performing works rely on large-scale, 3rd-person video-text datasets, such as HowTo100M. In this work, we exploit the recently released Ego4D dataset to pioneer Egocentric VLP along three directions. (i) We create EgoClip, a 1st-person video-text pretraining dataset comprising 3.8M clip-text pairs well-chosen from Ego4D, covering a large variety of human daily activities. (ii) We propose a novel pretraining objective, dubbed EgoNCE, which adapts video-text contrastive learning to the egocentric domain by mining egocentric-aware positive and negative samples. (iii) We introduce EgoMCQ, a development benchmark that is close to EgoClip and hence can support effective validation and fast exploration of our design decisions in EgoClip and EgoNCE. Furthermore, we demonstrate strong performance on five egocentric downstream tasks across three datasets: video-text retrieval on EPIC-KITCHENS-100; action recognition on Charades-Ego; natural language query, moment query, and object state change classification on Ego4D challenge benchmarks. The dataset and code are available at https://github.com/showlab/EgoVLP. △ Less

Submitted 12 October, 2022; v1 submitted 3 June, 2022; originally announced June 2022.

Comments: Accepted by NeurIPS 2022. Double champions at Ego4D and EPIC-Kitchens, CVPR 2022 challenges. 23 pages, 13 figures, 12 tables. Code: https://github.com/showlab/EgoVLP

arXiv:2202.09747 [pdf, other]

PGE: Robust Product Graph Embedding Learning for Error Detection

Authors: Kewei Cheng, Xian Li, Yifan Ethan Xu, Xin Luna Dong, Yizhou Sun

Abstract: Although product graphs (PGs) have gained increasing attentions in recent years for their successful applications in product search and recommendations, the extensive power of PGs can be limited by the inevitable involvement of various kinds of errors. Thus, it is critical to validate the correctness of triples in PGs to improve their reliability. Knowledge graph (KG) embedding methods have strong… ▽ More Although product graphs (PGs) have gained increasing attentions in recent years for their successful applications in product search and recommendations, the extensive power of PGs can be limited by the inevitable involvement of various kinds of errors. Thus, it is critical to validate the correctness of triples in PGs to improve their reliability. Knowledge graph (KG) embedding methods have strong error detection abilities. Yet, existing KG embedding methods may not be directly applicable to a PG due to its distinct characteristics: (1) PG contains rich textual signals, which necessitates a joint exploration of both text information and graph structure; (2) PG contains a large number of attribute triples, in which attribute values are represented by free texts. Since free texts are too flexible to define entities in KGs, traditional way to map entities to their embeddings using ids is no longer appropriate for attribute value representation; (3) Noisy triples in a PG mislead the embedding learning and significantly hurt the performance of error detection. To address the aforementioned challenges, we propose an end-to-end noise-tolerant embedding learning framework, PGE, to jointly leverage both text information and graph structure in PG to learn embeddings for error detection. Experimental results on real-world product graph demonstrate the effectiveness of the proposed framework comparing with the state-of-the-art approaches. △ Less

Submitted 20 February, 2022; originally announced February 2022.

arXiv:2201.12384 [pdf]

Developing a Machine-Learning Algorithm to Diagnose Age-Related Macular Degeneration

Authors: Ananya Dua, Pham Hung Minh, Sajid Fahmid, Shikhar Gupta, Sophia Zheng, Vanessa Moyo, Yanran Elisa Xue

Abstract: Today, more than 12 million people over the age of 40 suffer from ocular diseases. Most commonly, older patients are susceptible to age related macular degeneration, an eye disease that causes blurring of the central vision due to the deterioration of the retina. The former can only be detected through complex and expensive imaging software, markedly a visual field test; this leaves a significant… ▽ More Today, more than 12 million people over the age of 40 suffer from ocular diseases. Most commonly, older patients are susceptible to age related macular degeneration, an eye disease that causes blurring of the central vision due to the deterioration of the retina. The former can only be detected through complex and expensive imaging software, markedly a visual field test; this leaves a significant population with untreated eye disease and holds them at risk for complete vision loss. The use of machine learning algorithms has been proposed for treating eye disease. However, the development of these models is limited by a lack of understanding regarding appropriate model and training parameters to maximize model performance. In our study, we address these points by generating 6 models, each with a learning rate of 1 * 10^n where n is 0, -1, -2, ... -6, and calculated a f1 score for each of the models. Our analysis shows that sample imbalance is a key challenge in training of machine learning models and can result in deceptive improvements in training cost which does not translate to true improvements in model predictive performance. Considering the wide ranging impact of the disease and its adverse effects, we developed a machine learning algorithm to treat the same. We trained our model on varying eye disease datasets consisting of over 5000 patients, and the pictures of their infected eyes. In the future, we hope this model is used extensively, especially in areas that are under-resourced, to better diagnose eye disease and improve well being for humanity. △ Less

Submitted 28 January, 2022; originally announced January 2022.

Comments: 7 pages, 7 figures

arXiv:2112.08860 [pdf, other]

doi 10.1016/j.scib.2021.12.015

Search for gamma-ray spectral lines with the DArk Matter Particle Explorer

Authors: Francesca Alemanno, Qi An, Philipp Azzarello, Felicia Carla Tiziana Barbato, Paolo Bernardini, Xiao-Jun Bi, Ming-Sheng Cai, Elisabetta Casilli, Enrico Catanzani, Jin Chang, Deng-Yi Chen, Jun-Ling Chen, Zhan-Fang Chen, Ming-Yang Cui, Tian-Shu Cui, Yu-Xing Cui, Hao-Ting Dai, Antonio De Benedittis, Ivan De Mitri, Francesco de Palma, Maksym Deliyergiyev, Margherita Di Santo, Qi Ding, Tie-Kuang Dong, Zhen-Xing Dong , et al. (121 additional authors not shown)

Abstract: The DArk Matter Particle Explorer (DAMPE) is well suitable for searching for monochromatic and sharp $γがんま$-ray structures in the GeV$-$TeV range thanks to its unprecedented high energy resolution. In this work, we search for $γがんま$-ray line structures using five years of DAMPE data. To improve the sensitivity, we develop two types of dedicated data sets (including the BgoOnly data which is the first ti… ▽ More The DArk Matter Particle Explorer (DAMPE) is well suitable for searching for monochromatic and sharp $γがんま$-ray structures in the GeV$-$TeV range thanks to its unprecedented high energy resolution. In this work, we search for $γがんま$-ray line structures using five years of DAMPE data. To improve the sensitivity, we develop two types of dedicated data sets (including the BgoOnly data which is the first time to be used in the data analysis for the calorimeter-based gamma-ray observatories) and adopt the signal-to-noise ratio optimized regions of interest (ROIs) for different DM density profiles. No line signals or candidates are found between 10 and 300 GeV in the Galaxy. The constraints on the velocity-averaged cross section for $χかいχかい\to γがんまγがんま$ and the decay lifetime for $χかい\to γがんまνにゅー$, both at 95% confidence level, have been calculated and the systematic uncertainties have been taken into account. Comparing to the previous Fermi-LAT results, though DAMPE has an acceptance smaller by a factor of $\sim 10$, similar constraints on the DM parameters are achieved and below 100 GeV the lower limits on the decay lifetime are even stronger by a factor of a few. Our results demonstrate the potential of high-energy-resolution observations on dark matter detection. △ Less

Submitted 6 December, 2022; v1 submitted 16 December, 2021; originally announced December 2021.

Comments: 14 pages, 8 figures. Update the content to keep up with the published version

Journal ref: Science Bulletin, Volume 67, Issue 7, 15 April 2022, Pages 679-684

arXiv:2111.14448 [pdf, other]

doi 10.1145/3503161.3548027

AVA-AVD: Audio-Visual Speaker Diarization in the Wild

Authors: Eric Zhongcong Xu, Zeyang Song, Satoshi Tsutsui, Chao Feng, Mang Ye, Mike Zheng Shou

Abstract: Audio-visual speaker diarization aims at detecting "who spoke when" using both auditory and visual signals. Existing audio-visual diarization datasets are mainly focused on indoor environments like meeting rooms or news studios, which are quite different from in-the-wild videos in many scenarios such as movies, documentaries, and audience sitcoms. To develop diarization methods for these challengi… ▽ More Audio-visual speaker diarization aims at detecting "who spoke when" using both auditory and visual signals. Existing audio-visual diarization datasets are mainly focused on indoor environments like meeting rooms or news studios, which are quite different from in-the-wild videos in many scenarios such as movies, documentaries, and audience sitcoms. To develop diarization methods for these challenging videos, we create the AVA Audio-Visual Diarization (AVA-AVD) dataset. Our experiments demonstrate that adding AVA-AVD into training set can produce significantly better diarization models for in-the-wild videos despite that the data is relatively small. Moreover, this benchmark is challenging due to the diverse scenes, complicated acoustic conditions, and completely off-screen speakers. As a first step towards addressing the challenges, we design the Audio-Visual Relation Network (AVR-Net) which introduces a simple yet effective modality mask to capture discriminative information based on face visibility. Experiments show that our method not only can outperform state-of-the-art methods but is more robust as varying the ratio of off-screen speakers. Our data and code has been made publicly available at https://github.com/showlab/AVA-AVD. △ Less

Submitted 16 July, 2022; v1 submitted 29 November, 2021; originally announced November 2021.

Comments: ACMMM 2022

arXiv:2110.07058 [pdf, other]

Ego4D: Around the World in 3,000 Hours of Egocentric Video

Authors: Kristen Grauman, Andrew Westbury, Eugene Byrne, Zachary Chavis, Antonino Furnari, Rohit Girdhar, Jackson Hamburger, Hao Jiang, Miao Liu, Xingyu Liu, Miguel Martin, Tushar Nagarajan, Ilija Radosavovic, Santhosh Kumar Ramakrishnan, Fiona Ryan, Jayant Sharma, Michael Wray, Mengmeng Xu, Eric Zhongcong Xu, Chen Zhao, Siddhant Bansal, Dhruv Batra, Vincent Cartillier, Sean Crane, Tien Do , et al. (60 additional authors not shown)

Abstract: We introduce Ego4D, a massive-scale egocentric video dataset and benchmark suite. It offers 3,670 hours of daily-life activity video spanning hundreds of scenarios (household, outdoor, workplace, leisure, etc.) captured by 931 unique camera wearers from 74 worldwide locations and 9 different countries. The approach to collection is designed to uphold rigorous privacy and ethics standards with cons… ▽ More We introduce Ego4D, a massive-scale egocentric video dataset and benchmark suite. It offers 3,670 hours of daily-life activity video spanning hundreds of scenarios (household, outdoor, workplace, leisure, etc.) captured by 931 unique camera wearers from 74 worldwide locations and 9 different countries. The approach to collection is designed to uphold rigorous privacy and ethics standards with consenting participants and robust de-identification procedures where relevant. Ego4D dramatically expands the volume of diverse egocentric video footage publicly available to the research community. Portions of the video are accompanied by audio, 3D meshes of the environment, eye gaze, stereo, and/or synchronized videos from multiple egocentric cameras at the same event. Furthermore, we present a host of new benchmark challenges centered around understanding the first-person visual experience in the past (querying an episodic memory), present (analyzing hand-object manipulation, audio-visual conversation, and social interactions), and future (forecasting activities). By publicly sharing this massive annotated dataset and benchmark suite, we aim to push the frontier of first-person perception. Project page: https://ego4d-data.org/ △ Less

Submitted 11 March, 2022; v1 submitted 13 October, 2021; originally announced October 2021.

Comments: To appear in the Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022. This version updates the baseline result numbers for the Hands and Objects benchmark (appendix)

arXiv:2110.00123 [pdf, other]

doi 10.3847/2041-8213/ac2de6

Observations of Forbush Decreases of cosmic ray electrons and positrons with the Dark Matter Particle Explorer

Authors: Francesca Alemanno, Qi An, Philipp Azzarello, Felicia Carla Tiziana Barbato, Paolo Bernardini, XiaoJun Bi, MingSheng Cai, Elisabetta Casilli, Enrico Catanzani, Jin Chang, DengYi Chen, JunLing Chen, ZhanFang Chen, MingYang Cui, TianShu Cui, YuXing Cui, HaoTing Dai, Antonio De Benedittis, Ivan De Mitri, Francesco de Palma, Maksym Deliyergiyev, Margherita Di Santo, Qi Ding, TieKuang Dong, ZhenXing Dong , et al. (124 additional authors not shown)

Abstract: The Forbush Decrease (FD) represents the rapid decrease of the intensities of charged particles accompanied with the coronal mass ejections (CMEs) or high-speed streams from coronal holes. It has been mainly explored with ground-based neutron monitors network which indirectly measure the integrated intensities of all species of cosmic rays by counting secondary neutrons produced from interaction b… ▽ More The Forbush Decrease (FD) represents the rapid decrease of the intensities of charged particles accompanied with the coronal mass ejections (CMEs) or high-speed streams from coronal holes. It has been mainly explored with ground-based neutron monitors network which indirectly measure the integrated intensities of all species of cosmic rays by counting secondary neutrons produced from interaction between atmosphere atoms and cosmic rays. The space-based experiments can resolve the species of particles but the energy ranges are limited by the relative small acceptances except for the most abundant particles like protons and helium. Therefore, the FD of cosmic ray electrons and positrons have just been investigated by the PAMELA experiment in the low energy range ($<5$ GeV) with limited statistics. In this paper, we study the FD event occurred in September, 2017, with the electron and positron data recorded by the Dark Matter Particle Explorer. The evolution of the FDs from 2 GeV to 20 GeV with a time resolution of 6 hours are given. We observe two solar energetic particle events in the time profile of the intensity of cosmic rays, the earlier and weak one has not been shown in the neutron monitor data. Furthermore, both the amplitude and recovery time of fluxes of electrons and positrons show clear energy-dependence, which is important in probing the disturbances of the interplanetary environment by the coronal mass ejections. △ Less

Submitted 30 September, 2021; originally announced October 2021.

Comments: This article is dedicated to the 72nd anniversary of People's Republic of China

arXiv:2107.08189 [pdf, other]

BEDS-Bench: Behavior of EHR-models under Distributional Shift--A Benchmark

Authors: Anand Avati, Martin Seneviratne, Emily Xue, Zhen Xu, Balaji Lakshminarayanan, Andrew M. Dai

Abstract: Machine learning has recently demonstrated impressive progress in predictive accuracy across a wide array of tasks. Most ML approaches focus on generalization performance on unseen data that are similar to the training data (In-Distribution, or IND). However, real world applications and deployments of ML rarely enjoy the comfort of encountering examples that are always IND. In such situations, mos… ▽ More Machine learning has recently demonstrated impressive progress in predictive accuracy across a wide array of tasks. Most ML approaches focus on generalization performance on unseen data that are similar to the training data (In-Distribution, or IND). However, real world applications and deployments of ML rarely enjoy the comfort of encountering examples that are always IND. In such situations, most ML models commonly display erratic behavior on Out-of-Distribution (OOD) examples, such as assigning high confidence to wrong predictions, or vice-versa. Implications of such unusual model behavior are further exacerbated in the healthcare setting, where patient health can potentially be put at risk. It is crucial to study the behavior and robustness properties of models under distributional shift, understand common failure modes, and take mitigation steps before the model is deployed. Having a benchmark that shines light upon these aspects of a model is a first and necessary step in addressing the issue. Recent work and interest in increasing model robustness in OOD settings have focused more on image modality, while the Electronic Health Record (EHR) modality is still largely under-explored. We aim to bridge this gap by releasing BEDS-Bench, a benchmark for quantifying the behavior of ML models over EHR data under OOD settings. We use two open access, de-identified EHR datasets to construct several OOD data settings to run tests on, and measure relevant metrics that characterize crucial aspects of a model's OOD behavior. We evaluate several learning algorithms under BEDS-Bench and find that all of them show poor generalization performance under distributional shift in general. Our results highlight the need and the potential to improve robustness of EHR models under distributional shift, and BEDS-Bench provides one way to measure progress towards that goal. △ Less

Submitted 17 July, 2021; originally announced July 2021.

Showing 1–50 of 85 results for author: Xu, E