(Translated by https://www.hiragana.jp/)
Search | arXiv e-print repository
Skip to main content

Showing 1–50 of 154 results for author: Sun, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.04392  [pdf, other

    cs.CL

    Open-domain Implicit Format Control for Large Language Model Generation

    Authors: Yiqun Yao, Wenjia Ma, Xuezhi Fang, Xin Jiang, Xiang Li, Xuying Meng, Peng Han, Jing Li, Aixin Sun, Yequan Wang

    Abstract: Controlling the format of outputs generated by large language models (LLMs) is a critical functionality in various applications. Current methods typically employ constrained decoding with rule-based automata or fine-tuning with manually crafted format instructions, both of which struggle with open-domain format requirements. To address this limitation, we introduce a novel framework for controlled… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

    Comments: 6 pages

  2. arXiv:2408.03025  [pdf, other

    cs.IR

    The Crowd in MOOCs: A Study of Learning Patterns at Scale

    Authors: Xin Zhou, Aixin Sun, Jie Zhang, Donghui Lin

    Abstract: The increasing availability of learning activity data in Massive Open Online Courses (MOOCs) enables us to conduct a large-scale analysis of learners' learning behavior. In this paper, we analyze a dataset of 351 million learning activities from 0.8 million unique learners enrolled in over 1.6 thousand courses within two years. Specifically, we mine and identify the learning patterns of the crowd… ▽ More

    Submitted 6 August, 2024; originally announced August 2024.

    Comments: 16 pages

  3. arXiv:2408.01027  [pdf, ps, other

    cs.GT

    Randomized Strategyproof Mechanisms with Best of Both Worlds Fairness and Efficiency

    Authors: Ankang Sun, Bo Chen

    Abstract: We study the problem of mechanism design for allocating a set of indivisible items among agents with private preferences on items. We are interested in such a mechanism that is strategyproof (where agents' best strategy is to report their true preferences) and is expected to ensure fairness and efficiency to a certain degree. We first present an impossibility result that a deterministic mechanism… ▽ More

    Submitted 2 August, 2024; originally announced August 2024.

    Comments: 27 pages

    ACM Class: F.2.2

  4. arXiv:2407.16891  [pdf, other

    cs.CY cs.CL

    Cultural Value Differences of LLMs: Prompt, Language, and Model Size

    Authors: Qishuai Zhong, Yike Yun, Aixin Sun

    Abstract: Our study aims to identify behavior patterns in cultural values exhibited by large language models (LLMs). The studied variants include question ordering, prompting language, and model size. Our experiments reveal that each tested LLM can efficiently behave with different cultural values. More interestingly: (i) LLMs exhibit relatively consistent cultural values when presented with prompts in a si… ▽ More

    Submitted 17 June, 2024; originally announced July 2024.

    Comments: 20 pages

  5. arXiv:2407.14928  [pdf, other

    cs.SE cs.HC

    Influencer: Empowering Everyday Users in Creating Promotional Posts via AI-infused Exploration and Customization

    Authors: Xuye Liu, Annie Sun, Pengcheng An, Tengfei Ma, Jian Zhao

    Abstract: Creating promotional posts on social platforms enables everyday users to disseminate their creative outcomes, engage in community exchanges, or generate additional income from micro-businesses. However, creating eye-catching posts combining both original, appealing images and articulate, effective captions can be rather challenging and time-consuming for everyday users who are mostly design novice… ▽ More

    Submitted 20 July, 2024; originally announced July 2024.

    Comments: 18 pages

  6. arXiv:2407.06597  [pdf, other

    cs.AI

    TVR-Ranking: A Dataset for Ranked Video Moment Retrieval with Imprecise Queries

    Authors: Renjie Liang, Li Li, Chongzhi Zhang, Jing Wang, Xizhou Zhu, Aixin Sun

    Abstract: In this paper, we propose the task of \textit{Ranked Video Moment Retrieval} (RVMR) to locate a ranked list of matching moments from a collection of videos, through queries in natural language. Although a few related tasks have been proposed and studied by CV, NLP, and IR communities, RVMR is the task that best reflects the practical setting of moment search. To facilitate research in RVMR, we dev… ▽ More

    Submitted 23 July, 2024; v1 submitted 9 July, 2024; originally announced July 2024.

  7. arXiv:2407.02783  [pdf, ps, other

    cs.CL cs.AI

    52B to 1T: Lessons Learned via Tele-FLM Series

    Authors: Xiang Li, Yiqun Yao, Xin Jiang, Xuezhi Fang, Chao Wang, Xinzhang Liu, Zihan Wang, Yu Zhao, Xin Wang, Yuyao Huang, Shuangyong Song, Yongxiang Li, Zheng Zhang, Bo Zhao, Aixin Sun, Yequan Wang, Zhongjiang He, Zhongyuan Wang, Xuelong Li, Tiejun Huang

    Abstract: Large Language Models (LLMs) represent a significant stride toward Artificial General Intelligence. As scaling laws underscore the potential of increasing model sizes, the academic community has intensified its investigations into LLMs with capacities exceeding 50 billion parameters. This technical report builds on our prior work with Tele-FLM (also known as FLM-2), a publicly available 52-billion… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: For the Tele-FLM-52B tech report, see also 2404.16645

  8. arXiv:2407.01523  [pdf, other

    cs.CV cs.CL

    MMLongBench-Doc: Benchmarking Long-context Document Understanding with Visualizations

    Authors: Yubo Ma, Yuhang Zang, Liangyu Chen, Meiqi Chen, Yizhu Jiao, Xinze Li, Xinyuan Lu, Ziyu Liu, Yan Ma, Xiaoyi Dong, Pan Zhang, Liangming Pan, Yu-Gang Jiang, Jiaqi Wang, Yixin Cao, Aixin Sun

    Abstract: Understanding documents with rich layouts and multi-modal components is a long-standing and practical task. Recent Large Vision-Language Models (LVLMs) have made remarkable strides in various tasks, particularly in single-page document understanding (DU). However, their abilities on long-context DU remain an open problem. This work presents MMLongBench-Doc, a long-context, multi-modal benchmark co… ▽ More

    Submitted 10 July, 2024; v1 submitted 1 July, 2024; originally announced July 2024.

  9. arXiv:2407.00316  [pdf, other

    cs.CV

    OccFusion: Rendering Occluded Humans with Generative Diffusion Priors

    Authors: Adam Sun, Tiange Xiang, Scott Delp, Li Fei-Fei, Ehsan Adeli

    Abstract: Most existing human rendering methods require every part of the human to be fully visible throughout the input video. However, this assumption does not hold in real-life settings where obstructions are common, resulting in only partial visibility of the human. Considering this, we present OccFusion, an approach that utilizes efficient 3D Gaussian splatting supervised by pretrained 2D diffusion mod… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

  10. arXiv:2406.15501  [pdf

    cs.CR

    Secure Combination of Untrusted Time information Based on Optimized Dempster-Shafer Theory

    Authors: Yang Li, Yujie Luo, Yichen Zhang, Ao Sun, Wei Huang, Shuai Zhang, Tao Zhang, Chuang Zhou, Li Ma, Jie Yang, Mei Wu, Heng Wang, Yan Pan, Yun Shao, Xing Chen, Ziyang Chen, Song Yu, Hong Guo, Bingjie Xu

    Abstract: Secure precision time synchronization is important for applications of Cyber-Physical Systems. However, several attacks, especially the Time Delay Attack (TDA), deteriorates the performance of time synchronization system seriously. Multiple paths scheme is thought as an effective security countermeasure to decrease the influence of TDA. However, the effective secure combination algorithm is still… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  11. arXiv:2406.09869  [pdf, ps, other

    cs.SD eess.AS

    MMM: Multi-Layer Multi-Residual Multi-Stream Discrete Speech Representation from Self-supervised Learning Model

    Authors: Jiatong Shi, Xutai Ma, Hirofumi Inaguma, Anna Sun, Shinji Watanabe

    Abstract: Speech discrete representation has proven effective in various downstream applications due to its superior compression rate of the waveform, fast convergence during training, and compatibility with other modalities. Discrete units extracted from self-supervised learning (SSL) models have emerged as a prominent approach for obtaining speech discrete representation. However, while discrete units hav… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech2024

  12. arXiv:2406.04551  [pdf, other

    cs.CV cs.AI cs.LG

    Improving Geo-diversity of Generated Images with Contextualized Vendi Score Guidance

    Authors: Reyhane Askari Hemmat, Melissa Hall, Alicia Sun, Candace Ross, Michal Drozdzal, Adriana Romero-Soriano

    Abstract: With the growing popularity of text-to-image generative models, there has been increasing focus on understanding their risks and biases. Recent work has found that state-of-the-art models struggle to depict everyday objects with the true diversity of the real world and have notable gaps between geographic regions. In this work, we aim to increase the diversity of generated images of common objects… ▽ More

    Submitted 2 August, 2024; v1 submitted 6 June, 2024; originally announced June 2024.

  13. arXiv:2406.03488  [pdf, other

    cs.DC

    Seq1F1B: Efficient Sequence-Level Pipeline Parallelism for Large Language Model Training

    Authors: Ao Sun, Weilin Zhao, Xu Han, Cheng Yang, Zhiyuan Liu, Chuan Shi, Maosong Sun

    Abstract: The emergence of large language models (LLMs) relies heavily on distributed training strategies, among which pipeline parallelism plays a crucial role. As LLMs' training sequence length extends to 32k or even 128k, the current pipeline parallel methods face severe bottlenecks, including high memory footprints and substantial pipeline bubbles, greatly hindering model scalability and training throug… ▽ More

    Submitted 6 June, 2024; v1 submitted 5 June, 2024; originally announced June 2024.

    Comments: 12 pages, 4 figures, 6 tables

  14. arXiv:2404.16645  [pdf, other

    cs.CL cs.AI

    Tele-FLM Technical Report

    Authors: Xiang Li, Yiqun Yao, Xin Jiang, Xuezhi Fang, Chao Wang, Xinzhang Liu, Zihan Wang, Yu Zhao, Xin Wang, Yuyao Huang, Shuangyong Song, Yongxiang Li, Zheng Zhang, Bo Zhao, Aixin Sun, Yequan Wang, Zhongjiang He, Zhongyuan Wang, Xuelong Li, Tiejun Huang

    Abstract: Large language models (LLMs) have showcased profound capabilities in language understanding and generation, facilitating a wide array of applications. However, there is a notable paucity of detailed, open-sourced methodologies on efficiently scaling LLMs beyond 50 billion parameters with minimum trial-and-error cost and computational resources. In this report, we introduce Tele-FLM (aka FLM-2), a… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

  15. arXiv:2404.14962  [pdf, ps, other

    cs.IT

    Short Regular Girth-8 QC-LDPC Codes From Exponent Matrices with Vertical Symmetry

    Authors: Guohua Zhang, Aijing Sun, Ling Liu, Yi Fang

    Abstract: To address the challenge of constructing short girth-8 quasi-cyclic (QC) low-density parity-check (LDPC) codes, a novel construction framework based on vertical symmetry (VS) is proposed. Basic properties of the VS structure are presented. With the aid of these properties, existing explicit constructions for column weights from three to five which can be transformed into the VS structure are sorte… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

    Comments: 17 pages, 5 figures; This paper has been accepted by IEEE ISIT2024

  16. arXiv:2404.13945  [pdf, other

    cs.SE

    How do LLMs Support Deep Learning Testing? A Comprehensive Study Through the Lens of Image Mutation

    Authors: Liwen Wang, Yuanyuan Yuan, Ao Sun, Zongjie Li, Pingchuan Ma, Daoyuan Wu, Shuai Wang

    Abstract: Visual deep learning (VDL) systems have shown significant success in real-world applications like image recognition, object detection, and autonomous driving. To evaluate the reliability of VDL, a mainstream approach is software testing, which requires diverse and controllable mutations over image semantics. The rapid development of multi-modal large language models (MLLMs) has introduced revoluti… ▽ More

    Submitted 5 May, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

  17. Beyond Collaborative Filtering: A Relook at Task Formulation in Recommender Systems

    Authors: Aixin Sun

    Abstract: Recommender Systems (RecSys) have become indispensable in numerous applications, profoundly influencing our everyday experiences. Despite their practical significance, academic research in RecSys often abstracts the formulation of research tasks from real-world contexts, aiming for a clean problem formulation and more generalizable findings. However, it is observed that there is a lack of collecti… ▽ More

    Submitted 23 June, 2024; v1 submitted 20 April, 2024; originally announced April 2024.

    Comments: Published in ACM SIGWEB Newsletter, Spring 2024: https://dl.acm.org/doi/10.1145/3663752.3663756

    Journal ref: SIGWEB Newsl. 2024, Spring, Article 4 (Spring 2024), 11 pages

  18. arXiv:2404.05089  [pdf, other

    cs.CL cs.LG

    SEER-MoE: Sparse Expert Efficiency through Regularization for Mixture-of-Experts

    Authors: Alexandre Muzio, Alex Sun, Churan He

    Abstract: The advancement of deep learning has led to the emergence of Mixture-of-Experts (MoEs) models, known for their dynamic allocation of computational resources based on input. Despite their promise, MoEs face challenges, particularly in terms of memory requirements. To address this, our work introduces SEER-MoE, a novel two-stage framework for reducing both the memory footprint and compute requiremen… ▽ More

    Submitted 7 April, 2024; originally announced April 2024.

    Comments: 8+3 pages

  19. arXiv:2403.17173  [pdf, other

    cs.CV

    Task2Box: Box Embeddings for Modeling Asymmetric Task Relationships

    Authors: Rangel Daroya, Aaron Sun, Subhransu Maji

    Abstract: Modeling and visualizing relationships between tasks or datasets is an important step towards solving various meta-tasks such as dataset discovery, multi-tasking, and transfer learning. However, many relationships, such as containment and transferability, are naturally asymmetric and current approaches for representation and visualization (e.g., t-SNE) do not readily support this. We propose Task2… ▽ More

    Submitted 29 March, 2024; v1 submitted 25 March, 2024; originally announced March 2024.

  20. arXiv:2403.09347  [pdf, other

    cs.DC cs.LG

    BurstAttention: An Efficient Distributed Attention Framework for Extremely Long Sequences

    Authors: Ao Sun, Weilin Zhao, Xu Han, Cheng Yang, Zhiyuan Liu, Chuan Shi, Maosong Sun

    Abstract: Effective attention modules have played a crucial role in the success of Transformer-based large language models (LLMs), but the quadratic time and memory complexities of these attention modules also pose a challenge when processing long sequences. One potential solution for the long sequence problem is to utilize distributed clusters to parallelize the computation of attention modules across mult… ▽ More

    Submitted 6 June, 2024; v1 submitted 14 March, 2024; originally announced March 2024.

    Comments: 13 pages, 7 figures

  21. arXiv:2403.02181  [pdf, other

    cs.CL cs.AI cs.LG

    Not All Layers of LLMs Are Necessary During Inference

    Authors: Siqi Fan, Xin Jiang, Xiang Li, Xuying Meng, Peng Han, Shuo Shang, Aixin Sun, Yequan Wang, Zhongyuan Wang

    Abstract: Due to the large number of parameters, the inference phase of Large Language Models (LLMs) is resource-intensive. However, not all requests posed to LLMs are equally difficult to handle. Through analysis, we show that for some tasks, LLMs can achieve results comparable to the final output at some intermediate layers. That is, not all layers of LLMs are necessary during inference. If we can predict… ▽ More

    Submitted 9 July, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

  22. arXiv:2402.17221  [pdf, ps, other

    math.PR cs.DS

    Sharpened localization of the trailing point of the Pareto record frontier

    Authors: James Allen Fill, Daniel Naiman, Ao Sun

    Abstract: For $d\ge2$ and iid $d$-dimensional observations $X^{(1)},X^{(2)},\dots$ with independent Exponential$(1)$ coordinates, we revisit the study by Fill and Naiman (Electron. J. Probab., 2020) of the boundary (relative to the closed positive orthant), or "frontier", $F_n$ of the closed Pareto record-setting (RS) region \[ \mbox{RS}_n:=\{0\le x\in{\mathbb R}^d:x\not\prec X^{(i)}\mbox{\ for all… ▽ More

    Submitted 27 February, 2024; originally announced February 2024.

    Comments: 32 pages, 2 figures. arXiv admin note: text overlap with arXiv:1901.05621

    MSC Class: 60D05 (Primary) 60F05; 60F15; G0G70; 60G17 (Secondary)

  23. arXiv:2402.17220  [pdf, ps, other

    math.PR cs.DS

    On the probability of a Pareto record

    Authors: James Allen Fill, Ao Sun

    Abstract: Given a sequence of independent random vectors taking values in ${\mathbb R}^d$ and having common continuous distribution function $F$, say that the $n^{\rm \scriptsize th}$ observation sets a (Pareto) record if it is not dominated (in every coordinate) by any preceding observation. Let $p_n(F) \equiv p_{n, d}(F)$ denote the probability that the $n^{\rm \scriptsize th}$ observation sets a record.… ▽ More

    Submitted 5 May, 2024; v1 submitted 27 February, 2024; originally announced February 2024.

    Comments: 16 pages, 1 figure; this revision responds to three anonymous reviews; paper accepted to Probability in the Engineering and Informational Sciences

    MSC Class: 60G70 (Primary) 60D05 (Secondary)

  24. arXiv:2402.14440  [pdf, other

    cs.IR

    Recommender for Its Purpose: Repeat and Exploration in Food Delivery Recommendations

    Authors: Jiayu Li, Aixin Sun, Weizhi Ma, Peijie Sun, Min Zhang

    Abstract: Recommender systems have been widely used for various scenarios, such as e-commerce, news, and music, providing online contents to help and enrich users' daily life. Different scenarios hold distinct and unique characteristics, calling for domain-specific investigations and corresponding designed recommender systems. Therefore, in this paper, we focus on food delivery recommendations to unveil uni… ▽ More

    Submitted 22 February, 2024; originally announced February 2024.

    Comments: 11 pages, 5 figures

  25. arXiv:2402.11451  [pdf, other

    cs.CL cs.AI

    SciAgent: Tool-augmented Language Models for Scientific Reasoning

    Authors: Yubo Ma, Zhibin Gou, Junheng Hao, Ruochen Xu, Shuohang Wang, Liangming Pan, Yujiu Yang, Yixin Cao, Aixin Sun, Hany Awadalla, Weizhu Chen

    Abstract: Scientific reasoning poses an excessive challenge for even the most advanced Large Language Models (LLMs). To make this task more practical and solvable for LLMs, we introduce a new task setting named tool-augmented scientific reasoning. This setting supplements LLMs with scalable toolsets, and shifts the focus from pursuing an omniscient problem solver to a proficient tool-user. To facilitate the… ▽ More

    Submitted 20 February, 2024; v1 submitted 17 February, 2024; originally announced February 2024.

  26. arXiv:2402.07329  [pdf, other

    cs.CV

    The Bias of Harmful Label Associations in Vision-Language Models

    Authors: Caner Hazirbas, Alicia Sun, Yonathan Efroni, Mark Ibrahim

    Abstract: Despite the remarkable performance of foundation vision-language models, the shared representation space for text and vision can also encode harmful label associations detrimental to fairness. While prior work has uncovered bias in vision-language models' (VLMs) classification performance across geography, work has been limited along the important axis of harmful label associations due to a lack o… ▽ More

    Submitted 15 April, 2024; v1 submitted 11 February, 2024; originally announced February 2024.

  27. arXiv:2402.05945  [pdf, other

    cs.LG cs.AI

    Eliminating Information Leakage in Hard Concept Bottleneck Models with Supervised, Hierarchical Concept Learning

    Authors: Ao Sun, Yuanyuan Yuan, Pingchuan Ma, Shuai Wang

    Abstract: Concept Bottleneck Models (CBMs) aim to deliver interpretable and interventionable predictions by bridging features and labels with human-understandable concepts. While recent CBMs show promising potential, they suffer from information leakage, where unintended information beyond the concepts (either when concepts are represented with probabilities or binary states) are leaked to the subsequent la… ▽ More

    Submitted 2 February, 2024; originally announced February 2024.

  28. arXiv:2401.14194  [pdf, other

    cs.CL

    Parameter-Efficient Conversational Recommender System as a Language Processing Task

    Authors: Mathieu Ravaut, Hao Zhang, Lu Xu, Aixin Sun, Yong Liu

    Abstract: Conversational recommender systems (CRS) aim to recommend relevant items to users by eliciting user preference through natural language conversation. Prior work often utilizes external knowledge graphs for items' semantic information, a language model for dialogue generation, and a recommendation module for ranking relevant items. This combination of multiple components suffers from a cumbersome t… ▽ More

    Submitted 24 February, 2024; v1 submitted 25 January, 2024; originally announced January 2024.

    Comments: 9 pages, 4 figures, 8 tables, EACL 2024 conference, fixed typo

  29. arXiv:2401.08232  [pdf, other

    cs.CV

    Multi-scale 2D Temporal Map Diffusion Models for Natural Language Video Localization

    Authors: Chongzhi Zhang, Mingyuan Zhang, Zhiyang Teng, Jiayi Li, Xizhou Zhu, Lewei Lu, Ziwei Liu, Aixin Sun

    Abstract: Natural Language Video Localization (NLVL), grounding phrases from natural language descriptions to corresponding video segments, is a complex yet critical task in video understanding. Despite ongoing advancements, many existing solutions lack the capability to globally capture temporal dynamics of the video data. In this study, we present a novel approach to NLVL that aims to address this issue.… ▽ More

    Submitted 16 January, 2024; originally announced January 2024.

  30. arXiv:2401.07342  [pdf, other

    eess.AS cs.LG

    Who Said What? An Automated Approach to Analyzing Speech in Preschool Classrooms

    Authors: Anchen Sun, Juan J Londono, Batya Elbaum, Luis Estrada, Roberto Jose Lazo, Laura Vitale, Hugo Gonzalez Villasanti, Riccardo Fusaroli, Lynn K Perry, Daniel S Messinger

    Abstract: Young children spend substantial portions of their waking hours in noisy preschool classrooms. In these environments, children's vocal interactions with teachers are critical contributors to their language outcomes, but manually transcribing these interactions is prohibitive. Using audio from child- and teacher-worn recorders, we propose an automated framework that uses open source software both t… ▽ More

    Submitted 10 April, 2024; v1 submitted 14 January, 2024; originally announced January 2024.

    Comments: 8 pages, 4 figures, 3 tables, The paper has been accepted to 2024 IEEE International Conference on Development and Learning (ICDL) as a full oral presentation and will appear in the IEEE ICDL proceedings

  31. arXiv:2401.00431  [pdf, other

    cs.CV

    Wild2Avatar: Rendering Humans Behind Occlusions

    Authors: Tiange Xiang, Adam Sun, Scott Delp, Kazuki Kozuka, Li Fei-Fei, Ehsan Adeli

    Abstract: Rendering the visual appearance of moving humans from occluded monocular videos is a challenging task. Most existing research renders 3D humans under ideal conditions, requiring a clear and unobstructed scene. Those methods cannot be used to render humans in real-world scenes where obstacles may block the camera's view and lead to partial occlusions. In this work, we present Wild2Avatar, a neural… ▽ More

    Submitted 31 December, 2023; originally announced January 2024.

  32. arXiv:2312.11286  [pdf, ps, other

    cs.GT

    Envy-free House Allocation under Uncertain Preferences

    Authors: Haris Aziz, Isaiah Iliffe, Bo Li, Angus Ritossa, Ankang Sun, Mashbat Suzuki

    Abstract: We study the envy-free house allocation problem when agents have uncertain preferences over items and consider several well-studied preference uncertainty models. The central problem that we focus on is computing an allocation that has the highest probability of being envy-free. We show that each model leads to a distinct set of algorithmic and complexity results, including detailed results on (in… ▽ More

    Submitted 18 December, 2023; originally announced December 2023.

    Comments: To appear in the proceeding of AAAI2024

  33. arXiv:2312.05187  [pdf, other

    cs.CL cs.SD eess.AS

    Seamless: Multilingual Expressive and Streaming Speech Translation

    Authors: Seamless Communication, Loïc Barrault, Yu-An Chung, Mariano Coria Meglioli, David Dale, Ning Dong, Mark Duppenthaler, Paul-Ambroise Duquenne, Brian Ellis, Hady Elsahar, Justin Haaheim, John Hoffman, Min-Jae Hwang, Hirofumi Inaguma, Christopher Klaiber, Ilia Kulikov, Pengwei Li, Daniel Licht, Jean Maillard, Ruslan Mavlyutov, Alice Rakotoarison, Kaushik Ram Sadagopan, Abinesh Ramakrishnan, Tuan Tran, Guillaume Wenzek , et al. (40 additional authors not shown)

    Abstract: Large-scale automatic speech translation systems today lack key features that help machine-mediated communication feel seamless when compared to human-to-human dialogue. In this work, we introduce a family of models that enable end-to-end expressive and multilingual translations in a streaming fashion. First, we contribute an improved version of the massively multilingual and multimodal SeamlessM4… ▽ More

    Submitted 8 December, 2023; originally announced December 2023.

  34. arXiv:2312.04515  [pdf, other

    cs.CL

    Efficient Monotonic Multihead Attention

    Authors: Xutai Ma, Anna Sun, Siqi Ouyang, Hirofumi Inaguma, Paden Tomasello

    Abstract: We introduce the Efficient Monotonic Multihead Attention (EMMA), a state-of-the-art simultaneous translation model with numerically-stable and unbiased monotonic alignment estimation. In addition, we present improved training and inference strategies, including simultaneous fine-tuning from an offline translation model and reduction of monotonic alignment variance. The experimental results demonst… ▽ More

    Submitted 7 December, 2023; originally announced December 2023.

  35. arXiv:2310.10570  [pdf, other

    cs.CL

    On Context Utilization in Summarization with Large Language Models

    Authors: Mathieu Ravaut, Aixin Sun, Nancy F. Chen, Shafiq Joty

    Abstract: Large language models (LLMs) excel in abstractive summarization tasks, delivering fluent and pertinent summaries. Recent advancements have extended their capabilities to handle long-input contexts, exceeding 100k tokens. However, in question answering, language models exhibit uneven utilization of their input context. They tend to favor the initial and final segments, resulting in a U-shaped perfo… ▽ More

    Submitted 14 June, 2024; v1 submitted 16 October, 2023; originally announced October 2023.

    Comments: ACL 2024. 9 pages, 7 figures, 3 tables

  36. arXiv:2310.05634  [pdf, other

    cs.CL

    Towards Verifiable Generation: A Benchmark for Knowledge-aware Language Model Attribution

    Authors: Xinze Li, Yixin Cao, Liangming Pan, Yubo Ma, Aixin Sun

    Abstract: Although achieving great success, Large Language Models (LLMs) usually suffer from unreliable hallucinations. Although language attribution can be a potential solution, there are no suitable benchmarks and evaluation metrics to attribute LLMs to structured knowledge. In this paper, we define a new task of Knowledge-aware Language Model Attribution (KaLMA) that improves upon three core concerns wit… ▽ More

    Submitted 23 May, 2024; v1 submitted 9 October, 2023; originally announced October 2023.

    Comments: acl findings 2024

  37. arXiv:2310.02720  [pdf, other

    cs.SD eess.AS

    Multi-resolution HuBERT: Multi-resolution Speech Self-Supervised Learning with Masked Unit Prediction

    Authors: Jiatong Shi, Hirofumi Inaguma, Xutai Ma, Ilia Kulikov, Anna Sun

    Abstract: Existing Self-Supervised Learning (SSL) models for speech typically process speech signals at a fixed resolution of 20 milliseconds. This approach overlooks the varying informational content present at different resolutions in speech signals. In contrast, this paper aims to incorporate multi-resolution information into speech self-supervised representation learning. We introduce a SSL model that l… ▽ More

    Submitted 30 January, 2024; v1 submitted 4 October, 2023; originally announced October 2023.

    Comments: Accepted at ICLR2024 as spotlight

  38. arXiv:2309.10989  [pdf, other

    cs.CV

    COSE: A Consistency-Sensitivity Metric for Saliency on Image Classification

    Authors: Rangel Daroya, Aaron Sun, Subhransu Maji

    Abstract: We present a set of metrics that utilize vision priors to effectively assess the performance of saliency methods on image classification tasks. To understand behavior in deep learning models, many methods provide visual saliency maps emphasizing image regions that most contribute to a model prediction. However, there is limited work on analyzing the reliability of saliency methods in explaining mo… ▽ More

    Submitted 19 September, 2023; originally announced September 2023.

  39. arXiv:2309.08837  [pdf, other

    cs.SD eess.AS

    FastGraphTTS: An Ultrafast Syntax-Aware Speech Synthesis Framework

    Authors: Jianzong Wang, Xulong Zhang, Aolan Sun, Ning Cheng, Jing Xiao

    Abstract: This paper integrates graph-to-sequence into an end-to-end text-to-speech framework for syntax-aware modelling with syntactic information of input text. Specifically, the input text is parsed by a dependency parsing module to form a syntactic graph. The syntactic graph is then encoded by a graph encoder to extract the syntactic hidden information, which is concatenated with phoneme embedding and i… ▽ More

    Submitted 15 September, 2023; originally announced September 2023.

    Comments: Accepted by The 35th IEEE International Conference on Tools with Artificial Intelligence. (ICTAI 2023)

  40. arXiv:2309.07001  [pdf, other

    cs.CE cs.AI stat.AP

    Modeling the Evolutionary Trends in Corporate ESG Reporting: A Study based on Knowledge Management Model

    Authors: Ziyuan Xia, Anchen Sun, Xiaodong Cai, Saixing Zeng

    Abstract: Environmental, social, and governance (ESG) reports are globally recognized as a keystone in sustainable enterprise development. However, current literature has not concluded the development of topics and trends in ESG contexts in the twenty-first century. Therefore, We selected 1114 ESG reports from firms in the technology industry to analyze the evolutionary trends of ESG topics by text mining.… ▽ More

    Submitted 25 May, 2024; v1 submitted 13 September, 2023; originally announced September 2023.

    Comments: 29 pages, 10 figures, 3 tables

  41. arXiv:2309.06789  [pdf, other

    cs.IR

    An Image Dataset for Benchmarking Recommender Systems with Raw Pixels

    Authors: Yu Cheng, Yunzhu Pan, Jiaqi Zhang, Yongxin Ni, Aixin Sun, Fajie Yuan

    Abstract: Recommender systems (RS) have achieved significant success by leveraging explicit identification (ID) features. However, the full potential of content features, especially the pure image pixel features, remains relatively unexplored. The limited availability of large, diverse, and content-driven image recommendation datasets has hindered the use of raw images as item representations. In this regar… ▽ More

    Submitted 17 September, 2023; v1 submitted 13 September, 2023; originally announced September 2023.

  42. arXiv:2309.03852  [pdf, other

    cs.CL cs.AI

    FLM-101B: An Open LLM and How to Train It with $100K Budget

    Authors: Xiang Li, Yiqun Yao, Xin Jiang, Xuezhi Fang, Xuying Meng, Siqi Fan, Peng Han, Jing Li, Li Du, Bowen Qin, Zheng Zhang, Aixin Sun, Yequan Wang

    Abstract: Large language models (LLMs) have achieved remarkable success in NLP and multimodal tasks, among others. Despite these successes, two main challenges remain in developing LLMs: (i) high computational cost, and (ii) fair and objective evaluations. In this paper, we report a solution to significantly reduce LLM training cost through a growth strategy. We demonstrate that a 101B-parameter LLM with 0.… ▽ More

    Submitted 17 September, 2023; v1 submitted 7 September, 2023; originally announced September 2023.

  43. arXiv:2308.13561  [pdf, other

    cs.HC cs.CV

    Project Aria: A New Tool for Egocentric Multi-Modal AI Research

    Authors: Jakob Engel, Kiran Somasundaram, Michael Goesele, Albert Sun, Alexander Gamino, Andrew Turner, Arjang Talattof, Arnie Yuan, Bilal Souti, Brighid Meredith, Cheng Peng, Chris Sweeney, Cole Wilson, Dan Barnes, Daniel DeTone, David Caruso, Derek Valleroy, Dinesh Ginjupalli, Duncan Frost, Edward Miller, Elias Mueggler, Evgeniy Oleinik, Fan Zhang, Guruprasad Somasundaram, Gustavo Solaira , et al. (49 additional authors not shown)

    Abstract: Egocentric, multi-modal data as available on future augmented reality (AR) devices provides unique challenges and opportunities for machine perception. These future devices will need to be all-day wearable in a socially acceptable form-factor to support always available, context-aware and personalized AI applications. Our team at Meta Reality Labs Research built the Aria device, an egocentric, mul… ▽ More

    Submitted 1 October, 2023; v1 submitted 24 August, 2023; originally announced August 2023.

  44. arXiv:2308.11596  [pdf, other

    cs.CL

    SeamlessM4T: Massively Multilingual & Multimodal Machine Translation

    Authors: Seamless Communication, Loïc Barrault, Yu-An Chung, Mariano Cora Meglioli, David Dale, Ning Dong, Paul-Ambroise Duquenne, Hady Elsahar, Hongyu Gong, Kevin Heffernan, John Hoffman, Christopher Klaiber, Pengwei Li, Daniel Licht, Jean Maillard, Alice Rakotoarison, Kaushik Ram Sadagopan, Guillaume Wenzek, Ethan Ye, Bapi Akula, Peng-Jen Chen, Naji El Hachem, Brian Ellis, Gabriel Mejia Gonzalez, Justin Haaheim , et al. (43 additional authors not shown)

    Abstract: What does it take to create the Babel Fish, a tool that can help individuals translate speech between any two languages? While recent breakthroughs in text-based models have pushed machine translation coverage beyond 200 languages, unified speech-to-speech translation models have yet to achieve similar strides. More specifically, conventional speech-to-speech translation systems rely on cascaded s… ▽ More

    Submitted 24 October, 2023; v1 submitted 22 August, 2023; originally announced August 2023.

    ACM Class: I.2.7

  45. arXiv:2308.04622  [pdf, other

    cs.CV

    Rendering Humans from Object-Occluded Monocular Videos

    Authors: Tiange Xiang, Adam Sun, Jiajun Wu, Ehsan Adeli, Li Fei-Fei

    Abstract: 3D understanding and rendering of moving humans from monocular videos is a challenging task. Despite recent progress, the task remains difficult in real-world scenarios, where obstacles may block the camera view and cause partial occlusions in the captured videos. Existing methods cannot handle such defects due to two reasons. First, the standard rendering strategy relies on point-point mapping, w… ▽ More

    Submitted 8 August, 2023; originally announced August 2023.

    Comments: ICCV 2023, project page: https://cs.stanford.edu/~xtiange/projects/occnerf/

  46. arXiv:2307.16382  [pdf, other

    cs.LG cs.CL

    Does fine-tuning GPT-3 with the OpenAI API leak personally-identifiable information?

    Authors: Albert Yu Sun, Eliott Zemour, Arushi Saxena, Udith Vaidyanathan, Eric Lin, Christian Lau, Vaikkunth Mugunthan

    Abstract: Machine learning practitioners often fine-tune generative pre-trained models like GPT-3 to improve model performance at specific tasks. Previous works, however, suggest that fine-tuned machine learning models memorize and emit sensitive information from the original fine-tuning dataset. Companies such as OpenAI offer fine-tuning services for their models, but no prior work has conducted a memoriza… ▽ More

    Submitted 15 April, 2024; v1 submitted 30 July, 2023; originally announced July 2023.

  47. arXiv:2307.16090  [pdf, other

    physics.flu-dyn cs.LG

    Rapid Flood Inundation Forecast Using Fourier Neural Operator

    Authors: Alexander Y. Sun, Zhi Li, Wonhyun Lee, Qixing Huang, Bridget R. Scanlon, Clint Dawson

    Abstract: Flood inundation forecast provides critical information for emergency planning before and during flood events. Real time flood inundation forecast tools are still lacking. High-resolution hydrodynamic modeling has become more accessible in recent years, however, predicting flood extents at the street and building levels in real-time is still computationally demanding. Here we present a hybrid proc… ▽ More

    Submitted 29 July, 2023; originally announced July 2023.

    Comments: Artificial Intelligence (AI) and Humanitarian Assistance and Disaster Recovery (HADR) workshop, ICCV 2023 in Paris, France

  48. arXiv:2307.15061  [pdf, other

    cs.CV cs.RO

    The RoboDepth Challenge: Methods and Advancements Towards Robust Depth Estimation

    Authors: Lingdong Kong, Yaru Niu, Shaoyuan Xie, Hanjiang Hu, Lai Xing Ng, Benoit R. Cottereau, Ding Zhao, Liangjun Zhang, Hesheng Wang, Wei Tsang Ooi, Ruijie Zhu, Ziyang Song, Li Liu, Tianzhu Zhang, Jun Yu, Mohan Jing, Pengwei Li, Xiaohua Qi, Cheng Jin, Yingfeng Chen, Jie Hou, Jie Zhang, Zhen Kan, Qiang Ling, Liang Peng , et al. (18 additional authors not shown)

    Abstract: Accurate depth estimation under out-of-distribution (OoD) scenarios, such as adverse weather conditions, sensor failure, and noise contamination, is desirable for safety-critical applications. Existing depth estimation systems, however, suffer inevitably from real-world corruptions and perturbations and are struggled to provide reliable depth predictions under such cases. In this paper, we summari… ▽ More

    Submitted 27 July, 2023; originally announced July 2023.

    Comments: Technical Report; 65 pages, 34 figures, 24 tables; Code at https://github.com/ldkong1205/RoboDepth

  49. arXiv:2307.09985  [pdf, other

    cs.IR cs.AI

    Our Model Achieves Excellent Performance on MovieLens: What Does it Mean?

    Authors: Yu-chen Fan, Yitong Ji, Jie Zhang, Aixin Sun

    Abstract: A typical benchmark dataset for recommender system (RecSys) evaluation consists of user-item interactions generated on a platform within a time period. The interaction generation mechanism partially explains why a user interacts with (e.g., like, purchase, rate) an item, and the context of when a particular interaction happened. In this study, we conduct a meticulous analysis of the MovieLens data… ▽ More

    Submitted 24 March, 2024; v1 submitted 19 July, 2023; originally announced July 2023.

  50. arXiv:2307.06506  [pdf, other

    cs.DL cs.SI

    Research Explosion: More Effort to Climb onto Shoulders of the Giant

    Authors: Guoxiu He, Aixin Sun, Wei Lu

    Abstract: Fast-growing scientific publications present challenges to the scientific community. In this paper, we describe their implications to researchers. As references form explicit foundations for researchers to conduct a study, we investigate the evolution in reference patterns based on 60.8 million papers published from 1960 to 2015. The results demonstrate that recent papers contain more references t… ▽ More

    Submitted 12 July, 2023; originally announced July 2023.

    Comments: 21 pages, 24 figures