Search | arXiv e-print repository

arXiv:2407.06939 [pdf, other]

Towards Open-World Mobile Manipulation in Homes: Lessons from the Neurips 2023 HomeRobot Open Vocabulary Mobile Manipulation Challenge

Authors: Sriram Yenamandra, Arun Ramachandran, Mukul Khanna, Karmesh Yadav, Jay Vakil, Andrew Melnik, Michael Büttner, Leon Harz, Lyon Brown, Gora Chand Nandi, Arjun PS, Gaurav Kumar Yadav, Rahul Kala, Robert Haschke, Yang Luo, Jinxin Zhu, Yansen Han, Bingyi Lu, Xuan Gu, Qinyuan Liu, Yaping Zhao, Qiting Ye, Chenxiao Dou, Yansong Chua, Volodymyr Kuzma , et al. (20 additional authors not shown)

Abstract: In order to develop robots that can effectively serve as versatile and capable home assistants, it is crucial for them to reliably perceive and interact with a wide variety of objects across diverse environments. To this end, we proposed Open Vocabulary Mobile Manipulation as a key benchmark task for robotics: finding any object in a novel environment and placing it on any receptacle surface withi… ▽ More In order to develop robots that can effectively serve as versatile and capable home assistants, it is crucial for them to reliably perceive and interact with a wide variety of objects across diverse environments. To this end, we proposed Open Vocabulary Mobile Manipulation as a key benchmark task for robotics: finding any object in a novel environment and placing it on any receptacle surface within that environment. We organized a NeurIPS 2023 competition featuring both simulation and real-world components to evaluate solutions to this task. Our baselines on the most challenging version of this task, using real perception in simulation, achieved only an 0.8% success rate; by the end of the competition, the best participants achieved an 10.8\% success rate, a 13x improvement. We observed that the most successful teams employed a variety of methods, yet two common threads emerged among the best solutions: enhancing error detection and recovery, and improving the integration of perception with decision-making processes. In this paper, we detail the results and methodologies used, both in simulation and real-world settings. We discuss the lessons learned and their implications for future research. Additionally, we compare performance in real and simulated environments, emphasizing the necessity for robust generalization to novel settings. △ Less

Submitted 9 July, 2024; originally announced July 2024.

arXiv:2407.06370 [pdf, other]

An extreme thermal cycling reliability test of ATLAS ITk Strips barrel modules

Authors: A. Tishelman-Charny, A. Affolder, F. Capocasa, E. Duden, V. Fadeyev, M. Gignac, C. Helling, H. Herde, J. Johnson, D. Lynn, M. Morii, A. Mitra, L. Poley, G. Sciolla, S. Stucci, P. Sharma, G. Van Nieuwenhuizen, E. Wallin, A. Wang, S. Wonsak

Abstract: At the end of Run 3 of the Large Hadron Collider (LHC), the accelerator complex will be upgraded to the High-Luminosity LHC (HL-LHC) in order to increase the total amount of data provided to its experiments. To cope with the increased rates of data, radiation, and pileup, the ATLAS detector will undergo a substantial upgrade, including a replacement of the Inner Detector with a future Inner Tracke… ▽ More At the end of Run 3 of the Large Hadron Collider (LHC), the accelerator complex will be upgraded to the High-Luminosity LHC (HL-LHC) in order to increase the total amount of data provided to its experiments. To cope with the increased rates of data, radiation, and pileup, the ATLAS detector will undergo a substantial upgrade, including a replacement of the Inner Detector with a future Inner Tracker, called the ITk. The ITk will be composed of pixel and strip sub-detectors, where the strips portion will be composed of 17,888 silicon strip detector modules. During the HL-LHC running period, the ITk will be cooled and warmed a number of times from about ${-35}^\circ$C to room temperature as part of the operational cycle, including warm-ups during yearly shutdowns. To ensure ITk Strips modules are functional after these expected temperature changes, and to ensure modules are mechanically robust, each module must undergo ten thermal cycles and pass a set of electrical and mechanical criteria before it is placed on a local support structure. This paper describes the thermal cycling Quality Control (QC) procedure, and results from the barrel pre-production phase (about 5% of the production volume). Additionally, in order to assess the headroom of the nominal QC procedure of 10 cycles and to ensure modules don't begin failing soon after, four representative ITk Strips barrel modules were thermally cycled 100 times - this study is also described. △ Less

Submitted 8 July, 2024; originally announced July 2024.

arXiv:2407.06066 [pdf, other]

Revisiting the Ultraviolet Tail of the Primordial Gravitational Wave

Authors: Shi Pi, Misao Sasaki, Ao Wang, Jianing Wang

Abstract: High-frequency primordial gravitational waves (PGWs) with wave numbers larger than the Hubble parameter at the end of inflation are originated from the ultraviolet (UV) modes, which are never stretched out of the horizon. Such a UV tail of the PGW energy spectrum has a spurious logarithmic divergence. We study the origin of such a divergence, and find that it comes from the instantaneous inflation… ▽ More High-frequency primordial gravitational waves (PGWs) with wave numbers larger than the Hubble parameter at the end of inflation are originated from the ultraviolet (UV) modes, which are never stretched out of the horizon. Such a UV tail of the PGW energy spectrum has a spurious logarithmic divergence. We study the origin of such a divergence, and find that it comes from the instantaneous inflation-to-post-inflation transition, which can be removed by considering a finite duration. For the first time, we obtain a semi-analytical expression for the PGW energy spectrum. We find that the UV tail decays exponentially, while the decay rate depends solely on the transition rate. When there is a stiff post-inflationary stage, the enhanced PGW displays a characteristic spectral shape of power-law increasing and exponential decaying. We propose a fitting formula which can be used for signal searching. △ Less

Submitted 8 July, 2024; originally announced July 2024.

Comments: 23 pages, 13 figures

Report number: YITP-24-39

arXiv:2407.05769 [pdf, other]

Boosting 3D Object Detection with Semantic-Aware Multi-Branch Framework

Authors: Hao Jing, Anhong Wang, Lijun Zhao, Yakun Yang, Donghan Bu, Jing Zhang, Yifan Zhang, Junhui Hou

Abstract: In autonomous driving, LiDAR sensors are vital for acquiring 3D point clouds, providing reliable geometric information. However, traditional sampling methods of preprocessing often ignore semantic features, leading to detail loss and ground point interference in 3D object detection. To address this, we propose a multi-branch two-stage 3D object detection framework using a Semantic-aware Multi-bran… ▽ More In autonomous driving, LiDAR sensors are vital for acquiring 3D point clouds, providing reliable geometric information. However, traditional sampling methods of preprocessing often ignore semantic features, leading to detail loss and ground point interference in 3D object detection. To address this, we propose a multi-branch two-stage 3D object detection framework using a Semantic-aware Multi-branch Sampling (SMS) module and multi-view consistency constraints. The SMS module includes random sampling, Density Equalization Sampling (DES) for enhancing distant objects, and Ground Abandonment Sampling (GAS) to focus on non-ground points. The sampled multi-view points are processed through a Consistent KeyPoint Selection (CKPS) module to generate consistent keypoint masks for efficient proposal sampling. The first-stage detector uses multi-branch parallel learning with multi-view consistency loss for feature aggregation, while the second-stage detector fuses multi-view data through a Multi-View Fusion Pooling (MVFP) module to precisely predict 3D objects. The experimental results on KITTI 3D object detection benchmark dataset show that our method achieves excellent detection performance improvement for a variety of backbones, especially for low-performance backbones with the simple network structures. △ Less

Submitted 8 July, 2024; originally announced July 2024.

arXiv:2407.01351 [pdf, other]

Probing the connection between IceCube neutrinos and MOJAVE AGN

Authors: R. Abbasi, M. Ackermann, J. Adams, S. K. Agarwalla, J. A. Aguilar, M. Ahlers, J. M. Alameddine, N. M. Amin, K. Andeen, C. Argüelles, Y. Ashida, S. Athanasiadou, L. Ausborm, S. N. Axani, X. Bai, A. Balagopal V., M. Baricevic, S. W. Barwick, S. Bash, V. Basu, R. Bay, J. J. Beatty, J. Becker Tjus, J. Beise, C. Bellenghi , et al. (399 additional authors not shown)

Abstract: Active Galactic Nuclei (AGN) are prime candidate sources of the high-energy, astrophysical neutrinos detected by IceCube. This is demonstrated by the real-time multi-messenger detection of the blazar TXS 0506+056 and the recent evidence of neutrino emission from NGC 1068 from a separate time-averaged study. However, the production mechanism of the astrophysical neutrinos in AGN is not well establi… ▽ More Active Galactic Nuclei (AGN) are prime candidate sources of the high-energy, astrophysical neutrinos detected by IceCube. This is demonstrated by the real-time multi-messenger detection of the blazar TXS 0506+056 and the recent evidence of neutrino emission from NGC 1068 from a separate time-averaged study. However, the production mechanism of the astrophysical neutrinos in AGN is not well established which can be resolved via correlation studies with photon observations. For neutrinos produced due to photohadronic interactions in AGN, in addition to a correlation of neutrinos with high-energy photons, there would also be a correlation of neutrinos with photons emitted at radio wavelengths. In this work, we perform an in-depth stacking study of the correlation between 15 GHz radio observations of AGN reported in the MOJAVE XV catalog, and ten years of neutrino data from IceCube. We also use a time-dependent approach which improves the statistical power of the stacking analysis. No significant correlation was found for both analyses and upper limits are reported. When compared to the IceCube diffuse flux, at 100 TeV and for a spectral index of 2.5, the upper limits derived are $\sim3\%$ and $\sim9\%$ for the time-averaged and time-dependent case, respectively. △ Less

Submitted 1 July, 2024; originally announced July 2024.

Comments: 14 Pages 7 Figures

arXiv:2407.01314 [pdf, other]

Search for a light sterile neutrino with 7.5 years of IceCube DeepCore data

Authors: R. Abbasi, M. Ackermann, J. Adams, S. K. Agarwalla, J. A. Aguilar, M. Ahlers, J. M. Alameddine, N. M. Amin, K. Andeen, C. Argüelles, Y. Ashida, S. Athanasiadou, L. Ausborm, S. N. Axani, X. Bai, A. Balagopal V., M. Baricevic, S. W. Barwick, S. Bash, V. Basu, R. Bay, J. J. Beatty, J. Becker Tjus, J. Beise, C. Bellenghi , et al. (399 additional authors not shown)

Abstract: We present a search for an eV-scale sterile neutrino using 7.5 years of data from the IceCube DeepCore detector. The analysis uses a sample of 21,914 events with energies between 5 and 150 GeV to search for sterile neutrinos through atmospheric muon neutrino disappearance. Improvements in event selection and treatment of systematic uncertainties provide greater statistical power compared to previo… ▽ More We present a search for an eV-scale sterile neutrino using 7.5 years of data from the IceCube DeepCore detector. The analysis uses a sample of 21,914 events with energies between 5 and 150 GeV to search for sterile neutrinos through atmospheric muon neutrino disappearance. Improvements in event selection and treatment of systematic uncertainties provide greater statistical power compared to previous DeepCore sterile neutrino searches. Our results are compatible with the absence of mixing between active and sterile neutrino states, and we place constraints on the mixing matrix elements $|U_{μみゅー4}|^2 < 0.0534$ and $|U_{τたう4}|^2 < 0.0574$ at 90% CL under the assumption that $Δでるたm^2_{41}\geq 1\;\mathrm{eV^2}$. These null results add to the growing tension between anomalous appearance results and constraints from disappearance searches in the 3+1 sterile neutrino landscape. △ Less

Submitted 1 July, 2024; originally announced July 2024.

Comments: 11 pages, 5 figures. To be submitted to Physical Review D

arXiv:2407.00320 [pdf, other]

LiteSearch: Efficacious Tree Search for LLM

Authors: Ante Wang, Linfeng Song, Ye Tian, Baolin Peng, Dian Yu, Haitao Mi, Jinsong Su, Dong Yu

Abstract: Recent research suggests that tree search algorithms (e.g. Monte Carlo Tree Search) can dramatically boost LLM performance on complex mathematical reasoning tasks. However, they often require more than 10 times the computational resources of greedy decoding due to wasteful search strategies, making them difficult to be deployed in practical applications. This study introduces a novel guided tree s… ▽ More Recent research suggests that tree search algorithms (e.g. Monte Carlo Tree Search) can dramatically boost LLM performance on complex mathematical reasoning tasks. However, they often require more than 10 times the computational resources of greedy decoding due to wasteful search strategies, making them difficult to be deployed in practical applications. This study introduces a novel guided tree search algorithm with dynamic node selection and node-level exploration budget (maximum number of children) calculation to tackle this issue. By considering the search progress towards the final answer (history) and the guidance from a value network (future) trained without any step-wise annotations, our algorithm iteratively selects the most promising tree node before expanding it within the boundaries of the allocated computational budget. Experiments conducted on the GSM8K and TabMWP datasets demonstrate that our approach not only offers competitive performance but also enjoys significantly lower computational costs compared to baseline methods. △ Less

Submitted 29 June, 2024; originally announced July 2024.

arXiv:2407.00287 [pdf, ps, other]

Even- and odd-parity stabilities of black holes in Einstein-Aether gravity

Authors: Antonio De Felice, Shinji Mukohyama, Shinji Tsujikawa, Anzhong Wang, Chao Zhang

Abstract: In Einstein-Aether theories with a timelike unit vector field, we study the linear stability of static and spherically symmetric black holes against both even- and odd-parity perturbations. For this purpose, we formulate a gauge-invariant black hole perturbation theory in the background Aether-orthogonal frame where the spacelike property of hypersurfaces orthogonal to the timelike Aether field is… ▽ More In Einstein-Aether theories with a timelike unit vector field, we study the linear stability of static and spherically symmetric black holes against both even- and odd-parity perturbations. For this purpose, we formulate a gauge-invariant black hole perturbation theory in the background Aether-orthogonal frame where the spacelike property of hypersurfaces orthogonal to the timelike Aether field is always maintained even inside the metric horizon. Using a short-wavelength approximation with large radial and angular momenta, we show that, in general, there are three dynamical degrees of freedom arising from the even-parity sector besides two propagating degrees of freedom present in the odd-parity sector. The propagation speeds of even-parity perturbations and their no-ghost conditions coincide with those of tensor, vector, and scalar perturbations on the Minkowski background, while the odd sector contains tensor and vector modes with the same propagation speeds as those in the even-parity sector (and hence as those on the Minkowski background). Thus, the consistent study of black hole perturbations in the Aether-orthogonal frame on static and spherically symmetric backgrounds does not add new small-scale stability conditions to those known for the Minkowski background in the literature. △ Less

Submitted 28 June, 2024; originally announced July 2024.

Comments: 12 pages, no figures

Report number: YITP-24-74, IPMU24-0028, WUCG-24-06

arXiv:2406.16860 [pdf, other]

Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs

Authors: Shengbang Tong, Ellis Brown, Penghao Wu, Sanghyun Woo, Manoj Middepogu, Sai Charitha Akula, Jihan Yang, Shusheng Yang, Adithya Iyer, Xichen Pan, Austin Wang, Rob Fergus, Yann LeCun, Saining Xie

Abstract: We introduce Cambrian-1, a family of multimodal LLMs (MLLMs) designed with a vision-centric approach. While stronger language models can enhance multimodal capabilities, the design choices for vision components are often insufficiently explored and disconnected from visual representation learning research. This gap hinders accurate sensory grounding in real-world scenarios. Our study uses LLMs and… ▽ More We introduce Cambrian-1, a family of multimodal LLMs (MLLMs) designed with a vision-centric approach. While stronger language models can enhance multimodal capabilities, the design choices for vision components are often insufficiently explored and disconnected from visual representation learning research. This gap hinders accurate sensory grounding in real-world scenarios. Our study uses LLMs and visual instruction tuning as an interface to evaluate various visual representations, offering new insights into different models and architectures -- self-supervised, strongly supervised, or combinations thereof -- based on experiments with over 20 vision encoders. We critically examine existing MLLM benchmarks, addressing the difficulties involved in consolidating and interpreting results from various tasks, and introduce a new vision-centric benchmark, CV-Bench. To further improve visual grounding, we propose the Spatial Vision Aggregator (SVA), a dynamic and spatially-aware connector that integrates high-resolution vision features with LLMs while reducing the number of tokens. Additionally, we discuss the curation of high-quality visual instruction-tuning data from publicly available sources, emphasizing the importance of data source balancing and distribution ratio. Collectively, Cambrian-1 not only achieves state-of-the-art performance but also serves as a comprehensive, open cookbook for instruction-tuned MLLMs. We provide model weights, code, supporting tools, datasets, and detailed instruction-tuning and evaluation recipes. We hope our release will inspire and accelerate advancements in multimodal systems and visual representation learning. △ Less

Submitted 24 June, 2024; originally announced June 2024.

Comments: Website at https://cambrian-mllm.github.io

arXiv:2406.16251 [pdf, other]

Probing critical spin fluctuations with a composite magnetoelectric method: A case study on a Kitaev spin liquid candidate Na$_3$Co$_2$SbO$_6$

Authors: Xinrun Mi, Xintong Li, Long Zhang, Aifeng Wang, Yuan Li, Yisheng Chai, Mingquan He

Abstract: In correlated quantum materials, divergent critical fluctuations near the quantum critical point are often closely associated with exotic quantum phases of matter, such as unconventional superconductivity and quantum spin liquids. Here we present a simple yet highly sensitive composite magnetoelectric (ME) method for detecting the critical spin fluctuations in quantum magnets. The ME signal is pro… ▽ More In correlated quantum materials, divergent critical fluctuations near the quantum critical point are often closely associated with exotic quantum phases of matter, such as unconventional superconductivity and quantum spin liquids. Here we present a simple yet highly sensitive composite magnetoelectric (ME) method for detecting the critical spin fluctuations in quantum magnets. The ME signal is proportional the magnetostriction coefficient, which directly probes the product of magnetization and spin-spin correlation. As a demonstration, the composite ME method is applied to a Kitaev quantum spin liquid candidate Na$_3$Co$_2$SbO$_6$, which shows signs of magnetic field-induced quantum criticality. Notably, the ME signal prominently diverges at the magnetic field-induced tricritical points, particularly at a tricritical point that lies in close proximity to a zero-temperature quantum critical point. A crucial aspect of these tricritical points is their tunability through the modification of the in-plane magnetic field's direction. The direction of magnetic field can thus serve as a handful yet important tuning parameter, alongside pressure and chemical doping, for searching quantum critical points in quantum magnets with pronounced magnetic anisotropy. △ Less

Submitted 23 June, 2024; originally announced June 2024.

Comments: 6 pages, 4 figures

arXiv:2406.13323 [pdf, other]

An alkali-referenced vector spectrum analyzer for visible-light integrated photonics

Authors: Baoqi Shi, Ming-Yang Zheng, Yunkai Zhao, Yi-Han Luo, Jinbao Long, Wei Sun, Wenbo Ma, Xiu-Ping Xie, Lan Gao, Chen Shen, Anting Wang, Wei Liang, Qiang Zhang, Junqiu Liu

Abstract: Integrated photonics has reformed our information society by offering on-chip optical signal synthesis, processing and detection with reduced size, weight and power consumption. As such, it has been successfully established in the near-infrared (NIR) telecommunication bands. With the soaring demand in miniaturized systems for biosensing, quantum information and transportable atomic clocks, extensi… ▽ More Integrated photonics has reformed our information society by offering on-chip optical signal synthesis, processing and detection with reduced size, weight and power consumption. As such, it has been successfully established in the near-infrared (NIR) telecommunication bands. With the soaring demand in miniaturized systems for biosensing, quantum information and transportable atomic clocks, extensive endeavors have been stacked on translating integrated photonics into the visible spectrum, i.e. visible-light integrated photonics. Various innovative visible-light integrated devices have been demonstrated, such as lasers, frequency combs, and atom traps, highlighting the capacity and prospect to create chip-based optical atomic clocks that can make timing and frequency metrology ubiquitous. A pillar to the development of visible-light integrated photonics is characterization techniques featuring high frequency resolution and wide spectral coverage, which however remain elusive. Here, we demonstrate a vector spectrum analyzer (VSA) for visible-light integrated photonics, offering spectral bandwidth from 766 to 795 nm and frequency resolution of 415 kHzきろへるつ. The VSA is rooted on a widely chirping, high-power, narrow-linewidth, mode-hop-free laser around 780 nm, which is frequency-doubled from the near-infrared via an efficient, broadband CPLN waveguide. The VSA is further referenced to hyperfine structures of rubidium and potassium atoms, enabling 8.1 MHz frequency accuracy. We apply our VSA to showcase the characterization of loss, dispersion and phase response of passive integrated devices, as well as densely spaced spectra of mode-locked lasers. Combining operation in the NIR and visible spectra, our VSA allows characterization bandwidth exceeding an octave and can be an invaluable diagnostic tool for spectroscopy, nonlinear optical processing, imaging and quantum interfaces to atomic devices. △ Less

Submitted 19 June, 2024; originally announced June 2024.

arXiv:2406.12874 [pdf, other]

The Design, Implementation, and Performance of the LZ Calibration Systems

Authors: J. Aalbers, D. S. Akerib, A. K. Al Musalhi, F. Alder, C. S. Amarasinghe, A. Ames, T. J. Anderson, N. Angelides, H. M. Araújo, J. E. Armstrong, M. Arthurs, A. Baker, S. Balashov, J. Bang, E. E. Barillier, J. W. Bargemann, K. Beattie, T. Benson, A. Bhatti, A. Biekert, T. P. Biesiadzinski, H. J. Birch, E. Bishop, G. M. Blockinger, B. Boxer , et al. (179 additional authors not shown)

Abstract: LUX-ZEPLIN (LZ) is a tonne-scale experiment searching for direct dark matter interactions and other rare events. It is located at the Sanford Underground Research Facility (SURF) in Lead, South Dakota, USA. The core of the LZ detector is a dual-phase xenon time projection chamber (TPC), designed with the primary goal of detecting Weakly Interacting Massive Particles (WIMPs) via their induced low e… ▽ More LUX-ZEPLIN (LZ) is a tonne-scale experiment searching for direct dark matter interactions and other rare events. It is located at the Sanford Underground Research Facility (SURF) in Lead, South Dakota, USA. The core of the LZ detector is a dual-phase xenon time projection chamber (TPC), designed with the primary goal of detecting Weakly Interacting Massive Particles (WIMPs) via their induced low energy nuclear recoils. Surrounding the TPC, two veto detectors immersed in an ultra-pure water tank enable reducing background events to enhance the discovery potential. Intricate calibration systems are purposely designed to precisely understand the responses of these three detector volumes to various types of particle interactions and to demonstrate LZ's ability to discriminate between signals and backgrounds. In this paper, we present a comprehensive discussion of the key features, requirements, and performance of the LZ calibration systems, which play a crucial role in enabling LZ's WIMP-search and its broad science program. The thorough description of these calibration systems, with an emphasis on their novel aspects, is valuable for future calibration efforts in direct dark matter and other rare-event search experiments. △ Less

Submitted 20 June, 2024; v1 submitted 2 May, 2024; originally announced June 2024.

arXiv:2406.12723 [pdf, other]

BIOSCAN-5M: A Multimodal Dataset for Insect Biodiversity

Authors: Zahra Gharaee, Scott C. Lowe, ZeMing Gong, Pablo Millan Arias, Nicholas Pellegrino, Austin T. Wang, Joakim Bruslund Haurum, Iuliia Zarubiieva, Lila Kari, Dirk Steinke, Graham W. Taylor, Paul Fieguth, Angel X. Chang

Abstract: As part of an ongoing worldwide effort to comprehend and monitor insect biodiversity, this paper presents the BIOSCAN-5M Insect dataset to the machine learning community and establish several benchmark tasks. BIOSCAN-5M is a comprehensive dataset containing multi-modal information for over 5 million insect specimens, and it significantly expands existing image-based biological datasets by includin… ▽ More As part of an ongoing worldwide effort to comprehend and monitor insect biodiversity, this paper presents the BIOSCAN-5M Insect dataset to the machine learning community and establish several benchmark tasks. BIOSCAN-5M is a comprehensive dataset containing multi-modal information for over 5 million insect specimens, and it significantly expands existing image-based biological datasets by including taxonomic labels, raw nucleotide barcode sequences, assigned barcode index numbers, and geographical information. We propose three benchmark experiments to demonstrate the impact of the multi-modal data types on the classification and clustering accuracy. First, we pretrain a masked language model on the DNA barcode sequences of the BIOSCAN-5M dataset, and demonstrate the impact of using this large reference library on species- and genus-level classification performance. Second, we propose a zero-shot transfer learning task applied to images and DNA barcodes to cluster feature embeddings obtained from self-supervised learning, to investigate whether meaningful clusters can be derived from these representation embeddings. Third, we benchmark multi-modality by performing contrastive learning on DNA barcodes, image data, and taxonomic information. This yields a general shared embedding space enabling taxonomic classification using multiple types of information and modalities. The code repository of the BIOSCAN-5M Insect dataset is available at https://github.com/zahrag/BIOSCAN-5M. △ Less

Submitted 24 June, 2024; v1 submitted 18 June, 2024; originally announced June 2024.

arXiv:2406.12292 [pdf, other]

JEN-1 DreamStyler: Customized Musical Concept Learning via Pivotal Parameters Tuning

Authors: Boyu Chen, Peike Li, Yao Yao, Alex Wang

Abstract: Large models for text-to-music generation have achieved significant progress, facilitating the creation of high-quality and varied musical compositions from provided text prompts. However, input text prompts may not precisely capture user requirements, particularly when the objective is to generate music that embodies a specific concept derived from a designated reference collection. In this paper… ▽ More Large models for text-to-music generation have achieved significant progress, facilitating the creation of high-quality and varied musical compositions from provided text prompts. However, input text prompts may not precisely capture user requirements, particularly when the objective is to generate music that embodies a specific concept derived from a designated reference collection. In this paper, we propose a novel method for customized text-to-music generation, which can capture the concept from a two-minute reference music and generate a new piece of music conforming to the concept. We achieve this by fine-tuning a pretrained text-to-music model using the reference music. However, directly fine-tuning all parameters leads to overfitting issues. To address this problem, we propose a Pivotal Parameters Tuning method that enables the model to assimilate the new concept while preserving its original generative capabilities. Additionally, we identify a potential concept conflict when introducing multiple concepts into the pretrained model. We present a concept enhancement strategy to distinguish multiple concepts, enabling the fine-tuned model to generate music incorporating either individual or multiple concepts simultaneously. Since we are the first to work on the customized music generation task, we also introduce a new dataset and evaluation protocol for the new task. Our proposed Jen1-DreamStyler outperforms several baselines in both qualitative and quantitative evaluations. Demos will be available at https://www.jenmusic.ai/research#DreamStyler. △ Less

Submitted 18 June, 2024; originally announced June 2024.

arXiv:2406.09321 [pdf, other]

JailbreakEval: An Integrated Toolkit for Evaluating Jailbreak Attempts Against Large Language Models

Authors: Delong Ran, Jinyuan Liu, Yichen Gong, Jingyi Zheng, Xinlei He, Tianshuo Cong, Anyu Wang

Abstract: Jailbreak attacks aim to induce Large Language Models (LLMs) to generate harmful responses for forbidden instructions, presenting severe misuse threats to LLMs. Up to now, research into jailbreak attacks and defenses is emerging, however, there is (surprisingly) no consensus on how to evaluate whether a jailbreak attempt is successful. In other words, the methods to assess the harmfulness of an LL… ▽ More Jailbreak attacks aim to induce Large Language Models (LLMs) to generate harmful responses for forbidden instructions, presenting severe misuse threats to LLMs. Up to now, research into jailbreak attacks and defenses is emerging, however, there is (surprisingly) no consensus on how to evaluate whether a jailbreak attempt is successful. In other words, the methods to assess the harmfulness of an LLM's response are varied, such as manual annotation or prompting GPT-4 in specific ways. Each approach has its own set of strengths and weaknesses, impacting their alignment with human values, as well as the time and financial cost. This diversity in evaluation presents challenges for researchers in choosing suitable evaluation methods and conducting fair comparisons across different jailbreak attacks and defenses. In this paper, we conduct a comprehensive analysis of jailbreak evaluation methodologies, drawing from nearly ninety jailbreak research released between May 2023 and April 2024. Our study introduces a systematic taxonomy of jailbreak evaluators, offering in-depth insights into their strengths and weaknesses, along with the current status of their adaptation. Moreover, to facilitate subsequent research, we propose JailbreakEval, a user-friendly toolkit focusing on the evaluation of jailbreak attempts. It includes various well-known evaluators out-of-the-box, so that users can obtain evaluation results with only a single command. JailbreakEval also allows users to customize their own evaluation workflow in a unified framework with the ease of development and comparison. In summary, we regard JailbreakEval to be a catalyst that simplifies the evaluation process in jailbreak research and fosters an inclusive standard for jailbreak evaluation within the community. △ Less

Submitted 13 June, 2024; originally announced June 2024.

Comments: Our code is available at https://github.com/ThuCCSLab/JailbreakEval

arXiv:2406.08877 [pdf, other]

EgoExo-Fitness: Towards Egocentric and Exocentric Full-Body Action Understanding

Authors: Yuan-Ming Li, Wei-Jin Huang, An-Lan Wang, Ling-An Zeng, Jing-Ke Meng, Wei-Shi Zheng

Abstract: We present EgoExo-Fitness, a new full-body action understanding dataset, featuring fitness sequence videos recorded from synchronized egocentric and fixed exocentric (third-person) cameras. Compared with existing full-body action understanding datasets, EgoExo-Fitness not only contains videos from first-person perspectives, but also provides rich annotations. Specifically, two-level temporal bound… ▽ More We present EgoExo-Fitness, a new full-body action understanding dataset, featuring fitness sequence videos recorded from synchronized egocentric and fixed exocentric (third-person) cameras. Compared with existing full-body action understanding datasets, EgoExo-Fitness not only contains videos from first-person perspectives, but also provides rich annotations. Specifically, two-level temporal boundaries are provided to localize single action videos along with sub-steps of each action. More importantly, EgoExo-Fitness introduces innovative annotations for interpretable action judgement--including technical keypoint verification, natural language comments on action execution, and action quality scores. Combining all of these, EgoExo-Fitness provides new resources to study egocentric and exocentric full-body action understanding across dimensions of "what", "when", and "how well". To facilitate research on egocentric and exocentric full-body action understanding, we construct benchmarks on a suite of tasks (i.e., action classification, action localization, cross-view sequence verification, cross-view skill determination, and a newly proposed task of guidance-based execution verification), together with detailed analysis. Code and data will be available at https://github.com/iSEE-Laboratory/EgoExo-Fitness/tree/main. △ Less

Submitted 13 June, 2024; originally announced June 2024.

Comments: 33 pages, 9 figures

arXiv:2406.08274 [pdf, other]

The Camera and Readout for the Trinity Demonstrator and the EUSO-SPB2 Cherenkov Telescope

Authors: Mahdi Bagheri, Srikar Gadamsetty, Eliza Gazda, Eleanor Judd, Evgeny Kuznetsov, A. Nepomuk Otte, Mathew Potts, Oscar Romero Matamala, Noah Shapera, Joshua Sorell, Svanik Tandon, Andrew Wang

Abstract: We developed a modular silicon photomultiplier camera to detect Earth-skimming PeV to EeV tau neutrinos with the imaging atmospheric Cherenkov technique. We built two cameras, a 256-pixel camera with S14161-6050HS SiPMs for the Trinity Demonstrator located on Frisco Peak, Utah, and a 512-pixel camera with S14521-6050AN SiPMs for the EUSO-SPB2 Cherenkov Telescope. The front-end electronics are base… ▽ More We developed a modular silicon photomultiplier camera to detect Earth-skimming PeV to EeV tau neutrinos with the imaging atmospheric Cherenkov technique. We built two cameras, a 256-pixel camera with S14161-6050HS SiPMs for the Trinity Demonstrator located on Frisco Peak, Utah, and a 512-pixel camera with S14521-6050AN SiPMs for the EUSO-SPB2 Cherenkov Telescope. The front-end electronics are based on the eMUSIC ASIC, and the camera signals are sampled and digitized with the 100MS/s and 12-bit AGET system. Both cameras are liquid-cooled. We detail the camera concept and the results from characterizing the SiPMs, bench testing, and calibrating the two cameras. △ Less

Submitted 12 June, 2024; originally announced June 2024.

Comments: Submitted to Nuclear Instruments and Methods in Physics Research A

arXiv:2406.07601 [pdf, other]

IceCube Search for Neutrino Emission from X-ray Bright Seyfert Galaxies

Authors: R. Abbasi, M. Ackermann, J. Adams, S. K. Agarwalla, J. A. Aguilar, M. Ahlers, J. M. Alameddine, N. M. Amin, K. Andeen, C. Argüelles, Y. Ashida, S. Athanasiadou, L. Ausborm, S. N. Axani, X. Bai, A. Balagopal V., M. Baricevic, S. W. Barwick, S. Bash, V. Basu, R. Bay, J. J. Beatty, J. Becker Tjus, J. Beise, C. Bellenghi , et al. (400 additional authors not shown)

Abstract: The recent IceCube detection of TeV neutrino emission from the nearby active galaxy NGC 1068 suggests that active galactic nuclei (AGN) could make a sizable contribution to the diffuse flux of astrophysical neutrinos. The absence of TeV $γがんま$-rays from NGC 1068 indicates neutrino production in the vicinity of the supermassive black hole, where the high radiation density leads to $γがんま$-ray attenuation.… ▽ More The recent IceCube detection of TeV neutrino emission from the nearby active galaxy NGC 1068 suggests that active galactic nuclei (AGN) could make a sizable contribution to the diffuse flux of astrophysical neutrinos. The absence of TeV $γがんま$-rays from NGC 1068 indicates neutrino production in the vicinity of the supermassive black hole, where the high radiation density leads to $γがんま$-ray attenuation. Therefore, any potential neutrino emission from similar sources is not expected to correlate with high-energy $γがんま$-rays. Disk-corona models predict neutrino emission from Seyfert galaxies to correlate with keV X-rays, as they are tracers of coronal activity. Using through-going track events from the Northern Sky recorded by IceCube between 2011 and 2021, we report results from a search for individual and aggregated neutrino signals from 27 additional Seyfert galaxies that are contained in the BAT AGN Spectroscopic Survey (BASS). Besides the generic single power-law, we evaluate the spectra predicted by the disk-corona model. Assuming all sources to be intrinsically similar to NGC 1068, our findings constrain the collective neutrino emission from X-ray bright Seyfert galaxies in the Northern Hemisphere, but, at the same time, show excesses of neutrinos that could be associated with the objects NGC 4151 and CGCG 420-015. These excesses result in a 2.7$σしぐま$ significance with respect to background expectations. △ Less

Submitted 11 June, 2024; originally announced June 2024.

Comments: 17 pages, 9 figures

arXiv:2406.06745 [pdf, other]

Universal properties of the evolution of the Universe in modified loop quantum cosmology

Authors: Jamal Saeed, Rui Pan, Christian Brown, Gerald Clevear, Anzhong Wang

Abstract: In this paper, we systematically study the evolution of the Universe in the framework of a modified loop quantum cosmological model (mLQC-I) with various inflationary potentials, including chaotic, Starobinsky, generalized Starobinsky, polynomials of the first and second kinds, generalized T- models and natural inflation. In all these models, the big bang singularity is represented by a quantum bo… ▽ More In this paper, we systematically study the evolution of the Universe in the framework of a modified loop quantum cosmological model (mLQC-I) with various inflationary potentials, including chaotic, Starobinsky, generalized Starobinsky, polynomials of the first and second kinds, generalized T- models and natural inflation. In all these models, the big bang singularity is represented by a quantum bounce, and the evolution of the Universe both before and after the bounce is universal and weakly depends on the inflationary potentials, as long as the evolution is dominated by the kinetic energy of the inflaton at the bounce. In particular, the evolution in the pre-bounce region can be universally divided into three different phases: pre-bouncing, pre-transition, and pre-de Sitter. The pre-bouncing phase occurs immediately before the quantum bounce, during which the evolution of the Universe is dominated by the kinetic energy of the inflaton. Thus, the equation of state of the inflaton is about one, w = 1. Soon, the inflation potential takes over, so w rapidly falls from one to negative one. This pre-transition phase is very short and quickly turns into the pre-de Sitter phase, whereby the effective cosmological constant with a Planck size takes over and dominates the rest of the contracting phase. In the entire pre-bounce regime, the evolution of the expansion factor and the inflaton can be approximated by analytical solutions, which are universal and independent of the inflation potentials. △ Less

Submitted 10 June, 2024; originally announced June 2024.

Comments: 28 pages, 32 Figures

arXiv:2406.06684 [pdf, other]

Search for neutrino emission from hard X-ray AGN with IceCube

Authors: R. Abbasi, M. Ackermann, J. Adams, S. K. Agarwalla, J. A. Aguilar, M. Ahlers, J. M. Alameddine, N. M. Amin, K. Andeen, C. Argüelles, Y. Ashida, S. Athanasiadou, L. Ausborm, S. N. Axani, X. Bai, A. Balagopal V., M. Baricevic, S. W. Barwick, S. Bash, V. Basu, R. Bay, J. J. Beatty, J. Becker Tjus, J. Beise, C. Bellenghi , et al. (401 additional authors not shown)

Abstract: Active Galactic Nuclei (AGN) are promising candidate sources of high-energy astrophysical neutrinos since they provide environments rich in matter and photon targets where cosmic ray interactions may lead to the production of gamma rays and neutrinos. We searched for high-energy neutrino emission from AGN using the $\textit{Swift}$-BAT Spectroscopic Survey (BASS) catalog of hard X-ray sources and… ▽ More Active Galactic Nuclei (AGN) are promising candidate sources of high-energy astrophysical neutrinos since they provide environments rich in matter and photon targets where cosmic ray interactions may lead to the production of gamma rays and neutrinos. We searched for high-energy neutrino emission from AGN using the $\textit{Swift}$-BAT Spectroscopic Survey (BASS) catalog of hard X-ray sources and 12 years of IceCube muon track data. First, upon performing a stacked search, no significant emission was found. Second, we searched for neutrinos from a list of 43 candidate sources and found an excess from the direction of two sources, Seyfert galaxies NGC 1068 and NGC 4151. We observed NGC 1068 at flux $φふぁい_{νにゅー_μみゅー+\barνにゅー_μみゅー}$ = $4.02_{-1.52}^{+1.58} \times 10^{-11}$ TeV$^{-1}$ cm$^{-2}$ s$^{-1}$ normalized at 1 TeV, with power-law spectral index, $γがんま$ = 3.10$^{+0.26}_{-0.22}$, consistent with previous IceCube results. The observation of a neutrino excess from the direction of NGC 4151 is at a post-trial significance of 2.9$σしぐま$. If interpreted as an astrophysical signal, the excess observed from NGC 4151 corresponds to a flux $φふぁい_{νにゅー_μみゅー+\barνにゅー_μみゅー}$ = $1.51_{-0.81}^{+0.99} \times 10^{-11}$ TeV$^{-1}$ cm$^{-2}$ s$^{-1}$ normalized at 1 TeV and $γがんま$ = 2.83$^{+0.35}_{-0.28}$. △ Less

Submitted 12 June, 2024; v1 submitted 10 June, 2024; originally announced June 2024.

arXiv:2406.03035 [pdf, other]

Follow-Your-Pose v2: Multiple-Condition Guided Character Image Animation for Stable Pose Control

Authors: Jingyun Xue, Hongfa Wang, Qi Tian, Yue Ma, Andong Wang, Zhiyuan Zhao, Shaobo Min, Wenzhe Zhao, Kaihao Zhang, Heung-Yeung Shum, Wei Liu, Mengyang Liu, Wenhan Luo

Abstract: Pose-controllable character video generation is in high demand with extensive applications for fields such as automatic advertising and content creation on social media platforms. While existing character image animation methods using pose sequences and reference images have shown promising performance, they tend to struggle with incoherent animation in complex scenarios, such as multiple characte… ▽ More Pose-controllable character video generation is in high demand with extensive applications for fields such as automatic advertising and content creation on social media platforms. While existing character image animation methods using pose sequences and reference images have shown promising performance, they tend to struggle with incoherent animation in complex scenarios, such as multiple character animation and body occlusion. Additionally, current methods request large-scale high-quality videos with stable backgrounds and temporal consistency as training datasets, otherwise, their performance will greatly deteriorate. These two issues hinder the practical utilization of character image animation tools. In this paper, we propose a practical and robust framework Follow-Your-Pose v2, which can be trained on noisy open-sourced videos readily available on the internet. Multi-condition guiders are designed to address the challenges of background stability, body occlusion in multi-character generation, and consistency of character appearance. Moreover, to fill the gap of fair evaluation of multi-character pose animation, we propose a new benchmark comprising approximately 4,000 frames. Extensive experiments demonstrate that our approach outperforms state-of-the-art methods by a margin of over 35% across 2 datasets and on 7 metrics. Meanwhile, qualitative assessments reveal a significant improvement in the quality of generated video, particularly in scenarios involving complex backgrounds and body occlusion of multi-character, suggesting the superiority of our approach. △ Less

Submitted 12 June, 2024; v1 submitted 5 June, 2024; originally announced June 2024.

arXiv:2406.02547 [pdf, ps, other]

Leveraging Visual Tokens for Extended Text Contexts in Multi-Modal Learning

Authors: Alex Jinpeng Wang, Linjie Li, Yiqi Lin, Min Li, Lijuan Wang, Mike Zheng Shou

Abstract: Training models with longer in-context lengths is a significant challenge for multimodal model due to substantial GPU memory and computational costs. This exploratory study does not present state-of-the-art models; rather, it introduces an innovative method designed to increase in-context text length in multi-modality large language models (MLLMs) efficiently. We present Visualized In-Context Text… ▽ More Training models with longer in-context lengths is a significant challenge for multimodal model due to substantial GPU memory and computational costs. This exploratory study does not present state-of-the-art models; rather, it introduces an innovative method designed to increase in-context text length in multi-modality large language models (MLLMs) efficiently. We present Visualized In-Context Text Processing (VisInContext), which processes long in-context text using visual tokens. This technique significantly reduces GPU memory usage and floating point operations (FLOPs) for both training and inferenceing stage. For instance, our method expands the pre-training in-context text length from 256 to 2048 tokens with nearly same FLOPs for a 56 billion parameter MOE model. Experimental results demonstrate that model trained with VisInContext delivers superior performance on common downstream benchmarks for in-context few-shot evaluation. Additionally, VisInContext is complementary to existing methods for increasing in-context text length and enhances document understanding capabilities, showing great potential in document QA tasks and sequential document retrieval. △ Less

Submitted 4 June, 2024; originally announced June 2024.

Comments: 12 pages. The website is \url{https://fingerrec.github.io/visincontext}

arXiv:2406.02441 [pdf, other]

Probing the Scalar WIMP-Pion Coupling with the first LUX-ZEPLIN data

Authors: J. Aalbers, D. S. Akerib, A. K. Al Musalhi, F. Alder, C. S. Amarasinghe, A. Ames, T. J. Anderson, N. Angelides, H. M. Araújo, J. E. Armstrong, M. Arthurs, A. Baker, S. Balashov, J. Bang, E. E. Barillier, J. W. Bargemann, K. Beattie, T. Benson, A. Bhatti, A. Biekert, T. P. Biesiadzinski, H. J. Birch, E. J. Bishop, G. M. Blockinger, B. Boxer , et al. (178 additional authors not shown)

Abstract: Weakly interacting massive particles (WIMPs) may interact with a virtual pion that is exchanged between nucleons. This interaction channel is important to consider in models where the spin-independent isoscalar channel is suppressed. Using data from the first science run of the LUX-ZEPLIN dark matter experiment, containing 60 live days of data in a 5.5~tonne fiducial mass of liquid xenon, we repor… ▽ More Weakly interacting massive particles (WIMPs) may interact with a virtual pion that is exchanged between nucleons. This interaction channel is important to consider in models where the spin-independent isoscalar channel is suppressed. Using data from the first science run of the LUX-ZEPLIN dark matter experiment, containing 60 live days of data in a 5.5~tonne fiducial mass of liquid xenon, we report the results on a search for WIMP-pion interactions. We observe no significant excess and set an upper limit of $1.5\times10^{-46}$~cm$^2$ at a 90\% confidence level for a WIMP mass of 33~GeV/c$^2$ for this interaction. △ Less

Submitted 4 June, 2024; originally announced June 2024.

arXiv:2406.01908 [pdf, other]

PDHG-Unrolled Learning-to-Optimize Method for Large-Scale Linear Programming

Authors: Bingheng Li, Linxin Yang, Yupeng Chen, Senmiao Wang, Qian Chen, Haitao Mao, Yao Ma, Akang Wang, Tian Ding, Jiliang Tang, Ruoyu Sun

Abstract: Solving large-scale linear programming (LP) problems is an important task in various areas such as communication networks, power systems, finance and logistics. Recently, two distinct approaches have emerged to expedite LP solving: (i) First-order methods (FOMs); (ii) Learning to optimize (L2O). In this work, we propose an FOM-unrolled neural network (NN) called PDHG-Net, and propose a two-stage L… ▽ More Solving large-scale linear programming (LP) problems is an important task in various areas such as communication networks, power systems, finance and logistics. Recently, two distinct approaches have emerged to expedite LP solving: (i) First-order methods (FOMs); (ii) Learning to optimize (L2O). In this work, we propose an FOM-unrolled neural network (NN) called PDHG-Net, and propose a two-stage L2O method to solve large-scale LP problems. The new architecture PDHG-Net is designed by unrolling the recently emerged PDHG method into a neural network, combined with channel-expansion techniques borrowed from graph neural networks. We prove that the proposed PDHG-Net can recover PDHG algorithm, thus can approximate optimal solutions of LP instances with a polynomial number of neurons. We propose a two-stage inference approach: first use PDHG-Net to generate an approximate solution, and then apply PDHG algorithm to further improve the solution. Experiments show that our approach can significantly accelerate LP solving, achieving up to a 3$\times$ speedup compared to FOMs for large-scale LP problems. △ Less

Submitted 6 June, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

Comments: Accepted by ICML 2024

arXiv:2406.01635 [pdf, other]

SwdFold:A Reweighting and Unfolding method based on Optimal Transport Theory

Authors: Chu-Cheng Pan, Xiang Dong, Yu-Chang Sun, Ao-Yan Cheng, Ao-Bo Wang, Yu-Xuan Hu, Hao Cai

Abstract: High-energy physics experiments rely heavily on precise measurements of energy and momentum, yet face significant challenges due to detector limitations, calibration errors, and the intrinsic nature of particle interactions. Traditional unfolding techniques have been employed to correct for these distortions, yet they often suffer from model dependency and stability issues. We present a novel meth… ▽ More High-energy physics experiments rely heavily on precise measurements of energy and momentum, yet face significant challenges due to detector limitations, calibration errors, and the intrinsic nature of particle interactions. Traditional unfolding techniques have been employed to correct for these distortions, yet they often suffer from model dependency and stability issues. We present a novel method, SwdFold, which utilizes the principles of optimal transport to provide a robust, model-independent framework to estimate the probability density ratio for data unfolding. It not only unfold the toy experimental event by reweighted simulated data distributions closely with true distributions but also maintains the integrity of physical features across various observables. We can expect it can enable more reliable predictions and comprehensive analyses as a high precision reweighting and unfolding tool in high-energy physics. △ Less

Submitted 2 June, 2024; originally announced June 2024.

arXiv:2406.00959 [pdf, other]

Ta2Pd3Te5 topological thermometer

Authors: Yupeng Li, Anqi Wang, Senyang Pan, Dayu Yan, Guang Yang, Xingchen Guo, Yu Hong, Guangtong Liu, Fanming Qu, Zhijun Wang, Tian Qian, Jinglei Zhang, Youguo Shi, Li Lu, Jie Shen

Abstract: In recent decades, there has been a persistent pursuit of applications for surface/edge states in topological systems, driven by their dissipationless transport effects. However, there have been limited tangible breakthroughs in this field. This work demonstrates the remarkable properties of the topological insulator Ta2Pd3Te5, as a thermometer. This material exhibits a power-law correlation in te… ▽ More In recent decades, there has been a persistent pursuit of applications for surface/edge states in topological systems, driven by their dissipationless transport effects. However, there have been limited tangible breakthroughs in this field. This work demonstrates the remarkable properties of the topological insulator Ta2Pd3Te5, as a thermometer. This material exhibits a power-law correlation in temperature-dependent resistance at low temperatures, stemming from its Luttinger liquid behavior of edge states, while exhibiting semiconductor behavior at high temperatures. The power-law behavior effectively addresses the issue of infinite resistance in semiconductor thermometers at ultra-low temperatures, thereby playing a crucial role in enabling efficient thermometry in refrigerators supporting millikelvin temperatures or below. By employing chemical doping, adjusting thickness, and controlling gate voltage, its power-law behavior and semiconductor behavior can be effectively modulated. This enables efficient thermometry spanning from millikelvin temperatures to room temperature, and allows for precise local temperature measurement. Furthermore, this thermometer exhibits excellent temperature sensitivity and resolution, and can be fine-tuned to show small magnetoresistance. In summary, the Ta2Pd3Te5 thermometer, also referred to as a topological thermometer, exhibits outstanding performance and significant potential for measuring a wider range of temperatures compared to conventional low-temperature thermometers. △ Less

Submitted 2 June, 2024; originally announced June 2024.

Comments: 15 pages, 9 figures

arXiv:2406.00905 [pdf, other]

Exploration of mass splitting and muon/tau mixing parameters for an eV-scale sterile neutrino with IceCube

Authors: R. Abbasi, M. Ackermann, J. Adams, S. K. Agarwalla, J. A. Aguilar, M. Ahlers, J. M. Alameddine, N. M. Amin, K. Andeen, C. Argüelles, Y. Ashida, S. Athanasiadou, L. Ausborm, S. N. Axani, X. Bai, A. Balagopal V., M. Baricevic, S. W. Barwick, S. Bash, V. Basu, R. Bay, J. J. Beatty, J. Becker Tjus, J. Beise, C. Bellenghi , et al. (400 additional authors not shown)

Abstract: We present the first three-parameter fit to a 3+1 sterile neutrino model using 7.634 years of data from the IceCube Neutrino Observatory on $νにゅー_μみゅー+\overlineνにゅー_μみゅー$ charged-current interactions in the energy range 500-9976 GeV. Our analysis is sensitive to the mass-squared splitting between the heaviest and lightest mass state ($Δでるたm_{41}^2$), the mixing matrix element connecting muon flavor to the fourth… ▽ More We present the first three-parameter fit to a 3+1 sterile neutrino model using 7.634 years of data from the IceCube Neutrino Observatory on $νにゅー_μみゅー+\overlineνにゅー_μみゅー$ charged-current interactions in the energy range 500-9976 GeV. Our analysis is sensitive to the mass-squared splitting between the heaviest and lightest mass state ($Δでるたm_{41}^2$), the mixing matrix element connecting muon flavor to the fourth mass state ($|U_{\mu4}|^2$), and the element connecting tau flavor to the fourth mass state ($|U_{\tau4}|^2$). Predicted propagation effects in matter enhance the signature through a resonance as atmospheric neutrinos from the Northern Hemisphere traverse the Earth to the IceCube detector at the South Pole. The result is consistent with the no-sterile neutrino hypothesis with a probability of 4.3 %. Profiling the likelihood of each parameter yields the 90 % confidence levels: $ 2.4\,\mathrm{eV}^{2} < Δでるたm_{41}^2 <9.6\,\mathrm{eV}^{2} $ , $0.0081 < |U_{\mu4}|^2 < 0.10$ , and $|U_{\tau4}|^2< 0.035$, which narrows the allowed parameter-space for $|U_{\tau4}|^2$. However, the primary result of this analysis is the first map of the 3+1 parameter space exploring the interdependence of $Δでるたm_{41}^2$, $|U_{\mu4}|^2$, and $|U_{\tau4}|^2$. △ Less

Submitted 2 June, 2024; originally announced June 2024.

arXiv:2406.00622 [pdf, other]

Compositional 4D Dynamic Scenes Understanding with Physics Priors for Video Question Answering

Authors: Xingrui Wang, Wufei Ma, Angtian Wang, Shuo Chen, Adam Kortylewski, Alan Yuille

Abstract: For vision-language models (VLMs), understanding the dynamic properties of objects and their interactions within 3D scenes from video is crucial for effective reasoning. In this work, we introduce a video question answering dataset SuperCLEVR-Physics that focuses on the dynamics properties of objects. We concentrate on physical concepts -- velocity, acceleration, and collisions within 4D scenes, w… ▽ More For vision-language models (VLMs), understanding the dynamic properties of objects and their interactions within 3D scenes from video is crucial for effective reasoning. In this work, we introduce a video question answering dataset SuperCLEVR-Physics that focuses on the dynamics properties of objects. We concentrate on physical concepts -- velocity, acceleration, and collisions within 4D scenes, where the model needs to fully understand these dynamics properties and answer the questions built on top of them. From the evaluation of a variety of current VLMs, we find that these models struggle with understanding these dynamic properties due to the lack of explicit knowledge about the spatial structure in 3D and world dynamics in time variants. To demonstrate the importance of an explicit 4D dynamics representation of the scenes in understanding world dynamics, we further propose NS-4Dynamics, a Neural-Symbolic model for reasoning on 4D Dynamics properties under explicit scene representation from videos. Using scene rendering likelihood combining physical prior distribution, the 4D scene parser can estimate the dynamics properties of objects over time to and interpret the observation into 4D scene representation as world states. By further incorporating neural-symbolic reasoning, our approach enables advanced applications in future prediction, factual reasoning, and counterfactual reasoning. Our experiments show that our NS-4Dynamics suppresses previous VLMs in understanding the dynamics properties and answering questions about factual queries, future prediction, and counterfactual reasoning. Moreover, based on the explicit 4D scene representation, our model is effective in reconstructing the 4D scenes and re-simulate the future or counterfactual events. △ Less

Submitted 2 June, 2024; originally announced June 2024.

arXiv:2406.00061 [pdf, other]

STAT: Shrinking Transformers After Training

Authors: Megan Flynn, Alexander Wang, Dean Edward Alvarez, Christopher De Sa, Anil Damle

Abstract: We present STAT: a simple algorithm to prune transformer models without any fine-tuning. STAT eliminates both attention heads and neurons from the network, while preserving accuracy by calculating a correction to the weights of the next layer. Each layer block in the network is compressed using a series of principled matrix factorizations that preserve the network structure. Our entire algorithm t… ▽ More We present STAT: a simple algorithm to prune transformer models without any fine-tuning. STAT eliminates both attention heads and neurons from the network, while preserving accuracy by calculating a correction to the weights of the next layer. Each layer block in the network is compressed using a series of principled matrix factorizations that preserve the network structure. Our entire algorithm takes minutes to compress BERT, and less than three hours to compress models with 7B parameters using a single GPU. Using only several hundred data examples, STAT preserves the output of the network and improves upon existing gradient-free pruning methods. It is even competitive with methods that include significant fine-tuning. We demonstrate our method on both encoder and decoder architectures, including BERT, DistilBERT, and Llama-2 using benchmarks such as GLUE, Squad, WikiText2. △ Less

Submitted 29 May, 2024; originally announced June 2024.

arXiv:2405.20448 [pdf, other]

Knockout: A simple way to handle missing inputs

Authors: Minh Nguyen, Batuhan K. Karaman, Heejong Kim, Alan Q. Wang, Fengbei Liu, Mert R. Sabuncu

Abstract: Deep learning models can extract predictive and actionable information from complex inputs. The richer the inputs, the better these models usually perform. However, models that leverage rich inputs (e.g., multi-modality) can be difficult to deploy widely, because some inputs may be missing at inference. Current popular solutions to this problem include marginalization, imputation, and training mul… ▽ More Deep learning models can extract predictive and actionable information from complex inputs. The richer the inputs, the better these models usually perform. However, models that leverage rich inputs (e.g., multi-modality) can be difficult to deploy widely, because some inputs may be missing at inference. Current popular solutions to this problem include marginalization, imputation, and training multiple models. Marginalization can obtain calibrated predictions but it is computationally costly and therefore only feasible for low dimensional inputs. Imputation may result in inaccurate predictions because it employs point estimates for missing variables and does not work well for high dimensional inputs (e.g., images). Training multiple models whereby each model takes different subsets of inputs can work well but requires knowing missing input patterns in advance. Furthermore, training and retaining multiple models can be costly. We propose an efficient way to learn both the conditional distribution using full inputs and the marginal distributions. Our method, Knockout, randomly replaces input features with appropriate placeholder values during training. We provide a theoretical justification of Knockout and show that it can be viewed as an implicit marginalization strategy. We evaluate Knockout in a wide range of simulations and real-world datasets and show that it can offer strong empirical performance. △ Less

Submitted 3 June, 2024; v1 submitted 30 May, 2024; originally announced May 2024.

arXiv:2405.20334 [pdf, other]

VividDream: Generating 3D Scene with Ambient Dynamics

Authors: Yao-Chih Lee, Yi-Ting Chen, Andrew Wang, Ting-Hsuan Liao, Brandon Y. Feng, Jia-Bin Huang

Abstract: We introduce VividDream, a method for generating explorable 4D scenes with ambient dynamics from a single input image or text prompt. VividDream first expands an input image into a static 3D point cloud through iterative inpainting and geometry merging. An ensemble of animated videos is then generated using video diffusion models with quality refinement techniques and conditioned on renderings of… ▽ More We introduce VividDream, a method for generating explorable 4D scenes with ambient dynamics from a single input image or text prompt. VividDream first expands an input image into a static 3D point cloud through iterative inpainting and geometry merging. An ensemble of animated videos is then generated using video diffusion models with quality refinement techniques and conditioned on renderings of the static 3D scene from the sampled camera trajectories. We then optimize a canonical 4D scene representation using an animated video ensemble, with per-video motion embeddings and visibility masks to mitigate inconsistencies. The resulting 4D scene enables free-view exploration of a 3D scene with plausible ambient scene dynamics. Experiments demonstrate that VividDream can provide human viewers with compelling 4D experiences generated based on diverse real images and text prompts. △ Less

Submitted 30 May, 2024; originally announced May 2024.

Comments: Project page: https://vivid-dream-4d.github.io

arXiv:2405.20243 [pdf, other]

MANTA: A Negative-Triangularity NASEM-Compliant Fusion Pilot Plant

Authors: MANTA Collaboration, G. Rutherford, H. S. Wilson, A. Saltzman, D. Arnold, J. L. Ball, S. Benjamin, R. Bielajew, N. de Boucaud, M. Calvo-Carrera, R. Chandra, H. Choudhury, C. Cummings, L. Corsaro, N. DaSilva, R. Diab, A. R. Devitre, S. Ferry, S. J. Frank, C. J. Hansen, J. Jerkins, J. D. Johnson, P. Lunia, J. van de Lindt, S. Mackie , et al. (16 additional authors not shown)

Abstract: The MANTA (Modular Adjustable Negative Triangularity ARC-class) design study investigated how negative-triangularity (NT) may be leveraged in a compact, fusion pilot plant (FPP) to take a ``power-handling first" approach. The result is a pulsed, radiative, ELM-free tokamak that satisfies and exceeds the FPP requirements described in the 2021 National Academies of Sciences, Engineering, and Medicin… ▽ More The MANTA (Modular Adjustable Negative Triangularity ARC-class) design study investigated how negative-triangularity (NT) may be leveraged in a compact, fusion pilot plant (FPP) to take a ``power-handling first" approach. The result is a pulsed, radiative, ELM-free tokamak that satisfies and exceeds the FPP requirements described in the 2021 National Academies of Sciences, Engineering, and Medicine report ``Bringing Fusion to the U.S. Grid". A self-consistent integrated modeling workflow predicts a fusion power of 450 MW and a plasma gain of 11.5 with only 23.5 MW of power to the scrape-off layer (SOL). This low $P_\text{SOL}$ together with impurity seeding and high density at the separatrix results in a peak heat flux of just 2.8 MW/m$^{2}$. MANTA's high aspect ratio provides space for a large central solenoid (CS), resulting in ${\sim}$15 minute inductive pulses. In spite of the high B fields on the CS and the other REBCO-based magnets, the electromagnetic stresses remain below structural and critical current density limits. Iterative optimization of neutron shielding and tritium breeding blanket yield tritium self-sufficiency with a breeding ratio of 1.15, a blanket power multiplication factor of 1.11, toroidal field coil lifetimes of $3100 \pm 400$ MW-yr, and poloidal field coil lifetimes of at least $890 \pm 40$ MW-yr. Following balance of plant modeling, MANTA is projected to generate 90 MW of net electricity at an electricity gain factor of ${\sim}2.4$. Systems-level economic analysis estimates an overnight cost of US\$3.4 billion, meeting the NASEM FPP requirement that this first-of-a-kind be less than US\$5 billion. The toroidal field coil cost and replacement time are the most critical upfront and lifetime cost drivers, respectively. △ Less

Submitted 30 May, 2024; originally announced May 2024.

arXiv:2405.17537 [pdf, other]

BIOSCAN-CLIP: Bridging Vision and Genomics for Biodiversity Monitoring at Scale

Authors: ZeMing Gong, Austin T. Wang, Joakim Bruslund Haurum, Scott C. Lowe, Graham W. Taylor, Angel X. Chang

Abstract: Measuring biodiversity is crucial for understanding ecosystem health. While prior works have developed machine learning models for the taxonomic classification of photographic images and DNA separately, in this work, we introduce a multimodal approach combining both, using CLIP-style contrastive learning to align images, DNA barcodes, and textual data in a unified embedding space. This allows for… ▽ More Measuring biodiversity is crucial for understanding ecosystem health. While prior works have developed machine learning models for the taxonomic classification of photographic images and DNA separately, in this work, we introduce a multimodal approach combining both, using CLIP-style contrastive learning to align images, DNA barcodes, and textual data in a unified embedding space. This allows for accurate classification of both known and unknown insect species without task-specific fine-tuning, leveraging contrastive learning for the first time to fuse DNA and image data. Our method surpasses previous single-modality approaches in accuracy by over 11% on zero-shot learning tasks, showcasing its effectiveness in biodiversity studies. △ Less

Submitted 27 May, 2024; originally announced May 2024.

Comments: 16 pages with 9 figures

arXiv:2405.17248 [pdf, other]

Transformer In-Context Learning for Categorical Data

Authors: Aaron T. Wang, Ricardo Henao, Lawrence Carin

Abstract: Recent research has sought to understand Transformers through the lens of in-context learning with functional data. We extend that line of work with the goal of moving closer to language models, considering categorical outcomes, nonlinear underlying models, and nonlinear attention. The contextual data are of the form $\textsf{C}=(x_1,c_1,\dots,x_N,c_{N})$ where each $c_i\in\{0,\dots,C-1\}$ is draw… ▽ More Recent research has sought to understand Transformers through the lens of in-context learning with functional data. We extend that line of work with the goal of moving closer to language models, considering categorical outcomes, nonlinear underlying models, and nonlinear attention. The contextual data are of the form $\textsf{C}=(x_1,c_1,\dots,x_N,c_{N})$ where each $c_i\in\{0,\dots,C-1\}$ is drawn from a categorical distribution that depends on covariates $x_i\in\mathbb{R}^d$. Contextual outcomes in the $m$th set of contextual data, $\textsf{C}_m$, are modeled in terms of latent function $f_m(x)\in\textsf{F}$, where $\textsf{F}$ is a functional class with $(C-1)$-dimensional vector output. The probability of observing class $c\in\{0,\dots,C-1\}$ is modeled in terms of the output components of $f_m(x)$ via the softmax. The Transformer parameters may be trained with $M$ contextual examples, $\{\textsf{C}_m\}_{m=1,M}$, and the trained model is then applied to new contextual data $\textsf{C}_{M+1}$ for new $f_{M+1}(x)\in\textsf{F}$. The goal is for the Transformer to constitute the probability of each category $c\in\{0,\dots,C-1\}$ for a new query $x_{N_{M+1}+1}$. We assume each component of $f_m(x)$ resides in a reproducing kernel Hilbert space (RKHS), specifying $\textsf{F}$. Analysis and an extensive set of experiments suggest that on its forward pass the Transformer (with attention defined by the RKHS kernel) implements a form of gradient descent of the underlying function, connected to the latent vector function associated with the softmax. We present what is believed to be the first real-world demonstration of this few-shot-learning methodology, using the ImageNet dataset. △ Less

Submitted 27 May, 2024; originally announced May 2024.

arXiv:2405.14782 [pdf, other]

Lessons from the Trenches on Reproducible Evaluation of Language Models

Authors: Stella Biderman, Hailey Schoelkopf, Lintang Sutawika, Leo Gao, Jonathan Tow, Baber Abbasi, Alham Fikri Aji, Pawan Sasanka Ammanamanchi, Sidney Black, Jordan Clive, Anthony DiPofi, Julen Etxaniz, Benjamin Fattori, Jessica Zosa Forde, Charles Foster, Jeffrey Hsu, Mimansa Jaiswal, Wilson Y. Lee, Haonan Li, Charles Lovering, Niklas Muennighoff, Ellie Pavlick, Jason Phang, Aviya Skowron, Samson Tan , et al. (5 additional authors not shown)

Abstract: Effective evaluation of language models remains an open challenge in NLP. Researchers and engineers face methodological issues such as the sensitivity of models to evaluation setup, difficulty of proper comparisons across methods, and the lack of reproducibility and transparency. In this paper we draw on three years of experience in evaluating large language models to provide guidance and lessons… ▽ More Effective evaluation of language models remains an open challenge in NLP. Researchers and engineers face methodological issues such as the sensitivity of models to evaluation setup, difficulty of proper comparisons across methods, and the lack of reproducibility and transparency. In this paper we draw on three years of experience in evaluating large language models to provide guidance and lessons for researchers. First, we provide an overview of common challenges faced in language model evaluation. Second, we delineate best practices for addressing or lessening the impact of these challenges on research. Third, we present the Language Model Evaluation Harness (lm-eval): an open source library for independent, reproducible, and extensible evaluation of language models that seeks to address these issues. We describe the features of the library as well as case studies in which the library has been used to alleviate these methodological concerns. △ Less

Submitted 29 May, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

arXiv:2405.14732 [pdf, other]

The Data Acquisition System of the LZ Dark Matter Detector: FADR

Authors: J. Aalbers, D. S. Akerib, A. K. Al Musalhi, F. Alder, C. S. Amarasinghe, A. Ames, T. J. Anderson, N. Angelides, H. M. Araújo, J. E. Armstrong, M. Arthurs, A. Baker, S. Balashov, J. Bang, E. E. Barillier, J. W. Bargemann, K. Beattie, T. Benson, A. Bhatti, A. Biekert, T. P. Biesiadzinski, H. J. Birch, E. Bishop, G. M. Blockinger, B. Boxer , et al. (190 additional authors not shown)

Abstract: The Data Acquisition System (DAQ) for the LUX-ZEPLIN (LZ) dark matter detector is described. The signals from 745 PMTs, distributed across three subsystems, are sampled with 100-MHz 32-channel digitizers (DDC-32s). A basic waveform analysis is carried out on the on-board Field Programmable Gate Arrays (FPGAs) to extract information about the observed scintillation and electroluminescence signals.… ▽ More The Data Acquisition System (DAQ) for the LUX-ZEPLIN (LZ) dark matter detector is described. The signals from 745 PMTs, distributed across three subsystems, are sampled with 100-MHz 32-channel digitizers (DDC-32s). A basic waveform analysis is carried out on the on-board Field Programmable Gate Arrays (FPGAs) to extract information about the observed scintillation and electroluminescence signals. This information is used to determine if the digitized waveforms should be preserved for offline analysis. The system is designed around the Kintex-7 FPGA. In addition to digitizing the PMT signals and providing basic event selection in real time, the flexibility provided by the use of FPGAs allows us to monitor the performance of the detector and the DAQ in parallel to normal data acquisition. The hardware and software/firmware of this FPGA-based Architecture for Data acquisition and Realtime monitoring (FADR) are discussed and performance measurements are described. △ Less

Submitted 23 May, 2024; originally announced May 2024.

Comments: 18 pages, 24 figures

arXiv:2405.14458 [pdf, other]

YOLOv10: Real-Time End-to-End Object Detection

Authors: Ao Wang, Hui Chen, Lihao Liu, Kai Chen, Zijia Lin, Jungong Han, Guiguang Ding

Abstract: Over the past years, YOLOs have emerged as the predominant paradigm in the field of real-time object detection owing to their effective balance between computational cost and detection performance. Researchers have explored the architectural designs, optimization objectives, data augmentation strategies, and others for YOLOs, achieving notable progress. However, the reliance on the non-maximum sup… ▽ More Over the past years, YOLOs have emerged as the predominant paradigm in the field of real-time object detection owing to their effective balance between computational cost and detection performance. Researchers have explored the architectural designs, optimization objectives, data augmentation strategies, and others for YOLOs, achieving notable progress. However, the reliance on the non-maximum suppression (NMS) for post-processing hampers the end-to-end deployment of YOLOs and adversely impacts the inference latency. Besides, the design of various components in YOLOs lacks the comprehensive and thorough inspection, resulting in noticeable computational redundancy and limiting the model's capability. It renders the suboptimal efficiency, along with considerable potential for performance improvements. In this work, we aim to further advance the performance-efficiency boundary of YOLOs from both the post-processing and model architecture. To this end, we first present the consistent dual assignments for NMS-free training of YOLOs, which brings competitive performance and low inference latency simultaneously. Moreover, we introduce the holistic efficiency-accuracy driven model design strategy for YOLOs. We comprehensively optimize various components of YOLOs from both efficiency and accuracy perspectives, which greatly reduces the computational overhead and enhances the capability. The outcome of our effort is a new generation of YOLO series for real-time end-to-end object detection, dubbed YOLOv10. Extensive experiments show that YOLOv10 achieves state-of-the-art performance and efficiency across various model scales. For example, our YOLOv10-S is 1.8$\times$ faster than RT-DETR-R18 under the similar AP on COCO, meanwhile enjoying 2.8$\times$ smaller number of parameters and FLOPs. Compared with YOLOv9-C, YOLOv10-B has 46\% less latency and 25\% fewer parameters for the same performance. △ Less

Submitted 23 May, 2024; originally announced May 2024.

Comments: Code: https://github.com/THU-MIG/yolov10

arXiv:2405.14107 [pdf, other]

doi 10.1145/3643834.3661517

Towards Feature Engineering with Human and AI's Knowledge: Understanding Data Science Practitioners' Perceptions in Human&AI-Assisted Feature Engineering Design

Authors: Qian Zhu, Dakuo Wang, Shuai Ma, April Yi Wang, Zixin Chen, Udayan Khurana, Xiaojuan Ma

Abstract: As AI technology continues to advance, the importance of human-AI collaboration becomes increasingly evident, with numerous studies exploring its potential in various fields. One vital field is data science, including feature engineering (FE), where both human ingenuity and AI capabilities play pivotal roles. Despite the existence of AI-generated recommendations for FE, there remains a limited und… ▽ More As AI technology continues to advance, the importance of human-AI collaboration becomes increasingly evident, with numerous studies exploring its potential in various fields. One vital field is data science, including feature engineering (FE), where both human ingenuity and AI capabilities play pivotal roles. Despite the existence of AI-generated recommendations for FE, there remains a limited understanding of how to effectively integrate and utilize humans' and AI's knowledge. To address this gap, we design a readily-usable prototype, human\&AI-assisted FE in Jupyter notebooks. It harnesses the strengths of humans and AI to provide feature suggestions to users, seamlessly integrating these recommendations into practical workflows. Using the prototype as a research probe, we conducted an exploratory study to gain valuable insights into data science practitioners' perceptions, usage patterns, and their potential needs when presented with feature suggestions from both humans and AI. Through qualitative analysis, we discovered that the Creator of the feature (i.e., AI or human) significantly influences users' feature selection, and the semantic clarity of the suggested feature greatly impacts its adoption rate. Furthermore, our findings indicate that users perceive both differences and complementarity between features generated by humans and those generated by AI. Lastly, based on our study results, we derived a set of design recommendations for future human&AI FE design. Our findings show the collaborative potential between humans and AI in the field of FE. △ Less

Submitted 22 May, 2024; originally announced May 2024.

Comments: Computational Notebooks, Human-AI Collaboration, Feature Recommendation

arXiv:2405.14071 [pdf, ps, other]

Revisiting linear stability of black hole odd-parity perturbations in Einstein-Aether gravity

Authors: Shinji Mukohyama, Shinji Tsujikawa, Anzhong Wang

Abstract: In Einstein-Aether gravity, we revisit the issue of linear stabilities of black holes against odd-parity perturbations on a static and spherically symmetric background. In this theory, superluminal propagation is allowed and there is a preferred timelike direction along the unit Aether vector field. If we choose the usual spherically symmetric background coordinates with respect to the Killing tim… ▽ More In Einstein-Aether gravity, we revisit the issue of linear stabilities of black holes against odd-parity perturbations on a static and spherically symmetric background. In this theory, superluminal propagation is allowed and there is a preferred timelike direction along the unit Aether vector field. If we choose the usual spherically symmetric background coordinates with respect to the Killing time $t$ and the areal radius $r$, it may not be appropriate for unambiguously determining the black hole stability because the constant $t$ hypersurfaces are not necessarily always spacelike. Unlike past related works of black hole perturbations, we choose an Aether-orthogonal frame in which the timelike Aether field is orthogonal to spacelike hypersurfaces over the whole background spacetime. In the short wavelength limit, we show that no-ghost conditions as well as radial and angular propagation speeds coincide with those of vector and tensor perturbations on the Minkowski background. Thus, the odd-parity linear stability of black holes for large radial and angular momentum modes is solely determined by constant coefficients of the Aether derivative couplings. △ Less

Submitted 22 May, 2024; originally announced May 2024.

Comments: 14 pages, no figures

Report number: YITP-24-65, IPMU24-0024, WUCG-24-05

arXiv:2405.14019 [pdf, other]

BrainMorph: A Foundational Keypoint Model for Robust and Flexible Brain MRI Registration

Authors: Alan Q. Wang, Rachit Saluja, Heejong Kim, Xinzi He, Adrian Dalca, Mert R. Sabuncu

Abstract: We present a keypoint-based foundation model for general purpose brain MRI registration, based on the recently-proposed KeyMorph framework. Our model, called BrainMorph, serves as a tool that supports multi-modal, pairwise, and scalable groupwise registration. BrainMorph is trained on a massive dataset of over 100,000 3D volumes, skull-stripped and non-skull-stripped, from nearly 16,000 unique hea… ▽ More We present a keypoint-based foundation model for general purpose brain MRI registration, based on the recently-proposed KeyMorph framework. Our model, called BrainMorph, serves as a tool that supports multi-modal, pairwise, and scalable groupwise registration. BrainMorph is trained on a massive dataset of over 100,000 3D volumes, skull-stripped and non-skull-stripped, from nearly 16,000 unique healthy and diseased subjects. BrainMorph is robust to large misalignments, interpretable via interrogating automatically-extracted keypoints, and enables rapid and controllable generation of many plausible transformations with different alignment types and different degrees of nonlinearity at test-time. We demonstrate the superiority of BrainMorph in solving 3D rigid, affine, and nonlinear registration on a variety of multi-modal brain MRI scans of healthy and diseased subjects, in both the pairwise and groupwise setting. In particular, we show registration accuracy and speeds that surpass current state-of-the-art methods, especially in the context of large initial misalignments and large group settings. All code and models are available at https://github.com/alanqrwang/brainmorph. △ Less

Submitted 24 May, 2024; v1 submitted 22 May, 2024; originally announced May 2024.

Comments: arXiv admin note: text overlap with arXiv:2304.09941

arXiv:2405.11221 [pdf, other]

Real-time equilibrium reconstruction by neural network based on HL-3 tokamak

Authors: Guohui Zheng, Songfen Liu, Zongyu Yang, Rui Ma, Xinwen Gong, Ao Wang, Shuo Wang, Wulyu Zhong

Abstract: A neural network model, EFITNN, has been developed capable of real-time magnetic equilibrium reconstruction based on HL-3 tokamak magnetic measurement signals. The model processes inputs from 68 channels of magnetic measurement data gathered from 1159 HL-3 experimental discharges, including plasma current, loop voltage, and the poloidal magnetic fields measured by equilibrium probes. The outputs o… ▽ More A neural network model, EFITNN, has been developed capable of real-time magnetic equilibrium reconstruction based on HL-3 tokamak magnetic measurement signals. The model processes inputs from 68 channels of magnetic measurement data gathered from 1159 HL-3 experimental discharges, including plasma current, loop voltage, and the poloidal magnetic fields measured by equilibrium probes. The outputs of the model feature eight key plasma parameters, alongside high-resolution ($129\times129$) reconstructions of the toroidal current density $J_{\text P}$ and poloidal magnetic flux profiles $Ψぷさい_{rz}$. Moreover, the network's architecture employs a multi-task learning structure, which enables the sharing of weights and mutual correction among different outputs, and lead to increase the model's accuracy by up to 32%. The performance of EFITNN demonstrates remarkable consistency with the offline EFIT, achieving average $R^2 = 0.941, 0.997$ and $0.959$ for eight plasma parameters, $Ψぷさい_{rz}$ and $J_{\text P}$, respectively. The model's robust generalization capabilities are particularly evident in its successful predictions of quasi-snowflake (QSF) divertor configurations and its adept handling of data from shot numbers or plasma current intervals not previously encountered during training. Compared to numerical methods, EFITNN significantly enhances computational efficiency with average computation time ranging from 0.08ms to 0.45ms, indicating its potential utility in real-time isoflux control and plasma profile management. △ Less

Submitted 18 May, 2024; originally announced May 2024.

arXiv:2405.09713 [pdf, other]

SOK-Bench: A Situated Video Reasoning Benchmark with Aligned Open-World Knowledge

Authors: Andong Wang, Bo Wu, Sunli Chen, Zhenfang Chen, Haotian Guan, Wei-Ning Lee, Li Erran Li, Chuang Gan

Abstract: Learning commonsense reasoning from visual contexts and scenes in real-world is a crucial step toward advanced artificial intelligence. However, existing video reasoning benchmarks are still inadequate since they were mainly designed for factual or situated reasoning and rarely involve broader knowledge in the real world. Our work aims to delve deeper into reasoning evaluations, specifically withi… ▽ More Learning commonsense reasoning from visual contexts and scenes in real-world is a crucial step toward advanced artificial intelligence. However, existing video reasoning benchmarks are still inadequate since they were mainly designed for factual or situated reasoning and rarely involve broader knowledge in the real world. Our work aims to delve deeper into reasoning evaluations, specifically within dynamic, open-world, and structured context knowledge. We propose a new benchmark (SOK-Bench), consisting of 44K questions and 10K situations with instance-level annotations depicted in the videos. The reasoning process is required to understand and apply situated knowledge and general knowledge for problem-solving. To create such a dataset, we propose an automatic and scalable generation method to generate question-answer pairs, knowledge graphs, and rationales by instructing the combinations of LLMs and MLLMs. Concretely, we first extract observable situated entities, relations, and processes from videos for situated knowledge and then extend to open-world knowledge beyond the visible content. The task generation is facilitated through multiple dialogues as iterations and subsequently corrected and refined by our designed self-promptings and demonstrations. With a corpus of both explicit situated facts and implicit commonsense, we generate associated question-answer pairs and reasoning processes, finally followed by manual reviews for quality assurance. We evaluated recent mainstream large vision-language models on the benchmark and found several insightful conclusions. For more information, please refer to our benchmark at www.bobbywu.com/SOKBench. △ Less

Submitted 16 May, 2024; v1 submitted 15 May, 2024; originally announced May 2024.

Comments: CVPR

arXiv:2405.08680 [pdf, other]

Generalized uncertainty principle distorted quintessence dynamics

Authors: Gaurav Bhandari, S. D. Pathak, Manabendra Sharma, Anzhong Wang

Abstract: In this paper, we invoke a generalized uncertainty principle (GUP) in the symmetry-reduced cosmological Hamiltonian for a universe driven by a quintessence scalar field with potential. Our study focuses on semi-classical regime. In particular, we derive the GUP-distorted Friedmann, Raychaudhuri, and the Klein-Gordon equation. This is followed by a systematic analysis of the qualitative dynamics fo… ▽ More In this paper, we invoke a generalized uncertainty principle (GUP) in the symmetry-reduced cosmological Hamiltonian for a universe driven by a quintessence scalar field with potential. Our study focuses on semi-classical regime. In particular, we derive the GUP-distorted Friedmann, Raychaudhuri, and the Klein-Gordon equation. This is followed by a systematic analysis of the qualitative dynamics for the choice of potential $V(φふぁい)= V_0 \sinh^{-n}{(μみゅーφふぁい)}$. This involves constructing an autonomous dynamical system of equations by choosing appropriate dynamical variables, followed by a qualitative study using linear stability theory. Our analysis shows that incorporating GUP significantly changes the existing fixed points compared to the limiting case without quantum effects by switching off the GUP. △ Less

Submitted 14 May, 2024; originally announced May 2024.

Comments: 12. arXiv admin note: text overlap with arXiv:2404.09049

arXiv:2405.08672 [pdf, other]

EndoDAC: Efficient Adapting Foundation Model for Self-Supervised Depth Estimation from Any Endoscopic Camera

Authors: Beilei Cui, Mobarakol Islam, Long Bai, An Wang, Hongliang Ren

Abstract: Depth estimation plays a crucial role in various tasks within endoscopic surgery, including navigation, surface reconstruction, and augmented reality visualization. Despite the significant achievements of foundation models in vision tasks, including depth estimation, their direct application to the medical domain often results in suboptimal performance. This highlights the need for efficient adapt… ▽ More Depth estimation plays a crucial role in various tasks within endoscopic surgery, including navigation, surface reconstruction, and augmented reality visualization. Despite the significant achievements of foundation models in vision tasks, including depth estimation, their direct application to the medical domain often results in suboptimal performance. This highlights the need for efficient adaptation methods to adapt these models to endoscopic depth estimation. We propose Endoscopic Depth Any Camera (EndoDAC) which is an efficient self-supervised depth estimation framework that adapts foundation models to endoscopic scenes. Specifically, we develop the Dynamic Vector-Based Low-Rank Adaptation (DV-LoRA) and employ Convolutional Neck blocks to tailor the foundational model to the surgical domain, utilizing remarkably few trainable parameters. Given that camera information is not always accessible, we also introduce a self-supervised adaptation strategy that estimates camera intrinsics using the pose encoder. Our framework is capable of being trained solely on monocular surgical videos from any camera, ensuring minimal training costs. Experiments demonstrate that our approach obtains superior performance even with fewer training epochs and unaware of the ground truth camera intrinsics. Code is available at https://github.com/BeileiCui/EndoDAC. △ Less

Submitted 14 May, 2024; originally announced May 2024.

Comments: early accepted by MICCAI 2024

arXiv:2405.08575 [pdf, ps, other]

Complexity of codes for Ramsey positive sets

Authors: Allison Wang

Abstract: Sabok showed that the set of codes for $G_δでるた$ Ramsey positive subsets of $[ωおめが]^ωおめが$ is $\mathbfΣしぐま^1_2$-complete. We extend this result by providing sufficient conditions for the set of codes for $G_δでるた$ Ramsey positive subsets of an arbitrary topological Ramsey space to be $\mathbfΣしぐま^1_2$-complete. Sabok showed that the set of codes for $G_δでるた$ Ramsey positive subsets of $[ωおめが]^ωおめが$ is $\mathbfΣしぐま^1_2$-complete. We extend this result by providing sufficient conditions for the set of codes for $G_δでるた$ Ramsey positive subsets of an arbitrary topological Ramsey space to be $\mathbfΣしぐま^1_2$-complete. △ Less

Submitted 14 May, 2024; originally announced May 2024.

MSC Class: 03E15

arXiv:2405.08077 [pdf, other]

Methods and stability tests associated with the sterile neutrino search using improved high-energy $νにゅー_μみゅー$ event reconstruction in IceCube

Authors: IceCube Collaboration, R. Abbasi, M. Ackermann, J. Adams, S. K. Agarwalla, J. A. Aguilar, M. Ahlers, J. M. Alameddine, N. M. Amin, K. Andeen, C. Argüelles, Y. Ashida, S. Athanasiadou, L. Ausborm, S. N. Axani, X. Bai, A. Balagopal V., M. Baricevic, S. W. Barwick, S. Bash, V. Basu, R. Bay, J. J. Beatty, J. Becker Tjus, J. Beise , et al. (398 additional authors not shown)

Abstract: We provide supporting details for the search for a 3+1 sterile neutrino using data collected over eleven years at the IceCube Neutrino Observatory. The analysis uses atmospheric muon-flavored neutrinos from 0.5 to 100\, TeV that traverse the Earth to reach the IceCube detector, and finds a best-fit point at $\sin^2(2θしーた_{24}) = 0.16$ and $Δでるたm^{2}_{41} = 3.5$ eV$^2$ with a goodness-of-fit p-value of 1… ▽ More We provide supporting details for the search for a 3+1 sterile neutrino using data collected over eleven years at the IceCube Neutrino Observatory. The analysis uses atmospheric muon-flavored neutrinos from 0.5 to 100\, TeV that traverse the Earth to reach the IceCube detector, and finds a best-fit point at $\sin^2(2θしーた_{24}) = 0.16$ and $Δでるたm^{2}_{41} = 3.5$ eV$^2$ with a goodness-of-fit p-value of 12\% and consistency with the null hypothesis of no oscillations to sterile neutrinos with a p-value of 3.1\%. Several improvements were made over past analyses, which are reviewed in this article, including upgrades to the reconstruction and the study of sources of systematic uncertainty. We provide details of the fit quality and discuss stability tests that split the data for separate samples, comparing results. We find that the fits are consistent between split data sets. △ Less

Submitted 13 May, 2024; originally announced May 2024.

Comments: 18 pages, 17 figures, 2 tables. This long-form paper is a companion to the letter "A search for an eV-scale sterile neutrino using improved high-energy νにゅーμみゅー event reconstruction in IceCube."

arXiv:2405.08070 [pdf, other]

A search for an eV-scale sterile neutrino using improved high-energy $νにゅー_μみゅー$ event reconstruction in IceCube

Authors: IceCube Collaboration, R. Abbasi, M. Ackermann, J. Adams, S. K. Agarwalla, J. A. Aguilar, M. Ahlers, J. M. Alameddine, N. M. Amin, K. Andeen, C. Argüelles, Y. Ashida, S. Athanasiadou, L. Ausborm, S. N. Axani, X. Bai, A. Balagopal V., M. Baricevic, S. W. Barwick, S. Bash, V. Basu, R. Bay, J. J. Beatty, J. Becker Tjus, J. Beise , et al. (398 additional authors not shown)

Abstract: This Letter presents the result of a 3+1 sterile neutrino search using 10.7 years of IceCube data. We analyze atmospheric muon neutrinos that traverse the Earth with energies ranging from 0.5 to 100 TeV, incorporating significant improvements in modeling neutrino flux and detector response compared to earlier studies. Notably, for the first time, we categorize data into starting and through-going… ▽ More This Letter presents the result of a 3+1 sterile neutrino search using 10.7 years of IceCube data. We analyze atmospheric muon neutrinos that traverse the Earth with energies ranging from 0.5 to 100 TeV, incorporating significant improvements in modeling neutrino flux and detector response compared to earlier studies. Notably, for the first time, we categorize data into starting and through-going events, distinguishing neutrino interactions with vertices inside or outside the instrumented volume, to improve energy resolution. The best-fit point for a 3+1 model is found to be at $\sin^2(2θしーた_{24}) = 0.16$ and $Δでるたm^{2}_{41} = 3.5$ eV$^2$, which agrees with previous iterations of this study. The result is consistent with the null hypothesis of no sterile neutrinos with a p-value of 3.1\%. △ Less

Submitted 13 May, 2024; originally announced May 2024.

Comments: 9 pages, 3 figures. This letter is supported by the long-form paper "Methods and stability tests associated with the sterile neutrino search using improved high-energy $νにゅー_μみゅー$ event reconstruction in IceCube," also appearing on arXiv

arXiv:2405.07518 [pdf, other]

SambaNova SN40L: Scaling the AI Memory Wall with Dataflow and Composition of Experts

Authors: Raghu Prabhakar, Ram Sivaramakrishnan, Darshan Gandhi, Yun Du, Mingran Wang, Xiangyu Song, Kejie Zhang, Tianren Gao, Angela Wang, Karen Li, Yongning Sheng, Joshua Brot, Denis Sokolov, Apurv Vivek, Calvin Leung, Arjun Sabnis, Jiayu Bai, Tuowen Zhao, Mark Gottscho, David Jackson, Mark Luttrell, Manish K. Shah, Edison Chen, Kaizhao Liang, Swayambhoo Jain , et al. (5 additional authors not shown)

Abstract: Monolithic large language models (LLMs) like GPT-4 have paved the way for modern generative AI applications. Training, serving, and maintaining monolithic LLMs at scale, however, remains prohibitively expensive and challenging. The disproportionate increase in compute-to-memory ratio of modern AI accelerators have created a memory wall, necessitating new methods to deploy AI. Composition of Expert… ▽ More Monolithic large language models (LLMs) like GPT-4 have paved the way for modern generative AI applications. Training, serving, and maintaining monolithic LLMs at scale, however, remains prohibitively expensive and challenging. The disproportionate increase in compute-to-memory ratio of modern AI accelerators have created a memory wall, necessitating new methods to deploy AI. Composition of Experts (CoE) is an alternative modular approach that lowers the cost and complexity of training and serving. However, this approach presents two key challenges when using conventional hardware: (1) without fused operations, smaller models have lower operational intensity, which makes high utilization more challenging to achieve; and (2) hosting a large number of models can be either prohibitively expensive or slow when dynamically switching between them. In this paper, we describe how combining CoE, streaming dataflow, and a three-tier memory system scales the AI memory wall. We describe Samba-CoE, a CoE system with 150 experts and a trillion total parameters. We deploy Samba-CoE on the SambaNova SN40L Reconfigurable Dataflow Unit (RDU) - a commercial dataflow accelerator architecture that has been co-designed for enterprise inference and training applications. The chip introduces a new three-tier memory system with on-chip distributed SRAM, on-package HBM, and off-package DDR DRAM. A dedicated inter-RDU network enables scaling up and out over multiple sockets. We demonstrate speedups ranging from 2x to 13x on various benchmarks running on eight RDU sockets compared with an unfused baseline. We show that for CoE inference deployments, the 8-socket RDU Node reduces machine footprint by up to 19x, speeds up model switching time by 15x to 31x, and achieves an overall speedup of 3.7x over a DGX H100 and 6.6x over a DGX A100. △ Less

Submitted 13 May, 2024; originally announced May 2024.

arXiv:2405.07060 [pdf, other]

Memory-Maze: Scenario Driven Benchmark and Visual Language Navigation Model for Guiding Blind People

Authors: Masaki Kuribayashi, Kohei Uehara, Allan Wang, Daisuke Sato, Simon Chu, Shigeo Morishima

Abstract: Visual Language Navigation (VLN) powered navigation robots have the potential to guide blind people by understanding and executing route instructions provided by sighted passersby. This capability allows robots to operate in environments that are often unknown a priori. Existing VLN models are insufficient for the scenario of navigation guidance for blind people, as they need to understand routes… ▽ More Visual Language Navigation (VLN) powered navigation robots have the potential to guide blind people by understanding and executing route instructions provided by sighted passersby. This capability allows robots to operate in environments that are often unknown a priori. Existing VLN models are insufficient for the scenario of navigation guidance for blind people, as they need to understand routes described from human memory, which frequently contain stutters, errors, and omission of details as opposed to those obtained by thinking out loud, such as in the Room-to-Room dataset. However, currently, there is no benchmark that simulates instructions that were obtained from human memory in environments where blind people navigate. To this end, we present our benchmark, Memory-Maze, which simulates the scenario of seeking route instructions for guiding blind people. Our benchmark contains a maze-like structured virtual environment and novel route instruction data from human memory. To collect natural language instructions, we conducted two studies from sighted passersby onsite and annotators online. Our analysis demonstrates that instructions data collected onsite were more lengthy and contained more varied wording. Alongside our benchmark, we propose a VLN model better equipped to handle the scenario. Our proposed VLN model uses Large Language Models (LLM) to parse instructions and generate Python codes for robot control. We further show that the existing state-of-the-art model performed suboptimally on our benchmark. In contrast, our proposed method outperformed the state-of-the-art model by a fair margin. We found that future research should exercise caution when considering VLN technology for practical applications, as real-world scenarios have different characteristics than ones collected in traditional settings. △ Less

Submitted 11 May, 2024; originally announced May 2024.

arXiv:2405.06201 [pdf, other]

PhysMLE: Generalizable and Priors-Inclusive Multi-task Remote Physiological Measurement

Authors: Jiyao Wang, Hao Lu, Ange Wang, Xiao Yang, Yingcong Chen, Dengbo He, Kaishun Wu

Abstract: Remote photoplethysmography (rPPG) has been widely applied to measure heart rate from face videos. To increase the generalizability of the algorithms, domain generalization (DG) attracted increasing attention in rPPG. However, when rPPG is extended to simultaneously measure more vital signs (e.g., respiration and blood oxygen saturation), achieving generalizability brings new challenges. Although… ▽ More Remote photoplethysmography (rPPG) has been widely applied to measure heart rate from face videos. To increase the generalizability of the algorithms, domain generalization (DG) attracted increasing attention in rPPG. However, when rPPG is extended to simultaneously measure more vital signs (e.g., respiration and blood oxygen saturation), achieving generalizability brings new challenges. Although partial features shared among different physiological signals can benefit multi-task learning, the sparse and imbalanced target label space brings the seesaw effect over task-specific feature learning. To resolve this problem, we designed an end-to-end Mixture of Low-rank Experts for multi-task remote Physiological measurement (PhysMLE), which is based on multiple low-rank experts with a novel router mechanism, thereby enabling the model to adeptly handle both specifications and correlations within tasks. Additionally, we introduced prior knowledge from physiology among tasks to overcome the imbalance of label space under real-world multi-task physiological measurement. For fair and comprehensive evaluations, this paper proposed a large-scale multi-task generalization benchmark, named Multi-Source Synsemantic Domain Generalization (MSSDG) protocol. Extensive experiments with MSSDG and intra-dataset have shown the effectiveness and efficiency of PhysMLE. In addition, a new dataset was collected and made publicly available to meet the needs of the MSSDG. △ Less

Submitted 9 May, 2024; originally announced May 2024.

Showing 1–50 of 1,300 results for author: Wang, A