-
Centrality dependence of Lévy-stable two-pion Bose-Einstein correlations in $\sqrt{s_{_{NN}}}=200$ GeV Au$+$Au collisions
Authors:
PHENIX Collaboration,
N. J. Abdulameer,
U. Acharya,
A. Adare,
C. Aidala,
N. N. Ajitanand,
Y. Akiba,
R. Akimoto,
H. Al-Ta'ani,
J. Alexander,
A. Angerami,
K. Aoki,
N. Apadula,
Y. Aramaki,
H. Asano,
E. C. Aschenauer,
E. T. Atomssa,
T. C. Awes,
B. Azmoun,
V. Babintsev,
M. Bai,
B. Bannier,
K. N. Barish,
B. Bassalleck,
S. Bathe
, et al. (377 additional authors not shown)
Abstract:
The PHENIX experiment measured the centrality dependence of two-pion Bose-Einstein correlation functions in $\sqrt{s_{_{NN}}}=200$~GeV Au$+$Au collisions at the Relativistic Heavy Ion Collider at Brookhaven National Laboratory. The data are well represented by Lévy-stable source distributions. The extracted source parameters are the correlation-strength parameter $λ$, the Lévy index of stability…
▽ More
The PHENIX experiment measured the centrality dependence of two-pion Bose-Einstein correlation functions in $\sqrt{s_{_{NN}}}=200$~GeV Au$+$Au collisions at the Relativistic Heavy Ion Collider at Brookhaven National Laboratory. The data are well represented by Lévy-stable source distributions. The extracted source parameters are the correlation-strength parameter $λ$, the Lévy index of stability $α$, and the Lévy-scale parameter $R$ as a function of transverse mass $m_T$ and centrality. The $λ(m_T)$ parameter is constant at larger values of $m_T$, but decreases as $m_T$ decreases. The Lévy scale parameter $R(m_T)$ decreases with $m_T$ and exhibits proportionality to the length scale of the nuclear overlap region. The Lévy exponent $α(m_T)$ is independent of $m_T$ within uncertainties in each investigated centrality bin, but shows a clear centrality dependence. At all centralities, the Lévy exponent $α$ is significantly different from that of Gaussian ($α=2$) or Cauchy ($α=1$) source distributions. Comparisons to the predictions of Monte-Carlo simulations of resonance-decay chains show that in all but the most peripheral centrality class (50%-60%), the obtained results are inconsistent with the measurements, unless a significant reduction of the in-medium mass of the $η'$ meson is included. In each centrality class, the best value of the in-medium $η'$ mass is compared to the mass of the $η$ meson, as well as to several theoretical predictions that consider restoration of $U_A(1)$ symmetry in hot hadronic matter.
△ Less
Submitted 11 July, 2024;
originally announced July 2024.
-
Improved limit on neutrinoless double beta decay of \mohundred~from AMoRE-I
Authors:
A. Agrawal,
V. V. Alenkov,
P. Aryal,
J. Beyer,
B. Bhandari,
R. S. Boiko,
K. Boonin,
O. Buzanov,
C. R. Byeon,
N. Chanthima,
M. K. Cheoun,
J. S. Choe,
Seonho Choi,
S. Choudhury,
J. S. Chung,
F. A. Danevich,
M. Djamal,
D. Drung,
C. Enss,
A. Fleischmann,
A. M. Gangapshev,
L. Gastaldo,
Y. M. Gavrilyuk,
A. M. Gezhaev,
O. Gileva
, et al. (83 additional authors not shown)
Abstract:
AMoRE searches for the signature of neutrinoless double beta decay of $^{100}$Mo with a 100 kg sample of enriched $^{100}$Mo. Scintillating molybdate crystals coupled with a metallic magnetic calorimeter operate at milli-Kelvin temperatures to measure the energy of electrons emitted in the decay. As a demonstration of the full-scale AMoRE, we conducted AMoRE-I, a pre-experiment with 18 molybdate c…
▽ More
AMoRE searches for the signature of neutrinoless double beta decay of $^{100}$Mo with a 100 kg sample of enriched $^{100}$Mo. Scintillating molybdate crystals coupled with a metallic magnetic calorimeter operate at milli-Kelvin temperatures to measure the energy of electrons emitted in the decay. As a demonstration of the full-scale AMoRE, we conducted AMoRE-I, a pre-experiment with 18 molybdate crystals, at the Yangyang Underground Laboratory for over two years. The exposure was 8.02 kg$\cdot$year (or 3.89 kg$_{\mathrm{^{100}Mo}}\cdot$year) and the total background rate near the Q-value was 0.025 $\pm$ 0.002 counts/keV/kg/year. We observed no indication of $0νββ$ decay and report a new lower limit of the half-life of $^{100}$Mo $0νββ$ decay as $ T^{0ν}_{1/2}>3.0\times10^{24}~\mathrm{years}$ at 90\% confidence level. The effective Majorana mass limit range is $m_{ββ}<$(210--610) meV using nuclear matrix elements estimated in the framework of different models, including the recent shell model calculations.
△ Less
Submitted 8 July, 2024;
originally announced July 2024.
-
Stochastic Processes: From Classical to Quantum
Authors:
Soon Hoe Lim
Abstract:
The main goal of these notes is to give an introduction to the mathematics of quantum noise and some of its applications in non-equilibrium statistical mechanics. We start with some reminders from the theory of classical stochastic processes. We then provide a brief overview of quantum mechanics and quantum field theory, from the viewpoint of quantum probability and adopting the language of Hudson…
▽ More
The main goal of these notes is to give an introduction to the mathematics of quantum noise and some of its applications in non-equilibrium statistical mechanics. We start with some reminders from the theory of classical stochastic processes. We then provide a brief overview of quantum mechanics and quantum field theory, from the viewpoint of quantum probability and adopting the language of Hudson and Parthasarathy. We introduce quantum stochastic processes on a boson Fock space and their calculus. Whenever possible, we make connections with the relevant concepts in classical probability theory. As an application of the theory, we introduce the theory of open quantum systems, with emphasis on the physics and modeling aspects of these systems.
△ Less
Submitted 4 July, 2024;
originally announced July 2024.
-
The African Woman is Rhythmic and Soulful: Evaluation of Open-ended Generation for Implicit Biases
Authors:
Serene Lim
Abstract:
This study investigates the subtle and often concealed biases present in Large Language Models (LLMs), which, despite passing explicit bias tests, can still exhibit implicit biases akin to those observed in humans who profess egalitarian beliefs yet demonstrate underlying prejudices. The challenge of measuring such biases is exacerbated as LLMs become increasingly proprietary, restricting access t…
▽ More
This study investigates the subtle and often concealed biases present in Large Language Models (LLMs), which, despite passing explicit bias tests, can still exhibit implicit biases akin to those observed in humans who profess egalitarian beliefs yet demonstrate underlying prejudices. The challenge of measuring such biases is exacerbated as LLMs become increasingly proprietary, restricting access to their internal mechanisms such as embeddings, which are crucial for applying traditional bias measures. To tackle these issues, this study introduces innovative measures of bias inspired by psychological methodologies: the LLM Implicit Association Test (IAT) Bias and the LLM Decision Bias. The LLM IAT Bias is a prompt-based method designed to unearth implicit biases by simulating the well-known psychological IAT but adapted for use with LLMs. The LLM Decision Bias measure is developed to detect subtle discrimination in decision-making tasks, focusing on how LLMs choose between individuals in various scenarios. Open-ended generation is also utilised through thematic analysis of word generations and storytelling. The experiments revealed biases across gender and racial domains, from discriminatory categorisations to exoticisation. Our findings indicate that the prompt-based measure of implicit bias not only correlates with traditional embedding-based methods but also more effectively predicts downstream behaviors, which are crucially measured by the LLM Decision Bias. This relationship underscores the importance of relative, rather than absolute, evaluations in assessing implicit biases, reflecting psychological insights into human bias assessment. This research contributes to the broader understanding of AI ethics and provides suggestions for continually assessing and mitigating biases in advanced AI systems, emphasising the need for more qualitative and downstream focus.
△ Less
Submitted 1 July, 2024;
originally announced July 2024.
-
Flat Posterior Does Matter For Bayesian Transfer Learning
Authors:
Sungjun Lim,
Jeyoon Yeom,
Sooyon Kim,
Hoyoon Byun,
Jinho Kang,
Yohan Jung,
Jiyoung Jung,
Kyungwoo Song
Abstract:
The large-scale pre-trained neural network has achieved notable success in enhancing performance for downstream tasks. Another promising approach for generalization is Bayesian Neural Network (BNN), which integrates Bayesian methods into neural network architectures, offering advantages such as Bayesian Model averaging (BMA) and uncertainty quantification. Despite these benefits, transfer learning…
▽ More
The large-scale pre-trained neural network has achieved notable success in enhancing performance for downstream tasks. Another promising approach for generalization is Bayesian Neural Network (BNN), which integrates Bayesian methods into neural network architectures, offering advantages such as Bayesian Model averaging (BMA) and uncertainty quantification. Despite these benefits, transfer learning for BNNs has not been widely investigated and shows limited improvement. We hypothesize that this issue arises from the inability to find flat minima, which is crucial for generalization performance. To address this, we evaluate the sharpness of BNNs in various settings, revealing their insufficiency in seeking flat minima and the influence of flatness on BMA performance. Therefore, we propose Sharpness-aware Bayesian Model Averaging (SA-BMA), a Bayesian-fitting flat posterior seeking optimizer integrated with Bayesian transfer learning. SA-BMA calculates the divergence between posteriors in the parameter space, aligning with the nature of BNNs, and serves as a generalized version of existing sharpness-aware optimizers. We validate that SA-BMA improves generalization performance in few-shot classification and distribution shift scenarios by ensuring flatness.
△ Less
Submitted 21 June, 2024;
originally announced June 2024.
-
DiPEx: Dispersing Prompt Expansion for Class-Agnostic Object Detection
Authors:
Jia Syuen Lim,
Zhuoxiao Chen,
Mahsa Baktashmotlagh,
Zhi Chen,
Xin Yu,
Zi Huang,
Yadan Luo
Abstract:
Class-agnostic object detection (OD) can be a cornerstone or a bottleneck for many downstream vision tasks. Despite considerable advancements in bottom-up and multi-object discovery methods that leverage basic visual cues to identify salient objects, consistently achieving a high recall rate remains difficult due to the diversity of object types and their contextual complexity. In this work, we in…
▽ More
Class-agnostic object detection (OD) can be a cornerstone or a bottleneck for many downstream vision tasks. Despite considerable advancements in bottom-up and multi-object discovery methods that leverage basic visual cues to identify salient objects, consistently achieving a high recall rate remains difficult due to the diversity of object types and their contextual complexity. In this work, we investigate using vision-language models (VLMs) to enhance object detection via a self-supervised prompt learning strategy. Our initial findings indicate that manually crafted text queries often result in undetected objects, primarily because detection confidence diminishes when the query words exhibit semantic overlap. To address this, we propose a Dispersing Prompt Expansion (DiPEx) approach. DiPEx progressively learns to expand a set of distinct, non-overlapping hyperspherical prompts to enhance recall rates, thereby improving performance in downstream tasks such as out-of-distribution OD. Specifically, DiPEx initiates the process by self-training generic parent prompts and selecting the one with the highest semantic uncertainty for further expansion. The resulting child prompts are expected to inherit semantics from their parent prompts while capturing more fine-grained semantics. We apply dispersion losses to ensure high inter-class discrepancy among child prompts while preserving semantic consistency between parent-child prompt pairs. To prevent excessive growth of the prompt sets, we utilize the maximum angular coverage (MAC) of the semantic space as a criterion for early termination. We demonstrate the effectiveness of DiPEx through extensive class-agnostic OD and OOD-OD experiments on MS-COCO and LVIS, surpassing other prompting methods by up to 20.1% in AR and achieving a 21.3% AP improvement over SAM. The code is available at https://github.com/jason-lim26/DiPEx.
△ Less
Submitted 21 June, 2024;
originally announced June 2024.
-
Do LLMs Have Distinct and Consistent Personality? TRAIT: Personality Testset designed for LLMs with Psychometrics
Authors:
Seungbeen Lee,
Seungwon Lim,
Seungju Han,
Giyeong Oh,
Hyungjoo Chae,
Jiwan Chung,
Minju Kim,
Beong-woo Kwak,
Yeonsoo Lee,
Dongha Lee,
Jinyoung Yeo,
Youngjae Yu
Abstract:
The idea of personality in descriptive psychology, traditionally defined through observable behavior, has now been extended to Large Language Models (LLMs) to better understand their behavior. This raises a question: do LLMs exhibit distinct and consistent personality traits, similar to humans? Existing self-assessment personality tests, while applicable, lack the necessary validity and reliabilit…
▽ More
The idea of personality in descriptive psychology, traditionally defined through observable behavior, has now been extended to Large Language Models (LLMs) to better understand their behavior. This raises a question: do LLMs exhibit distinct and consistent personality traits, similar to humans? Existing self-assessment personality tests, while applicable, lack the necessary validity and reliability for precise personality measurements. To address this, we introduce TRAIT, a new tool consisting of 8K multi-choice questions designed to assess the personality of LLMs with validity and reliability. TRAIT is built on the psychometrically validated human questionnaire, Big Five Inventory (BFI) and Short Dark Triad (SD-3), enhanced with the ATOMIC10X knowledge graph for testing personality in a variety of real scenarios. TRAIT overcomes the reliability and validity issues when measuring personality of LLM with self-assessment, showing the highest scores across three metrics: refusal rate, prompt sensitivity, and option order sensitivity. It reveals notable insights into personality of LLM: 1) LLMs exhibit distinct and consistent personality, which is highly influenced by their training data (i.e., data used for alignment tuning), and 2) current prompting techniques have limited effectiveness in eliciting certain traits, such as high psychopathy or low conscientiousness, suggesting the need for further research in this direction.
△ Less
Submitted 20 June, 2024;
originally announced June 2024.
-
Composing Object Relations and Attributes for Image-Text Matching
Authors:
Khoi Pham,
Chuong Huynh,
Ser-Nam Lim,
Abhinav Shrivastava
Abstract:
We study the visual semantic embedding problem for image-text matching. Most existing work utilizes a tailored cross-attention mechanism to perform local alignment across the two image and text modalities. This is computationally expensive, even though it is more powerful than the unimodal dual-encoder approach. This work introduces a dual-encoder image-text matching model, leveraging a scene grap…
▽ More
We study the visual semantic embedding problem for image-text matching. Most existing work utilizes a tailored cross-attention mechanism to perform local alignment across the two image and text modalities. This is computationally expensive, even though it is more powerful than the unimodal dual-encoder approach. This work introduces a dual-encoder image-text matching model, leveraging a scene graph to represent captions with nodes for objects and attributes interconnected by relational edges. Utilizing a graph attention network, our model efficiently encodes object-attribute and object-object semantic relations, resulting in a robust and fast-performing system. Representing caption as a scene graph offers the ability to utilize the strong relational inductive bias of graph neural networks to learn object-attribute and object-object relations effectively. To train the model, we propose losses that align the image and caption both at the holistic level (image-caption) and the local level (image-object entity), which we show is key to the success of the model. Our model is termed Composition model for Object Relations and Attributes, CORA. Experimental results on two prominent image-text retrieval benchmarks, Flickr30K and MSCOCO, demonstrate that CORA outperforms existing state-of-the-art computationally expensive cross-attention methods regarding recall score while achieving fast computation speed of the dual encoder.
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
Projected background and sensitivity of AMoRE-II
Authors:
A. Agrawal,
V. V. Alenkov,
P. Aryal,
J. Beyer,
B. Bhandari,
R. S. Boiko,
K. Boonin,
O. Buzanov,
C. R. Byeon,
N. Chanthima,
M. K. Cheoun,
J. S. Choe,
Seonho Choi,
S. Choudhury,
J. S. Chung,
F. A. Danevich,
M. Djamal,
D. Drung,
C. Enss,
A. Fleischmann,
A. M. Gangapshev,
L. Gastaldo,
Y. M. Gavrilyuk,
A. M. Gezhaev,
O. Gileva
, et al. (81 additional authors not shown)
Abstract:
AMoRE-II aims to search for neutrinoless double beta decay with an array of 423 Li$_2$$^{100}$MoO$_4$ crystals operating in the cryogenic system as the main phase of the Advanced Molybdenum-based Rare process Experiment (AMoRE). AMoRE has been planned to operate in three phases: AMoRE-pilot, AMoRE-I, and AMoRE-II. AMoRE-II is currently being installed at the Yemi Underground Laboratory, located ap…
▽ More
AMoRE-II aims to search for neutrinoless double beta decay with an array of 423 Li$_2$$^{100}$MoO$_4$ crystals operating in the cryogenic system as the main phase of the Advanced Molybdenum-based Rare process Experiment (AMoRE). AMoRE has been planned to operate in three phases: AMoRE-pilot, AMoRE-I, and AMoRE-II. AMoRE-II is currently being installed at the Yemi Underground Laboratory, located approximately 1000 meters deep in Jeongseon, Korea. The goal of AMoRE-II is to reach up to $T^{0νββ}_{1/2}$ $\sim$ 6 $\times$ 10$^{26}$ years, corresponding to an effective Majorana mass of 15 - 29 meV, covering all the inverted mass hierarchy regions. To achieve this, the background level of the experimental configurations and possible background sources of gamma and beta events should be well understood. We have intensively performed Monte Carlo simulations using the GEANT4 toolkit in all the experimental configurations with potential sources. We report the estimated background level that meets the 10$^{-4}$counts/(keV$\cdot$kg$\cdot$yr) requirement for AMoRE-II in the region of interest (ROI) and show the projected half-life sensitivity based on the simulation study.
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
Jet modification via $π^0$-hadron correlations in Au$+$Au collisions at $\sqrt{s_{_{NN}}}=200$ GeV
Authors:
PHENIX Collaboration,
N. J. Abdulameer,
U. Acharya,
A. Adare,
S. Afanasiev,
C. Aidala,
N. N. Ajitanand,
Y. Akiba,
H. Al-Bataineh,
J. Alexander,
M. Alfred,
K. Aoki,
N. Apadula,
L. Aphecetche,
J. Asai,
H. Asano,
E. T. Atomssa,
R. Averbeck,
T. C. Awes,
B. Azmoun,
V. Babintsev,
M. Bai,
G. Baksay,
L. Baksay,
A. Baldisseri
, et al. (510 additional authors not shown)
Abstract:
High-momentum two-particle correlations are a useful tool for studying jet-quenching effects in the quark-gluon plasma. Angular correlations between neutral-pion triggers and charged hadrons with transverse momenta in the range 4--12~GeV/$c$ and 0.5--7~GeV/$c$, respectively, have been measured by the PHENIX experiment in 2014 for Au$+$Au collisions at $\sqrt{s_{_{NN}}}=200$~GeV. Suppression is obs…
▽ More
High-momentum two-particle correlations are a useful tool for studying jet-quenching effects in the quark-gluon plasma. Angular correlations between neutral-pion triggers and charged hadrons with transverse momenta in the range 4--12~GeV/$c$ and 0.5--7~GeV/$c$, respectively, have been measured by the PHENIX experiment in 2014 for Au$+$Au collisions at $\sqrt{s_{_{NN}}}=200$~GeV. Suppression is observed in the yield of high-momentum jet fragments opposite the trigger particle, which indicates jet suppression stemming from in-medium partonic energy loss, while enhancement is observed for low-momentum particles. The ratio and differences between the yield in Au$+$Au collisions and $p$$+$$p$ collisions, $I_{AA}$ and $Δ_{AA}$, as a function of the trigger-hadron azimuthal separation, $Δφ$, are measured for the first time at the Relativistic Heavy Ion Collider. These results better quantify how the yield of low-$p_T$ associated hadrons is enhanced at wide angle, which is crucial for studying energy loss as well as medium-response effects.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
Battling Botpoop using GenAI for Higher Education: A Study of a Retrieval Augmented Generation Chatbots Impact on Learning
Authors:
Maung Thway,
Jose Recatala-Gomez,
Fun Siong Lim,
Kedar Hippalgaonkar,
Leonard W. T. Ng
Abstract:
Generative artificial intelligence (GenAI) and large language models (LLMs) have simultaneously opened new avenues for enhancing human learning and increased the prevalence of poor-quality information in student response - termed Botpoop. This study introduces Professor Leodar, a custom-built, Singlish-speaking Retrieval Augmented Generation (RAG) chatbot designed to enhance educational while redu…
▽ More
Generative artificial intelligence (GenAI) and large language models (LLMs) have simultaneously opened new avenues for enhancing human learning and increased the prevalence of poor-quality information in student response - termed Botpoop. This study introduces Professor Leodar, a custom-built, Singlish-speaking Retrieval Augmented Generation (RAG) chatbot designed to enhance educational while reducing Botpoop. Deployed at Nanyang Technological University, Singapore, Professor Leodar offers a glimpse into the future of AI-assisted learning, offering personalized guidance, 24/7 availability, and contextually relevant information. Through a mixed-methods approach, we examine the impact of Professor Leodar on learning, engagement, and exam preparedness, with 97.1% of participants reporting positive experiences. These findings help define possible roles of AI in education and highlight the potential of custom GenAI chatbots. Our combination of chatbot development, in-class deployment and outcomes study offers a benchmark for GenAI educational tools and is a stepping stone for redefining the interplay between AI and human learning.
△ Less
Submitted 21 June, 2024; v1 submitted 11 June, 2024;
originally announced June 2024.
-
UVIS: Unsupervised Video Instance Segmentation
Authors:
Shuaiyi Huang,
Saksham Suri,
Kamal Gupta,
Sai Saketh Rambhatla,
Ser-nam Lim,
Abhinav Shrivastava
Abstract:
Video instance segmentation requires classifying, segmenting, and tracking every object across video frames. Unlike existing approaches that rely on masks, boxes, or category labels, we propose UVIS, a novel Unsupervised Video Instance Segmentation (UVIS) framework that can perform video instance segmentation without any video annotations or dense label-based pretraining. Our key insight comes fro…
▽ More
Video instance segmentation requires classifying, segmenting, and tracking every object across video frames. Unlike existing approaches that rely on masks, boxes, or category labels, we propose UVIS, a novel Unsupervised Video Instance Segmentation (UVIS) framework that can perform video instance segmentation without any video annotations or dense label-based pretraining. Our key insight comes from leveraging the dense shape prior from the self-supervised vision foundation model DINO and the openset recognition ability from the image-caption supervised vision-language model CLIP. Our UVIS framework consists of three essential steps: frame-level pseudo-label generation, transformer-based VIS model training, and query-based tracking. To improve the quality of VIS predictions in the unsupervised setup, we introduce a dual-memory design. This design includes a semantic memory bank for generating accurate pseudo-labels and a tracking memory bank for maintaining temporal consistency in object tracks. We evaluate our approach on three standard VIS benchmarks, namely YoutubeVIS-2019, YoutubeVIS-2021, and Occluded VIS. Our UVIS achieves 21.1 AP on YoutubeVIS-2019 without any video annotations or dense pretraining, demonstrating the potential of our unsupervised VIS framework.
△ Less
Submitted 10 June, 2024;
originally announced June 2024.
-
Artificial social influence via human-embodied AI agent interaction in immersive virtual reality (VR): Effects of similarity-matching during health conversations
Authors:
Sue Lim,
Ralf Schmälzle,
Gary Bente
Abstract:
Interactions with artificial intelligence (AI) based agents can positively influence human behavior and judgment. However, studies to date focus on text-based conversational agents (CA) with limited embodiment, restricting our understanding of how social influence principles, such as similarity, apply to AI agents (i.e., artificial social influence). We address this gap by leveraging the latest ad…
▽ More
Interactions with artificial intelligence (AI) based agents can positively influence human behavior and judgment. However, studies to date focus on text-based conversational agents (CA) with limited embodiment, restricting our understanding of how social influence principles, such as similarity, apply to AI agents (i.e., artificial social influence). We address this gap by leveraging the latest advances in AI (language models) and combining them with immersive virtual reality (VR). Specifically, we built VR-ECAs, or embodied conversational agents that can naturally converse with humans about health-related topics in a virtual environment. Then we manipulated human-agent similarity via gender matching and examined its effects on biobehavioral (i.e., gaze), social (e.g., agent likeability), and behavioral outcomes (i.e., healthy snack selection). We found that discussing health with opposite-gender agents enhanced gaze duration and the likelihood of healthy snack selection. In addition, female participants liked the VR-ECAs more than their male counterparts, regardless of the gender of the VR-ECAs. Finally, participants experienced greater presence while conversing with VR-embodied agents than chatting with text-only agents. Overall, our findings highlight embodiment as a crucial factor in how AI influences human behavior, and our paradigm enables new experimental research at the intersection of social influence, human-AI communication, and immersive virtual reality (VR).
△ Less
Submitted 8 June, 2024;
originally announced June 2024.
-
Multidimensional optical singularities and their applications
Authors:
Soon Wei Daniel Lim,
Christina M. Spaegele,
Federico Capasso
Abstract:
Optical singularities, which are positions within an electromagnetic field where certain field parameters become undefined, hold significant potential for applications in areas such as super-resolution microscopy, sensing, and communication. This potential stems from their high field confinement and characteristic rapidly-changing field distributions. Although the systematic characterization of th…
▽ More
Optical singularities, which are positions within an electromagnetic field where certain field parameters become undefined, hold significant potential for applications in areas such as super-resolution microscopy, sensing, and communication. This potential stems from their high field confinement and characteristic rapidly-changing field distributions. Although the systematic characterization of the first singularities dates back many decades, recent advancements in sub-wavelength wavefront control at optical frequencies have led to a renewed interest in the field, and have substantially expanded the range of known optical singularities and singular structures. However, the diversity in descriptions, mathematical formulations, and naming conventions can create confusion and impede accessibility to the field. This review aims to clarify the nomenclature by demonstrating that any singular field can be conceptualized as a collection of a finite set of principal, 'generic' singularities. These singularities are robust against small perturbations due to their topological nature. We underscore that the control over the principal properties of those singularities, namely, their protection against perturbations and their dimension, utilizes a consistent mathematical framework. Additionally, we provide an overview of current design techniques for both stable and approximate singularities and discuss their applications across various disciplines.
△ Less
Submitted 2 June, 2024;
originally announced June 2024.
-
Experimental demonstration of a fault-tolerant qubit encoded on a hyperfine-coupled qudit
Authors:
Sumin Lim,
Mikhail Vaganov,
Junjie Liu,
Arzhang Ardavan
Abstract:
The realization of effective quantum error correction protocols remains a central challenge in the development of scalable quantum computers. Protocols employing redundancy over multiple physical qubits to encode a single error-protected logical qubit are theoretically effective, but imply a large resource overhead. Alternative, more hardware-efficient, approaches seek to deploy higher-dimensional…
▽ More
The realization of effective quantum error correction protocols remains a central challenge in the development of scalable quantum computers. Protocols employing redundancy over multiple physical qubits to encode a single error-protected logical qubit are theoretically effective, but imply a large resource overhead. Alternative, more hardware-efficient, approaches seek to deploy higher-dimensional quantum systems known as qudits. Recently, proposals have emerged for exploiting high-spin magnetic nuclei coupled to condensed matter electron spin qubits to implement fault-tolerant memories.
Here, we explore experimentally the simplest of these proposals, a logical qubit encoded on the four states of a I=3/2 nuclear spin hyperfine-coupled to a S=1/2 electron spin qubit; the encoding protects against the dominant decoherence mechanism in such systems, fluctuations of the quantizing magnetic field. We implement the encoding using electron-nuclear double resonance within a subspace of the spin levels in an ensemble of highly coherent manganese defects in zinc oxide. We explore the dynamics of the encoded state both under a controlled application of the fluctuation and under natural decoherence processes. Our results confirm the potential of these proposals for practical, implementable, fault tolerant quantum memories.
△ Less
Submitted 31 May, 2024;
originally announced May 2024.
-
Stochastic Optimal Control for Diffusion Bridges in Function Spaces
Authors:
Byoungwoo Park,
Jungwon Choi,
Sungbin Lim,
Juho Lee
Abstract:
Recent advancements in diffusion models and diffusion bridges primarily focus on finite-dimensional spaces, yet many real-world problems necessitate operations in infinite-dimensional function spaces for more natural and interpretable formulations. In this paper, we present a theory of stochastic optimal control (SOC) tailored to infinite-dimensional spaces, aiming to extend diffusion-based algori…
▽ More
Recent advancements in diffusion models and diffusion bridges primarily focus on finite-dimensional spaces, yet many real-world problems necessitate operations in infinite-dimensional function spaces for more natural and interpretable formulations. In this paper, we present a theory of stochastic optimal control (SOC) tailored to infinite-dimensional spaces, aiming to extend diffusion-based algorithms to function spaces. Specifically, we demonstrate how Doob's $h$-transform, the fundamental tool for constructing diffusion bridges, can be derived from the SOC perspective and expanded to infinite dimensions. This expansion presents a challenge, as infinite-dimensional spaces typically lack closed-form densities. Leveraging our theory, we establish that solving the optimal control problem with a specific objective function choice is equivalent to learning diffusion-based generative models. We propose two applications: (1) learning bridges between two infinite-dimensional distributions and (2) generative models for sampling from an infinite-dimensional distribution. Our approach proves effective for diverse problems involving continuous function space representations, such as resolution-free images, time-series data, and probability density functions.
△ Less
Submitted 2 June, 2024; v1 submitted 31 May, 2024;
originally announced May 2024.
-
Automated Real-World Sustainability Data Generation from Images of Buildings
Authors:
Peter J Bentley,
Soo Ling Lim,
Rajat Mathur,
Sid Narang
Abstract:
When data on building features is unavailable, the task of determining how to improve that building in terms of carbon emissions becomes infeasible. We show that from only a set of images, a Large Language Model with appropriate prompt engineering and domain knowledge can successfully estimate a range of building features relevant for sustainability calculations. We compare our novel image-to-data…
▽ More
When data on building features is unavailable, the task of determining how to improve that building in terms of carbon emissions becomes infeasible. We show that from only a set of images, a Large Language Model with appropriate prompt engineering and domain knowledge can successfully estimate a range of building features relevant for sustainability calculations. We compare our novel image-to-data method with a ground truth comprising real building data for 47 apartments and achieve accuracy better than a human performing the same task. We also demonstrate that the method can generate tailored recommendations to the owner on how best to improve their properties and discuss methods to scale the approach.
△ Less
Submitted 28 May, 2024;
originally announced May 2024.
-
LARM: Large Auto-Regressive Model for Long-Horizon Embodied Intelligence
Authors:
Zhuoling Li,
Xiaogang Xu,
Zhenhua Xu,
SerNam Lim,
Hengshuang Zhao
Abstract:
Due to the need to interact with the real world, embodied agents are required to possess comprehensive prior knowledge, long-horizon planning capability, and a swift response speed. Despite recent large language model (LLM) based agents achieving promising performance, they still exhibit several limitations. For instance, the output of LLMs is a descriptive sentence, which is ambiguous when determ…
▽ More
Due to the need to interact with the real world, embodied agents are required to possess comprehensive prior knowledge, long-horizon planning capability, and a swift response speed. Despite recent large language model (LLM) based agents achieving promising performance, they still exhibit several limitations. For instance, the output of LLMs is a descriptive sentence, which is ambiguous when determining specific actions. To address these limitations, we introduce the large auto-regressive model (LARM). LARM leverages both text and multi-view images as input and predicts subsequent actions in an auto-regressive manner. To train LARM, we develop a novel data format named auto-regressive node transmission structure and assemble a corresponding dataset. Adopting a two-phase training regimen, LARM successfully harvests enchanted equipment in Minecraft, which demands significantly more complex decision-making chains than the highest achievements of prior best methods. Besides, the speed of LARM is 6.8x faster.
△ Less
Submitted 27 May, 2024;
originally announced May 2024.
-
Distilling Vision-Language Pretraining for Efficient Cross-Modal Retrieval
Authors:
Young Kyun Jang,
Donghyun Kim,
Ser-nam Lim
Abstract:
``Learning to hash'' is a practical solution for efficient retrieval, offering fast search speed and low storage cost. It is widely applied in various applications, such as image-text cross-modal search. In this paper, we explore the potential of enhancing the performance of learning to hash with the proliferation of powerful large pre-trained models, such as Vision-Language Pre-training (VLP) mod…
▽ More
``Learning to hash'' is a practical solution for efficient retrieval, offering fast search speed and low storage cost. It is widely applied in various applications, such as image-text cross-modal search. In this paper, we explore the potential of enhancing the performance of learning to hash with the proliferation of powerful large pre-trained models, such as Vision-Language Pre-training (VLP) models. We introduce a novel method named Distillation for Cross-Modal Quantization (DCMQ), which leverages the rich semantic knowledge of VLP models to improve hash representation learning. Specifically, we use the VLP as a `teacher' to distill knowledge into a `student' hashing model equipped with codebooks. This process involves the replacement of supervised labels, which are composed of multi-hot vectors and lack semantics, with the rich semantics of VLP. In the end, we apply a transformation termed Normalization with Paired Consistency (NPC) to achieve a discriminative target for distillation. Further, we introduce a new quantization method, Product Quantization with Gumbel (PQG) that promotes balanced codebook learning, thereby improving the retrieval performance. Extensive benchmark testing demonstrates that DCMQ consistently outperforms existing supervised cross-modal hashing approaches, showcasing its significant potential.
△ Less
Submitted 23 May, 2024;
originally announced May 2024.
-
Towards Cross-modal Backward-compatible Representation Learning for Vision-Language Models
Authors:
Young Kyun Jang,
Ser-nam Lim
Abstract:
Modern retrieval systems often struggle with upgrading to new and more powerful models due to the incompatibility of embeddings between the old and new models. This necessitates a costly process known as backfilling, which involves re-computing the embeddings for a large number of data samples. In vision, Backward-compatible Training (BT) has been proposed to ensure that the new model aligns with…
▽ More
Modern retrieval systems often struggle with upgrading to new and more powerful models due to the incompatibility of embeddings between the old and new models. This necessitates a costly process known as backfilling, which involves re-computing the embeddings for a large number of data samples. In vision, Backward-compatible Training (BT) has been proposed to ensure that the new model aligns with the old model's embeddings. This paper extends the concept of vision-only BT to the field of cross-modal retrieval, marking the first attempt to address Cross-modal BT (XBT). Our goal is to achieve backward-compatibility between Vision-Language Pretraining (VLP) models, such as CLIP, for the cross-modal retrieval task. To address XBT challenges, we propose an efficient solution: a projection module that maps the new model's embeddings to those of the old model. This module, pretrained solely with text data, significantly reduces the number of image-text pairs required for XBT learning, and, once it is pretrained, it avoids using the old model during training. Furthermore, we utilize parameter-efficient training strategies that improve efficiency and preserve the off-the-shelf new model's knowledge by avoiding any modifications. Experimental results on cross-modal retrieval datasets demonstrate the effectiveness of XBT and its potential to enable backfill-free upgrades when a new VLP model emerges.
△ Less
Submitted 23 May, 2024;
originally announced May 2024.
-
Address-Specific Sustainable Accommodation Choice Through Real-World Data Integration
Authors:
Peter J. Bentley,
Rajat Mathur,
Soo Ling Lim,
Sid Narang
Abstract:
Consumers wish to choose sustainable accommodation for their travels, and in the case of corporations, may be required to do so. Yet accommodation marketplaces provide no meaningful capability for sustainable choice: typically CO2 estimates are provided that are identical for all accommodation of the same type across an entire country. We propose a decision support system that enables real choice…
▽ More
Consumers wish to choose sustainable accommodation for their travels, and in the case of corporations, may be required to do so. Yet accommodation marketplaces provide no meaningful capability for sustainable choice: typically CO2 estimates are provided that are identical for all accommodation of the same type across an entire country. We propose a decision support system that enables real choice of sustainable accommodation. We develop a data-driven address-specific metric called EcoGrade, which integrates government approved datasets and uses interpolation where data is sparse. We validate the metric on 10,000 UK addresses in 10 cities, showing the match of our interpolations to reality is statistically significant. We show how the metric has been embedded into a decision support system for a global accommodation marketplace and tested by real users over several months with positive user feedback. In the EU, forty percent of final energy consumption is from buildings. We need to encourage all building owners to make their accommodation more efficient. The rental sector is one area where change can occur rapidly, as rented accommodation is renovated frequently. We anticipate our decision support system using EcoGrade will encourage this positive change.
△ Less
Submitted 21 May, 2024;
originally announced May 2024.
-
Investigation of suppression of $Υ(nS)$ in relativistic heavy-ion collisions at RHIC and LHC energies
Authors:
Junlee Kim,
Jaebeom Park,
Byungsik Hong,
Juhee Hong,
Eun-Joo Kim,
Yongsun Kim,
MinJung Kweon,
Su Houng Lee,
Sanghoon Lim,
Jinjoo Seo
Abstract:
The primary purpose of studying quarkonium production in relativistic heavy-ion collisions is to understand the properties of the quark-gluon plasma. At various collision systems, measurements of quarkonium states of different binding energies, such as $Υ(nS)$, can provide comprehensive information. A model study has been performed to investigate the modification of $Υ(nS)$ production in Pb-Pb col…
▽ More
The primary purpose of studying quarkonium production in relativistic heavy-ion collisions is to understand the properties of the quark-gluon plasma. At various collision systems, measurements of quarkonium states of different binding energies, such as $Υ(nS)$, can provide comprehensive information. A model study has been performed to investigate the modification of $Υ(nS)$ production in Pb-Pb collisions at $\sqrt{s_{\mathrm{NN}}}=$ 5.02 TeV and Au-Au collisions at $\sqrt{s_{\mathrm{NN}}}=$ 200 GeV. The Monte-Carlo simulation study is performed with a publicly available hydrodynamic simulation package for the quark-gluon plasma medium and a theoretical calculation of temperature-dependent thermal width of $Υ(nS)$ considering the gluo-dissociation and inelastic parton scattering for dissociation inside the medium. In addition, we perform a systematic study with different descriptions of initial collision geometry and formation time of $Υ(nS)$ to investigate their impacts on yield modification. The model calculation with a varied parameter set can describe the experimental data of $Υ(nS)$ in Pb-Pb collisions at 5.02 TeV and $Υ(2S)$ in Au-Au collisions at 200 GeV but underestimates the modification of $Υ(1S)$ at the lower collision energy. The nuclear absorption mechanism is explored to understand the discrepancy between the data and simulation.
△ Less
Submitted 19 May, 2024;
originally announced May 2024.
-
Spherical Linear Interpolation and Text-Anchoring for Zero-shot Composed Image Retrieval
Authors:
Young Kyun Jang,
Dat Huynh,
Ashish Shah,
Wen-Kai Chen,
Ser-Nam Lim
Abstract:
Composed Image Retrieval (CIR) is a complex task that retrieves images using a query, which is configured with an image and a caption that describes desired modifications to that image. Supervised CIR approaches have shown strong performance, but their reliance on expensive manually-annotated datasets restricts their scalability and broader applicability. To address these issues, previous studies…
▽ More
Composed Image Retrieval (CIR) is a complex task that retrieves images using a query, which is configured with an image and a caption that describes desired modifications to that image. Supervised CIR approaches have shown strong performance, but their reliance on expensive manually-annotated datasets restricts their scalability and broader applicability. To address these issues, previous studies have proposed pseudo-word token-based Zero-Shot CIR (ZS-CIR) methods, which utilize a projection module to map images to word tokens. However, we conjecture that this approach has a downside: the projection module distorts the original image representation and confines the resulting composed embeddings to the text-side. In order to resolve this, we introduce a novel ZS-CIR method that uses Spherical Linear Interpolation (Slerp) to directly merge image and text representations by identifying an intermediate embedding of both. Furthermore, we introduce Text-Anchored-Tuning (TAT), a method that fine-tunes the image encoder while keeping the text encoder fixed. TAT closes the modality gap between images and text, making the Slerp process much more effective. Notably, the TAT method is not only efficient in terms of the scale of the training dataset and training time, but it also serves as an excellent initial checkpoint for training supervised CIR models, thereby highlighting its wider potential. The integration of the Slerp-based ZS-CIR with a TAT-tuned model enables our approach to deliver state-of-the-art retrieval performance across CIR benchmarks.
△ Less
Submitted 1 May, 2024;
originally announced May 2024.
-
The Robotic MAAO 0.7m Telescope System: Performance and Standard Photometric System
Authors:
Gu Lim,
Dohyeong Kim,
Seonghun Lim,
Myungshin Im,
Hyeonho Choi,
Jaemin Park,
Keun-Hong Park,
Junyeong Park,
Chaudhary Muskaan,
Donghyun Kim,
Hayeong Jeong
Abstract:
We introduce a 0.7m telescope system at the Miryang Arirang Astronomical Observatory (MAAO), a public observatory in Miryang, Korea. System integration and a scheduling program enable the 0.7m telescope system to operate completely robotically during nighttime, eliminating the need for human intervention. Using the 0.7m telescope system, we obtain atmospheric extinction coefficients and the zero-p…
▽ More
We introduce a 0.7m telescope system at the Miryang Arirang Astronomical Observatory (MAAO), a public observatory in Miryang, Korea. System integration and a scheduling program enable the 0.7m telescope system to operate completely robotically during nighttime, eliminating the need for human intervention. Using the 0.7m telescope system, we obtain atmospheric extinction coefficients and the zero-point magnitudes by observing standard stars. As a result, we find that atmospheric extinctions are moderate but they can sometimes increase depending on the weather conditions. The measured 5-sigma limiting magnitudes reach down to BVRI=19.4-19.6 AB mag for a point source with a total integrated time of 10 minutes under clear weather conditions, demonstrating comparable performance with other observational facilities operating under similar specifications and sky conditions. We expect that the newly established MAAO 0.7m telescope system will contribute significantly to the observational studies of astronomy. Particularly, with its capability for robotic observations, this system, although its primary duty is for public viewing, can be extensively used for the time-series observation of transients.
△ Less
Submitted 24 April, 2024;
originally announced April 2024.
-
Visual Delta Generator with Large Multi-modal Models for Semi-supervised Composed Image Retrieval
Authors:
Young Kyun Jang,
Donghyun Kim,
Zihang Meng,
Dat Huynh,
Ser-Nam Lim
Abstract:
Composed Image Retrieval (CIR) is a task that retrieves images similar to a query, based on a provided textual modification. Current techniques rely on supervised learning for CIR models using labeled triplets of the reference image, text, target image. These specific triplets are not as commonly available as simple image-text pairs, limiting the widespread use of CIR and its scalability. On the o…
▽ More
Composed Image Retrieval (CIR) is a task that retrieves images similar to a query, based on a provided textual modification. Current techniques rely on supervised learning for CIR models using labeled triplets of the reference image, text, target image. These specific triplets are not as commonly available as simple image-text pairs, limiting the widespread use of CIR and its scalability. On the other hand, zero-shot CIR can be relatively easily trained with image-caption pairs without considering the image-to-image relation, but this approach tends to yield lower accuracy. We propose a new semi-supervised CIR approach where we search for a reference and its related target images in auxiliary data and learn our large language model-based Visual Delta Generator (VDG) to generate text describing the visual difference (i.e., visual delta) between the two. VDG, equipped with fluent language knowledge and being model agnostic, can generate pseudo triplets to boost the performance of CIR models. Our approach significantly improves the existing supervised learning approaches and achieves state-of-the-art results on the CIR benchmarks.
△ Less
Submitted 23 April, 2024;
originally announced April 2024.
-
Planet Hunters NGTS: New Planet Candidates from a Citizen Science Search of the Next Generation Transit Survey Public Data
Authors:
Sean M. O'Brien,
Megan E. Schwamb,
Samuel Gill,
Christopher A. Watson,
Matthew R. Burleigh,
Alicia Kendall,
David R. Anderson,
José I. Vines,
James S. Jenkins,
Douglas R. Alves,
Laura Trouille,
Solène Ulmer-Moll,
Edward M. Bryant,
Ioannis Apergis,
Matthew P. Battley,
Daniel Bayliss,
Nora L. Eisner,
Edward Gillen,
Michael R. Goad,
Maximilian N. Günther,
Beth A. Henderson,
Jeong-Eun Heo,
David G. Jackson,
Chris Lintott,
James McCormac
, et al. (13 additional authors not shown)
Abstract:
We present the results from the first two years of the Planet Hunters NGTS citizen science project, which searches for transiting planet candidates in data from the Next Generation Transit Survey (NGTS) by enlisting the help of members of the general public. Over 8,000 registered volunteers reviewed 138,198 light curves from the NGTS Public Data Releases 1 and 2. We utilize a user weighting scheme…
▽ More
We present the results from the first two years of the Planet Hunters NGTS citizen science project, which searches for transiting planet candidates in data from the Next Generation Transit Survey (NGTS) by enlisting the help of members of the general public. Over 8,000 registered volunteers reviewed 138,198 light curves from the NGTS Public Data Releases 1 and 2. We utilize a user weighting scheme to combine the classifications of multiple users to identify the most promising planet candidates not initially discovered by the NGTS team. We highlight the five most interesting planet candidates detected through this search, which are all candidate short-period giant planets. This includes the TIC-165227846 system that, if confirmed, would be the lowest-mass star to host a close-in giant planet. We assess the detection efficiency of the project by determining the number of confirmed planets from the NASA Exoplanet Archive and TESS Objects of Interest (TOIs) successfully recovered by this search and find that 74% of confirmed planets and 63% of TOIs detected by NGTS are recovered by the Planet Hunters NGTS project. The identification of new planet candidates shows that the citizen science approach can provide a complementary method to the detection of exoplanets with ground-based surveys such as NGTS.
△ Less
Submitted 23 April, 2024;
originally announced April 2024.
-
Redefining the Shortest Path Problem Formulation of the Linear Non-Gaussian Acyclic Model: Pairwise Likelihood Ratios, Prior Knowledge, and Path Enumeration
Authors:
Hans Jarett J. Ong,
Brian Godwin S. Lim
Abstract:
Effective causal discovery is essential for learning the causal graph from observational data. The linear non-Gaussian acyclic model (LiNGAM) operates under the assumption of a linear data generating process with non-Gaussian noise in determining the causal graph. Its assumption of unmeasured confounders being absent, however, poses practical limitations. In response, empirical research has shown…
▽ More
Effective causal discovery is essential for learning the causal graph from observational data. The linear non-Gaussian acyclic model (LiNGAM) operates under the assumption of a linear data generating process with non-Gaussian noise in determining the causal graph. Its assumption of unmeasured confounders being absent, however, poses practical limitations. In response, empirical research has shown that the reformulation of LiNGAM as a shortest path problem (LiNGAM-SPP) addresses this limitation. Within LiNGAM-SPP, mutual information is chosen to serve as the measure of independence. A challenge is introduced - parameter tuning is now needed due to its reliance on kNN mutual information estimators. The paper proposes a threefold enhancement to the LiNGAM-SPP framework.
First, the need for parameter tuning is eliminated by using the pairwise likelihood ratio in lieu of kNN-based mutual information. This substitution is validated on a general data generating process and benchmark real-world data sets, outperforming existing methods especially when given a larger set of features. The incorporation of prior knowledge is then enabled by a node-skipping strategy implemented on the graph representation of all causal orderings to eliminate violations based on the provided input of relative orderings. Flexibility relative to existing approaches is achieved. Last among the three enhancements is the utilization of the distribution of paths in the graph representation of all causal orderings. From this, crucial properties of the true causal graph such as the presence of unmeasured confounders and sparsity may be inferred. To some extent, the expected performance of the causal discovery algorithm may be predicted. The refinements above advance the practicality and performance of LiNGAM-SPP, showcasing the potential of graph-search-based methodologies in advancing causal discovery.
△ Less
Submitted 18 April, 2024;
originally announced April 2024.
-
Multiphoton super-resolution imaging via virtual structured illumination
Authors:
Sumin Lim,
Sungsam Kang,
Jin-Hee Hong,
Youngho Jin,
Kalpak Gupta,
Moonseok Kim,
Suhyun Kim,
Wonshik Choi,
Seokchan Yoon
Abstract:
Fluorescence imaging in thick biological tissues is challenging due to sample-induced aberration and scattering, which leads to severe degradation of image quality and resolution. Fluorescence imaging in reflection geometry further exacerbates this issue since the point spread function is distorted in both excitation and emission pathways. Here, we propose a novel approach termed adaptive optics v…
▽ More
Fluorescence imaging in thick biological tissues is challenging due to sample-induced aberration and scattering, which leads to severe degradation of image quality and resolution. Fluorescence imaging in reflection geometry further exacerbates this issue since the point spread function is distorted in both excitation and emission pathways. Here, we propose a novel approach termed adaptive optics virtual structured illumination microscopy (AO V-SIM) that enables super-resolution multiphoton imaging through a scattering medium in reflection geometry. Our approach exploits the incoherent reflection matrix obtained using a conventional point-scanning fluorescence microscope with an array detector. We introduce V-SIM super-resolution reconstruction algorithm based on the incoherent reflection matrix. Furthermore, we introduce a software adaptive optics correction algorithm, AO V-SIM, which recovers unattenuated and phase-corrected optical transfer function for both excitation and emission pathways. The effectiveness of our proposed method is experimentally validated through sub-diffraction-limited two-photon fluorescence imaging of various samples in the presence of strong aberration.
△ Less
Submitted 17 April, 2024;
originally announced April 2024.
-
Hyperbolic Heterogeneous Graph Attention Networks
Authors:
Jongmin Park,
Seunghoon Han,
Soohwan Jeong,
Sungsu Lim
Abstract:
Most previous heterogeneous graph embedding models represent elements in a heterogeneous graph as vector representations in a low-dimensional Euclidean space. However, because heterogeneous graphs inherently possess complex structures, such as hierarchical or power-law structures, distortions can occur when representing them in Euclidean space. To overcome this limitation, we propose Hyperbolic He…
▽ More
Most previous heterogeneous graph embedding models represent elements in a heterogeneous graph as vector representations in a low-dimensional Euclidean space. However, because heterogeneous graphs inherently possess complex structures, such as hierarchical or power-law structures, distortions can occur when representing them in Euclidean space. To overcome this limitation, we propose Hyperbolic Heterogeneous Graph Attention Networks (HHGAT) that learn vector representations in hyperbolic spaces with meta-path instances. We conducted experiments on three real-world heterogeneous graph datasets, demonstrating that HHGAT outperforms state-of-the-art heterogeneous graph embedding models in node classification and clustering tasks.
△ Less
Submitted 15 April, 2024;
originally announced April 2024.
-
Securing Monolithic Kernels using Compartmentalization
Authors:
Soo Yee Lim,
Sidhartha Agrawal,
Xueyuan Han,
David Eyers,
Dan O'Keeffe,
Thomas Pasquier
Abstract:
Monolithic operating systems, where all kernel functionality resides in a single, shared address space, are the foundation of most mainstream computer systems. However, a single flaw, even in a non-essential part of the kernel (e.g., device drivers), can cause the entire operating system to fall under an attacker's control. Kernel hardening techniques might prevent certain types of vulnerabilities…
▽ More
Monolithic operating systems, where all kernel functionality resides in a single, shared address space, are the foundation of most mainstream computer systems. However, a single flaw, even in a non-essential part of the kernel (e.g., device drivers), can cause the entire operating system to fall under an attacker's control. Kernel hardening techniques might prevent certain types of vulnerabilities, but they fail to address a fundamental weakness: the lack of intra-kernel security that safely isolates different parts of the kernel. We survey kernel compartmentalization techniques that define and enforce intra-kernel boundaries and propose a taxonomy that allows the community to compare and discuss future work. We also identify factors that complicate comparisons among compartmentalized systems, suggest new ways to compare future approaches with existing work meaningfully, and discuss emerging research directions.
△ Less
Submitted 12 April, 2024;
originally announced April 2024.
-
Taxonomy and Analysis of Sensitive User Queries in Generative AI Search
Authors:
Hwiyeol Jo,
Taiwoo Park,
Nayoung Choi,
Changbong Kim,
Ohjoon Kwon,
Donghyeon Jeon,
Hyunwoo Lee,
Eui-Hyeon Lee,
Kyoungho Shin,
Sun Suk Lim,
Kyungmi Kim,
Jihye Lee,
Sun Kim
Abstract:
Although there has been a growing interest among industries to integrate generative LLMs into their services, limited experiences and scarcity of resources acts as a barrier in launching and servicing large-scale LLM-based conversational services. In this paper, we share our experiences in developing and operating generative AI models within a national-scale search engine, with a specific focus on…
▽ More
Although there has been a growing interest among industries to integrate generative LLMs into their services, limited experiences and scarcity of resources acts as a barrier in launching and servicing large-scale LLM-based conversational services. In this paper, we share our experiences in developing and operating generative AI models within a national-scale search engine, with a specific focus on the sensitiveness of user queries. We propose a taxonomy for sensitive search queries, outline our approaches, and present a comprehensive analysis report on sensitive queries from actual users.
△ Less
Submitted 5 April, 2024;
originally announced April 2024.
-
Singular linear forms over global function fields
Authors:
Gukyeong Bang,
Taehyeong Kim,
Seonhee Lim
Abstract:
In this paper, we consider singular linear forms over global function fields of class number one and give an upper bound for the Hausdorff dimension of the set of singular linear forms by constructing an appropriate Margulis function over global function fields.
In this paper, we consider singular linear forms over global function fields of class number one and give an upper bound for the Hausdorff dimension of the set of singular linear forms by constructing an appropriate Margulis function over global function fields.
△ Less
Submitted 11 April, 2024;
originally announced April 2024.
-
MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding
Authors:
Bo He,
Hengduo Li,
Young Kyun Jang,
Menglin Jia,
Xuefei Cao,
Ashish Shah,
Abhinav Shrivastava,
Ser-Nam Lim
Abstract:
With the success of large language models (LLMs), integrating the vision model into LLMs to build vision-language foundation models has gained much more interest recently. However, existing LLM-based large multimodal models (e.g., Video-LLaMA, VideoChat) can only take in a limited number of frames for short video understanding. In this study, we mainly focus on designing an efficient and effective…
▽ More
With the success of large language models (LLMs), integrating the vision model into LLMs to build vision-language foundation models has gained much more interest recently. However, existing LLM-based large multimodal models (e.g., Video-LLaMA, VideoChat) can only take in a limited number of frames for short video understanding. In this study, we mainly focus on designing an efficient and effective model for long-term video understanding. Instead of trying to process more frames simultaneously like most existing work, we propose to process videos in an online manner and store past video information in a memory bank. This allows our model to reference historical video content for long-term analysis without exceeding LLMs' context length constraints or GPU memory limits. Our memory bank can be seamlessly integrated into current multimodal LLMs in an off-the-shelf manner. We conduct extensive experiments on various video understanding tasks, such as long-video understanding, video question answering, and video captioning, and our model can achieve state-of-the-art performances across multiple datasets. Code available at https://boheumd.github.io/MA-LMM/.
△ Less
Submitted 24 April, 2024; v1 submitted 8 April, 2024;
originally announced April 2024.
-
HyperCLOVA X Technical Report
Authors:
Kang Min Yoo,
Jaegeun Han,
Sookyo In,
Heewon Jeon,
Jisu Jeong,
Jaewook Kang,
Hyunwook Kim,
Kyung-Min Kim,
Munhyong Kim,
Sungju Kim,
Donghyun Kwak,
Hanock Kwak,
Se Jung Kwon,
Bado Lee,
Dongsoo Lee,
Gichang Lee,
Jooho Lee,
Baeseong Park,
Seongjin Shin,
Joonsang Yu,
Seolki Baek,
Sumin Byeon,
Eungsup Cho,
Dooseok Choe,
Jeesung Han
, et al. (371 additional authors not shown)
Abstract:
We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment t…
▽ More
We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment to responsible AI. The model is evaluated across various benchmarks, including comprehensive reasoning, knowledge, commonsense, factuality, coding, math, chatting, instruction-following, and harmlessness, in both Korean and English. HyperCLOVA X exhibits strong reasoning capabilities in Korean backed by a deep understanding of the language and cultural nuances. Further analysis of the inherent bilingual nature and its extension to multilingualism highlights the model's cross-lingual proficiency and strong generalization ability to untargeted languages, including machine translation between several language pairs and cross-lingual inference tasks. We believe that HyperCLOVA X can provide helpful guidance for regions or countries in developing their sovereign LLMs.
△ Less
Submitted 13 April, 2024; v1 submitted 2 April, 2024;
originally announced April 2024.
-
Enhancing Empathy in Virtual Reality: An Embodied Approach to Mindset Modulation
Authors:
Seoyeon Bae,
Yoon Kyung Lee,
Jungcheol Lee,
Jaeheon Kim,
Haeseong Jeon,
Seung-Hwan Lim,
Byung-Cheol Kim,
Sowon Hahn
Abstract:
A growth mindset has shown promising outcomes for increasing empathy ability. However, stimulating a growth mindset in VR-based empathy interventions is under-explored. In the present study, we implemented prosocial VR content, Our Neighbor Hero, focusing on embodying a virtual character to modulate players' mindsets. The virtual body served as a stepping stone, enabling players to identify with t…
▽ More
A growth mindset has shown promising outcomes for increasing empathy ability. However, stimulating a growth mindset in VR-based empathy interventions is under-explored. In the present study, we implemented prosocial VR content, Our Neighbor Hero, focusing on embodying a virtual character to modulate players' mindsets. The virtual body served as a stepping stone, enabling players to identify with the character and cultivate a growth mindset as they followed mission instructions. We considered several implementation factors to assist players in positioning within the VR experience, including positive feedback, content difficulty, background lighting, and multimodal feedback. We conducted an experiment to investigate the intervention's effectiveness in increasing empathy. Our findings revealed that the VR content and mindset training encouraged participants to improve their growth mindsets and empathic motives. This VR content was developed for college students to enhance their empathy and teamwork skills. It has the potential to improve collaboration in organizational and community environments.
△ Less
Submitted 30 March, 2024;
originally announced April 2024.
-
ERD: A Framework for Improving LLM Reasoning for Cognitive Distortion Classification
Authors:
Sehee Lim,
Yejin Kim,
Chi-Hyun Choi,
Jy-yong Sohn,
Byung-Hoon Kim
Abstract:
Improving the accessibility of psychotherapy with the aid of Large Language Models (LLMs) is garnering a significant attention in recent years. Recognizing cognitive distortions from the interviewee's utterances can be an essential part of psychotherapy, especially for cognitive behavioral therapy. In this paper, we propose ERD, which improves LLM-based cognitive distortion classification performa…
▽ More
Improving the accessibility of psychotherapy with the aid of Large Language Models (LLMs) is garnering a significant attention in recent years. Recognizing cognitive distortions from the interviewee's utterances can be an essential part of psychotherapy, especially for cognitive behavioral therapy. In this paper, we propose ERD, which improves LLM-based cognitive distortion classification performance with the aid of additional modules of (1) extracting the parts related to cognitive distortion, and (2) debating the reasoning steps by multiple agents. Our experimental results on a public dataset show that ERD improves the multi-class F1 score as well as binary specificity score. Regarding the latter score, it turns out that our method is effective in debiasing the baseline method which has high false positive rate, especially when the summary of multi-agent debate is provided to LLMs.
△ Less
Submitted 21 March, 2024;
originally announced March 2024.
-
CMOS-compatible photonic integrated circuits on thin-film ScAlN
Authors:
Sihao Wang,
Veerendra Dhyani,
Sakthi Sanjeev Mohanraj,
Xiaodong Shi,
Binni Varghese,
Wing Wai Chung,
Ding Huang,
Zhi Shiuh Lim,
Qibin Zeng,
Huajun Liu,
Xianshu Luo,
Victor Leong,
Nanxi Li,
Di Zhu
Abstract:
Scandium aluminum nitride (ScAlN) has recently emerged as an attractive material for integrated photonics due to its favorable nonlinear optical properties and compatibility with CMOS fabrication. Despite the promising and versatile material properties, it is still an outstanding challenge to realize low-loss photonic circuits on thin-film ScAlN-on-insulator wafers. Here, we present a systematic s…
▽ More
Scandium aluminum nitride (ScAlN) has recently emerged as an attractive material for integrated photonics due to its favorable nonlinear optical properties and compatibility with CMOS fabrication. Despite the promising and versatile material properties, it is still an outstanding challenge to realize low-loss photonic circuits on thin-film ScAlN-on-insulator wafers. Here, we present a systematic study on the material quality of sputtered thin-film ScAlN produced in a CMOS-compatible 200 mm line, and an optimized fabrication process to yield 400 nm thick, fully etched waveguides. With surface polishing and annealing, we achieve micro-ring resonators with an intrinsic quality factor as high as $1.47\times 10^5$, corresponding to a propagation loss of 2.4 dB/cm. These results serve as a critical step towards developing future large-scale, low-loss photonic integrated circuits based on ScAlN.
△ Less
Submitted 11 June, 2024; v1 submitted 21 March, 2024;
originally announced March 2024.
-
Emotion Recognition Using Transformers with Masked Learning
Authors:
Seongjae Min,
Junseok Yang,
Sangjun Lim,
Junyong Lee,
Sangwon Lee,
Sejoon Lim
Abstract:
In recent years, deep learning has achieved innovative advancements in various fields, including the analysis of human emotions and behaviors. Initiatives such as the Affective Behavior Analysis in-the-wild (ABAW) competition have been particularly instrumental in driving research in this area by providing diverse and challenging datasets that enable precise evaluation of complex emotional states.…
▽ More
In recent years, deep learning has achieved innovative advancements in various fields, including the analysis of human emotions and behaviors. Initiatives such as the Affective Behavior Analysis in-the-wild (ABAW) competition have been particularly instrumental in driving research in this area by providing diverse and challenging datasets that enable precise evaluation of complex emotional states. This study leverages the Vision Transformer (ViT) and Transformer models to focus on the estimation of Valence-Arousal (VA), which signifies the positivity and intensity of emotions, recognition of various facial expressions, and detection of Action Units (AU) representing fundamental muscle movements. This approach transcends traditional Convolutional Neural Networks (CNNs) and Long Short-Term Memory (LSTM) based methods, proposing a new Transformer-based framework that maximizes the understanding of temporal and spatial features. The core contributions of this research include the introduction of a learning technique through random frame masking and the application of Focal loss adapted for imbalanced data, enhancing the accuracy and applicability of emotion and behavior analysis in real-world settings. This approach is expected to contribute to the advancement of emotional computing and deep learning methodologies.
△ Less
Submitted 23 March, 2024; v1 submitted 19 March, 2024;
originally announced March 2024.
-
Multi-Object RANSAC: Efficient Plane Clustering Method in a Clutter
Authors:
Seunghyeon Lim,
Youngjae Yoo,
Jun Ki Lee,
Byoung-Tak Zhang
Abstract:
In this paper, we propose a novel method for plane clustering specialized in cluttered scenes using an RGB-D camera and validate its effectiveness through robot grasping experiments. Unlike existing methods, which focus on large-scale indoor structures, our approach -- Multi-Object RANSAC emphasizes cluttered environments that contain a wide range of objects with different scales. It enhances plan…
▽ More
In this paper, we propose a novel method for plane clustering specialized in cluttered scenes using an RGB-D camera and validate its effectiveness through robot grasping experiments. Unlike existing methods, which focus on large-scale indoor structures, our approach -- Multi-Object RANSAC emphasizes cluttered environments that contain a wide range of objects with different scales. It enhances plane segmentation by generating subplanes in Deep Plane Clustering (DPC) module, which are then merged with the final planes by post-processing. DPC rearranges the point cloud by voting layers to make subplane clusters, trained in a self-supervised manner using pseudo-labels generated from RANSAC. Multi-Object RANSAC demonstrates superior plane instance segmentation performances over other recent RANSAC applications. We conducted an experiment on robot suction-based grasping, comparing our method with vision-based grasping network and RANSAC applications. The results from this real-world scenario showed its remarkable performance surpassing the baseline methods, highlighting its potential for advanced scene understanding and manipulation.
△ Less
Submitted 19 March, 2024;
originally announced March 2024.
-
Mitigating Dialogue Hallucination for Large Vision Language Models via Adversarial Instruction Tuning
Authors:
Dongmin Park,
Zhaofang Qian,
Guangxing Han,
Ser-Nam Lim
Abstract:
Mitigating hallucinations of Large Vision Language Models,(LVLMs) is crucial to enhance their reliability for general-purpose assistants. This paper shows that such hallucinations of LVLMs can be significantly exacerbated by preceding user-system dialogues. To precisely measure this, we first present an evaluation benchmark by extending popular multi-modal benchmark datasets with prepended halluci…
▽ More
Mitigating hallucinations of Large Vision Language Models,(LVLMs) is crucial to enhance their reliability for general-purpose assistants. This paper shows that such hallucinations of LVLMs can be significantly exacerbated by preceding user-system dialogues. To precisely measure this, we first present an evaluation benchmark by extending popular multi-modal benchmark datasets with prepended hallucinatory dialogues powered by our novel Adversarial Question Generator (AQG), which can automatically generate image-related yet adversarial dialogues by adopting adversarial attacks on LVLMs. On our benchmark, the zero-shot performance of state-of-the-art LVLMs drops significantly for both the VQA and Captioning tasks. Next, we further reveal this hallucination is mainly due to the prediction bias toward preceding dialogues rather than visual content. To reduce this bias, we propose Adversarial Instruction Tuning (AIT) that robustly fine-tunes LVLMs against hallucinatory dialogues. Extensive experiments show our proposed approach successfully reduces dialogue hallucination while maintaining performance.
△ Less
Submitted 25 May, 2024; v1 submitted 15 March, 2024;
originally announced March 2024.
-
The Next Generation Virgo Cluster Survey (NGVS). XXVII.The Size and Structure of Globular Cluster Systems and their Connection to Dark Matter Halos
Authors:
Sungsoon Lim,
Eric W. Peng,
Patrick Côté,
Laura Ferrarese,
Joel C. Roediger,
Chengze Liu,
Chelsea Spengler,
Elisabeth Sola,
Pierre-Alain Duc,
Laura V. Sales,
John P. Blakeslee,
Jean-Charles Cuillandre,
Patrick R. Durrell,
Eric Emsellem,
Stephen D. J. Gwyn,
Ariane Lançon,
Francine R. Marleau,
J. Christopher Mihos,
Oliver Müller,
Thomas H. Puzia,
Rubén Sánchez-Janssen
Abstract:
We study the size and structure of globular clusters (GC) systems of 118 early-type galaxies from the NGVS, MATLAS, and ACSVCS surveys. Fitting Sérsic profiles, we investigate the relationship between effective radii of GC systems ($R_{e, \rm gc}$) and galaxy properties. GC systems are 2--4 times more extended than host galaxies across the entire stellar mass range of our sample (…
▽ More
We study the size and structure of globular clusters (GC) systems of 118 early-type galaxies from the NGVS, MATLAS, and ACSVCS surveys. Fitting Sérsic profiles, we investigate the relationship between effective radii of GC systems ($R_{e, \rm gc}$) and galaxy properties. GC systems are 2--4 times more extended than host galaxies across the entire stellar mass range of our sample ($10^{8.3} < M_* < 10^{11.6}~M_{\odot}$). The relationship between $R_{e, \rm gc}$ and galaxy stellar mass exhibits a characteristic "knee" at a stellar mass of $M_p \simeq 10^{10.8}$, similar to galaxy $R_e$--stellar mass relationship. We present a new characterization of the traditional blue and red GC color sub-populations, describing them with respect to host galaxy $(g'-i')$ color ($Δ_{gi}$): GCs with similar colors to their hosts have a "red" $Δ_{gi}$, and those significantly bluer GCs have a "blue" $Δ_{gi}$. The GC populations with red $Δ_{gi}$, even in dwarf galaxies, are twice as extended as the stars, suggesting that formation or survival mechanisms favor the outer regions. We find a tight correlation between $R_{e, \rm gc}$ and the total number of GCs, with intrinsic scatter $\lesssim 0.1$ dex spanning two and three orders of magnitude in size and number, respectively. This holds for both red and blue subpopulations, albeit with different slopes. Assuming that $N_{GC, Total}$ correlates with $M_{200}$, we find that the red GC systems have effective radii of roughly 1-5\% $R_{\rm 200}$, while the blue GC systems in massive galaxies can have sizes as large as $\sim$10\% $R_{\rm 200}$. Environmental dependence on $R_{e, \rm gc}$ is also found, with lower density environments exhibiting more extended GC systems at fixed mass.
△ Less
Submitted 14 March, 2024;
originally announced March 2024.
-
Action Reimagined: Text-to-Pose Video Editing for Dynamic Human Actions
Authors:
Lan Wang,
Vishnu Boddeti,
Sernam Lim
Abstract:
We introduce a novel text-to-pose video editing method, ReimaginedAct. While existing video editing tasks are limited to changes in attributes, backgrounds, and styles, our method aims to predict open-ended human action changes in video. Moreover, our method can accept not only direct instructional text prompts but also `what if' questions to predict possible action changes. ReimaginedAct comprise…
▽ More
We introduce a novel text-to-pose video editing method, ReimaginedAct. While existing video editing tasks are limited to changes in attributes, backgrounds, and styles, our method aims to predict open-ended human action changes in video. Moreover, our method can accept not only direct instructional text prompts but also `what if' questions to predict possible action changes. ReimaginedAct comprises video understanding, reasoning, and editing modules. First, an LLM is utilized initially to obtain a plausible answer for the instruction or question, which is then used for (1) prompting Grounded-SAM to produce bounding boxes of relevant individuals and (2) retrieving a set of pose videos that we have collected for editing human actions. The retrieved pose videos and the detected individuals are then utilized to alter the poses extracted from the original video. We also employ a timestep blending module to ensure the edited video retains its original content except where necessary modifications are needed. To facilitate research in text-to-pose video editing, we introduce a new evaluation dataset, WhatifVideo-1.0. This dataset includes videos of different scenarios spanning a range of difficulty levels, along with questions and text prompts. Experimental results demonstrate that existing video editing methods struggle with human action editing, while our approach can achieve effective action editing and even imaginary editing from counterfactual questions.
△ Less
Submitted 11 March, 2024;
originally announced March 2024.
-
FSViewFusion: Few-Shots View Generation of Novel Objects
Authors:
Rukhshanda Hussain,
Hui Xian Grace Lim,
Borchun Chen,
Mubarak Shah,
Ser Nam Lim
Abstract:
Novel view synthesis has observed tremendous developments since the arrival of NeRFs. However, Nerf models overfit on a single scene, lacking generalization to out of distribution objects. Recently, diffusion models have exhibited remarkable performance on introducing generalization in view synthesis. Inspired by these advancements, we explore the capabilities of a pretrained stable diffusion mode…
▽ More
Novel view synthesis has observed tremendous developments since the arrival of NeRFs. However, Nerf models overfit on a single scene, lacking generalization to out of distribution objects. Recently, diffusion models have exhibited remarkable performance on introducing generalization in view synthesis. Inspired by these advancements, we explore the capabilities of a pretrained stable diffusion model for view synthesis without explicit 3D priors. Specifically, we base our method on a personalized text to image model, Dreambooth, given its strong ability to adapt to specific novel objects with a few shots. Our research reveals two interesting findings. First, we observe that Dreambooth can learn the high level concept of a view, compared to arguably more complex strategies which involve finetuning diffusions on large amounts of multi-view data. Second, we establish that the concept of a view can be disentangled and transferred to a novel object irrespective of the original object's identify from which the views are learnt. Motivated by this, we introduce a learning strategy, FSViewFusion, which inherits a specific view through only one image sample of a single scene, and transfers the knowledge to a novel object, learnt from few shots, using low rank adapters. Through extensive experiments we demonstrate that our method, albeit simple, is efficient in generating reliable view samples for in the wild images. Code and models will be released.
△ Less
Submitted 12 March, 2024; v1 submitted 10 March, 2024;
originally announced March 2024.
-
Paving the Way for Pass Disturb Free Vertical NAND Storage via A Dedicated and String-Compatible Pass Gate
Authors:
Zijian Zhao,
Sola Woo,
Khandker Akif Aabrar,
Sharadindu Gopal Kirtania,
Zhouhang Jiang,
Shan Deng,
Yi Xiao,
Halid Mulaosmanovic,
Stefan Duenkel,
Dominik Kleimaier,
Steven Soss,
Sven Beyer,
Rajiv Joshi,
Scott Meninger,
Mohamed Mohamed,
Kijoon Kim,
Jongho Woo,
Suhwan Lim,
Kwangsoo Kim,
Wanki Kim,
Daewon Ha,
Vijaykrishnan Narayanan,
Suman Datta,
Shimeng Yu,
Kai Ni
Abstract:
In this work, we propose a dual-port cell design to address the pass disturb in vertical NAND storage, which can pass signals through a dedicated and string-compatible pass gate. We demonstrate that: i) the pass disturb-free feature originates from weakening of the depolarization field by the pass bias at the high-${V}_{TH}$ (HVT) state and the screening of the applied field by channel at the low-…
▽ More
In this work, we propose a dual-port cell design to address the pass disturb in vertical NAND storage, which can pass signals through a dedicated and string-compatible pass gate. We demonstrate that: i) the pass disturb-free feature originates from weakening of the depolarization field by the pass bias at the high-${V}_{TH}$ (HVT) state and the screening of the applied field by channel at the low-${V}_{TH}$ (LVT) state; ii) combined simulations and experimental demonstrations of dual-port design verify the disturb-free operation in a NAND string, overcoming a key challenge in single-port designs; iii) the proposed design can be incorporated in a highly scaled vertical NAND FeFET string and the pass gate can be incorporated into the existing 3D NAND with the negligible overhead of the pass gate interconnection through a global bottom pass gate contact in the substrate.
△ Less
Submitted 7 March, 2024;
originally announced March 2024.
-
On the origin of topotactic reduction effect for superconductivity in infinite-layer nickelates
Authors:
Shengwei Zeng,
Chi Sin Tang,
Zhaoyang Luo,
Lin Er Chow,
Zhi Shiuh Lim,
Saurav Prakash,
Ping Yang,
Caozheng Diao,
Xiaojiang Yu,
Zhenxiang Xing,
Rong Ji,
Xinmao Yin,
Changjian Li,
X. Renshaw Wang,
Qian He,
Mark B. H. Breese,
A. Ariando,
Huajun Liu
Abstract:
Topotactic reduction utilizing metal hydrides as reagents emerges as an effective approach to achieve exceptionally low oxidization states of metal ions and unconventional coordination networks. This method opens avenues to the development of entirely new functional materials, with one notable example being the infinite-layer nickelate superconductors. However, the reduction effect on the atomic r…
▽ More
Topotactic reduction utilizing metal hydrides as reagents emerges as an effective approach to achieve exceptionally low oxidization states of metal ions and unconventional coordination networks. This method opens avenues to the development of entirely new functional materials, with one notable example being the infinite-layer nickelate superconductors. However, the reduction effect on the atomic reconstruction and electronic structures -- crucial for superconductivity -- remains largely unresolved. We design two sets of control Nd$_{0.8}$Sr$_{0.2}$NiO$_2$ thin films and implement secondary ion mass spectroscopy to highlight the absence of reduction-induced hydrogen intercalation. X-ray absorption spectroscopy shows a significant linear dichroism with dominant Ni 3d$_{x2{-}y2}$ orbitals on superconducting samples, indicating a Ni single-band nature of infinite-layer nickelates. Consistent with the superconducting $T_c$, the Ni 3d orbitals asymmetry manifests a dome-like reduction duration dependence. Our results unveil the critical role of reduction in modulating the Ni-3d orbital polarization and its impact on the superconducting properties.
△ Less
Submitted 1 March, 2024;
originally announced March 2024.
-
UniMODE: Unified Monocular 3D Object Detection
Authors:
Zhuoling Li,
Xiaogang Xu,
SerNam Lim,
Hengshuang Zhao
Abstract:
Realizing unified monocular 3D object detection, including both indoor and outdoor scenes, holds great importance in applications like robot navigation. However, involving various scenarios of data to train models poses challenges due to their significantly different characteristics, e.g., diverse geometry properties and heterogeneous domain distributions. To address these challenges, we build a d…
▽ More
Realizing unified monocular 3D object detection, including both indoor and outdoor scenes, holds great importance in applications like robot navigation. However, involving various scenarios of data to train models poses challenges due to their significantly different characteristics, e.g., diverse geometry properties and heterogeneous domain distributions. To address these challenges, we build a detector based on the bird's-eye-view (BEV) detection paradigm, where the explicit feature projection is beneficial to addressing the geometry learning ambiguity when employing multiple scenarios of data to train detectors. Then, we split the classical BEV detection architecture into two stages and propose an uneven BEV grid design to handle the convergence instability caused by the aforementioned challenges. Moreover, we develop a sparse BEV feature projection strategy to reduce computational cost and a unified domain alignment method to handle heterogeneous domains. Combining these techniques, a unified detector UniMODE is derived, which surpasses the previous state-of-the-art on the challenging Omni3D dataset (a large-scale dataset including both indoor and outdoor scenes) by 4.9% AP_3D, revealing the first successful generalization of a BEV detector to unified 3D object detection.
△ Less
Submitted 9 May, 2024; v1 submitted 28 February, 2024;
originally announced February 2024.
-
The FLAMINGO simulation view of cluster progenitors observed in the epoch of reionization with JWST
Authors:
Seunghwan Lim,
Sandro Tacchella,
Joop Schaye,
Matthieu Schaller,
Jakob M. Helton,
Roi Kugel,
Roberto Maiolino
Abstract:
Motivated by the recent JWST discovery of galaxy overdensities during the Epoch of Reionzation, we examine the physical properties of high-$z$ protoclusters and their evolution using the FLAMINGO simulation suite. We investigate the impact of the apertures used to define protoclusters, because the heterogeneous apertures used in the literature have limited our understanding of the population. Our…
▽ More
Motivated by the recent JWST discovery of galaxy overdensities during the Epoch of Reionzation, we examine the physical properties of high-$z$ protoclusters and their evolution using the FLAMINGO simulation suite. We investigate the impact of the apertures used to define protoclusters, because the heterogeneous apertures used in the literature have limited our understanding of the population. Our results are insensitive to the uncertainties of the subgrid models at a given resolution, whereas further investigation into the dependence on numerical resolution is needed. When considering galaxies more massive than $M_\ast\,{\simeq}\,10^8\,{\rm M_\odot}$, the FLAMINGO simulations predict a dominant contribution from progenitors similar to those of the Coma cluster to the cosmic star-formation rate density during the reionization epoch. Our results indicate the onset of suppression of star formation in the protocluster environments as early as $z\,{\simeq}\,5$. The galaxy number density profiles are similar to NFW at $z\,{\lesssim}\,1$ while showing a steeper slope at earlier times before the formation of the core. Different from most previous simulations, the predicted star-formation history for individual protoclusters is in good agreement with observations. We demonstrate that, depending on the aperture, the integrated physical properties including the total (dark matter and baryonic) mass can be biased by a factor of 2 to 5 at $z\,{=}\,5.5$--$7$, and by an order of magnitude at $z\,{\lesssim}\,4$. This correction suffices to remove the ${\simeq}\,3\,σ$ tensions with the number density of structures found in recent JWST observations.
△ Less
Submitted 27 February, 2024;
originally announced February 2024.
-
Analysis of Multi-Source Language Training in Cross-Lingual Transfer
Authors:
Seong Hoon Lim,
Taejun Yun,
Jinhyeon Kim,
Jihun Choi,
Taeuk Kim
Abstract:
The successful adaptation of multilingual language models (LMs) to a specific language-task pair critically depends on the availability of data tailored for that condition. While cross-lingual transfer (XLT) methods have contributed to addressing this data scarcity problem, there still exists ongoing debate about the mechanisms behind their effectiveness. In this work, we focus on one of promising…
▽ More
The successful adaptation of multilingual language models (LMs) to a specific language-task pair critically depends on the availability of data tailored for that condition. While cross-lingual transfer (XLT) methods have contributed to addressing this data scarcity problem, there still exists ongoing debate about the mechanisms behind their effectiveness. In this work, we focus on one of promising assumptions about inner workings of XLT, that it encourages multilingual LMs to place greater emphasis on language-agnostic or task-specific features. We test this hypothesis by examining how the patterns of XLT change with a varying number of source languages involved in the process. Our experimental findings show that the use of multiple source languages in XLT-a technique we term Multi-Source Language Training (MSLT)-leads to increased mingling of embedding spaces for different languages, supporting the claim that XLT benefits from making use of language-independent information. On the other hand, we discover that using an arbitrary combination of source languages does not always guarantee better performance. We suggest simple heuristics for identifying effective language combinations for MSLT and empirically prove its effectiveness.
△ Less
Submitted 4 June, 2024; v1 submitted 21 February, 2024;
originally announced February 2024.
-
Design and characterization of individual addressing optics based on multi-channel acousto-optic modulator for $^{171}$Yb$^+$ qubits
Authors:
Sungjoo Lim,
Seunghyun Baek,
Jacob Whitlow,
Marissa D'Onofrio,
Tianyi Chen,
Samuel Phiri,
Stephen Crain,
Kenneth R. Brown,
Jungsang Kim,
Junki Kim
Abstract:
We present the design and characterization of individual addressing optics based on a multi-channel acousto-optic modulator (AOM) for trapped ytterbium-171 ions. The design parameters of the individual addressing system were determined based on the tradeoff between the expected crosstalk and the required numerical aperture of the projection objective lens. The target beam diameter and separation w…
▽ More
We present the design and characterization of individual addressing optics based on a multi-channel acousto-optic modulator (AOM) for trapped ytterbium-171 ions. The design parameters of the individual addressing system were determined based on the tradeoff between the expected crosstalk and the required numerical aperture of the projection objective lens. The target beam diameter and separation were 1.90 $μ$m and 4.28 $μ$m, respectively. The individual beams shaped by the projection optics were characterized by an imaging sensor and a field probe ion. The resulting effective beam diameters and separations were approximately 2.34--2.36 $μ$m and 4.31 $μ$m, respectively, owing to residual aberration.
△ Less
Submitted 30 March, 2024; v1 submitted 21 February, 2024;
originally announced February 2024.
-
Constraining the stellar populations of ultra-diffuse galaxies in the MATLAS survey using spectral energy distribution fitting
Authors:
Maria Luisa Buzzo,
Duncan A. Forbes,
Thomas H. Jarrett,
Francine R. Marleau,
Pierre-Alain Duc,
Jean P. Brodie,
Aaron J. Romanowsky,
Jonah S. Gannon,
Steven R. Janssens,
Joel Pfeffer,
Anna Ferré-Mateu,
Lydia Haacke,
Warrick J. Couch,
Sungsoon Lim,
Rubén Sánchez-Janssen
Abstract:
We use spectral energy distribution (SED) fitting to place constraints on the stellar populations of 59 ultra-diffuse galaxies (UDGs) in the low-to-moderate density fields of the MATLAS survey. We use the routine PROSPECTOR, coupled with archival data in the optical from DECaLS, and near- and mid-infrared imaging from WISE, to recover the stellar masses, ages, metallicities and star formation time…
▽ More
We use spectral energy distribution (SED) fitting to place constraints on the stellar populations of 59 ultra-diffuse galaxies (UDGs) in the low-to-moderate density fields of the MATLAS survey. We use the routine PROSPECTOR, coupled with archival data in the optical from DECaLS, and near- and mid-infrared imaging from WISE, to recover the stellar masses, ages, metallicities and star formation timescales of the UDGs. We find that a subsample of the UDGs lies within the scatter of the mass-metallicity relation (MZR) for local classical dwarfs. However, another subsample is more metal-poor, being consistent with the evolving MZR at high-redshift. We investigate UDG positioning trends in the mass-metallicity plane as a function of surface brightness, effective radius, axis ratio, local volume density, mass-weighted age, star formation timescale, globular cluster (GC) counts and GC specific frequency. We find that our sample of UDGs can be separated into two main classes. Class A: Comprised of UDGs with lower stellar masses, prolonged star formation histories (SFHs), more elongated, inhabiting less dense environments, hosting fewer GCs, younger, consistent with the classical dwarf MZR, and fainter. Class B: UDGs with higher stellar masses, rapid SFHs, rounder, inhabiting the densest of our probed environments, hosting on average the most numerous GC systems, older, consistent with the high-redshift MZR (i.e., consistent with early-quenching), and brighter. The combination of these properties suggests that UDGs of Class A are consistent with a `puffed-up dwarf' formation scenario, while UDGs of Class B seem to be better explained by `failed galaxy' scenarios.
△ Less
Submitted 19 February, 2024;
originally announced February 2024.