Search | arXiv e-print repository

arXiv:2407.08586 [pdf, other]

Centrality dependence of Lévy-stable two-pion Bose-Einstein correlations in $\sqrt{s_{_{NN}}}=200$ GeV Au$+$Au collisions

Authors: PHENIX Collaboration, N. J. Abdulameer, U. Acharya, A. Adare, C. Aidala, N. N. Ajitanand, Y. Akiba, R. Akimoto, H. Al-Ta'ani, J. Alexander, A. Angerami, K. Aoki, N. Apadula, Y. Aramaki, H. Asano, E. C. Aschenauer, E. T. Atomssa, T. C. Awes, B. Azmoun, V. Babintsev, M. Bai, B. Bannier, K. N. Barish, B. Bassalleck, S. Bathe , et al. (377 additional authors not shown)

Abstract: The PHENIX experiment measured the centrality dependence of two-pion Bose-Einstein correlation functions in $\sqrt{s_{_{NN}}}=200$~GeV Au$+$Au collisions at the Relativistic Heavy Ion Collider at Brookhaven National Laboratory. The data are well represented by Lévy-stable source distributions. The extracted source parameters are the correlation-strength parameter $λらむだ$, the Lévy index of stability… ▽ More The PHENIX experiment measured the centrality dependence of two-pion Bose-Einstein correlation functions in $\sqrt{s_{_{NN}}}=200$~GeV Au$+$Au collisions at the Relativistic Heavy Ion Collider at Brookhaven National Laboratory. The data are well represented by Lévy-stable source distributions. The extracted source parameters are the correlation-strength parameter $λらむだ$, the Lévy index of stability $αあるふぁ$, and the Lévy-scale parameter $R$ as a function of transverse mass $m_T$ and centrality. The $λらむだ(m_T)$ parameter is constant at larger values of $m_T$, but decreases as $m_T$ decreases. The Lévy scale parameter $R(m_T)$ decreases with $m_T$ and exhibits proportionality to the length scale of the nuclear overlap region. The Lévy exponent $αあるふぁ(m_T)$ is independent of $m_T$ within uncertainties in each investigated centrality bin, but shows a clear centrality dependence. At all centralities, the Lévy exponent $αあるふぁ$ is significantly different from that of Gaussian ($αあるふぁ=2$) or Cauchy ($αあるふぁ=1$) source distributions. Comparisons to the predictions of Monte-Carlo simulations of resonance-decay chains show that in all but the most peripheral centrality class (50%-60%), the obtained results are inconsistent with the measurements, unless a significant reduction of the in-medium mass of the $ηいーた'$ meson is included. In each centrality class, the best value of the in-medium $ηいーた'$ mass is compared to the mass of the $ηいーた$ meson, as well as to several theoretical predictions that consider restoration of $U_A(1)$ symmetry in hot hadronic matter. △ Less

Submitted 11 July, 2024; originally announced July 2024.

Comments: 401 authors from 75 institutions, 20 pages, 15 figures, 2 tables. v1 is version submitted to Physical Review C. HEPdata tables for the points plotted in figures for this and previous PHENIX publications are (or will be) publicly available at http://www.phenix.bnl.gov/papers.html

arXiv:2407.05618 [pdf, other]

Improved limit on neutrinoless double beta decay of \mohundred~from AMoRE-I

Authors: A. Agrawal, V. V. Alenkov, P. Aryal, J. Beyer, B. Bhandari, R. S. Boiko, K. Boonin, O. Buzanov, C. R. Byeon, N. Chanthima, M. K. Cheoun, J. S. Choe, Seonho Choi, S. Choudhury, J. S. Chung, F. A. Danevich, M. Djamal, D. Drung, C. Enss, A. Fleischmann, A. M. Gangapshev, L. Gastaldo, Y. M. Gavrilyuk, A. M. Gezhaev, O. Gileva , et al. (83 additional authors not shown)

Abstract: AMoRE searches for the signature of neutrinoless double beta decay of $^{100}$Mo with a 100 kg sample of enriched $^{100}$Mo. Scintillating molybdate crystals coupled with a metallic magnetic calorimeter operate at milli-Kelvin temperatures to measure the energy of electrons emitted in the decay. As a demonstration of the full-scale AMoRE, we conducted AMoRE-I, a pre-experiment with 18 molybdate c… ▽ More AMoRE searches for the signature of neutrinoless double beta decay of $^{100}$Mo with a 100 kg sample of enriched $^{100}$Mo. Scintillating molybdate crystals coupled with a metallic magnetic calorimeter operate at milli-Kelvin temperatures to measure the energy of electrons emitted in the decay. As a demonstration of the full-scale AMoRE, we conducted AMoRE-I, a pre-experiment with 18 molybdate crystals, at the Yangyang Underground Laboratory for over two years. The exposure was 8.02 kg$\cdot$year (or 3.89 kg$_{\mathrm{^{100}Mo}}\cdot$year) and the total background rate near the Q-value was 0.025 $\pm$ 0.002 counts/keV/kg/year. We observed no indication of $0νββ$ decay and report a new lower limit of the half-life of $^{100}$Mo $0νββ$ decay as $ T^{0νにゅー}_{1/2}>3.0\times10^{24}~\mathrm{years}$ at 90\% confidence level. The effective Majorana mass limit range is $m_{βべーたβべーた}<$(210--610) meV using nuclear matrix elements estimated in the framework of different models, including the recent shell model calculations. △ Less

Submitted 8 July, 2024; originally announced July 2024.

Comments: 7 pages, 4 figures

arXiv:2407.04005 [pdf, ps, other]

Stochastic Processes: From Classical to Quantum

Authors: Soon Hoe Lim

Abstract: The main goal of these notes is to give an introduction to the mathematics of quantum noise and some of its applications in non-equilibrium statistical mechanics. We start with some reminders from the theory of classical stochastic processes. We then provide a brief overview of quantum mechanics and quantum field theory, from the viewpoint of quantum probability and adopting the language of Hudson… ▽ More The main goal of these notes is to give an introduction to the mathematics of quantum noise and some of its applications in non-equilibrium statistical mechanics. We start with some reminders from the theory of classical stochastic processes. We then provide a brief overview of quantum mechanics and quantum field theory, from the viewpoint of quantum probability and adopting the language of Hudson and Parthasarathy. We introduce quantum stochastic processes on a boson Fock space and their calculus. Whenever possible, we make connections with the relevant concepts in classical probability theory. As an application of the theory, we introduce the theory of open quantum systems, with emphasis on the physics and modeling aspects of these systems. △ Less

Submitted 4 July, 2024; originally announced July 2024.

Comments: 52 pages

arXiv:2407.01270 [pdf]

The African Woman is Rhythmic and Soulful: Evaluation of Open-ended Generation for Implicit Biases

Authors: Serene Lim

Abstract: This study investigates the subtle and often concealed biases present in Large Language Models (LLMs), which, despite passing explicit bias tests, can still exhibit implicit biases akin to those observed in humans who profess egalitarian beliefs yet demonstrate underlying prejudices. The challenge of measuring such biases is exacerbated as LLMs become increasingly proprietary, restricting access t… ▽ More This study investigates the subtle and often concealed biases present in Large Language Models (LLMs), which, despite passing explicit bias tests, can still exhibit implicit biases akin to those observed in humans who profess egalitarian beliefs yet demonstrate underlying prejudices. The challenge of measuring such biases is exacerbated as LLMs become increasingly proprietary, restricting access to their internal mechanisms such as embeddings, which are crucial for applying traditional bias measures. To tackle these issues, this study introduces innovative measures of bias inspired by psychological methodologies: the LLM Implicit Association Test (IAT) Bias and the LLM Decision Bias. The LLM IAT Bias is a prompt-based method designed to unearth implicit biases by simulating the well-known psychological IAT but adapted for use with LLMs. The LLM Decision Bias measure is developed to detect subtle discrimination in decision-making tasks, focusing on how LLMs choose between individuals in various scenarios. Open-ended generation is also utilised through thematic analysis of word generations and storytelling. The experiments revealed biases across gender and racial domains, from discriminatory categorisations to exoticisation. Our findings indicate that the prompt-based measure of implicit bias not only correlates with traditional embedding-based methods but also more effectively predicts downstream behaviors, which are crucially measured by the LLM Decision Bias. This relationship underscores the importance of relative, rather than absolute, evaluations in assessing implicit biases, reflecting psychological insights into human bias assessment. This research contributes to the broader understanding of AI ethics and provides suggestions for continually assessing and mitigating biases in advanced AI systems, emphasising the need for more qualitative and downstream focus. △ Less

Submitted 1 July, 2024; originally announced July 2024.

arXiv:2406.15664 [pdf, other]

Flat Posterior Does Matter For Bayesian Transfer Learning

Authors: Sungjun Lim, Jeyoon Yeom, Sooyon Kim, Hoyoon Byun, Jinho Kang, Yohan Jung, Jiyoung Jung, Kyungwoo Song

Abstract: The large-scale pre-trained neural network has achieved notable success in enhancing performance for downstream tasks. Another promising approach for generalization is Bayesian Neural Network (BNN), which integrates Bayesian methods into neural network architectures, offering advantages such as Bayesian Model averaging (BMA) and uncertainty quantification. Despite these benefits, transfer learning… ▽ More The large-scale pre-trained neural network has achieved notable success in enhancing performance for downstream tasks. Another promising approach for generalization is Bayesian Neural Network (BNN), which integrates Bayesian methods into neural network architectures, offering advantages such as Bayesian Model averaging (BMA) and uncertainty quantification. Despite these benefits, transfer learning for BNNs has not been widely investigated and shows limited improvement. We hypothesize that this issue arises from the inability to find flat minima, which is crucial for generalization performance. To address this, we evaluate the sharpness of BNNs in various settings, revealing their insufficiency in seeking flat minima and the influence of flatness on BMA performance. Therefore, we propose Sharpness-aware Bayesian Model Averaging (SA-BMA), a Bayesian-fitting flat posterior seeking optimizer integrated with Bayesian transfer learning. SA-BMA calculates the divergence between posteriors in the parameter space, aligning with the nature of BNNs, and serves as a generalized version of existing sharpness-aware optimizers. We validate that SA-BMA improves generalization performance in few-shot classification and distribution shift scenarios by ensuring flatness. △ Less

Submitted 21 June, 2024; originally announced June 2024.

arXiv:2406.14924 [pdf, other]

DiPEx: Dispersing Prompt Expansion for Class-Agnostic Object Detection

Authors: Jia Syuen Lim, Zhuoxiao Chen, Mahsa Baktashmotlagh, Zhi Chen, Xin Yu, Zi Huang, Yadan Luo

Abstract: Class-agnostic object detection (OD) can be a cornerstone or a bottleneck for many downstream vision tasks. Despite considerable advancements in bottom-up and multi-object discovery methods that leverage basic visual cues to identify salient objects, consistently achieving a high recall rate remains difficult due to the diversity of object types and their contextual complexity. In this work, we in… ▽ More Class-agnostic object detection (OD) can be a cornerstone or a bottleneck for many downstream vision tasks. Despite considerable advancements in bottom-up and multi-object discovery methods that leverage basic visual cues to identify salient objects, consistently achieving a high recall rate remains difficult due to the diversity of object types and their contextual complexity. In this work, we investigate using vision-language models (VLMs) to enhance object detection via a self-supervised prompt learning strategy. Our initial findings indicate that manually crafted text queries often result in undetected objects, primarily because detection confidence diminishes when the query words exhibit semantic overlap. To address this, we propose a Dispersing Prompt Expansion (DiPEx) approach. DiPEx progressively learns to expand a set of distinct, non-overlapping hyperspherical prompts to enhance recall rates, thereby improving performance in downstream tasks such as out-of-distribution OD. Specifically, DiPEx initiates the process by self-training generic parent prompts and selecting the one with the highest semantic uncertainty for further expansion. The resulting child prompts are expected to inherit semantics from their parent prompts while capturing more fine-grained semantics. We apply dispersion losses to ensure high inter-class discrepancy among child prompts while preserving semantic consistency between parent-child prompt pairs. To prevent excessive growth of the prompt sets, we utilize the maximum angular coverage (MAC) of the semantic space as a criterion for early termination. We demonstrate the effectiveness of DiPEx through extensive class-agnostic OD and OOD-OD experiments on MS-COCO and LVIS, surpassing other prompting methods by up to 20.1% in AR and achieving a 21.3% AP improvement over SAM. The code is available at https://github.com/jason-lim26/DiPEx. △ Less

Submitted 21 June, 2024; originally announced June 2024.

Comments: 19 pages

arXiv:2406.14703 [pdf, other]

Do LLMs Have Distinct and Consistent Personality? TRAIT: Personality Testset designed for LLMs with Psychometrics

Authors: Seungbeen Lee, Seungwon Lim, Seungju Han, Giyeong Oh, Hyungjoo Chae, Jiwan Chung, Minju Kim, Beong-woo Kwak, Yeonsoo Lee, Dongha Lee, Jinyoung Yeo, Youngjae Yu

Abstract: The idea of personality in descriptive psychology, traditionally defined through observable behavior, has now been extended to Large Language Models (LLMs) to better understand their behavior. This raises a question: do LLMs exhibit distinct and consistent personality traits, similar to humans? Existing self-assessment personality tests, while applicable, lack the necessary validity and reliabilit… ▽ More The idea of personality in descriptive psychology, traditionally defined through observable behavior, has now been extended to Large Language Models (LLMs) to better understand their behavior. This raises a question: do LLMs exhibit distinct and consistent personality traits, similar to humans? Existing self-assessment personality tests, while applicable, lack the necessary validity and reliability for precise personality measurements. To address this, we introduce TRAIT, a new tool consisting of 8K multi-choice questions designed to assess the personality of LLMs with validity and reliability. TRAIT is built on the psychometrically validated human questionnaire, Big Five Inventory (BFI) and Short Dark Triad (SD-3), enhanced with the ATOMIC10X knowledge graph for testing personality in a variety of real scenarios. TRAIT overcomes the reliability and validity issues when measuring personality of LLM with self-assessment, showing the highest scores across three metrics: refusal rate, prompt sensitivity, and option order sensitivity. It reveals notable insights into personality of LLM: 1) LLMs exhibit distinct and consistent personality, which is highly influenced by their training data (i.e., data used for alignment tuning), and 2) current prompting techniques have limited effectiveness in eliciting certain traits, such as high psychopathy or low conscientiousness, suggesting the need for further research in this direction. △ Less

Submitted 20 June, 2024; originally announced June 2024.

Comments: Preprint; Under review

arXiv:2406.11820 [pdf, other]

Composing Object Relations and Attributes for Image-Text Matching

Authors: Khoi Pham, Chuong Huynh, Ser-Nam Lim, Abhinav Shrivastava

Abstract: We study the visual semantic embedding problem for image-text matching. Most existing work utilizes a tailored cross-attention mechanism to perform local alignment across the two image and text modalities. This is computationally expensive, even though it is more powerful than the unimodal dual-encoder approach. This work introduces a dual-encoder image-text matching model, leveraging a scene grap… ▽ More We study the visual semantic embedding problem for image-text matching. Most existing work utilizes a tailored cross-attention mechanism to perform local alignment across the two image and text modalities. This is computationally expensive, even though it is more powerful than the unimodal dual-encoder approach. This work introduces a dual-encoder image-text matching model, leveraging a scene graph to represent captions with nodes for objects and attributes interconnected by relational edges. Utilizing a graph attention network, our model efficiently encodes object-attribute and object-object semantic relations, resulting in a robust and fast-performing system. Representing caption as a scene graph offers the ability to utilize the strong relational inductive bias of graph neural networks to learn object-attribute and object-object relations effectively. To train the model, we propose losses that align the image and caption both at the holistic level (image-caption) and the local level (image-object entity), which we show is key to the success of the model. Our model is termed Composition model for Object Relations and Attributes, CORA. Experimental results on two prominent image-text retrieval benchmarks, Flickr30K and MSCOCO, demonstrate that CORA outperforms existing state-of-the-art computationally expensive cross-attention methods regarding recall score while achieving fast computation speed of the dual encoder. △ Less

Submitted 17 June, 2024; originally announced June 2024.

Comments: Accepted to CVPR'24

arXiv:2406.09698 [pdf, other]

Projected background and sensitivity of AMoRE-II

Authors: A. Agrawal, V. V. Alenkov, P. Aryal, J. Beyer, B. Bhandari, R. S. Boiko, K. Boonin, O. Buzanov, C. R. Byeon, N. Chanthima, M. K. Cheoun, J. S. Choe, Seonho Choi, S. Choudhury, J. S. Chung, F. A. Danevich, M. Djamal, D. Drung, C. Enss, A. Fleischmann, A. M. Gangapshev, L. Gastaldo, Y. M. Gavrilyuk, A. M. Gezhaev, O. Gileva , et al. (81 additional authors not shown)

Abstract: AMoRE-II aims to search for neutrinoless double beta decay with an array of 423 Li$_2$$^{100}$MoO$_4$ crystals operating in the cryogenic system as the main phase of the Advanced Molybdenum-based Rare process Experiment (AMoRE). AMoRE has been planned to operate in three phases: AMoRE-pilot, AMoRE-I, and AMoRE-II. AMoRE-II is currently being installed at the Yemi Underground Laboratory, located ap… ▽ More AMoRE-II aims to search for neutrinoless double beta decay with an array of 423 Li$_2$$^{100}$MoO$_4$ crystals operating in the cryogenic system as the main phase of the Advanced Molybdenum-based Rare process Experiment (AMoRE). AMoRE has been planned to operate in three phases: AMoRE-pilot, AMoRE-I, and AMoRE-II. AMoRE-II is currently being installed at the Yemi Underground Laboratory, located approximately 1000 meters deep in Jeongseon, Korea. The goal of AMoRE-II is to reach up to $T^{0νにゅーβべーたβべーた}_{1/2}$ $\sim$ 6 $\times$ 10$^{26}$ years, corresponding to an effective Majorana mass of 15 - 29 meV, covering all the inverted mass hierarchy regions. To achieve this, the background level of the experimental configurations and possible background sources of gamma and beta events should be well understood. We have intensively performed Monte Carlo simulations using the GEANT4 toolkit in all the experimental configurations with potential sources. We report the estimated background level that meets the 10$^{-4}$counts/(keV$\cdot$kg$\cdot$yr) requirement for AMoRE-II in the region of interest (ROI) and show the projected half-life sensitivity based on the simulation study. △ Less

Submitted 13 June, 2024; originally announced June 2024.

arXiv:2406.08301 [pdf, other]

Jet modification via $πぱい^0$-hadron correlations in Au$+$Au collisions at $\sqrt{s_{_{NN}}}=200$ GeV

Authors: PHENIX Collaboration, N. J. Abdulameer, U. Acharya, A. Adare, S. Afanasiev, C. Aidala, N. N. Ajitanand, Y. Akiba, H. Al-Bataineh, J. Alexander, M. Alfred, K. Aoki, N. Apadula, L. Aphecetche, J. Asai, H. Asano, E. T. Atomssa, R. Averbeck, T. C. Awes, B. Azmoun, V. Babintsev, M. Bai, G. Baksay, L. Baksay, A. Baldisseri , et al. (510 additional authors not shown)

Abstract: High-momentum two-particle correlations are a useful tool for studying jet-quenching effects in the quark-gluon plasma. Angular correlations between neutral-pion triggers and charged hadrons with transverse momenta in the range 4--12~GeV/$c$ and 0.5--7~GeV/$c$, respectively, have been measured by the PHENIX experiment in 2014 for Au$+$Au collisions at $\sqrt{s_{_{NN}}}=200$~GeV. Suppression is obs… ▽ More High-momentum two-particle correlations are a useful tool for studying jet-quenching effects in the quark-gluon plasma. Angular correlations between neutral-pion triggers and charged hadrons with transverse momenta in the range 4--12~GeV/$c$ and 0.5--7~GeV/$c$, respectively, have been measured by the PHENIX experiment in 2014 for Au$+$Au collisions at $\sqrt{s_{_{NN}}}=200$~GeV. Suppression is observed in the yield of high-momentum jet fragments opposite the trigger particle, which indicates jet suppression stemming from in-medium partonic energy loss, while enhancement is observed for low-momentum particles. The ratio and differences between the yield in Au$+$Au collisions and $p$$+$$p$ collisions, $I_{AA}$ and $Δでるた_{AA}$, as a function of the trigger-hadron azimuthal separation, $Δでるたφふぁい$, are measured for the first time at the Relativistic Heavy Ion Collider. These results better quantify how the yield of low-$p_T$ associated hadrons is enhanced at wide angle, which is crucial for studying energy loss as well as medium-response effects. △ Less

Submitted 12 June, 2024; originally announced June 2024.

Comments: 534 authors from 83 institutions, 12 pages, 7 figures. v1 is version submitted to Physical Review C. HEPdata tables for the points plotted in figures for this and previous PHENIX publications are (or will be) publicly available at http://www.phenix.bnl.gov/papers.html

arXiv:2406.07796 [pdf]

Battling Botpoop using GenAI for Higher Education: A Study of a Retrieval Augmented Generation Chatbots Impact on Learning

Authors: Maung Thway, Jose Recatala-Gomez, Fun Siong Lim, Kedar Hippalgaonkar, Leonard W. T. Ng

Abstract: Generative artificial intelligence (GenAI) and large language models (LLMs) have simultaneously opened new avenues for enhancing human learning and increased the prevalence of poor-quality information in student response - termed Botpoop. This study introduces Professor Leodar, a custom-built, Singlish-speaking Retrieval Augmented Generation (RAG) chatbot designed to enhance educational while redu… ▽ More Generative artificial intelligence (GenAI) and large language models (LLMs) have simultaneously opened new avenues for enhancing human learning and increased the prevalence of poor-quality information in student response - termed Botpoop. This study introduces Professor Leodar, a custom-built, Singlish-speaking Retrieval Augmented Generation (RAG) chatbot designed to enhance educational while reducing Botpoop. Deployed at Nanyang Technological University, Singapore, Professor Leodar offers a glimpse into the future of AI-assisted learning, offering personalized guidance, 24/7 availability, and contextually relevant information. Through a mixed-methods approach, we examine the impact of Professor Leodar on learning, engagement, and exam preparedness, with 97.1% of participants reporting positive experiences. These findings help define possible roles of AI in education and highlight the potential of custom GenAI chatbots. Our combination of chatbot development, in-class deployment and outcomes study offers a benchmark for GenAI educational tools and is a stepping stone for redefining the interplay between AI and human learning. △ Less

Submitted 21 June, 2024; v1 submitted 11 June, 2024; originally announced June 2024.

Comments: 13 pages, 5 figures, SI with Annexes A, B and C upon request

arXiv:2406.06908 [pdf, other]

UVIS: Unsupervised Video Instance Segmentation

Authors: Shuaiyi Huang, Saksham Suri, Kamal Gupta, Sai Saketh Rambhatla, Ser-nam Lim, Abhinav Shrivastava

Abstract: Video instance segmentation requires classifying, segmenting, and tracking every object across video frames. Unlike existing approaches that rely on masks, boxes, or category labels, we propose UVIS, a novel Unsupervised Video Instance Segmentation (UVIS) framework that can perform video instance segmentation without any video annotations or dense label-based pretraining. Our key insight comes fro… ▽ More Video instance segmentation requires classifying, segmenting, and tracking every object across video frames. Unlike existing approaches that rely on masks, boxes, or category labels, we propose UVIS, a novel Unsupervised Video Instance Segmentation (UVIS) framework that can perform video instance segmentation without any video annotations or dense label-based pretraining. Our key insight comes from leveraging the dense shape prior from the self-supervised vision foundation model DINO and the openset recognition ability from the image-caption supervised vision-language model CLIP. Our UVIS framework consists of three essential steps: frame-level pseudo-label generation, transformer-based VIS model training, and query-based tracking. To improve the quality of VIS predictions in the unsupervised setup, we introduce a dual-memory design. This design includes a semantic memory bank for generating accurate pseudo-labels and a tracking memory bank for maintaining temporal consistency in object tracks. We evaluate our approach on three standard VIS benchmarks, namely YoutubeVIS-2019, YoutubeVIS-2021, and Occluded VIS. Our UVIS achieves 21.1 AP on YoutubeVIS-2019 without any video annotations or dense pretraining, demonstrating the potential of our unsupervised VIS framework. △ Less

Submitted 10 June, 2024; originally announced June 2024.

Comments: CVPR2024 Workshop

arXiv:2406.05486 [pdf, other]

Artificial social influence via human-embodied AI agent interaction in immersive virtual reality (VR): Effects of similarity-matching during health conversations

Authors: Sue Lim, Ralf Schmälzle, Gary Bente

Abstract: Interactions with artificial intelligence (AI) based agents can positively influence human behavior and judgment. However, studies to date focus on text-based conversational agents (CA) with limited embodiment, restricting our understanding of how social influence principles, such as similarity, apply to AI agents (i.e., artificial social influence). We address this gap by leveraging the latest ad… ▽ More Interactions with artificial intelligence (AI) based agents can positively influence human behavior and judgment. However, studies to date focus on text-based conversational agents (CA) with limited embodiment, restricting our understanding of how social influence principles, such as similarity, apply to AI agents (i.e., artificial social influence). We address this gap by leveraging the latest advances in AI (language models) and combining them with immersive virtual reality (VR). Specifically, we built VR-ECAs, or embodied conversational agents that can naturally converse with humans about health-related topics in a virtual environment. Then we manipulated human-agent similarity via gender matching and examined its effects on biobehavioral (i.e., gaze), social (e.g., agent likeability), and behavioral outcomes (i.e., healthy snack selection). We found that discussing health with opposite-gender agents enhanced gaze duration and the likelihood of healthy snack selection. In addition, female participants liked the VR-ECAs more than their male counterparts, regardless of the gender of the VR-ECAs. Finally, participants experienced greater presence while conversing with VR-embodied agents than chatting with text-only agents. Overall, our findings highlight embodiment as a crucial factor in how AI influences human behavior, and our paradigm enables new experimental research at the intersection of social influence, human-AI communication, and immersive virtual reality (VR). △ Less

Submitted 8 June, 2024; originally announced June 2024.

Comments: 11 pages, 4 figures, manuscript submitted to a journal

arXiv:2406.00784 [pdf, other]

Multidimensional optical singularities and their applications

Authors: Soon Wei Daniel Lim, Christina M. Spaegele, Federico Capasso

Abstract: Optical singularities, which are positions within an electromagnetic field where certain field parameters become undefined, hold significant potential for applications in areas such as super-resolution microscopy, sensing, and communication. This potential stems from their high field confinement and characteristic rapidly-changing field distributions. Although the systematic characterization of th… ▽ More Optical singularities, which are positions within an electromagnetic field where certain field parameters become undefined, hold significant potential for applications in areas such as super-resolution microscopy, sensing, and communication. This potential stems from their high field confinement and characteristic rapidly-changing field distributions. Although the systematic characterization of the first singularities dates back many decades, recent advancements in sub-wavelength wavefront control at optical frequencies have led to a renewed interest in the field, and have substantially expanded the range of known optical singularities and singular structures. However, the diversity in descriptions, mathematical formulations, and naming conventions can create confusion and impede accessibility to the field. This review aims to clarify the nomenclature by demonstrating that any singular field can be conceptualized as a collection of a finite set of principal, 'generic' singularities. These singularities are robust against small perturbations due to their topological nature. We underscore that the control over the principal properties of those singularities, namely, their protection against perturbations and their dimension, utilizes a consistent mathematical framework. Additionally, we provide an overview of current design techniques for both stable and approximate singularities and discuss their applications across various disciplines. △ Less

Submitted 2 June, 2024; originally announced June 2024.

arXiv:2405.20827 [pdf, other]

Experimental demonstration of a fault-tolerant qubit encoded on a hyperfine-coupled qudit

Authors: Sumin Lim, Mikhail Vaganov, Junjie Liu, Arzhang Ardavan

Abstract: The realization of effective quantum error correction protocols remains a central challenge in the development of scalable quantum computers. Protocols employing redundancy over multiple physical qubits to encode a single error-protected logical qubit are theoretically effective, but imply a large resource overhead. Alternative, more hardware-efficient, approaches seek to deploy higher-dimensional… ▽ More The realization of effective quantum error correction protocols remains a central challenge in the development of scalable quantum computers. Protocols employing redundancy over multiple physical qubits to encode a single error-protected logical qubit are theoretically effective, but imply a large resource overhead. Alternative, more hardware-efficient, approaches seek to deploy higher-dimensional quantum systems known as qudits. Recently, proposals have emerged for exploiting high-spin magnetic nuclei coupled to condensed matter electron spin qubits to implement fault-tolerant memories. Here, we explore experimentally the simplest of these proposals, a logical qubit encoded on the four states of a I=3/2 nuclear spin hyperfine-coupled to a S=1/2 electron spin qubit; the encoding protects against the dominant decoherence mechanism in such systems, fluctuations of the quantizing magnetic field. We implement the encoding using electron-nuclear double resonance within a subspace of the spin levels in an ensemble of highly coherent manganese defects in zinc oxide. We explore the dynamics of the encoded state both under a controlled application of the fluctuation and under natural decoherence processes. Our results confirm the potential of these proposals for practical, implementable, fault tolerant quantum memories. △ Less

Submitted 31 May, 2024; originally announced May 2024.

arXiv:2405.20630 [pdf, other]

Stochastic Optimal Control for Diffusion Bridges in Function Spaces

Authors: Byoungwoo Park, Jungwon Choi, Sungbin Lim, Juho Lee

Abstract: Recent advancements in diffusion models and diffusion bridges primarily focus on finite-dimensional spaces, yet many real-world problems necessitate operations in infinite-dimensional function spaces for more natural and interpretable formulations. In this paper, we present a theory of stochastic optimal control (SOC) tailored to infinite-dimensional spaces, aiming to extend diffusion-based algori… ▽ More Recent advancements in diffusion models and diffusion bridges primarily focus on finite-dimensional spaces, yet many real-world problems necessitate operations in infinite-dimensional function spaces for more natural and interpretable formulations. In this paper, we present a theory of stochastic optimal control (SOC) tailored to infinite-dimensional spaces, aiming to extend diffusion-based algorithms to function spaces. Specifically, we demonstrate how Doob's $h$-transform, the fundamental tool for constructing diffusion bridges, can be derived from the SOC perspective and expanded to infinite dimensions. This expansion presents a challenge, as infinite-dimensional spaces typically lack closed-form densities. Leveraging our theory, we establish that solving the optimal control problem with a specific objective function choice is equivalent to learning diffusion-based generative models. We propose two applications: (1) learning bridges between two infinite-dimensional distributions and (2) generative models for sampling from an infinite-dimensional distribution. Our approach proves effective for diverse problems involving continuous function space representations, such as resolution-free images, time-series data, and probability density functions. △ Less

Submitted 2 June, 2024; v1 submitted 31 May, 2024; originally announced May 2024.

arXiv:2405.18064 [pdf]

Automated Real-World Sustainability Data Generation from Images of Buildings

Authors: Peter J Bentley, Soo Ling Lim, Rajat Mathur, Sid Narang

Abstract: When data on building features is unavailable, the task of determining how to improve that building in terms of carbon emissions becomes infeasible. We show that from only a set of images, a Large Language Model with appropriate prompt engineering and domain knowledge can successfully estimate a range of building features relevant for sustainability calculations. We compare our novel image-to-data… ▽ More When data on building features is unavailable, the task of determining how to improve that building in terms of carbon emissions becomes infeasible. We show that from only a set of images, a Large Language Model with appropriate prompt engineering and domain knowledge can successfully estimate a range of building features relevant for sustainability calculations. We compare our novel image-to-data method with a ground truth comprising real building data for 47 apartments and achieve accuracy better than a human performing the same task. We also demonstrate that the method can generate tailored recommendations to the owner on how best to improve their properties and discuss methods to scale the approach. △ Less

Submitted 28 May, 2024; originally announced May 2024.

Comments: 6 pages

MSC Class: 68T07; 94A08

arXiv:2405.17424 [pdf, other]

LARM: Large Auto-Regressive Model for Long-Horizon Embodied Intelligence

Authors: Zhuoling Li, Xiaogang Xu, Zhenhua Xu, SerNam Lim, Hengshuang Zhao

Abstract: Due to the need to interact with the real world, embodied agents are required to possess comprehensive prior knowledge, long-horizon planning capability, and a swift response speed. Despite recent large language model (LLM) based agents achieving promising performance, they still exhibit several limitations. For instance, the output of LLMs is a descriptive sentence, which is ambiguous when determ… ▽ More Due to the need to interact with the real world, embodied agents are required to possess comprehensive prior knowledge, long-horizon planning capability, and a swift response speed. Despite recent large language model (LLM) based agents achieving promising performance, they still exhibit several limitations. For instance, the output of LLMs is a descriptive sentence, which is ambiguous when determining specific actions. To address these limitations, we introduce the large auto-regressive model (LARM). LARM leverages both text and multi-view images as input and predicts subsequent actions in an auto-regressive manner. To train LARM, we develop a novel data format named auto-regressive node transmission structure and assemble a corresponding dataset. Adopting a two-phase training regimen, LARM successfully harvests enchanted equipment in Minecraft, which demands significantly more complex decision-making chains than the highest achievements of prior best methods. Besides, the speed of LARM is 6.8x faster. △ Less

Submitted 27 May, 2024; originally announced May 2024.

arXiv:2405.14726 [pdf, other]

Distilling Vision-Language Pretraining for Efficient Cross-Modal Retrieval

Authors: Young Kyun Jang, Donghyun Kim, Ser-nam Lim

Abstract: ``Learning to hash'' is a practical solution for efficient retrieval, offering fast search speed and low storage cost. It is widely applied in various applications, such as image-text cross-modal search. In this paper, we explore the potential of enhancing the performance of learning to hash with the proliferation of powerful large pre-trained models, such as Vision-Language Pre-training (VLP) mod… ▽ More ``Learning to hash'' is a practical solution for efficient retrieval, offering fast search speed and low storage cost. It is widely applied in various applications, such as image-text cross-modal search. In this paper, we explore the potential of enhancing the performance of learning to hash with the proliferation of powerful large pre-trained models, such as Vision-Language Pre-training (VLP) models. We introduce a novel method named Distillation for Cross-Modal Quantization (DCMQ), which leverages the rich semantic knowledge of VLP models to improve hash representation learning. Specifically, we use the VLP as a `teacher' to distill knowledge into a `student' hashing model equipped with codebooks. This process involves the replacement of supervised labels, which are composed of multi-hot vectors and lack semantics, with the rich semantics of VLP. In the end, we apply a transformation termed Normalization with Paired Consistency (NPC) to achieve a discriminative target for distillation. Further, we introduce a new quantization method, Product Quantization with Gumbel (PQG) that promotes balanced codebook learning, thereby improving the retrieval performance. Extensive benchmark testing demonstrates that DCMQ consistently outperforms existing supervised cross-modal hashing approaches, showcasing its significant potential. △ Less

Submitted 23 May, 2024; originally announced May 2024.

arXiv:2405.14715 [pdf, other]

Towards Cross-modal Backward-compatible Representation Learning for Vision-Language Models

Authors: Young Kyun Jang, Ser-nam Lim

Abstract: Modern retrieval systems often struggle with upgrading to new and more powerful models due to the incompatibility of embeddings between the old and new models. This necessitates a costly process known as backfilling, which involves re-computing the embeddings for a large number of data samples. In vision, Backward-compatible Training (BT) has been proposed to ensure that the new model aligns with… ▽ More Modern retrieval systems often struggle with upgrading to new and more powerful models due to the incompatibility of embeddings between the old and new models. This necessitates a costly process known as backfilling, which involves re-computing the embeddings for a large number of data samples. In vision, Backward-compatible Training (BT) has been proposed to ensure that the new model aligns with the old model's embeddings. This paper extends the concept of vision-only BT to the field of cross-modal retrieval, marking the first attempt to address Cross-modal BT (XBT). Our goal is to achieve backward-compatibility between Vision-Language Pretraining (VLP) models, such as CLIP, for the cross-modal retrieval task. To address XBT challenges, we propose an efficient solution: a projection module that maps the new model's embeddings to those of the old model. This module, pretrained solely with text data, significantly reduces the number of image-text pairs required for XBT learning, and, once it is pretrained, it avoids using the old model during training. Furthermore, we utilize parameter-efficient training strategies that improve efficiency and preserve the off-the-shelf new model's knowledge by avoiding any modifications. Experimental results on cross-modal retrieval datasets demonstrate the effectiveness of XBT and its potential to enable backfill-free upgrades when a new VLP model emerges. △ Less

Submitted 23 May, 2024; originally announced May 2024.

arXiv:2405.12934 [pdf]

Address-Specific Sustainable Accommodation Choice Through Real-World Data Integration

Authors: Peter J. Bentley, Rajat Mathur, Soo Ling Lim, Sid Narang

Abstract: Consumers wish to choose sustainable accommodation for their travels, and in the case of corporations, may be required to do so. Yet accommodation marketplaces provide no meaningful capability for sustainable choice: typically CO2 estimates are provided that are identical for all accommodation of the same type across an entire country. We propose a decision support system that enables real choice… ▽ More Consumers wish to choose sustainable accommodation for their travels, and in the case of corporations, may be required to do so. Yet accommodation marketplaces provide no meaningful capability for sustainable choice: typically CO2 estimates are provided that are identical for all accommodation of the same type across an entire country. We propose a decision support system that enables real choice of sustainable accommodation. We develop a data-driven address-specific metric called EcoGrade, which integrates government approved datasets and uses interpolation where data is sparse. We validate the metric on 10,000 UK addresses in 10 cities, showing the match of our interpolations to reality is statistically significant. We show how the metric has been embedded into a decision support system for a global accommodation marketplace and tested by real users over several months with positive user feedback. In the EU, forty percent of final energy consumption is from buildings. We need to encourage all building owners to make their accommodation more efficient. The rental sector is one area where change can occur rapidly, as rented accommodation is renovated frequently. We anticipate our decision support system using EcoGrade will encourage this positive change. △ Less

Submitted 21 May, 2024; originally announced May 2024.

Comments: 8 pages

MSC Class: 68U35 ACM Class: E.m; H.m

arXiv:2405.11689 [pdf, other]

Investigation of suppression of $Υうぷしろん(nS)$ in relativistic heavy-ion collisions at RHIC and LHC energies

Authors: Junlee Kim, Jaebeom Park, Byungsik Hong, Juhee Hong, Eun-Joo Kim, Yongsun Kim, MinJung Kweon, Su Houng Lee, Sanghoon Lim, Jinjoo Seo

Abstract: The primary purpose of studying quarkonium production in relativistic heavy-ion collisions is to understand the properties of the quark-gluon plasma. At various collision systems, measurements of quarkonium states of different binding energies, such as $Υうぷしろん(nS)$, can provide comprehensive information. A model study has been performed to investigate the modification of $Υうぷしろん(nS)$ production in Pb-Pb col… ▽ More The primary purpose of studying quarkonium production in relativistic heavy-ion collisions is to understand the properties of the quark-gluon plasma. At various collision systems, measurements of quarkonium states of different binding energies, such as $Υうぷしろん(nS)$, can provide comprehensive information. A model study has been performed to investigate the modification of $Υうぷしろん(nS)$ production in Pb-Pb collisions at $\sqrt{s_{\mathrm{NN}}}=$ 5.02 TeV and Au-Au collisions at $\sqrt{s_{\mathrm{NN}}}=$ 200 GeV. The Monte-Carlo simulation study is performed with a publicly available hydrodynamic simulation package for the quark-gluon plasma medium and a theoretical calculation of temperature-dependent thermal width of $Υうぷしろん(nS)$ considering the gluo-dissociation and inelastic parton scattering for dissociation inside the medium. In addition, we perform a systematic study with different descriptions of initial collision geometry and formation time of $Υうぷしろん(nS)$ to investigate their impacts on yield modification. The model calculation with a varied parameter set can describe the experimental data of $Υうぷしろん(nS)$ in Pb-Pb collisions at 5.02 TeV and $Υうぷしろん(2S)$ in Au-Au collisions at 200 GeV but underestimates the modification of $Υうぷしろん(1S)$ at the lower collision energy. The nuclear absorption mechanism is explored to understand the discrepancy between the data and simulation. △ Less

Submitted 19 May, 2024; originally announced May 2024.

Comments: 9 pages, 11 figures

arXiv:2405.00571 [pdf, other]

Spherical Linear Interpolation and Text-Anchoring for Zero-shot Composed Image Retrieval

Authors: Young Kyun Jang, Dat Huynh, Ashish Shah, Wen-Kai Chen, Ser-Nam Lim

Abstract: Composed Image Retrieval (CIR) is a complex task that retrieves images using a query, which is configured with an image and a caption that describes desired modifications to that image. Supervised CIR approaches have shown strong performance, but their reliance on expensive manually-annotated datasets restricts their scalability and broader applicability. To address these issues, previous studies… ▽ More Composed Image Retrieval (CIR) is a complex task that retrieves images using a query, which is configured with an image and a caption that describes desired modifications to that image. Supervised CIR approaches have shown strong performance, but their reliance on expensive manually-annotated datasets restricts their scalability and broader applicability. To address these issues, previous studies have proposed pseudo-word token-based Zero-Shot CIR (ZS-CIR) methods, which utilize a projection module to map images to word tokens. However, we conjecture that this approach has a downside: the projection module distorts the original image representation and confines the resulting composed embeddings to the text-side. In order to resolve this, we introduce a novel ZS-CIR method that uses Spherical Linear Interpolation (Slerp) to directly merge image and text representations by identifying an intermediate embedding of both. Furthermore, we introduce Text-Anchored-Tuning (TAT), a method that fine-tunes the image encoder while keeping the text encoder fixed. TAT closes the modality gap between images and text, making the Slerp process much more effective. Notably, the TAT method is not only efficient in terms of the scale of the training dataset and training time, but it also serves as an excellent initial checkpoint for training supervised CIR models, thereby highlighting its wider potential. The integration of the Slerp-based ZS-CIR with a TAT-tuned model enables our approach to deliver state-of-the-art retrieval performance across CIR benchmarks. △ Less

Submitted 1 May, 2024; originally announced May 2024.

arXiv:2404.15884 [pdf, other]

The Robotic MAAO 0.7m Telescope System: Performance and Standard Photometric System

Authors: Gu Lim, Dohyeong Kim, Seonghun Lim, Myungshin Im, Hyeonho Choi, Jaemin Park, Keun-Hong Park, Junyeong Park, Chaudhary Muskaan, Donghyun Kim, Hayeong Jeong

Abstract: We introduce a 0.7m telescope system at the Miryang Arirang Astronomical Observatory (MAAO), a public observatory in Miryang, Korea. System integration and a scheduling program enable the 0.7m telescope system to operate completely robotically during nighttime, eliminating the need for human intervention. Using the 0.7m telescope system, we obtain atmospheric extinction coefficients and the zero-p… ▽ More We introduce a 0.7m telescope system at the Miryang Arirang Astronomical Observatory (MAAO), a public observatory in Miryang, Korea. System integration and a scheduling program enable the 0.7m telescope system to operate completely robotically during nighttime, eliminating the need for human intervention. Using the 0.7m telescope system, we obtain atmospheric extinction coefficients and the zero-point magnitudes by observing standard stars. As a result, we find that atmospheric extinctions are moderate but they can sometimes increase depending on the weather conditions. The measured 5-sigma limiting magnitudes reach down to BVRI=19.4-19.6 AB mag for a point source with a total integrated time of 10 minutes under clear weather conditions, demonstrating comparable performance with other observational facilities operating under similar specifications and sky conditions. We expect that the newly established MAAO 0.7m telescope system will contribute significantly to the observational studies of astronomy. Particularly, with its capability for robotic observations, this system, although its primary duty is for public viewing, can be extensively used for the time-series observation of transients. △ Less

Submitted 24 April, 2024; originally announced April 2024.

Comments: 14 pages, 10 figures, Accepted for publication in PASP

arXiv:2404.15516 [pdf, other]

Visual Delta Generator with Large Multi-modal Models for Semi-supervised Composed Image Retrieval

Authors: Young Kyun Jang, Donghyun Kim, Zihang Meng, Dat Huynh, Ser-Nam Lim

Abstract: Composed Image Retrieval (CIR) is a task that retrieves images similar to a query, based on a provided textual modification. Current techniques rely on supervised learning for CIR models using labeled triplets of the reference image, text, target image. These specific triplets are not as commonly available as simple image-text pairs, limiting the widespread use of CIR and its scalability. On the o… ▽ More Composed Image Retrieval (CIR) is a task that retrieves images similar to a query, based on a provided textual modification. Current techniques rely on supervised learning for CIR models using labeled triplets of the reference image, text, target image. These specific triplets are not as commonly available as simple image-text pairs, limiting the widespread use of CIR and its scalability. On the other hand, zero-shot CIR can be relatively easily trained with image-caption pairs without considering the image-to-image relation, but this approach tends to yield lower accuracy. We propose a new semi-supervised CIR approach where we search for a reference and its related target images in auxiliary data and learn our large language model-based Visual Delta Generator (VDG) to generate text describing the visual difference (i.e., visual delta) between the two. VDG, equipped with fluent language knowledge and being model agnostic, can generate pseudo triplets to boost the performance of CIR models. Our approach significantly improves the existing supervised learning approaches and achieves state-of-the-art results on the CIR benchmarks. △ Less

Submitted 23 April, 2024; originally announced April 2024.

Comments: 15 pages

arXiv:2404.15395 [pdf, other]

doi 10.3847/1538-3881/ad32c8

Planet Hunters NGTS: New Planet Candidates from a Citizen Science Search of the Next Generation Transit Survey Public Data

Authors: Sean M. O'Brien, Megan E. Schwamb, Samuel Gill, Christopher A. Watson, Matthew R. Burleigh, Alicia Kendall, David R. Anderson, José I. Vines, James S. Jenkins, Douglas R. Alves, Laura Trouille, Solène Ulmer-Moll, Edward M. Bryant, Ioannis Apergis, Matthew P. Battley, Daniel Bayliss, Nora L. Eisner, Edward Gillen, Michael R. Goad, Maximilian N. Günther, Beth A. Henderson, Jeong-Eun Heo, David G. Jackson, Chris Lintott, James McCormac , et al. (13 additional authors not shown)

Abstract: We present the results from the first two years of the Planet Hunters NGTS citizen science project, which searches for transiting planet candidates in data from the Next Generation Transit Survey (NGTS) by enlisting the help of members of the general public. Over 8,000 registered volunteers reviewed 138,198 light curves from the NGTS Public Data Releases 1 and 2. We utilize a user weighting scheme… ▽ More We present the results from the first two years of the Planet Hunters NGTS citizen science project, which searches for transiting planet candidates in data from the Next Generation Transit Survey (NGTS) by enlisting the help of members of the general public. Over 8,000 registered volunteers reviewed 138,198 light curves from the NGTS Public Data Releases 1 and 2. We utilize a user weighting scheme to combine the classifications of multiple users to identify the most promising planet candidates not initially discovered by the NGTS team. We highlight the five most interesting planet candidates detected through this search, which are all candidate short-period giant planets. This includes the TIC-165227846 system that, if confirmed, would be the lowest-mass star to host a close-in giant planet. We assess the detection efficiency of the project by determining the number of confirmed planets from the NASA Exoplanet Archive and TESS Objects of Interest (TOIs) successfully recovered by this search and find that 74% of confirmed planets and 63% of TOIs detected by NGTS are recovered by the Planet Hunters NGTS project. The identification of new planet candidates shows that the citizen science approach can provide a complementary method to the detection of exoplanets with ground-based surveys such as NGTS. △ Less

Submitted 23 April, 2024; originally announced April 2024.

Comments: 42 pages, 20 figures, 17 tables. To be published in AJ

Journal ref: AJ 167 (2024) 238

arXiv:2404.11922 [pdf, other]

Redefining the Shortest Path Problem Formulation of the Linear Non-Gaussian Acyclic Model: Pairwise Likelihood Ratios, Prior Knowledge, and Path Enumeration

Authors: Hans Jarett J. Ong, Brian Godwin S. Lim

Abstract: Effective causal discovery is essential for learning the causal graph from observational data. The linear non-Gaussian acyclic model (LiNGAM) operates under the assumption of a linear data generating process with non-Gaussian noise in determining the causal graph. Its assumption of unmeasured confounders being absent, however, poses practical limitations. In response, empirical research has shown… ▽ More Effective causal discovery is essential for learning the causal graph from observational data. The linear non-Gaussian acyclic model (LiNGAM) operates under the assumption of a linear data generating process with non-Gaussian noise in determining the causal graph. Its assumption of unmeasured confounders being absent, however, poses practical limitations. In response, empirical research has shown that the reformulation of LiNGAM as a shortest path problem (LiNGAM-SPP) addresses this limitation. Within LiNGAM-SPP, mutual information is chosen to serve as the measure of independence. A challenge is introduced - parameter tuning is now needed due to its reliance on kNN mutual information estimators. The paper proposes a threefold enhancement to the LiNGAM-SPP framework. First, the need for parameter tuning is eliminated by using the pairwise likelihood ratio in lieu of kNN-based mutual information. This substitution is validated on a general data generating process and benchmark real-world data sets, outperforming existing methods especially when given a larger set of features. The incorporation of prior knowledge is then enabled by a node-skipping strategy implemented on the graph representation of all causal orderings to eliminate violations based on the provided input of relative orderings. Flexibility relative to existing approaches is achieved. Last among the three enhancements is the utilization of the distribution of paths in the graph representation of all causal orderings. From this, crucial properties of the true causal graph such as the presence of unmeasured confounders and sparsity may be inferred. To some extent, the expected performance of the causal discovery algorithm may be predicted. The refinements above advance the practicality and performance of LiNGAM-SPP, showcasing the potential of graph-search-based methodologies in advancing causal discovery. △ Less

Submitted 18 April, 2024; originally announced April 2024.

arXiv:2404.11849 [pdf]

Multiphoton super-resolution imaging via virtual structured illumination

Authors: Sumin Lim, Sungsam Kang, Jin-Hee Hong, Youngho Jin, Kalpak Gupta, Moonseok Kim, Suhyun Kim, Wonshik Choi, Seokchan Yoon

Abstract: Fluorescence imaging in thick biological tissues is challenging due to sample-induced aberration and scattering, which leads to severe degradation of image quality and resolution. Fluorescence imaging in reflection geometry further exacerbates this issue since the point spread function is distorted in both excitation and emission pathways. Here, we propose a novel approach termed adaptive optics v… ▽ More Fluorescence imaging in thick biological tissues is challenging due to sample-induced aberration and scattering, which leads to severe degradation of image quality and resolution. Fluorescence imaging in reflection geometry further exacerbates this issue since the point spread function is distorted in both excitation and emission pathways. Here, we propose a novel approach termed adaptive optics virtual structured illumination microscopy (AO V-SIM) that enables super-resolution multiphoton imaging through a scattering medium in reflection geometry. Our approach exploits the incoherent reflection matrix obtained using a conventional point-scanning fluorescence microscope with an array detector. We introduce V-SIM super-resolution reconstruction algorithm based on the incoherent reflection matrix. Furthermore, we introduce a software adaptive optics correction algorithm, AO V-SIM, which recovers unattenuated and phase-corrected optical transfer function for both excitation and emission pathways. The effectiveness of our proposed method is experimentally validated through sub-diffraction-limited two-photon fluorescence imaging of various samples in the presence of strong aberration. △ Less

Submitted 17 April, 2024; originally announced April 2024.

arXiv:2404.09456 [pdf, other]

doi 10.1145/3589335.3651522

Hyperbolic Heterogeneous Graph Attention Networks

Authors: Jongmin Park, Seunghoon Han, Soohwan Jeong, Sungsu Lim

Abstract: Most previous heterogeneous graph embedding models represent elements in a heterogeneous graph as vector representations in a low-dimensional Euclidean space. However, because heterogeneous graphs inherently possess complex structures, such as hierarchical or power-law structures, distortions can occur when representing them in Euclidean space. To overcome this limitation, we propose Hyperbolic He… ▽ More Most previous heterogeneous graph embedding models represent elements in a heterogeneous graph as vector representations in a low-dimensional Euclidean space. However, because heterogeneous graphs inherently possess complex structures, such as hierarchical or power-law structures, distortions can occur when representing them in Euclidean space. To overcome this limitation, we propose Hyperbolic Heterogeneous Graph Attention Networks (HHGAT) that learn vector representations in hyperbolic spaces with meta-path instances. We conducted experiments on three real-world heterogeneous graph datasets, demonstrating that HHGAT outperforms state-of-the-art heterogeneous graph embedding models in node classification and clustering tasks. △ Less

Submitted 15 April, 2024; originally announced April 2024.

Comments: Accepted in ACM THE WEB CONFERENCE 2024 short paper track

arXiv:2404.08716 [pdf, other]

Securing Monolithic Kernels using Compartmentalization

Authors: Soo Yee Lim, Sidhartha Agrawal, Xueyuan Han, David Eyers, Dan O'Keeffe, Thomas Pasquier

Abstract: Monolithic operating systems, where all kernel functionality resides in a single, shared address space, are the foundation of most mainstream computer systems. However, a single flaw, even in a non-essential part of the kernel (e.g., device drivers), can cause the entire operating system to fall under an attacker's control. Kernel hardening techniques might prevent certain types of vulnerabilities… ▽ More Monolithic operating systems, where all kernel functionality resides in a single, shared address space, are the foundation of most mainstream computer systems. However, a single flaw, even in a non-essential part of the kernel (e.g., device drivers), can cause the entire operating system to fall under an attacker's control. Kernel hardening techniques might prevent certain types of vulnerabilities, but they fail to address a fundamental weakness: the lack of intra-kernel security that safely isolates different parts of the kernel. We survey kernel compartmentalization techniques that define and enforce intra-kernel boundaries and propose a taxonomy that allows the community to compare and discuss future work. We also identify factors that complicate comparisons among compartmentalized systems, suggest new ways to compare future approaches with existing work meaningfully, and discuss emerging research directions. △ Less

Submitted 12 April, 2024; originally announced April 2024.

Comments: 24 pages, 7 figures

arXiv:2404.08672 [pdf, other]

Taxonomy and Analysis of Sensitive User Queries in Generative AI Search

Authors: Hwiyeol Jo, Taiwoo Park, Nayoung Choi, Changbong Kim, Ohjoon Kwon, Donghyeon Jeon, Hyunwoo Lee, Eui-Hyeon Lee, Kyoungho Shin, Sun Suk Lim, Kyungmi Kim, Jihye Lee, Sun Kim

Abstract: Although there has been a growing interest among industries to integrate generative LLMs into their services, limited experiences and scarcity of resources acts as a barrier in launching and servicing large-scale LLM-based conversational services. In this paper, we share our experiences in developing and operating generative AI models within a national-scale search engine, with a specific focus on… ▽ More Although there has been a growing interest among industries to integrate generative LLMs into their services, limited experiences and scarcity of resources acts as a barrier in launching and servicing large-scale LLM-based conversational services. In this paper, we share our experiences in developing and operating generative AI models within a national-scale search engine, with a specific focus on the sensitiveness of user queries. We propose a taxonomy for sensitive search queries, outline our approaches, and present a comprehensive analysis report on sensitive queries from actual users. △ Less

Submitted 5 April, 2024; originally announced April 2024.

arXiv:2404.07752 [pdf, ps, other]

Singular linear forms over global function fields

Authors: Gukyeong Bang, Taehyeong Kim, Seonhee Lim

Abstract: In this paper, we consider singular linear forms over global function fields of class number one and give an upper bound for the Hausdorff dimension of the set of singular linear forms by constructing an appropriate Margulis function over global function fields. In this paper, we consider singular linear forms over global function fields of class number one and give an upper bound for the Hausdorff dimension of the set of singular linear forms by constructing an appropriate Margulis function over global function fields. △ Less

Submitted 11 April, 2024; originally announced April 2024.

Comments: 32 pages

arXiv:2404.05726 [pdf, other]

MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding

Authors: Bo He, Hengduo Li, Young Kyun Jang, Menglin Jia, Xuefei Cao, Ashish Shah, Abhinav Shrivastava, Ser-Nam Lim

Abstract: With the success of large language models (LLMs), integrating the vision model into LLMs to build vision-language foundation models has gained much more interest recently. However, existing LLM-based large multimodal models (e.g., Video-LLaMA, VideoChat) can only take in a limited number of frames for short video understanding. In this study, we mainly focus on designing an efficient and effective… ▽ More With the success of large language models (LLMs), integrating the vision model into LLMs to build vision-language foundation models has gained much more interest recently. However, existing LLM-based large multimodal models (e.g., Video-LLaMA, VideoChat) can only take in a limited number of frames for short video understanding. In this study, we mainly focus on designing an efficient and effective model for long-term video understanding. Instead of trying to process more frames simultaneously like most existing work, we propose to process videos in an online manner and store past video information in a memory bank. This allows our model to reference historical video content for long-term analysis without exceeding LLMs' context length constraints or GPU memory limits. Our memory bank can be seamlessly integrated into current multimodal LLMs in an off-the-shelf manner. We conduct extensive experiments on various video understanding tasks, such as long-video understanding, video question answering, and video captioning, and our model can achieve state-of-the-art performances across multiple datasets. Code available at https://boheumd.github.io/MA-LMM/. △ Less

Submitted 24 April, 2024; v1 submitted 8 April, 2024; originally announced April 2024.

Comments: Accepted at CVPR 2024. Project Page https://boheumd.github.io/MA-LMM/

arXiv:2404.01954 [pdf, other]

HyperCLOVA X Technical Report

Authors: Kang Min Yoo, Jaegeun Han, Sookyo In, Heewon Jeon, Jisu Jeong, Jaewook Kang, Hyunwook Kim, Kyung-Min Kim, Munhyong Kim, Sungju Kim, Donghyun Kwak, Hanock Kwak, Se Jung Kwon, Bado Lee, Dongsoo Lee, Gichang Lee, Jooho Lee, Baeseong Park, Seongjin Shin, Joonsang Yu, Seolki Baek, Sumin Byeon, Eungsup Cho, Dooseok Choe, Jeesung Han , et al. (371 additional authors not shown)

Abstract: We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment t… ▽ More We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment to responsible AI. The model is evaluated across various benchmarks, including comprehensive reasoning, knowledge, commonsense, factuality, coding, math, chatting, instruction-following, and harmlessness, in both Korean and English. HyperCLOVA X exhibits strong reasoning capabilities in Korean backed by a deep understanding of the language and cultural nuances. Further analysis of the inherent bilingual nature and its extension to multilingualism highlights the model's cross-lingual proficiency and strong generalization ability to untargeted languages, including machine translation between several language pairs and cross-lingual inference tasks. We believe that HyperCLOVA X can provide helpful guidance for regions or countries in developing their sovereign LLMs. △ Less

Submitted 13 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

Comments: 44 pages; updated authors list and fixed author names

arXiv:2404.00300 [pdf, other]

Enhancing Empathy in Virtual Reality: An Embodied Approach to Mindset Modulation

Authors: Seoyeon Bae, Yoon Kyung Lee, Jungcheol Lee, Jaeheon Kim, Haeseong Jeon, Seung-Hwan Lim, Byung-Cheol Kim, Sowon Hahn

Abstract: A growth mindset has shown promising outcomes for increasing empathy ability. However, stimulating a growth mindset in VR-based empathy interventions is under-explored. In the present study, we implemented prosocial VR content, Our Neighbor Hero, focusing on embodying a virtual character to modulate players' mindsets. The virtual body served as a stepping stone, enabling players to identify with t… ▽ More A growth mindset has shown promising outcomes for increasing empathy ability. However, stimulating a growth mindset in VR-based empathy interventions is under-explored. In the present study, we implemented prosocial VR content, Our Neighbor Hero, focusing on embodying a virtual character to modulate players' mindsets. The virtual body served as a stepping stone, enabling players to identify with the character and cultivate a growth mindset as they followed mission instructions. We considered several implementation factors to assist players in positioning within the VR experience, including positive feedback, content difficulty, background lighting, and multimodal feedback. We conducted an experiment to investigate the intervention's effectiveness in increasing empathy. Our findings revealed that the VR content and mindset training encouraged participants to improve their growth mindsets and empathic motives. This VR content was developed for college students to enhance their empathy and teamwork skills. It has the potential to improve collaboration in organizational and community environments. △ Less

Submitted 30 March, 2024; originally announced April 2024.

Comments: 9 pages, 2 figures, 1 table

arXiv:2403.14255 [pdf, other]

ERD: A Framework for Improving LLM Reasoning for Cognitive Distortion Classification

Authors: Sehee Lim, Yejin Kim, Chi-Hyun Choi, Jy-yong Sohn, Byung-Hoon Kim

Abstract: Improving the accessibility of psychotherapy with the aid of Large Language Models (LLMs) is garnering a significant attention in recent years. Recognizing cognitive distortions from the interviewee's utterances can be an essential part of psychotherapy, especially for cognitive behavioral therapy. In this paper, we propose ERD, which improves LLM-based cognitive distortion classification performa… ▽ More Improving the accessibility of psychotherapy with the aid of Large Language Models (LLMs) is garnering a significant attention in recent years. Recognizing cognitive distortions from the interviewee's utterances can be an essential part of psychotherapy, especially for cognitive behavioral therapy. In this paper, we propose ERD, which improves LLM-based cognitive distortion classification performance with the aid of additional modules of (1) extracting the parts related to cognitive distortion, and (2) debating the reasoning steps by multiple agents. Our experimental results on a public dataset show that ERD improves the multi-class F1 score as well as binary specificity score. Regarding the latter score, it turns out that our method is effective in debiasing the baseline method which has high false positive rate, especially when the summary of multi-agent debate is provided to LLMs. △ Less

Submitted 21 March, 2024; originally announced March 2024.

arXiv:2403.14212 [pdf, other]

doi 10.1063/5.0208517

CMOS-compatible photonic integrated circuits on thin-film ScAlN

Authors: Sihao Wang, Veerendra Dhyani, Sakthi Sanjeev Mohanraj, Xiaodong Shi, Binni Varghese, Wing Wai Chung, Ding Huang, Zhi Shiuh Lim, Qibin Zeng, Huajun Liu, Xianshu Luo, Victor Leong, Nanxi Li, Di Zhu

Abstract: Scandium aluminum nitride (ScAlN) has recently emerged as an attractive material for integrated photonics due to its favorable nonlinear optical properties and compatibility with CMOS fabrication. Despite the promising and versatile material properties, it is still an outstanding challenge to realize low-loss photonic circuits on thin-film ScAlN-on-insulator wafers. Here, we present a systematic s… ▽ More Scandium aluminum nitride (ScAlN) has recently emerged as an attractive material for integrated photonics due to its favorable nonlinear optical properties and compatibility with CMOS fabrication. Despite the promising and versatile material properties, it is still an outstanding challenge to realize low-loss photonic circuits on thin-film ScAlN-on-insulator wafers. Here, we present a systematic study on the material quality of sputtered thin-film ScAlN produced in a CMOS-compatible 200 mm line, and an optimized fabrication process to yield 400 nm thick, fully etched waveguides. With surface polishing and annealing, we achieve micro-ring resonators with an intrinsic quality factor as high as $1.47\times 10^5$, corresponding to a propagation loss of 2.4 dBでしべる/cm. These results serve as a critical step towards developing future large-scale, low-loss photonic integrated circuits based on ScAlN. △ Less

Submitted 11 June, 2024; v1 submitted 21 March, 2024; originally announced March 2024.

Journal ref: APL Photon. 9, 066109 (2024)

arXiv:2403.13731 [pdf, other]

Emotion Recognition Using Transformers with Masked Learning

Authors: Seongjae Min, Junseok Yang, Sangjun Lim, Junyong Lee, Sangwon Lee, Sejoon Lim

Abstract: In recent years, deep learning has achieved innovative advancements in various fields, including the analysis of human emotions and behaviors. Initiatives such as the Affective Behavior Analysis in-the-wild (ABAW) competition have been particularly instrumental in driving research in this area by providing diverse and challenging datasets that enable precise evaluation of complex emotional states.… ▽ More In recent years, deep learning has achieved innovative advancements in various fields, including the analysis of human emotions and behaviors. Initiatives such as the Affective Behavior Analysis in-the-wild (ABAW) competition have been particularly instrumental in driving research in this area by providing diverse and challenging datasets that enable precise evaluation of complex emotional states. This study leverages the Vision Transformer (ViT) and Transformer models to focus on the estimation of Valence-Arousal (VA), which signifies the positivity and intensity of emotions, recognition of various facial expressions, and detection of Action Units (AUえーゆー) representing fundamental muscle movements. This approach transcends traditional Convolutional Neural Networks (CNNs) and Long Short-Term Memory (LSTM) based methods, proposing a new Transformer-based framework that maximizes the understanding of temporal and spatial features. The core contributions of this research include the introduction of a learning technique through random frame masking and the application of Focal loss adapted for imbalanced data, enhancing the accuracy and applicability of emotion and behavior analysis in real-world settings. This approach is expected to contribute to the advancement of emotional computing and deep learning methodologies. △ Less

Submitted 23 March, 2024; v1 submitted 19 March, 2024; originally announced March 2024.

arXiv:2403.12449 [pdf, other]

Multi-Object RANSAC: Efficient Plane Clustering Method in a Clutter

Authors: Seunghyeon Lim, Youngjae Yoo, Jun Ki Lee, Byoung-Tak Zhang

Abstract: In this paper, we propose a novel method for plane clustering specialized in cluttered scenes using an RGB-D camera and validate its effectiveness through robot grasping experiments. Unlike existing methods, which focus on large-scale indoor structures, our approach -- Multi-Object RANSAC emphasizes cluttered environments that contain a wide range of objects with different scales. It enhances plan… ▽ More In this paper, we propose a novel method for plane clustering specialized in cluttered scenes using an RGB-D camera and validate its effectiveness through robot grasping experiments. Unlike existing methods, which focus on large-scale indoor structures, our approach -- Multi-Object RANSAC emphasizes cluttered environments that contain a wide range of objects with different scales. It enhances plane segmentation by generating subplanes in Deep Plane Clustering (DPC) module, which are then merged with the final planes by post-processing. DPC rearranges the point cloud by voting layers to make subplane clusters, trained in a self-supervised manner using pseudo-labels generated from RANSAC. Multi-Object RANSAC demonstrates superior plane instance segmentation performances over other recent RANSAC applications. We conducted an experiment on robot suction-based grasping, comparing our method with vision-based grasping network and RANSAC applications. The results from this real-world scenario showed its remarkable performance surpassing the baseline methods, highlighting its potential for advanced scene understanding and manipulation. △ Less

Submitted 19 March, 2024; originally announced March 2024.

Comments: 7 pages, 6 figures

arXiv:2403.10492 [pdf, other]

Mitigating Dialogue Hallucination for Large Vision Language Models via Adversarial Instruction Tuning

Authors: Dongmin Park, Zhaofang Qian, Guangxing Han, Ser-Nam Lim

Abstract: Mitigating hallucinations of Large Vision Language Models,(LVLMs) is crucial to enhance their reliability for general-purpose assistants. This paper shows that such hallucinations of LVLMs can be significantly exacerbated by preceding user-system dialogues. To precisely measure this, we first present an evaluation benchmark by extending popular multi-modal benchmark datasets with prepended halluci… ▽ More Mitigating hallucinations of Large Vision Language Models,(LVLMs) is crucial to enhance their reliability for general-purpose assistants. This paper shows that such hallucinations of LVLMs can be significantly exacerbated by preceding user-system dialogues. To precisely measure this, we first present an evaluation benchmark by extending popular multi-modal benchmark datasets with prepended hallucinatory dialogues powered by our novel Adversarial Question Generator (AQG), which can automatically generate image-related yet adversarial dialogues by adopting adversarial attacks on LVLMs. On our benchmark, the zero-shot performance of state-of-the-art LVLMs drops significantly for both the VQA and Captioning tasks. Next, we further reveal this hallucination is mainly due to the prediction bias toward preceding dialogues rather than visual content. To reduce this bias, we propose Adversarial Instruction Tuning (AIT) that robustly fine-tunes LVLMs against hallucinatory dialogues. Extensive experiments show our proposed approach successfully reduces dialogue hallucination while maintaining performance. △ Less

Submitted 25 May, 2024; v1 submitted 15 March, 2024; originally announced March 2024.

arXiv:2403.09926 [pdf, other]

The Next Generation Virgo Cluster Survey (NGVS). XXVII.The Size and Structure of Globular Cluster Systems and their Connection to Dark Matter Halos

Authors: Sungsoon Lim, Eric W. Peng, Patrick Côté, Laura Ferrarese, Joel C. Roediger, Chengze Liu, Chelsea Spengler, Elisabeth Sola, Pierre-Alain Duc, Laura V. Sales, John P. Blakeslee, Jean-Charles Cuillandre, Patrick R. Durrell, Eric Emsellem, Stephen D. J. Gwyn, Ariane Lançon, Francine R. Marleau, J. Christopher Mihos, Oliver Müller, Thomas H. Puzia, Rubén Sánchez-Janssen

Abstract: We study the size and structure of globular clusters (GC) systems of 118 early-type galaxies from the NGVS, MATLAS, and ACSVCS surveys. Fitting Sérsic profiles, we investigate the relationship between effective radii of GC systems ($R_{e, \rm gc}$) and galaxy properties. GC systems are 2--4 times more extended than host galaxies across the entire stellar mass range of our sample (… ▽ More We study the size and structure of globular clusters (GC) systems of 118 early-type galaxies from the NGVS, MATLAS, and ACSVCS surveys. Fitting Sérsic profiles, we investigate the relationship between effective radii of GC systems ($R_{e, \rm gc}$) and galaxy properties. GC systems are 2--4 times more extended than host galaxies across the entire stellar mass range of our sample ($10^{8.3} < M_* < 10^{11.6}~M_{\odot}$). The relationship between $R_{e, \rm gc}$ and galaxy stellar mass exhibits a characteristic "knee" at a stellar mass of $M_p \simeq 10^{10.8}$, similar to galaxy $R_e$--stellar mass relationship. We present a new characterization of the traditional blue and red GC color sub-populations, describing them with respect to host galaxy $(g'-i')$ color ($Δでるた_{gi}$): GCs with similar colors to their hosts have a "red" $Δでるた_{gi}$, and those significantly bluer GCs have a "blue" $Δでるた_{gi}$. The GC populations with red $Δでるた_{gi}$, even in dwarf galaxies, are twice as extended as the stars, suggesting that formation or survival mechanisms favor the outer regions. We find a tight correlation between $R_{e, \rm gc}$ and the total number of GCs, with intrinsic scatter $\lesssim 0.1$ dex spanning two and three orders of magnitude in size and number, respectively. This holds for both red and blue subpopulations, albeit with different slopes. Assuming that $N_{GC, Total}$ correlates with $M_{200}$, we find that the red GC systems have effective radii of roughly 1-5\% $R_{\rm 200}$, while the blue GC systems in massive galaxies can have sizes as large as $\sim$10\% $R_{\rm 200}$. Environmental dependence on $R_{e, \rm gc}$ is also found, with lower density environments exhibiting more extended GC systems at fixed mass. △ Less

Submitted 14 March, 2024; originally announced March 2024.

Comments: 28 pages, 18 Figures, 3 tables, accepted for publication in ApJ

arXiv:2403.07198 [pdf, other]

Action Reimagined: Text-to-Pose Video Editing for Dynamic Human Actions

Authors: Lan Wang, Vishnu Boddeti, Sernam Lim

Abstract: We introduce a novel text-to-pose video editing method, ReimaginedAct. While existing video editing tasks are limited to changes in attributes, backgrounds, and styles, our method aims to predict open-ended human action changes in video. Moreover, our method can accept not only direct instructional text prompts but also `what if' questions to predict possible action changes. ReimaginedAct comprise… ▽ More We introduce a novel text-to-pose video editing method, ReimaginedAct. While existing video editing tasks are limited to changes in attributes, backgrounds, and styles, our method aims to predict open-ended human action changes in video. Moreover, our method can accept not only direct instructional text prompts but also `what if' questions to predict possible action changes. ReimaginedAct comprises video understanding, reasoning, and editing modules. First, an LLM is utilized initially to obtain a plausible answer for the instruction or question, which is then used for (1) prompting Grounded-SAM to produce bounding boxes of relevant individuals and (2) retrieving a set of pose videos that we have collected for editing human actions. The retrieved pose videos and the detected individuals are then utilized to alter the poses extracted from the original video. We also employ a timestep blending module to ensure the edited video retains its original content except where necessary modifications are needed. To facilitate research in text-to-pose video editing, we introduce a new evaluation dataset, WhatifVideo-1.0. This dataset includes videos of different scenarios spanning a range of difficulty levels, along with questions and text prompts. Experimental results demonstrate that existing video editing methods struggle with human action editing, while our approach can achieve effective action editing and even imaginary editing from counterfactual questions. △ Less

Submitted 11 March, 2024; originally announced March 2024.

arXiv:2403.06394 [pdf, other]

FSViewFusion: Few-Shots View Generation of Novel Objects

Authors: Rukhshanda Hussain, Hui Xian Grace Lim, Borchun Chen, Mubarak Shah, Ser Nam Lim

Abstract: Novel view synthesis has observed tremendous developments since the arrival of NeRFs. However, Nerf models overfit on a single scene, lacking generalization to out of distribution objects. Recently, diffusion models have exhibited remarkable performance on introducing generalization in view synthesis. Inspired by these advancements, we explore the capabilities of a pretrained stable diffusion mode… ▽ More Novel view synthesis has observed tremendous developments since the arrival of NeRFs. However, Nerf models overfit on a single scene, lacking generalization to out of distribution objects. Recently, diffusion models have exhibited remarkable performance on introducing generalization in view synthesis. Inspired by these advancements, we explore the capabilities of a pretrained stable diffusion model for view synthesis without explicit 3D priors. Specifically, we base our method on a personalized text to image model, Dreambooth, given its strong ability to adapt to specific novel objects with a few shots. Our research reveals two interesting findings. First, we observe that Dreambooth can learn the high level concept of a view, compared to arguably more complex strategies which involve finetuning diffusions on large amounts of multi-view data. Second, we establish that the concept of a view can be disentangled and transferred to a novel object irrespective of the original object's identify from which the views are learnt. Motivated by this, we introduce a learning strategy, FSViewFusion, which inherits a specific view through only one image sample of a single scene, and transfers the knowledge to a novel object, learnt from few shots, using low rank adapters. Through extensive experiments we demonstrate that our method, albeit simple, is efficient in generating reliable view samples for in the wild images. Code and models will be released. △ Less

Submitted 12 March, 2024; v1 submitted 10 March, 2024; originally announced March 2024.

arXiv:2403.04981 [pdf, other]

Paving the Way for Pass Disturb Free Vertical NAND Storage via A Dedicated and String-Compatible Pass Gate

Authors: Zijian Zhao, Sola Woo, Khandker Akif Aabrar, Sharadindu Gopal Kirtania, Zhouhang Jiang, Shan Deng, Yi Xiao, Halid Mulaosmanovic, Stefan Duenkel, Dominik Kleimaier, Steven Soss, Sven Beyer, Rajiv Joshi, Scott Meninger, Mohamed Mohamed, Kijoon Kim, Jongho Woo, Suhwan Lim, Kwangsoo Kim, Wanki Kim, Daewon Ha, Vijaykrishnan Narayanan, Suman Datta, Shimeng Yu, Kai Ni

Abstract: In this work, we propose a dual-port cell design to address the pass disturb in vertical NAND storage, which can pass signals through a dedicated and string-compatible pass gate. We demonstrate that: i) the pass disturb-free feature originates from weakening of the depolarization field by the pass bias at the high-${V}_{TH}$ (HVT) state and the screening of the applied field by channel at the low-… ▽ More In this work, we propose a dual-port cell design to address the pass disturb in vertical NAND storage, which can pass signals through a dedicated and string-compatible pass gate. We demonstrate that: i) the pass disturb-free feature originates from weakening of the depolarization field by the pass bias at the high-${V}_{TH}$ (HVT) state and the screening of the applied field by channel at the low-${V}_{TH}$ (LVT) state; ii) combined simulations and experimental demonstrations of dual-port design verify the disturb-free operation in a NAND string, overcoming a key challenge in single-port designs; iii) the proposed design can be incorporated in a highly scaled vertical NAND FeFET string and the pass gate can be incorporated into the existing 3D NAND with the negligible overhead of the pass gate interconnection through a global bottom pass gate contact in the substrate. △ Less

Submitted 7 March, 2024; originally announced March 2024.

Comments: 29 pages, 7 figures

arXiv:2403.00960 [pdf]

On the origin of topotactic reduction effect for superconductivity in infinite-layer nickelates

Authors: Shengwei Zeng, Chi Sin Tang, Zhaoyang Luo, Lin Er Chow, Zhi Shiuh Lim, Saurav Prakash, Ping Yang, Caozheng Diao, Xiaojiang Yu, Zhenxiang Xing, Rong Ji, Xinmao Yin, Changjian Li, X. Renshaw Wang, Qian He, Mark B. H. Breese, A. Ariando, Huajun Liu

Abstract: Topotactic reduction utilizing metal hydrides as reagents emerges as an effective approach to achieve exceptionally low oxidization states of metal ions and unconventional coordination networks. This method opens avenues to the development of entirely new functional materials, with one notable example being the infinite-layer nickelate superconductors. However, the reduction effect on the atomic r… ▽ More Topotactic reduction utilizing metal hydrides as reagents emerges as an effective approach to achieve exceptionally low oxidization states of metal ions and unconventional coordination networks. This method opens avenues to the development of entirely new functional materials, with one notable example being the infinite-layer nickelate superconductors. However, the reduction effect on the atomic reconstruction and electronic structures -- crucial for superconductivity -- remains largely unresolved. We design two sets of control Nd$_{0.8}$Sr$_{0.2}$NiO$_2$ thin films and implement secondary ion mass spectroscopy to highlight the absence of reduction-induced hydrogen intercalation. X-ray absorption spectroscopy shows a significant linear dichroism with dominant Ni 3d$_{x2{-}y2}$ orbitals on superconducting samples, indicating a Ni single-band nature of infinite-layer nickelates. Consistent with the superconducting $T_c$, the Ni 3d orbitals asymmetry manifests a dome-like reduction duration dependence. Our results unveil the critical role of reduction in modulating the Ni-3d orbital polarization and its impact on the superconducting properties. △ Less

Submitted 1 March, 2024; originally announced March 2024.

Comments: Main: 21 pages, 4 figure - Supplementary: 7 pages, 5 figure

arXiv:2402.18573 [pdf, other]

UniMODE: Unified Monocular 3D Object Detection

Authors: Zhuoling Li, Xiaogang Xu, SerNam Lim, Hengshuang Zhao

Abstract: Realizing unified monocular 3D object detection, including both indoor and outdoor scenes, holds great importance in applications like robot navigation. However, involving various scenarios of data to train models poses challenges due to their significantly different characteristics, e.g., diverse geometry properties and heterogeneous domain distributions. To address these challenges, we build a d… ▽ More Realizing unified monocular 3D object detection, including both indoor and outdoor scenes, holds great importance in applications like robot navigation. However, involving various scenarios of data to train models poses challenges due to their significantly different characteristics, e.g., diverse geometry properties and heterogeneous domain distributions. To address these challenges, we build a detector based on the bird's-eye-view (BEV) detection paradigm, where the explicit feature projection is beneficial to addressing the geometry learning ambiguity when employing multiple scenarios of data to train detectors. Then, we split the classical BEV detection architecture into two stages and propose an uneven BEV grid design to handle the convergence instability caused by the aforementioned challenges. Moreover, we develop a sparse BEV feature projection strategy to reduce computational cost and a unified domain alignment method to handle heterogeneous domains. Combining these techniques, a unified detector UniMODE is derived, which surpasses the previous state-of-the-art on the challenging Omni3D dataset (a large-scale dataset including both indoor and outdoor scenes) by 4.9% AP_3D, revealing the first successful generalization of a BEV detector to unified 3D object detection. △ Less

Submitted 9 May, 2024; v1 submitted 28 February, 2024; originally announced February 2024.

Comments: This paper has been accepted for publication in CVPR2024

arXiv:2402.17819 [pdf, other]

The FLAMINGO simulation view of cluster progenitors observed in the epoch of reionization with JWST

Authors: Seunghwan Lim, Sandro Tacchella, Joop Schaye, Matthieu Schaller, Jakob M. Helton, Roi Kugel, Roberto Maiolino

Abstract: Motivated by the recent JWST discovery of galaxy overdensities during the Epoch of Reionzation, we examine the physical properties of high-$z$ protoclusters and their evolution using the FLAMINGO simulation suite. We investigate the impact of the apertures used to define protoclusters, because the heterogeneous apertures used in the literature have limited our understanding of the population. Our… ▽ More Motivated by the recent JWST discovery of galaxy overdensities during the Epoch of Reionzation, we examine the physical properties of high-$z$ protoclusters and their evolution using the FLAMINGO simulation suite. We investigate the impact of the apertures used to define protoclusters, because the heterogeneous apertures used in the literature have limited our understanding of the population. Our results are insensitive to the uncertainties of the subgrid models at a given resolution, whereas further investigation into the dependence on numerical resolution is needed. When considering galaxies more massive than $M_\ast\,{\simeq}\,10^8\,{\rm M_\odot}$, the FLAMINGO simulations predict a dominant contribution from progenitors similar to those of the Coma cluster to the cosmic star-formation rate density during the reionization epoch. Our results indicate the onset of suppression of star formation in the protocluster environments as early as $z\,{\simeq}\,5$. The galaxy number density profiles are similar to NFW at $z\,{\lesssim}\,1$ while showing a steeper slope at earlier times before the formation of the core. Different from most previous simulations, the predicted star-formation history for individual protoclusters is in good agreement with observations. We demonstrate that, depending on the aperture, the integrated physical properties including the total (dark matter and baryonic) mass can be biased by a factor of 2 to 5 at $z\,{=}\,5.5$--$7$, and by an order of magnitude at $z\,{\lesssim}\,4$. This correction suffices to remove the ${\simeq}\,3\,σしぐま$ tensions with the number density of structures found in recent JWST observations. △ Less

Submitted 27 February, 2024; originally announced February 2024.

Comments: 21 pages, 15 figures, comments are welcome

arXiv:2402.13562 [pdf, other]

Analysis of Multi-Source Language Training in Cross-Lingual Transfer

Authors: Seong Hoon Lim, Taejun Yun, Jinhyeon Kim, Jihun Choi, Taeuk Kim

Abstract: The successful adaptation of multilingual language models (LMs) to a specific language-task pair critically depends on the availability of data tailored for that condition. While cross-lingual transfer (XLT) methods have contributed to addressing this data scarcity problem, there still exists ongoing debate about the mechanisms behind their effectiveness. In this work, we focus on one of promising… ▽ More The successful adaptation of multilingual language models (LMs) to a specific language-task pair critically depends on the availability of data tailored for that condition. While cross-lingual transfer (XLT) methods have contributed to addressing this data scarcity problem, there still exists ongoing debate about the mechanisms behind their effectiveness. In this work, we focus on one of promising assumptions about inner workings of XLT, that it encourages multilingual LMs to place greater emphasis on language-agnostic or task-specific features. We test this hypothesis by examining how the patterns of XLT change with a varying number of source languages involved in the process. Our experimental findings show that the use of multiple source languages in XLT-a technique we term Multi-Source Language Training (MSLT)-leads to increased mingling of embedding spaces for different languages, supporting the claim that XLT benefits from making use of language-independent information. On the other hand, we discover that using an arbitrary combination of source languages does not always guarantee better performance. We suggest simple heuristics for identifying effective language combinations for MSLT and empirically prove its effectiveness. △ Less

Submitted 4 June, 2024; v1 submitted 21 February, 2024; originally announced February 2024.

Comments: Accepted to ACL 2024

arXiv:2402.13560 [pdf, other]

Design and characterization of individual addressing optics based on multi-channel acousto-optic modulator for $^{171}$Yb$^+$ qubits

Authors: Sungjoo Lim, Seunghyun Baek, Jacob Whitlow, Marissa D'Onofrio, Tianyi Chen, Samuel Phiri, Stephen Crain, Kenneth R. Brown, Jungsang Kim, Junki Kim

Abstract: We present the design and characterization of individual addressing optics based on a multi-channel acousto-optic modulator (AOM) for trapped ytterbium-171 ions. The design parameters of the individual addressing system were determined based on the tradeoff between the expected crosstalk and the required numerical aperture of the projection objective lens. The target beam diameter and separation w… ▽ More We present the design and characterization of individual addressing optics based on a multi-channel acousto-optic modulator (AOM) for trapped ytterbium-171 ions. The design parameters of the individual addressing system were determined based on the tradeoff between the expected crosstalk and the required numerical aperture of the projection objective lens. The target beam diameter and separation were 1.90 $μみゅー$m and 4.28 $μみゅー$m, respectively. The individual beams shaped by the projection optics were characterized by an imaging sensor and a field probe ion. The resulting effective beam diameters and separations were approximately 2.34--2.36 $μみゅー$m and 4.31 $μみゅー$m, respectively, owing to residual aberration. △ Less

Submitted 30 March, 2024; v1 submitted 21 February, 2024; originally announced February 2024.

Comments: 14 pages, 5 figures

arXiv:2402.12033 [pdf, other]

Constraining the stellar populations of ultra-diffuse galaxies in the MATLAS survey using spectral energy distribution fitting

Authors: Maria Luisa Buzzo, Duncan A. Forbes, Thomas H. Jarrett, Francine R. Marleau, Pierre-Alain Duc, Jean P. Brodie, Aaron J. Romanowsky, Jonah S. Gannon, Steven R. Janssens, Joel Pfeffer, Anna Ferré-Mateu, Lydia Haacke, Warrick J. Couch, Sungsoon Lim, Rubén Sánchez-Janssen

Abstract: We use spectral energy distribution (SED) fitting to place constraints on the stellar populations of 59 ultra-diffuse galaxies (UDGs) in the low-to-moderate density fields of the MATLAS survey. We use the routine PROSPECTOR, coupled with archival data in the optical from DECaLS, and near- and mid-infrared imaging from WISE, to recover the stellar masses, ages, metallicities and star formation time… ▽ More We use spectral energy distribution (SED) fitting to place constraints on the stellar populations of 59 ultra-diffuse galaxies (UDGs) in the low-to-moderate density fields of the MATLAS survey. We use the routine PROSPECTOR, coupled with archival data in the optical from DECaLS, and near- and mid-infrared imaging from WISE, to recover the stellar masses, ages, metallicities and star formation timescales of the UDGs. We find that a subsample of the UDGs lies within the scatter of the mass-metallicity relation (MZR) for local classical dwarfs. However, another subsample is more metal-poor, being consistent with the evolving MZR at high-redshift. We investigate UDG positioning trends in the mass-metallicity plane as a function of surface brightness, effective radius, axis ratio, local volume density, mass-weighted age, star formation timescale, globular cluster (GC) counts and GC specific frequency. We find that our sample of UDGs can be separated into two main classes. Class A: Comprised of UDGs with lower stellar masses, prolonged star formation histories (SFHs), more elongated, inhabiting less dense environments, hosting fewer GCs, younger, consistent with the classical dwarf MZR, and fainter. Class B: UDGs with higher stellar masses, rapid SFHs, rounder, inhabiting the densest of our probed environments, hosting on average the most numerous GC systems, older, consistent with the high-redshift MZR (i.e., consistent with early-quenching), and brighter. The combination of these properties suggests that UDGs of Class A are consistent with a `puffed-up dwarf' formation scenario, while UDGs of Class B seem to be better explained by `failed galaxy' scenarios. △ Less

Submitted 19 February, 2024; originally announced February 2024.

Comments: Accepted for publication in MNRAS. 19 pages (+6 of appendices), 10 figures, 5 tables

Showing 1–50 of 867 results for author: Lim, S