Search | arXiv e-print repository

Unexpected changes in the band structure within AFM1 state of CeBi

Authors: Yevhen Kushnirenko, Brinda Kuthanazhi, Benjamin Schrunk, Evan O'Leary, Andrew Eaton, Robert-Jan Slager, Junyeong Ahn, Lin-Lin Wang, Paul C. Canfield, Adam Kaminski

Abstract: We perform angle-resolved photoemission spectroscopy (ARPES) measurements in conjunction with density functional theory (DFT) calculations to investigate the evolution of the electronic structure of CeBi upon a series of antiferromagnetic (AFM) transitions. We find evidence for a new AFM transition in addition to two previously known from transport studies. We demonstrate the development of an add… ▽ More We perform angle-resolved photoemission spectroscopy (ARPES) measurements in conjunction with density functional theory (DFT) calculations to investigate the evolution of the electronic structure of CeBi upon a series of antiferromagnetic (AFM) transitions. We find evidence for a new AFM transition in addition to two previously known from transport studies. We demonstrate the development of an additional Dirac state in the (+-+-) ordered phase and a transformation of unconventional surface-state pairs in the (++--) ordered phase. This revises the phase diagram of this intriguing material, where there are now three distinct AFM states below TN in zero magnetic field instead of two as it was previously thought. △ Less

Submitted 12 September, 2024; originally announced September 2024.

Comments: 16 pages, 4 figures

arXiv:2409.02846 [pdf, other]

MaDis-Stereo: Enhanced Stereo Matching via Distilled Masked Image Modeling

Authors: Jihye Ahn, Hyesong Choi, Soomin Kim, Dongbo Min

Abstract: In stereo matching, CNNs have traditionally served as the predominant architectures. Although Transformer-based stereo models have been studied recently, their performance still lags behind CNN-based stereo models due to the inherent data scarcity issue in the stereo matching task. In this paper, we propose Masked Image Modeling Distilled Stereo matching model, termed MaDis-Stereo, that enhances l… ▽ More In stereo matching, CNNs have traditionally served as the predominant architectures. Although Transformer-based stereo models have been studied recently, their performance still lags behind CNN-based stereo models due to the inherent data scarcity issue in the stereo matching task. In this paper, we propose Masked Image Modeling Distilled Stereo matching model, termed MaDis-Stereo, that enhances locality inductive bias by leveraging Masked Image Modeling (MIM) in training Transformer-based stereo model. Given randomly masked stereo images as inputs, our method attempts to conduct both image reconstruction and depth prediction tasks. While this strategy is beneficial to resolving the data scarcity issue, the dual challenge of reconstructing masked tokens and subsequently performing stereo matching poses significant challenges, particularly in terms of training stability. To address this, we propose to use an auxiliary network (teacher), updated via Exponential Moving Average (EMA), along with the original stereo model (student), where teacher predictions serve as pseudo supervisory signals to effectively distill knowledge into the student model. State-of-the-arts performance is achieved with the proposed method on several stereo matching such as ETH3D and KITTI 2015. Additionally, to demonstrate that our model effectively leverages locality inductive bias, we provide the attention distance measurement. △ Less

Submitted 4 September, 2024; originally announced September 2024.

arXiv:2409.02545 [pdf, other]

UniTT-Stereo: Unified Training of Transformer for Enhanced Stereo Matching

Authors: Soomin Kim, Hyesong Choi, Jihye Ahn, Dongbo Min

Abstract: Unlike other vision tasks where Transformer-based approaches are becoming increasingly common, stereo depth estimation is still dominated by convolution-based approaches. This is mainly due to the limited availability of real-world ground truth for stereo matching, which is a limiting factor in improving the performance of Transformer-based stereo approaches. In this paper, we propose UniTT-Stereo… ▽ More Unlike other vision tasks where Transformer-based approaches are becoming increasingly common, stereo depth estimation is still dominated by convolution-based approaches. This is mainly due to the limited availability of real-world ground truth for stereo matching, which is a limiting factor in improving the performance of Transformer-based stereo approaches. In this paper, we propose UniTT-Stereo, a method to maximize the potential of Transformer-based stereo architectures by unifying self-supervised learning used for pre-training with stereo matching framework based on supervised learning. To be specific, we explore the effectiveness of reconstructing features of masked portions in an input image and at the same time predicting corresponding points in another image from the perspective of locality inductive bias, which is crucial in training models with limited training data. Moreover, to address these challenging tasks of reconstruction-and-prediction, we present a new strategy to vary a masking ratio when training the stereo model with stereo-tailored losses. State-of-the-art performance of UniTT-Stereo is validated on various benchmarks such as ETH3D, KITTI 2012, and KITTI 2015 datasets. Lastly, to investigate the advantages of the proposed approach, we provide a frequency analysis of feature maps and the analysis of locality inductive bias based on attention maps. △ Less

Submitted 4 September, 2024; originally announced September 2024.

arXiv:2409.01141 [pdf, other]

Duplex: A Device for Large Language Models with Mixture of Experts, Grouped Query Attention, and Continuous Batching

Authors: Sungmin Yun, Kwanhee Kyung, Juhwan Cho, Jaewan Choi, Jongmin Kim, Byeongho Kim, Sukhan Lee, Kyomin Sohn, Jung Ho Ahn

Abstract: Large language models (LLMs) have emerged due to their capability to generate high-quality content across diverse contexts. To reduce their explosively increasing demands for computing resources, a mixture of experts (MoE) has emerged. The MoE layer enables exploiting a huge number of parameters with less computation. Applying state-of-the-art continuous batching increases throughput; however, it… ▽ More Large language models (LLMs) have emerged due to their capability to generate high-quality content across diverse contexts. To reduce their explosively increasing demands for computing resources, a mixture of experts (MoE) has emerged. The MoE layer enables exploiting a huge number of parameters with less computation. Applying state-of-the-art continuous batching increases throughput; however, it leads to frequent DRAM access in the MoE and attention layers. We observe that conventional computing devices have limitations when processing the MoE and attention layers, which dominate the total execution time and exhibit low arithmetic intensity (Op/B). Processing MoE layers only with devices targeting low-Op/B such as processing-in-memory (PIM) architectures is challenging due to the fluctuating Op/B in the MoE layer caused by continuous batching. To address these challenges, we propose Duplex, which comprises xPU tailored for high-Op/B and Logic-PIM to effectively perform low-Op/B operation within a single device. Duplex selects the most suitable processor based on the Op/B of each layer within LLMs. As the Op/B of the MoE layer is at least 1 and that of the attention layer has a value of 4-8 for grouped query attention, prior PIM architectures are not efficient, which place processing units inside DRAM dies and only target extremely low-Op/B (under one) operations. Based on recent trends, Logic-PIM adds more through-silicon vias (TSVs) to enable high-bandwidth communication between the DRAM die and the logic die and place powerful processing units on the logic die, which is best suited for handling low-Op/B operations ranging from few to a few dozens. To maximally utilize the xPU and Logic-PIM, we propose expert and attention co-processing. △ Less

Submitted 2 September, 2024; originally announced September 2024.

Comments: 15 pages, 16 figures, accepted at MICRO 2024

arXiv:2408.16225 [pdf, ps, other]

Global boundedness and blow-up in a repulsive chemotaxis-consumption system in higher dimensions

Authors: Jaewook Ahn, Kyungkeun Kang, Dongkwang Kim

Abstract: This paper investigates the repulsive chemotaxis-consumption model \begin{align*} \partial_t u &= \nabla \cdot (D(u) \nabla u) + \nabla \cdot (u \nabla v), \\ 0 &= Δでるたv - uv \end{align*} in an $n$-dimensional ball, $n \ge 3$, where the diffusion coefficient $D$ is an appropriate extension of the function $0\leξくしー\mapsto(1+ξくしー)^{m-1}$ for some $m>0$. Under the boundary conditions \begin{equation*} νにゅー\cdot… ▽ More This paper investigates the repulsive chemotaxis-consumption model \begin{align*} \partial_t u &= \nabla \cdot (D(u) \nabla u) + \nabla \cdot (u \nabla v), \\ 0 &= Δでるたv - uv \end{align*} in an $n$-dimensional ball, $n \ge 3$, where the diffusion coefficient $D$ is an appropriate extension of the function $0\leξくしー\mapsto(1+ξくしー)^{m-1}$ for some $m>0$. Under the boundary conditions \begin{equation*} νにゅー\cdot (D(u) \nabla u + u \nabla v) = 0 \quad\text{ and }\quad v = M>0,\end{equation*} we first demonstrate that for $m > 1$, or $m = 1$ with $0 < M < 2/(n-2)$, the system admits globally defined classical solutions that are uniformly bounded in time for any choice of sufficiently smooth radial initial data. This result is further extended to the case $0<m<1$ when $M$ is chosen to be sufficiently small, depending on the initial conditions. In contrast, it is shown that for $0 < m < \frac{2}{n}$, the system exhibits blow-up behavior for sufficiently large $M$. △ Less

Submitted 28 August, 2024; originally announced August 2024.

MSC Class: 35B44; 35K51; 92C17

arXiv:2408.10156 [pdf, other]

Stacking Polymorphism of PtSe$_{2}$: Its Implication to Layer-dependent Metal-insulator Transitions

Authors: Jeonghwan Ahn, Iuegyun Hong, Gwangyoung Lee, Hyeondeok Shin, Anouar Benali, Yongkyung Kwon, Jaron T. Krogel

Abstract: Using diffusion Monte Carlo (DMC) and density functional theory (DFT) calculations, we examined the structural stability and interlayer binding properties of PtSe$_2$, a representative transition metal dichalcogenide (TMD) with strong interlayer interaction. Our DMC results for the bilayer revealed that AA and AB-r stacking modes are nearly degenerate, highlighting the significant role of interlay… ▽ More Using diffusion Monte Carlo (DMC) and density functional theory (DFT) calculations, we examined the structural stability and interlayer binding properties of PtSe$_2$, a representative transition metal dichalcogenide (TMD) with strong interlayer interaction. Our DMC results for the bilayer revealed that AA and AB-r stacking modes are nearly degenerate, highlighting the significant role of interlayer hybridization in offsetting the energy cost due to larger interlayer separations in the AB-r mode. Additionally, our DMC-benchmarked DFT studies with the r$^2$SCAN+rVV10 functional demonstrated pronounced stacking polymorphism in few-layer PtSe$_2$, suggesting the potential for stacking faults and the formation of grain boundaries between different stacking domains which could develop metallic electronic structures. Thus this polymorphism, along with selenium vacancies, influences a layer-dependent metal-insulator transition observed in few-layer PtSe$_2$. Our findings emphasize the importance of both van der Waals interactions and interlayer hybridization in determining the phase stability and electronic properties of TMDs, advancing our understanding of their fundamental properties and refining theoretical models for practical applications in nanoelectronic devices. △ Less

Submitted 19 August, 2024; originally announced August 2024.

Comments: 20 pages. 6 figures

arXiv:2408.09696 [pdf, other]

Optimal Replenishment Strategy for Satellite Constellation with Dual Supply Modes

Authors: Jaewoo Kim, Jaemyung Ahn, Taehyun Sung

Abstract: This paper proposes a novel inventory management model for the replenishment strategy of a satellite mega-constellation incorporating dual supply modes: normal and auxiliary. The proposed framework employs an indirect channel for normal supply, wherein spare satellites are initially injected into a parking orbit before transferring to the target orbital plane via propulsion systems and orbital per… ▽ More This paper proposes a novel inventory management model for the replenishment strategy of a satellite mega-constellation incorporating dual supply modes: normal and auxiliary. The proposed framework employs an indirect channel for normal supply, wherein spare satellites are initially injected into a parking orbit before transferring to the target orbital plane via propulsion systems and orbital perturbations. Conversely, the auxiliary supply mode utilizes a direct channel, injecting spare satellites immediately into their designated orbital planes. The inventory management model is constructed using parametric replenishment policies: $\left(s,Q\right)$ and $\left(R_1,R_2,Q_1,Q_2\right)$ with a time window. Following this model, two optimization problems are addressed to construct the supply chain for satellite mega-constellation replenishment and to evaluate its performance. These are decision-making contexts from the perspectives of constellation operators and launch service providers. The case study showcases the practical applicability of the proposed model and optimization problems, yielding valuable insights for stakeholders in the spaceborne industry. △ Less

Submitted 8 September, 2024; v1 submitted 19 August, 2024; originally announced August 2024.

Comments: 41 pages, 10 figures

arXiv:2408.04208 [pdf, other]

Visibility Analysis of the Sun as Viewed from Multiple Spacecraft at the Sun-Earth Lagrange Points

Authors: Jinsung Lee, Sung-Hong Park, Arik Posner, Kyung-Suk Cho, Jaemyung Ahn

Abstract: Beyond the Sun-Earth line, spacecraft equipped with various solar telescopes are intended to be deployed at several different vantage points in the heliosphere to carry out coordinated, multi-view observations of the Sun and its dynamic activities. In this context, we investigate solar visibility by imaging instruments onboard the spacecraft orbiting the Sun-Earth Lagrange points L1, L4 and L5, re… ▽ More Beyond the Sun-Earth line, spacecraft equipped with various solar telescopes are intended to be deployed at several different vantage points in the heliosphere to carry out coordinated, multi-view observations of the Sun and its dynamic activities. In this context, we investigate solar visibility by imaging instruments onboard the spacecraft orbiting the Sun-Earth Lagrange points L1, L4 and L5, respectively. An optimal arrival time for vertical periodic orbits stationed at L4 and L5 is determined based on geometric considerations that ensure maximum visibility of solar poles or higher latitudes per year. For a different set of orbits around the three Lagrange points (L1, L4 and L5), we calculate the visibility of the solar surface (i.e., observation days per year) as a function of the solar latitude. We also analyze where the solar limb viewed from one of the three Sun-Earth Lagrange points under consideration is projected onto the solar surface visible to the other two. This analysis particularly aims at determining the feasibility of studying solar eruptions, such as flares and coronal mass ejections, with coordinated observations of off-limb erupting coronal structures and their on-disk magnetic footpoints. In addition, visibility analysis of a feature (such as sunspots) on the solar surface is made for multiple spacecraft in various types of orbits with different inclinations to quantify the improvement in continuous tracking of the target feature for studying its long-term evolution from emergence, growth and to decay. A comprehensive comparison of observations from single (L1), double (L1 and L4) and multi-space missions (L1, L4 and L5) is carried out through our solar visibility analysis, and this may help us to design future space missions of constructing multiple solar observatories at the Sun-Earth Lagrange points. △ Less

Submitted 8 August, 2024; originally announced August 2024.

arXiv:2407.18505 [pdf, other]

VoxSim: A perceptual voice similarity dataset

Authors: Junseok Ahn, Youkyum Kim, Yeunju Choi, Doyeop Kwak, Ji-Hoon Kim, Seongkyu Mun, Joon Son Chung

Abstract: This paper introduces VoxSim, a dataset of perceptual voice similarity ratings. Recent efforts to automate the assessment of speech synthesis technologies have primarily focused on predicting mean opinion score of naturalness, leaving speaker voice similarity relatively unexplored due to a lack of extensive training data. To address this, we generate about 41k utterance pairs from the VoxCeleb dat… ▽ More This paper introduces VoxSim, a dataset of perceptual voice similarity ratings. Recent efforts to automate the assessment of speech synthesis technologies have primarily focused on predicting mean opinion score of naturalness, leaving speaker voice similarity relatively unexplored due to a lack of extensive training data. To address this, we generate about 41k utterance pairs from the VoxCeleb dataset, a widely utilised speech dataset for speaker recognition, and collect nearly 70k speaker similarity scores through a listening test. VoxSim offers a valuable resource for the development and benchmarking of speaker similarity prediction models. We provide baseline results of speaker similarity prediction models on the VoxSim test set and further demonstrate that the model trained on our dataset generalises to the out-of-domain VCC2018 dataset. △ Less

Submitted 26 July, 2024; originally announced July 2024.

Comments: INTERSPEECH 2024. The dataset is available from https://mm.kaist.ac.kr/projects/voxsim/

arXiv:2407.16141 [pdf, other]

Stock-driven Household Attention

Authors: Hie Joo Ahn, Shihan Xie

Abstract: We investigate the effects of stockholding on households' attention to the macroeconomy. Households' attentiveness is measured by their accuracy of inflation expectations and perceptions. Relative to non-stockholders, stockholders produce more accurate inflation forecasts and backcasts, disagree less about future inflation, and adjust their outlook more responsively to news, suggesting that stock-… ▽ More We investigate the effects of stockholding on households' attention to the macroeconomy. Households' attentiveness is measured by their accuracy of inflation expectations and perceptions. Relative to non-stockholders, stockholders produce more accurate inflation forecasts and backcasts, disagree less about future inflation, and adjust their outlook more responsively to news, suggesting that stock-market participation raises households' attention. Frequent changes in stock prices incentivize stockholders to closely monitor financial markets for optimal trading, given the low cost of acquiring information. Consequently, paying attention to the macroeconomy helps hedge the risks associated with holding stocks. Therefore, attention heterogeneity driven by stockholdings can be a channel through which the distributional consequences of monetary policy are created. △ Less

Submitted 22 July, 2024; originally announced July 2024.

arXiv:2407.13055 [pdf, other]

Cheddar: A Swift Fully Homomorphic Encryption Library for CUDA GPUs

Authors: Jongmin Kim, Wonseok Choi, Jung Ho Ahn

Abstract: Fully homomorphic encryption (FHE) is a cryptographic technology capable of resolving security and privacy problems in cloud computing by encrypting data in use. However, FHE introduces tremendous computational overhead for processing encrypted data, causing FHE workloads to become 2-6 orders of magnitude slower than their unencrypted counterparts. To mitigate the overhead, we propose Cheddar, an… ▽ More Fully homomorphic encryption (FHE) is a cryptographic technology capable of resolving security and privacy problems in cloud computing by encrypting data in use. However, FHE introduces tremendous computational overhead for processing encrypted data, causing FHE workloads to become 2-6 orders of magnitude slower than their unencrypted counterparts. To mitigate the overhead, we propose Cheddar, an FHE library for CUDA GPUs, which demonstrates significantly faster performance compared to prior GPU implementations. We develop optimized functionalities at various implementation levels ranging from efficient low-level primitives to streamlined high-level operational sequences. Especially, we improve major FHE operations, including number-theoretic transform and base conversion, based on efficient kernel designs using a small word size of 32 bits. By these means, Cheddar demonstrates 2.9 to 25.6 times higher performance for representative FHE workloads compared to prior GPU implementations. △ Less

Submitted 17 July, 2024; originally announced July 2024.

Comments: 12 pages, 5 figures

arXiv:2407.11368 [pdf]

Ancient Korean Archive Translation: Comparison Analysis on Statistical phrase alignment, LLM in-context learning, and inter-methodological approach

Authors: Sojung Lucia Kim, Taehong Jang, Joonmo Ahn

Abstract: This study aims to compare three methods for translating ancient texts with sparse corpora: (1) the traditional statistical translation method of phrase alignment, (2) in-context LLM learning, and (3) proposed inter methodological approach - statistical machine translation method using sentence piece tokens derived from unified set of source-target corpus. The performance of the proposed approach… ▽ More This study aims to compare three methods for translating ancient texts with sparse corpora: (1) the traditional statistical translation method of phrase alignment, (2) in-context LLM learning, and (3) proposed inter methodological approach - statistical machine translation method using sentence piece tokens derived from unified set of source-target corpus. The performance of the proposed approach in this study is 36.71 in BLEU score, surpassing the scores of SOLAR-10.7B context learning and the best existing Seq2Seq model. Further analysis and discussion are presented. △ Less

Submitted 16 July, 2024; originally announced July 2024.

Comments: ACL2024 submitted

arXiv:2407.11017 [pdf, other]

Direct-Inverse Prompting: Analyzing LLMs' Discriminative Capacity in Self-Improving Generation

Authors: Jihyun Janice Ahn, Ryo Kamoi, Lu Cheng, Rui Zhang, Wenpeng Yin

Abstract: Mainstream LLM research has primarily focused on enhancing their generative capabilities. However, even the most advanced LLMs experience uncertainty in their outputs, often producing varied results on different runs or when faced with minor changes in input, despite no substantial change in content. Given multiple responses from the same LLM to the same input, we advocate leveraging the LLMs' dis… ▽ More Mainstream LLM research has primarily focused on enhancing their generative capabilities. However, even the most advanced LLMs experience uncertainty in their outputs, often producing varied results on different runs or when faced with minor changes in input, despite no substantial change in content. Given multiple responses from the same LLM to the same input, we advocate leveraging the LLMs' discriminative capability to reduce this generative uncertainty, aiding in identifying the correct answers. Specifically, we propose and analyze three discriminative prompts: direct, inverse, and hybrid, to explore the potential of both closed-source and open-source LLMs in self-improving their generative performance on two benchmark datasets. Our insights reveal which discriminative prompt is most promising and when to use it. To our knowledge, this is the first work to systematically analyze LLMs' discriminative capacity to address generative uncertainty. △ Less

Submitted 26 June, 2024; originally announced July 2024.

Comments: 4 pages, 3 tables

arXiv:2407.10558 [pdf, other]

ConTEXTure: Consistent Multiview Images to Texture

Authors: Jaehoon Ahn, Sumin Cho, Harim Jung, Kibeom Hong, Seonghoon Ban, Moon-Ryul Jung

Abstract: We introduce ConTEXTure, a generative network designed to create a texture map/atlas for a given 3D mesh using images from multiple viewpoints. The process begins with generating a front-view image from a text prompt, such as 'Napoleon, front view', describing the 3D mesh. Additional images from different viewpoints are derived from this front-view image and camera poses relative to it. ConTEXTure… ▽ More We introduce ConTEXTure, a generative network designed to create a texture map/atlas for a given 3D mesh using images from multiple viewpoints. The process begins with generating a front-view image from a text prompt, such as 'Napoleon, front view', describing the 3D mesh. Additional images from different viewpoints are derived from this front-view image and camera poses relative to it. ConTEXTure builds upon the TEXTure network, which uses text prompts for six viewpoints (e.g., 'Napoleon, front view', 'Napoleon, left view', etc.). However, TEXTure often generates images for non-front viewpoints that do not accurately represent those viewpoints.To address this issue, we employ Zero123++, which generates multiple view-consistent images for the six specified viewpoints simultaneously, conditioned on the initial front-view image and the depth maps of the mesh for the six viewpoints. By utilizing these view-consistent images, ConTEXTure learns the texture atlas from all viewpoint images concurrently, unlike previous methods that do so sequentially. This approach ensures that the rendered images from various viewpoints, including back, side, bottom, and top, are free from viewpoint irregularities. △ Less

Submitted 15 July, 2024; originally announced July 2024.

Comments: 11 pages, 7 figures

arXiv:2407.05883 [pdf, other]

A coarse Erdős-Pósa theorem

Authors: Jungho Ahn, J. Pascal Gollin, Tony Huynh, O-joung Kwon

Abstract: An \emph{induced packing} of cycles in a graph is a set of vertex-disjoint cycles with no edges between them. We generalise the classic Erdős-Pósa theorem to induced packings of cycles. More specifically, we show that there exists a function ${f(k) = \mathcal{O}(k \log k)}$ such that for every positive integer ${k}$, every graph $G$ contains either an induced packing of $k$ cycles or a set $X$ of… ▽ More An \emph{induced packing} of cycles in a graph is a set of vertex-disjoint cycles with no edges between them. We generalise the classic Erdős-Pósa theorem to induced packings of cycles. More specifically, we show that there exists a function ${f(k) = \mathcal{O}(k \log k)}$ such that for every positive integer ${k}$, every graph $G$ contains either an induced packing of $k$ cycles or a set $X$ of at most~${f(k)}$ vertices such that the closed neighbourhood of $X$ intersects all cycles in $G$. Our proof is constructive and yields a polynomial-time algorithm finding either the induced packing of cycles or the set $X$. Furthermore, we show that for every positive integer ${d}$, if a graph $G$ does not contain two cycles at distance more than $d$, then $G$ contains sets $X_1,X_2\subseteq V(G)$ with $\lvert {X_1} \rvert \leq12(d+1)$ and $\lvert {X_2} \rvert \leq12$ such that after removing the ball of radius $2d$ around $X_1$ or the ball of radius $3d$ around $X_2$, the resulting graphs are forests. As a corollary, we prove that every graph with no $K_{1,t}$ induced subgraph and no induced packing of $k$ cycles has tree-independence number at most $\mathcal{O}(tk\log k)$, and one can construct a corresponding tree-decomposition in polynomial time. This resolves a special case of a conjecture of Dallard et al. (arXiv:2402.11222), and implies that on such graphs, many NP-hard problems, such as Maximum Weight Independent Set, Maximum Weight Induced Matching, Graph Homomorphism, and Minimum Weight Feedback Vertex Set, are solvable in polynomial time. On the other hand, we show that the class of all graphs with no $K_{1,3}$ induced subgraph and no two cycles at distance more than $2$ has unbounded tree-independence number. △ Less

Submitted 8 July, 2024; originally announced July 2024.

Comments: 27 pages, 3 figures

MSC Class: 05C38; 05C70; 05C85

arXiv:2407.02026 [pdf, other]

Programming higher-order interactions of Rydberg atoms

Authors: Andrew Byun, Seokho Jeong, Jaewook Ahn

Abstract: Higher-order interactions in spin-based Hamiltonians are crucial in addressing numerous fundamentally significant physical problems. In this work, Rydberg-atom graph gadgets are introduced to effectively program $K$-th order interactions within a Rydberg atom system. This approach facilitates the determination of the ground states of an Ising-type Hamiltonian, encoded to solve higher-order unconst… ▽ More Higher-order interactions in spin-based Hamiltonians are crucial in addressing numerous fundamentally significant physical problems. In this work, Rydberg-atom graph gadgets are introduced to effectively program $K$-th order interactions within a Rydberg atom system. This approach facilitates the determination of the ground states of an Ising-type Hamiltonian, encoded to solve higher-order unconstrained optimization problems. A favorable scaling behavior, $O(N^K)$, is expected in terms of the number of atoms required for $N$-vertex hypergraph optimization problems. △ Less

Submitted 2 July, 2024; originally announced July 2024.

Comments: 10 pages, 5 figures

arXiv:2407.00965 [pdf, other]

Measurement of the integrated luminosity of data samples collected during 2019-2022 by the Belle II experiment

Authors: The Belle II Collaboration, I. Adachi, L. Aggarwal, H. Ahmed, J. K. Ahn, H. Aihara, N. Akopov, A. Aloisio, N. Althubiti, N. Anh Ky, D. M. Asner, H. Atmacan, T. Aushev, V. Aushev, M. Aversano, R. Ayad, V. Babu, H. Bae, S. Bahinipati, P. Bambade, Sw. Banerjee, M. Barrett, J. Baudot, A. Baur, A. Beaubien , et al. (382 additional authors not shown)

Abstract: A series of data samples was collected with the Belle II detector at the SuperKEKB collider from March 2019 to June 2022. We determine the integrated luminosities of these data samples using three distinct methodologies involving Bhabha ($e^+e^- \to e^+e^-(nγがんま)$), digamma ($e^+e^- \to γがんまγがんま(nγがんま)$), and dimuon ($e^+e^- \to μみゅー^+ μみゅー^- (nγがんま)$) events. The total integrated luminosity obtained with Bhabha, diga… ▽ More A series of data samples was collected with the Belle II detector at the SuperKEKB collider from March 2019 to June 2022. We determine the integrated luminosities of these data samples using three distinct methodologies involving Bhabha ($e^+e^- \to e^+e^-(nγがんま)$), digamma ($e^+e^- \to γがんまγがんま(nγがんま)$), and dimuon ($e^+e^- \to μみゅー^+ μみゅー^- (nγがんま)$) events. The total integrated luminosity obtained with Bhabha, digamma, and dimuon events is (426.52 $\pm$ 0.03 $\pm$ 2.48)~fb$^{-1}$, (427.32 $\pm$ 0.03 $\pm$ 2.56)~fb$^{-1}$, and (424.84 $\pm$ 0.04 $\pm$ 3.88)~fb$^{-1}$, where the first uncertainties are statistical and the second are systematic. The resulting total integrated luminosity obtained from the combination of the three methods is (426.88 $\pm$ 1.93)~fb$^{-1}$. △ Less

Submitted 1 July, 2024; originally announced July 2024.

Comments: 12 pages, 3 figures

Report number: Belle II Preprint 2024-019; KEK Preprint 2024-16

arXiv:2406.18750 [pdf, ps, other]

Stationary states of a chemotaxis consumption system with singular sensitivity and inhomogeneous boundary conditions

Authors: Jaewook Ahn, Johannes Lankeit

Abstract: For given total mass $m>0$ we show unique solvability of the stationary chemotaxis-consumption model \[ \begin{cases} 0= Δでるたu - χかい\nabla \cdot (\frac{u}{v} \nabla v) \\ 0= Δでるたv - uv \\ \int_Ωおめがu = m \end{cases} \] under no-flux-Dirichlet boundary conditions in bounded smooth domains $Ωおめが\subset \mathbb{R}^2$ and $Ωおめが=B_R(0)\subset \mathbb{R}^d$, $d\ge 3$. For given total mass $m>0$ we show unique solvability of the stationary chemotaxis-consumption model \[ \begin{cases} 0= Δでるたu - χかい\nabla \cdot (\frac{u}{v} \nabla v) \\ 0= Δでるたv - uv \\ \int_Ωおめがu = m \end{cases} \] under no-flux-Dirichlet boundary conditions in bounded smooth domains $Ωおめが\subset \mathbb{R}^2$ and $Ωおめが=B_R(0)\subset \mathbb{R}^d$, $d\ge 3$. △ Less

Submitted 26 June, 2024; originally announced June 2024.

MSC Class: 35J25; 92C17; 35Q92

arXiv:2406.15965 [pdf, other]

doi 10.1103/PhysRevD.110.032021

Search for charmed baryons in the $Λらむだ_c^+ηいーた$ system and measurement of the branching fractions of $Λらむだ_c(2880)^+$ and $Λらむだ_c(2940)^+$ decaying to $Λらむだ_c^+ηいーた$ and $pD^0$ relative to $Σしぐま_c(2455)πぱい$

Authors: Belle Collaboration, S. X. Li, C. P. Shen, I. Adachi, J. K. Ahn, H. Aihara, D. M. Asner, H. Atmacan, T. Aushev, R. Ayad, Sw. Banerjee, K. Belous, J. Bennett, M. Bessner, T. Bilka, D. Biswas, D. Bodrov, A. Bozek, M. Bračko, P. Branchini, T. E. Browder, A. Budano, M. Campajola, M. -C. Chang, B. G. Cheon , et al. (103 additional authors not shown)

Abstract: We search for excited charmed baryons in the $Λらむだ_c^+ηいーた$ system using a data sample corresponding to an integrated luminosity of 980 $\rm fb^{-1}$. The data were collected by the Belle detector at the KEKB $e^{+}$$e^{-}$ asymmetric-energy collider. No significant signals are found in the $Λらむだ_c^+ηいーた$ mass spectrum, including the known $Λらむだ_c(2880)^+$ and $Λらむだ_c(2940)^+$. Clear $Λらむだ_c(2880)^+$ and… ▽ More We search for excited charmed baryons in the $Λらむだ_c^+ηいーた$ system using a data sample corresponding to an integrated luminosity of 980 $\rm fb^{-1}$. The data were collected by the Belle detector at the KEKB $e^{+}$$e^{-}$ asymmetric-energy collider. No significant signals are found in the $Λらむだ_c^+ηいーた$ mass spectrum, including the known $Λらむだ_c(2880)^+$ and $Λらむだ_c(2940)^+$. Clear $Λらむだ_c(2880)^+$ and $Λらむだ_c(2940)^+$ signals are observed in the $pD^0$ mass spectrum. We set upper limits at 90\% credibility level on ratios of branching fractions of $Λらむだ_c(2880)^+$ and $Λらむだ_c(2940)^+$ decaying to $Λらむだ_c^+ηいーた$ relative to $Σしぐま_c(2455)πぱい$ of $<0.13$ for the $Λらむだ_c(2880)^+$ and $<1.11$ for the $Λらむだ_c(2940)^+$. We measure ratios of branching fractions of $Λらむだ_c(2880)^+$ and $Λらむだ_c(2940)^+$ decaying to $pD^0$ relative to $Σしぐま_c(2455)πぱい$ of $0.75 \pm 0.03(\text{stat.}) \pm 0.07(\text{syst.})$ for the $Λらむだ_c(2880)^+$ and $3.59 \pm 0.21(\text{stat.}) \pm 0.56(\text{syst.})$ for the $Λらむだ_c(2940)^+$. △ Less

Submitted 28 July, 2024; v1 submitted 22 June, 2024; originally announced June 2024.

Comments: 10 pages, 4 figures, accepted for publication as a Regular Article in Physical Review D

Report number: Belle Preprint: 2024-06;KEK Preprint: 2024-15

Journal ref: Phys. Rev. D 110, 032021 (2024)

arXiv:2406.15709 [pdf, other]

I Experienced More than 10 DeFi Scams: On DeFi Users' Perception of Security Breaches and Countermeasures

Authors: Mingyi Liu, Jun Ho Huh, HyungSeok Han, Jaehyuk Lee, Jihae Ahn, Frank Li, Hyoungshick Kim, Taesoo Kim

Abstract: Decentralized Finance (DeFi) offers a whole new investment experience and has quickly emerged as an enticing alternative to Centralized Finance (CeFi). Rapidly growing market size and active users, however, have also made DeFi a lucrative target for scams and hacks, with 1.95 billion USD lost in 2023. Unfortunately, no prior research thoroughly investigates DeFi users' security risk awareness leve… ▽ More Decentralized Finance (DeFi) offers a whole new investment experience and has quickly emerged as an enticing alternative to Centralized Finance (CeFi). Rapidly growing market size and active users, however, have also made DeFi a lucrative target for scams and hacks, with 1.95 billion USD lost in 2023. Unfortunately, no prior research thoroughly investigates DeFi users' security risk awareness levels and the adequacy of their risk mitigation strategies. Based on a semi-structured interview study (N = 14) and a follow-up survey (N = 493), this paper investigates DeFi users' security perceptions and commonly adopted practices, and how those affected by previous scams or hacks (DeFi victims) respond and try to recover their losses. Our analysis shows that users often prefer DeFi over CeFi due to their decentralized nature and strong profitability. Despite being aware that DeFi, compared to CeFi, is prone to more severe attacks, users are willing to take those risks to explore new investment opportunities. Worryingly, most victims do not learn from previous experiences; unlike victims studied through traditional systems, DeFi victims tend to find new services, without revising their security practices, to recover their losses quickly. The abundance of various DeFi services and opportunities allows victims to continuously explore new financial opportunities, and this reality seems to cloud their security priorities. Indeed, our results indicate that DeFi users' strong financial motivations outweigh their security concerns - much like those who are addicted to gambling. Our observations about victims' post-incident behaviors suggest that stronger control in the form of industry regulations would be necessary to protect DeFi users from future breaches. △ Less

Submitted 21 June, 2024; originally announced June 2024.

Comments: In Proceedings of the 33rd USENIX Security Symposium, Philadelphia, PA, USA, Aug. 2024

arXiv:2406.12233 [pdf, other]

SyncVSR: Data-Efficient Visual Speech Recognition with End-to-End Crossmodal Audio Token Synchronization

Authors: Young Jin Ahn, Jungwoo Park, Sangha Park, Jonghyun Choi, Kee-Eung Kim

Abstract: Visual Speech Recognition (VSR) stands at the intersection of computer vision and speech recognition, aiming to interpret spoken content from visual cues. A prominent challenge in VSR is the presence of homophenes-visually similar lip gestures that represent different phonemes. Prior approaches have sought to distinguish fine-grained visemes by aligning visual and auditory semantics, but often fel… ▽ More Visual Speech Recognition (VSR) stands at the intersection of computer vision and speech recognition, aiming to interpret spoken content from visual cues. A prominent challenge in VSR is the presence of homophenes-visually similar lip gestures that represent different phonemes. Prior approaches have sought to distinguish fine-grained visemes by aligning visual and auditory semantics, but often fell short of full synchronization. To address this, we present SyncVSR, an end-to-end learning framework that leverages quantized audio for frame-level crossmodal supervision. By integrating a projection layer that synchronizes visual representation with acoustic data, our encoder learns to generate discrete audio tokens from a video sequence in a non-autoregressive manner. SyncVSR shows versatility across tasks, languages, and modalities at the cost of a forward pass. Our empirical evaluations show that it not only achieves state-of-the-art results but also reduces data usage by up to ninefold. △ Less

Submitted 17 June, 2024; originally announced June 2024.

arXiv:2406.05963 [pdf, other]

Solution for SMART-101 Challenge of CVPR Multi-modal Algorithmic Reasoning Task 2024

Authors: Jinwoo Ahn, Junhyeok Park, Min-Jun Kim, Kang-Hyeon Kim, So-Yeong Sohn, Yun-Ji Lee, Du-Seong Chang, Yu-Jung Heo, Eun-Sol Kim

Abstract: In this paper, the solution of HYU MLLAB KT Team to the Multimodal Algorithmic Reasoning Task: SMART-101 CVPR 2024 Challenge is presented. Beyond conventional visual question-answering problems, the SMART-101 challenge aims to achieve human-level multimodal understanding by tackling complex visio-linguistic puzzles designed for children in the 6-8 age group. To solve this problem, we suggest two m… ▽ More In this paper, the solution of HYU MLLAB KT Team to the Multimodal Algorithmic Reasoning Task: SMART-101 CVPR 2024 Challenge is presented. Beyond conventional visual question-answering problems, the SMART-101 challenge aims to achieve human-level multimodal understanding by tackling complex visio-linguistic puzzles designed for children in the 6-8 age group. To solve this problem, we suggest two main ideas. First, to utilize the reasoning ability of a large-scale language model (LLM), the given visual cues (images) are grounded in the text modality. For this purpose, we generate highly detailed text captions that describe the context of the image and use these captions as input for the LLM. Second, due to the nature of puzzle images, which often contain various geometric visual patterns, we utilize an object detection algorithm to ensure these patterns are not overlooked in the captioning process. We employed the SAM algorithm, which can detect various-size objects, to capture the visual features of these geometric patterns and used this information as input for the LLM. Under the puzzle split configuration, we achieved an option selection accuracy Oacc of 29.5 on the test set and a weighted option selection accuracy (WOSA) of 27.1 on the challenge set. △ Less

Submitted 9 June, 2024; originally announced June 2024.

arXiv:2406.05602 [pdf, other]

Can Prompt Modifiers Control Bias? A Comparative Analysis of Text-to-Image Generative Models

Authors: Philip Wootaek Shin, Jihyun Janice Ahn, Wenpeng Yin, Jack Sampson, Vijaykrishnan Narayanan

Abstract: It has been shown that many generative models inherit and amplify societal biases. To date, there is no uniform/systematic agreed standard to control/adjust for these biases. This study examines the presence and manipulation of societal biases in leading text-to-image models: Stable Diffusion, DALL-E 3, and Adobe Firefly. Through a comprehensive analysis combining base prompts with modifiers and t… ▽ More It has been shown that many generative models inherit and amplify societal biases. To date, there is no uniform/systematic agreed standard to control/adjust for these biases. This study examines the presence and manipulation of societal biases in leading text-to-image models: Stable Diffusion, DALL-E 3, and Adobe Firefly. Through a comprehensive analysis combining base prompts with modifiers and their sequencing, we uncover the nuanced ways these AI technologies encode biases across gender, race, geography, and region/culture. Our findings reveal the challenges and potential of prompt engineering in controlling biases, highlighting the critical need for ethical AI development promoting diversity and inclusivity. This work advances AI ethics by not only revealing the nuanced dynamics of bias in text-to-image generation models but also by offering a novel framework for future research in controlling bias. Our contributions-panning comparative analyses, the strategic use of prompt modifiers, the exploration of prompt sequencing effects, and the introduction of a bias sensitivity taxonomy-lay the groundwork for the development of common metrics and standard analyses for evaluating whether and how future AI models exhibit and respond to requests to adjust for inherent biases. △ Less

Submitted 8 June, 2024; originally announced June 2024.

arXiv:2406.02753 [pdf, other]

Towards improved property prediction of two-dimensional (2D) materials using many-body Quantum Monte Carlo methods

Authors: Daniel Wines, Jeonghwan Ahn, Anouar Benali, Paul R. C. Kent, Jaron T. Krogel, Yongkyung Kwon, Lubos Mitas, Fernando A. Reboredo, Brenda Rubenstein, Kayahan Saritas, Hyeondeok Shin, Ivan Štich, Can Ataca

Abstract: The field of two-dimensional (2D) materials has grown dramatically in the last two decades. 2D materials can be utilized for a variety of next-generation optoelectronic, spintronic, clean energy, and quantum computation applications. These 2D structures, which are often exfoliated from layered van der Waals (vdW) materials, possess highly inhomogeneous electron densities and can possess short- and… ▽ More The field of two-dimensional (2D) materials has grown dramatically in the last two decades. 2D materials can be utilized for a variety of next-generation optoelectronic, spintronic, clean energy, and quantum computation applications. These 2D structures, which are often exfoliated from layered van der Waals (vdW) materials, possess highly inhomogeneous electron densities and can possess short- and long-range electron correlations. The complexities of 2D materials make them challenging to study with standard mean-field electronic structure methods such as density functional theory (DFT), which relies on approximations for the unknown exchange-correlation functional. In order to overcome the limitations of DFT, highly accurate many-body electronic structure approaches such as Diffusion Monte Carlo (DMC) can be utilized. In the past decade, DMC has been used to calculate accurate magnetic, electronic, excitonic, and topological properties in addition to accurately capturing interlayer interactions and cohesion and adsorption energetics of 2D materials. This approach has been applied to 2D systems of wide interest including graphene, phosphorene, MoS$_2$, CrI$_3$, VSe$_2$, GaSe, GeSe, borophene, and several others. In this review article, we highlight some successful recent applications of DMC to 2D systems for improved property predictions beyond standard DFT. △ Less

Submitted 4 June, 2024; originally announced June 2024.

arXiv:2406.01963 [pdf]

Diamond molecular balance: Revolutionizing high-resolution mass spectrometry from MDa to TDa at room temperature

Authors: Donggeun Lee, Seung-Woo Jeon, Chang-Hwan Yi, Yang-Hee Kim, Yeeun Choi, Sang-Hun Lee, Jinwoong Cha, Seung-Bo Shim, Junho Suh, Il-Young Kim, Dongyeon Daniel Kang, Hojoong Jung, Cherlhyun Jeong, Jae-pyoung Ahn, Hee Chul Park, Sang-Wook Han, Chulki Kim

Abstract: The significance of mass spectrometry lies in its unparalleled ability to accurately identify and quantify molecules in complex samples, providing invaluable insights into molecular structures and interactions. Here, we leverage diamond nanostructures as highly sensitive mass sensors by utilizing a self-excitation mechanism under an electron beam in a conventional scanning electron microscope (SEM… ▽ More The significance of mass spectrometry lies in its unparalleled ability to accurately identify and quantify molecules in complex samples, providing invaluable insights into molecular structures and interactions. Here, we leverage diamond nanostructures as highly sensitive mass sensors by utilizing a self-excitation mechanism under an electron beam in a conventional scanning electron microscope (SEM). The diamond molecular balance (DMB) exhibits an exceptional mass resolution of 0.36 MDa, based on its outstanding mechanical quality factor and frequency stability, along with an extensive dynamic range from MDa to TDa. This positions the DMB at the forefront of molecular balances operating at room temperature. Notably, the DMB demonstrates its ability to measure the mass of a single bacteriophage T4 by precisely locating the analyte on the device. These findings highlight the groundbreaking potential of the DMB as a revolutionary tool for mass spectrometry at room temperature. △ Less

Submitted 25 July, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

Comments: 16 pages, 4 figures

arXiv:2405.18027 [pdf, other]

TimeChara: Evaluating Point-in-Time Character Hallucination of Role-Playing Large Language Models

Authors: Jaewoo Ahn, Taehyun Lee, Junyoung Lim, Jin-Hwa Kim, Sangdoo Yun, Hwaran Lee, Gunhee Kim

Abstract: While Large Language Models (LLMs) can serve as agents to simulate human behaviors (i.e., role-playing agents), we emphasize the importance of point-in-time role-playing. This situates characters at specific moments in the narrative progression for three main reasons: (i) enhancing users' narrative immersion, (ii) avoiding spoilers, and (iii) fostering engagement in fandom role-playing. To accurat… ▽ More While Large Language Models (LLMs) can serve as agents to simulate human behaviors (i.e., role-playing agents), we emphasize the importance of point-in-time role-playing. This situates characters at specific moments in the narrative progression for three main reasons: (i) enhancing users' narrative immersion, (ii) avoiding spoilers, and (iii) fostering engagement in fandom role-playing. To accurately represent characters at specific time points, agents must avoid character hallucination, where they display knowledge that contradicts their characters' identities and historical timelines. We introduce TimeChara, a new benchmark designed to evaluate point-in-time character hallucination in role-playing LLMs. Comprising 10,895 instances generated through an automated pipeline, this benchmark reveals significant hallucination issues in current state-of-the-art LLMs (e.g., GPT-4o). To counter this challenge, we propose Narrative-Experts, a method that decomposes the reasoning steps and utilizes narrative experts to reduce point-in-time character hallucinations effectively. Still, our findings with TimeChara highlight the ongoing challenges of point-in-time character hallucination, calling for further study. △ Less

Submitted 28 May, 2024; originally announced May 2024.

Comments: ACL 2024 Findings. Code and dataset are released at https://ahnjaewoo.github.io/timechara

arXiv:2405.16303 [pdf, other]

Extended spin relaxation times of optically addressed telecom defects in silicon carbide

Authors: Jonghoon Ahn, Christina Wicker, Nolan Bitner, Michael T. Solomon, Benedikt Tissot, Guido Burkard, Alan M. Dibos, Jiefei Zhang, F. Joseph Heremans, David D. Awschalom

Abstract: Optically interfaced solid-state defects are promising candidates for quantum communication technologies. The ideal defect system would feature bright telecom emission, long-lived spin states, and a scalable material platform, simultaneously. Here, we employ one such system, vanadium (V4+) in silicon carbide (SiC), to establish a potential telecom spin-photon interface within a mature semiconducto… ▽ More Optically interfaced solid-state defects are promising candidates for quantum communication technologies. The ideal defect system would feature bright telecom emission, long-lived spin states, and a scalable material platform, simultaneously. Here, we employ one such system, vanadium (V4+) in silicon carbide (SiC), to establish a potential telecom spin-photon interface within a mature semiconductor host. This demonstration of efficient optical spin polarization and readout facilitates all optical measurements of temperature-dependent spin relaxation times (T1). With this technique, we lower the temperature from about 2K to 100 mK to observe a remarkable four-orders-of-magnitude increase in spin T1 from all measured sites, with site-specific values ranging from 57 ms to above 27 s. Furthermore, we identify the underlying relaxation mechanisms, which involve a two-phonon Orbach process, indicating the opportunity for strain-tuning to enable qubit operation at higher temperatures. These results position V4+ in SiC as a prime candidate for scalable quantum nodes in future quantum networks. △ Less

Submitted 25 May, 2024; originally announced May 2024.

Comments: 11 pages, 6 figures

arXiv:2405.11390 [pdf, other]

Search for Two-Body $B$ Meson Decays to $Λらむだ^{0}$ and $Ωおめが^{(*)0}_{c}$

Authors: Belle Collaboration, V. Savinov, I. Adachi, J. K. Ahn, H. Aihara, D. M. Asner, H. Atmacan, R. Ayad, Sw. Banerjee, J. Bennett, M. Bessner, V. Bhardwaj, D. Biswas, A. Bobrov, D. Bodrov, J. Borah, M. Bračko, P. Branchini, T. E. Browder, A. Budano, D. Červenkov, M. -C. Chang, P. Chang, B. G. Cheon, K. Cho , et al. (124 additional authors not shown)

Abstract: We report the results of the first search for Standard Model and baryon-number-violating two-body decays of the neutral $B$ mesons to $Λらむだ^{0}$ and $Ωおめが^{(*)0}_c$ using 711~${\rm fb^{-1}}$ of data collected at the $Υうぷしろん(4S)$ resonance with the Belle detector at the KEKB asymmetric-energy $e^+ e^-$ collider. We observe no evidence of signal from any such decays and set 95\% confidence-level upper limits o… ▽ More We report the results of the first search for Standard Model and baryon-number-violating two-body decays of the neutral $B$ mesons to $Λらむだ^{0}$ and $Ωおめが^{(*)0}_c$ using 711~${\rm fb^{-1}}$ of data collected at the $Υうぷしろん(4S)$ resonance with the Belle detector at the KEKB asymmetric-energy $e^+ e^-$ collider. We observe no evidence of signal from any such decays and set 95\% confidence-level upper limits on the products of $B^0$ and $\bar{B}^0$ branching fractions for these two-body decays with $\mathcal{B}(Ωおめが_{c}^{0} \to πぱい^+ Ωおめが^-)$ in the range between 9.5~$\times 10^{-8}$ and 31.2~$\times 10^{-8}$. △ Less

Submitted 18 May, 2024; originally announced May 2024.

Comments: 6 pages, 2 figures, submitted to PRD(L)

Report number: Belle Preprint 2024-04, KEK Preprint 2024-5

arXiv:2405.10272 [pdf, other]

Faces that Speak: Jointly Synthesising Talking Face and Speech from Text

Authors: Youngjoon Jang, Ji-Hoon Kim, Junseok Ahn, Doyeop Kwak, Hong-Sun Yang, Yoon-Cheol Ju, Il-Hwan Kim, Byeong-Yeol Kim, Joon Son Chung

Abstract: The goal of this work is to simultaneously generate natural talking faces and speech outputs from text. We achieve this by integrating Talking Face Generation (TFG) and Text-to-Speech (TTS) systems into a unified framework. We address the main challenges of each task: (1) generating a range of head poses representative of real-world scenarios, and (2) ensuring voice consistency despite variations… ▽ More The goal of this work is to simultaneously generate natural talking faces and speech outputs from text. We achieve this by integrating Talking Face Generation (TFG) and Text-to-Speech (TTS) systems into a unified framework. We address the main challenges of each task: (1) generating a range of head poses representative of real-world scenarios, and (2) ensuring voice consistency despite variations in facial motion for the same identity. To tackle these issues, we introduce a motion sampler based on conditional flow matching, which is capable of high-quality motion code generation in an efficient way. Moreover, we introduce a novel conditioning method for the TTS system, which utilises motion-removed features from the TFG model to yield uniform speech outputs. Our extensive experiments demonstrate that our method effectively creates natural-looking talking faces and speech that accurately match the input text. To our knowledge, this is the first effort to build a multimodal synthesis system that can generalise to unseen identities. △ Less

Submitted 16 May, 2024; originally announced May 2024.

Comments: CVPR 2024

arXiv:2405.02499 [pdf, other]

DRAMScope: Uncovering DRAM Microarchitecture and Characteristics by Issuing Memory Commands

Authors: Hwayong Nam, Seungmin Baek, Minbok Wi, Michael Jaemin Kim, Jaehyun Park, Chihun Song, Nam Sung Kim, Jung Ho Ahn

Abstract: The demand for precise information on DRAM microarchitectures and error characteristics has surged, driven by the need to explore processing in memory, enhance reliability, and mitigate security vulnerability. Nonetheless, DRAM manufacturers have disclosed only a limited amount of information, making it difficult to find specific information on their DRAM microarchitectures. This paper addresses t… ▽ More The demand for precise information on DRAM microarchitectures and error characteristics has surged, driven by the need to explore processing in memory, enhance reliability, and mitigate security vulnerability. Nonetheless, DRAM manufacturers have disclosed only a limited amount of information, making it difficult to find specific information on their DRAM microarchitectures. This paper addresses this gap by presenting more rigorous findings on the microarchitectures of commodity DRAM chips and their impacts on the characteristics of activate-induced bitflips (AIBs), such as RowHammer and RowPress. The previous studies have also attempted to understand the DRAM microarchitectures and associated behaviors, but we have found some of their results to be misled by inaccurate address mapping and internal data swizzling, or lack of a deeper understanding of the modern DRAM cell structure. For accurate and efficient reverse-engineering, we use three tools: AIBs, retention time test, and RowCopy, which can be cross-validated. With these three tools, we first take a macroscopic view of modern DRAM chips to uncover the size, structure, and operation of their subarrays, memory array tiles (MATs), and rows. Then, we analyze AIB characteristics based on the microscopic view of the DRAM microarchitecture, such as 6F^2 cell layout, through which we rectify misunderstandings regarding AIBs and discover a new data pattern that accelerates AIBs. Lastly, based on our findings at both macroscopic and microscopic levels, we identify previously unknown AIB vulnerabilities and propose a simple yet effective protection solution. △ Less

Submitted 3 May, 2024; originally announced May 2024.

Comments: To appear at the 51st IEEE/ACM International Symposium on Computer Architecture (ISCA)

arXiv:2404.14687 [pdf, other]

Pegasus-v1 Technical Report

Authors: Raehyuk Jung, Hyojun Go, Jaehyuk Yi, Jiho Jang, Daniel Kim, Jay Suh, Aiden Lee, Cooper Han, Jae Lee, Jeff Kim, Jin-Young Kim, Junwan Kim, Kyle Park, Lucas Lee, Mars Ha, Minjoon Seo, Abraham Jo, Ed Park, Hassan Kianinejad, SJ Kim, Tony Moon, Wade Jeong, Andrei Popescu, Esther Kim, EK Yoon , et al. (19 additional authors not shown)

Abstract: This technical report introduces Pegasus-1, a multimodal language model specialized in video content understanding and interaction through natural language. Pegasus-1 is designed to address the unique challenges posed by video data, such as interpreting spatiotemporal information, to offer nuanced video content comprehension across various lengths. This technical report overviews Pegasus-1's archi… ▽ More This technical report introduces Pegasus-1, a multimodal language model specialized in video content understanding and interaction through natural language. Pegasus-1 is designed to address the unique challenges posed by video data, such as interpreting spatiotemporal information, to offer nuanced video content comprehension across various lengths. This technical report overviews Pegasus-1's architecture, training strategies, and its performance in benchmarks on video conversation, zero-shot video question answering, and video summarization. We also explore qualitative characteristics of Pegasus-1 , demonstrating its capabilities as well as its limitations, in order to provide readers a balanced view of its current state and its future direction. △ Less

Submitted 22 April, 2024; originally announced April 2024.

arXiv:2404.12461 [pdf, other]

Exploring interlayer coupling in the twisted bilayer PtTe$_{2}$

Authors: Jeonghwan Ahn, Seoung-Hun Kang, Mina Yoon, Jaron T. Krogel

Abstract: We have investigated interlayer interactions in the bilayer PtTe$_{2}$ system, which influence the electronic energy bands near the Fermi levels. Our diffusion Monte Carlo (DMC) calculations for the high-symmetry bilayer stackings (AA, AB, AC) manifest distinct interlayer binding characteristics among the stacking modes by revealing significantly different interlayer separations depending on the s… ▽ More We have investigated interlayer interactions in the bilayer PtTe$_{2}$ system, which influence the electronic energy bands near the Fermi levels. Our diffusion Monte Carlo (DMC) calculations for the high-symmetry bilayer stackings (AA, AB, AC) manifest distinct interlayer binding characteristics among the stacking modes by revealing significantly different interlayer separations depending on the stackings, which is critical to understanding the interlayer coupling of the twisted bilayers consisting of various local stacking arrangements. Furthermore, a comparison between the interlayer separations obtained from DMC and density functional theory (DFT) shows that meta-GGA-based vdW-DFT results agree with DMC for different layer stackings, including twisted bilayers, but only the ground-state AA stacking matches well with GGA-based DFT predictions. This underscores the importance of accurate exchange-correlation potentials even for capturing the stacking-dependent interlayer binding properties. We further show that the variability in DFT-predicted interlayer separations is responsible for the large discrepancy of band structures in the 21.79$^{\circ}$-twisted bilayer PtTe$_{2}$, affecting its classification as metallic or semiconducting. These results demonstrate the importance of obtaining a correct description of stacking-dependent interlayer coupling in modeling delicate bilayer systems at finite twists. △ Less

Submitted 18 April, 2024; originally announced April 2024.

arXiv:2404.03602 [pdf, other]

Evaluating LLMs at Detecting Errors in LLM Responses

Authors: Ryo Kamoi, Sarkar Snigdha Sarathi Das, Renze Lou, Jihyun Janice Ahn, Yilun Zhao, Xiaoxin Lu, Nan Zhang, Yusen Zhang, Ranran Haoran Zhang, Sujeeth Reddy Vummanthala, Salika Dave, Shaobo Qin, Arman Cohan, Wenpeng Yin, Rui Zhang

Abstract: With Large Language Models (LLMs) being widely used across various tasks, detecting errors in their responses is increasingly crucial. However, little research has been conducted on error detection of LLM responses. Collecting error annotations on LLM responses is challenging due to the subjective nature of many NLP tasks, and thus previous research focuses on tasks of little practical value (e.g.… ▽ More With Large Language Models (LLMs) being widely used across various tasks, detecting errors in their responses is increasingly crucial. However, little research has been conducted on error detection of LLM responses. Collecting error annotations on LLM responses is challenging due to the subjective nature of many NLP tasks, and thus previous research focuses on tasks of little practical value (e.g., word sorting) or limited error types (e.g., faithfulness in summarization). This work introduces ReaLMistake, the first error detection benchmark consisting of objective, realistic, and diverse errors made by LLMs. ReaLMistake contains three challenging and meaningful tasks that introduce objectively assessable errors in four categories (reasoning correctness, instruction-following, context-faithfulness, and parameterized knowledge), eliciting naturally observed and diverse errors in responses of GPT-4 and Llama 2 70B annotated by experts. We use ReaLMistake to evaluate error detectors based on 12 LLMs. Our findings show: 1) Top LLMs like GPT-4 and Claude 3 detect errors made by LLMs at very low recall, and all LLM-based error detectors perform much worse than humans. 2) Explanations by LLM-based error detectors lack reliability. 3) LLMs-based error detection is sensitive to small changes in prompts but remains challenging to improve. 4) Popular approaches to improving LLMs, including self-consistency and majority vote, do not improve the error detection performance. Our benchmark and code are provided at https://github.com/psunlpgroup/ReaLMistake. △ Less

Submitted 27 July, 2024; v1 submitted 4 April, 2024; originally announced April 2024.

Comments: COLM 2024, 46 pages, Benchmark and code: https://github.com/psunlpgroup/ReaLMistake

arXiv:2404.02155 [pdf, other]

Alpha Invariance: On Inverse Scaling Between Distance and Volume Density in Neural Radiance Fields

Authors: Joshua Ahn, Haochen Wang, Raymond A. Yeh, Greg Shakhnarovich

Abstract: Scale-ambiguity in 3D scene dimensions leads to magnitude-ambiguity of volumetric densities in neural radiance fields, i.e., the densities double when scene size is halved, and vice versa. We call this property alpha invariance. For NeRFs to better maintain alpha invariance, we recommend 1) parameterizing both distance and volume densities in log space, and 2) a discretization-agnostic initializat… ▽ More Scale-ambiguity in 3D scene dimensions leads to magnitude-ambiguity of volumetric densities in neural radiance fields, i.e., the densities double when scene size is halved, and vice versa. We call this property alpha invariance. For NeRFs to better maintain alpha invariance, we recommend 1) parameterizing both distance and volume densities in log space, and 2) a discretization-agnostic initialization strategy to guarantee high ray transmittance. We revisit a few popular radiance field models and find that these systems use various heuristics to deal with issues arising from scene scaling. We test their behaviors and show our recipe to be more robust. △ Less

Submitted 16 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

Comments: CVPR 2024. project page https://pals.ttic.edu/p/alpha-invariance

arXiv:2404.01954 [pdf, other]

HyperCLOVA X Technical Report

Authors: Kang Min Yoo, Jaegeun Han, Sookyo In, Heewon Jeon, Jisu Jeong, Jaewook Kang, Hyunwook Kim, Kyung-Min Kim, Munhyong Kim, Sungju Kim, Donghyun Kwak, Hanock Kwak, Se Jung Kwon, Bado Lee, Dongsoo Lee, Gichang Lee, Jooho Lee, Baeseong Park, Seongjin Shin, Joonsang Yu, Seolki Baek, Sumin Byeon, Eungsup Cho, Dooseok Choe, Jeesung Han , et al. (371 additional authors not shown)

Abstract: We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment t… ▽ More We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment to responsible AI. The model is evaluated across various benchmarks, including comprehensive reasoning, knowledge, commonsense, factuality, coding, math, chatting, instruction-following, and harmlessness, in both Korean and English. HyperCLOVA X exhibits strong reasoning capabilities in Korean backed by a deep understanding of the language and cultural nuances. Further analysis of the inherent bilingual nature and its extension to multilingualism highlights the model's cross-lingual proficiency and strong generalization ability to untargeted languages, including machine translation between several language pairs and cross-lingual inference tasks. We believe that HyperCLOVA X can provide helpful guidance for regions or countries in developing their sovereign LLMs. △ Less

Submitted 13 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

Comments: 44 pages; updated authors list and fixed author names

arXiv:2403.20109 [pdf, ps, other]

Mol-AIR: Molecular Reinforcement Learning with Adaptive Intrinsic Rewards for Goal-directed Molecular Generation

Authors: Jinyeong Park, Jaegyoon Ahn, Jonghwan Choi, Jibum Kim

Abstract: Optimizing techniques for discovering molecular structures with desired properties is crucial in artificial intelligence(AI)-based drug discovery. Combining deep generative models with reinforcement learning has emerged as an effective strategy for generating molecules with specific properties. Despite its potential, this approach is ineffective in exploring the vast chemical space and optimizing… ▽ More Optimizing techniques for discovering molecular structures with desired properties is crucial in artificial intelligence(AI)-based drug discovery. Combining deep generative models with reinforcement learning has emerged as an effective strategy for generating molecules with specific properties. Despite its potential, this approach is ineffective in exploring the vast chemical space and optimizing particular chemical properties. To overcome these limitations, we present Mol-AIR, a reinforcement learning-based framework using adaptive intrinsic rewards for effective goal-directed molecular generation. Mol-AIR leverages the strengths of both history-based and learning-based intrinsic rewards by exploiting random distillation network and counting-based strategies. In benchmark tests, Mol-AIR demonstrates superior performance over existing approaches in generating molecules with desired properties without any prior knowledge, including penalized LogP, QED, and celecoxib similarity. We believe that Mol-AIR represents a significant advancement in drug discovery, offering a more efficient path to discovering novel therapeutics. △ Less

Submitted 29 March, 2024; originally announced March 2024.

arXiv:2403.14963 [pdf, other]

Enabling Physical Localization of Uncooperative Cellular Devices

Authors: Taekkyung Oh, Sangwook Bae, Junho Ahn, Yonghwa Lee, Dinh-Tuan Hoang, Min Suk Kang, Nils Ole Tippenhauer, Yongdae Kim

Abstract: In cellular networks, it can become necessary for authorities to physically locate user devices for tracking criminals or illegal devices. While cellular operators can provide authorities with cell information the device is camping on, fine-grained localization is still required. Therefore, the authorized agents trace the device by monitoring its uplink signals. However, tracking the uplink signal… ▽ More In cellular networks, it can become necessary for authorities to physically locate user devices for tracking criminals or illegal devices. While cellular operators can provide authorities with cell information the device is camping on, fine-grained localization is still required. Therefore, the authorized agents trace the device by monitoring its uplink signals. However, tracking the uplink signal source without its cooperation is challenging even for operators and authorities. Particularly, three challenges remain for fine-grained localization: i) localization works only if devices generate enough uplink traffic reliably over time, ii) the target device might generate its uplink traffic with significantly low power, and iii) cellular repeater may add too much noise to true uplink signals. While these challenges present practical hurdles for localization, they have been overlooked in prior works. In this work, we investigate the impact of these real-world challenges on cellular localization and propose an Uncooperative Multiangulation Attack (UMA) that addresses these challenges. UMA can 1) force a target device to transmit traffic continuously, 2) boost the target's signal strength to the maximum, and 3) uniquely distinguish traffic from the target and the repeaters. Notably, the UMA technique works without privilege on cellular operators or user devices, which makes it operate on any LTE network. Our evaluations show that UMA effectively resolves the challenges in real-world environments when devices are not cooperative for localization. Our approach exploits the current cellular design vulnerabilities, which we have responsibly disclosed to GSMA. △ Less

Submitted 25 March, 2024; v1 submitted 22 March, 2024; originally announced March 2024.

arXiv:2403.05591 [pdf, other]

Data-Driven Ergonomic Risk Assessment of Complex Hand-intensive Manufacturing Processes

Authors: Anand Krishnan, Xingjian Yang, Utsav Seth, Jonathan M. Jeyachandran, Jonathan Y. Ahn, Richard Gardner, Samuel F. Pedigo, Adriana, Blom-Schieber, Ashis G. Banerjee, Krithika Manohar

Abstract: Hand-intensive manufacturing processes, such as composite layup and textile draping, require significant human dexterity to accommodate task complexity. These strenuous hand motions often lead to musculoskeletal disorders and rehabilitation surgeries. We develop a data-driven ergonomic risk assessment system with a special focus on hand and finger activity to better identify and address ergonomic… ▽ More Hand-intensive manufacturing processes, such as composite layup and textile draping, require significant human dexterity to accommodate task complexity. These strenuous hand motions often lead to musculoskeletal disorders and rehabilitation surgeries. We develop a data-driven ergonomic risk assessment system with a special focus on hand and finger activity to better identify and address ergonomic issues related to hand-intensive manufacturing processes. The system comprises a multi-modal sensor testbed to collect and synchronize operator upper body pose, hand pose and applied forces; a Biometric Assessment of Complete Hand (BACH) formulation to measure high-fidelity hand and finger risks; and industry-standard risk scores associated with upper body posture, RULA, and hand activity, HAL. Our findings demonstrate that BACH captures injurious activity with a higher granularity in comparison to the existing metrics. Machine learning models are also used to automate RULA and HAL scoring, and generalize well to unseen participants. Our assessment system, therefore, provides ergonomic interpretability of the manufacturing processes studied, and could be used to mitigate risks through minor workplace optimization and posture corrections. △ Less

Submitted 5 March, 2024; originally announced March 2024.

Comments: 26 pages, 7 figures

arXiv:2403.05530 [pdf, other]

Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1110 additional authors not shown)

Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February version on the great majority of capabilities and benchmarks; (2) Gemini 1.5 Flash, a more lightweight variant designed for efficiency with minimal regression in quality. Gemini 1.5 models achieve near-perfect recall on long-context retrieval tasks across modalities, improve the state-of-the-art in long-document QA, long-video QA and long-context ASR, and match or surpass Gemini 1.0 Ultra's state-of-the-art performance across a broad set of benchmarks. Studying the limits of Gemini 1.5's long-context ability, we find continued improvement in next-token prediction and near-perfect retrieval (>99%) up to at least 10M tokens, a generational leap over existing models such as Claude 3.0 (200k) and GPT-4 Turbo (128k). Finally, we highlight real-world use cases, such as Gemini 1.5 collaborating with professionals on completing their tasks achieving 26 to 75% time savings across 10 different job categories, as well as surprising new capabilities of large language models at the frontier; when given a grammar manual for Kalamang, a language with fewer than 200 speakers worldwide, the model learns to translate English to Kalamang at a similar level to a person who learned from the same content. △ Less

Submitted 8 August, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

arXiv:2403.04340 [pdf, other]

Search for a pentaquark state decaying into $pJ/ψぷさい$ in $Υうぷしろん(1,2S)$ inclusive decays at Belle

Authors: Belle Collaboration, X. Dong, H. Y. Zhang, X. L. Wang, I. Adachi, J. K. Ahn, H. Aihara, S. Al Said, D. M. Asner, H. Atmacan, R. Ayad, S. Bahinipati, Sw. Banerjee, M. Bessner, V. Bhardwaj, D. Biswas, D. Bodrov, A. Bozek, M. Bračko, P. Branchini, T. E. Browder, A. Budano, M. Campajola, D. Červenkov, M. -C. Chang , et al. (139 additional authors not shown)

Abstract: Using the data samples of 102 million $Υうぷしろん(1S)$ and 158 million $Υうぷしろん(2S)$ events collected by the Belle detector, we search for a pentaquark state in the $pJ/ψぷさい$ final state from $Υうぷしろん(1,2S)$ inclusive decays. Here, the charge-conjugate $\bar{p}J/ψぷさい$ is included. We observe clear $pJ/ψぷさい$ production in $Υうぷしろん(1,2S)$ decays and measure the branching fractions to be… ▽ More Using the data samples of 102 million $Υうぷしろん(1S)$ and 158 million $Υうぷしろん(2S)$ events collected by the Belle detector, we search for a pentaquark state in the $pJ/ψぷさい$ final state from $Υうぷしろん(1,2S)$ inclusive decays. Here, the charge-conjugate $\bar{p}J/ψぷさい$ is included. We observe clear $pJ/ψぷさい$ production in $Υうぷしろん(1,2S)$ decays and measure the branching fractions to be $\mathcal{B}[Υうぷしろん(1S) \to pJ/ψぷさい+ anything] = [4.27 \pm 0.16(stat.) \pm 0.20(syst.)] \times 10^{-5}$ and $\mathcal{B}[Υうぷしろん(2S) \to pJ/ψぷさい+ anything] = [3.59 \pm 0.14(stat.) \pm 0.16(syst.)] \times 10^{-5}$. We also measure the cross section of inclusive $pJ/ψぷさい$ production in $e^+e^-$ annihilation to be $σしぐま(e^+e^- \to pJ/ψぷさい+ anything) = [57.5 \pm 2.1 (stat.) \pm 2.5(syst.)]$~fb at $\sqrt{s} = 10.52~\hbox{GeV}$ using an 89.5~fb$^{-1}$ continuum data sample. There is no significant $P_c(4312)^+$, $P_c(4440)^+$ or $P_c(4457)^+$ signal found in the $pJ/ψぷさい$ final states in $Υうぷしろん(1,2S)$ inclusive decays. We determine the upper limits of $\mathcal{B}[Υうぷしろん(1,2S)\to P_c^{+} + anything] \cdot \mathcal{B}(P_c^{+}\to pJ/ψぷさい)$ to be at the $10^{-6}$ level. △ Less

Submitted 11 March, 2024; v1 submitted 7 March, 2024; originally announced March 2024.

Report number: Belle Preprint 2024-02, KEK Preprint 2023-54

arXiv:2402.02648 [pdf, other]

Recursive Chain-of-Feedback Prevents Performance Degradation from Redundant Prompting

Authors: Jinwoo Ahn, Kyuseung Shin

Abstract: Large Language Models (LLMs) frequently struggle with complex reasoning tasks, failing to construct logically sound steps towards the solution. In response to this behavior, users often try prompting the LLMs repeatedly in hopes of reaching a better response. This paper studies such repetitive behavior and its effect by defining a novel setting, Chain-of-Feedback (CoF). The setting takes questions… ▽ More Large Language Models (LLMs) frequently struggle with complex reasoning tasks, failing to construct logically sound steps towards the solution. In response to this behavior, users often try prompting the LLMs repeatedly in hopes of reaching a better response. This paper studies such repetitive behavior and its effect by defining a novel setting, Chain-of-Feedback (CoF). The setting takes questions that require multi-step reasoning as an input. Upon response, we repetitively prompt meaningless feedback (e.g. 'make another attempt') requesting additional trials. Surprisingly, our preliminary results show that repeated meaningless feedback gradually decreases the quality of the responses, eventually leading to a larger deviation from the intended outcome. To alleviate these troubles, we propose a novel method, Recursive Chain-of-Feedback (R-CoF). Following the logic of recursion in computer science, R-CoF recursively revises the initially incorrect response by breaking down each incorrect reasoning step into smaller individual problems. Our preliminary results show that majority of questions that LLMs fail to respond correctly can be answered using R-CoF without any sample data outlining the logical process. △ Less

Submitted 1 March, 2024; v1 submitted 4 February, 2024; originally announced February 2024.

Comments: Still Ongoing Work; 8 Pages; 2 Figures

arXiv:2402.02447 [pdf, other]

Breaking MLPerf Training: A Case Study on Optimizing BERT

Authors: Yongdeok Kim, Jaehyung Ahn, Myeongwoo Kim, Changin Choi, Heejae Kim, Narankhuu Tuvshinjargal, Seungwon Lee, Yanzi Zhang, Yuan Pei, Xiongzhan Linghu, Jingkun Ma, Lin Chen, Yuehua Dai, Sungjoo Yoo

Abstract: Speeding up the large-scale distributed training is challenging in that it requires improving various components of training including load balancing, communication, optimizers, etc. We present novel approaches for fast large-scale training of BERT model which individually ameliorates each component thereby leading to a new level of BERT training performance. Load balancing is imperative in distri… ▽ More Speeding up the large-scale distributed training is challenging in that it requires improving various components of training including load balancing, communication, optimizers, etc. We present novel approaches for fast large-scale training of BERT model which individually ameliorates each component thereby leading to a new level of BERT training performance. Load balancing is imperative in distributed BERT training since its training datasets are characterized by samples with various lengths. Communication cost, which is proportional to the scale of distributed training, needs to be hidden by useful computation. In addition, the optimizers, e.g., ADAM, LAMB, etc., need to be carefully re-evaluated in the context of large-scale distributed training. We propose two new ideas, (1) local presorting based on dataset stratification for load balancing and (2) bucket-wise gradient clipping before allreduce which allows us to benefit from the overlap of gradient computation and synchronization as well as the fast training of gradient clipping before allreduce. We also re-evaluate existing optimizers via hyperparameter optimization and utilize ADAM, which also contributes to fast training via larger batches than existing methods. Our proposed methods, all combined, give the fastest MLPerf BERT training of 25.1 (22.3) seconds on 1,024 NVIDIA A100 GPUs, which is 1.33x (1.13x) and 1.57x faster than the other top two (one) submissions to MLPerf v1.1 (v2.0). Our implementation and evaluation results are available at MLPerf v1.1~v2.1. △ Less

Submitted 4 February, 2024; originally announced February 2024.

Comments: Total 15 pages (Appendix 3 pages)

arXiv:2402.00157 [pdf, other]

Large Language Models for Mathematical Reasoning: Progresses and Challenges

Authors: Janice Ahn, Rishu Verma, Renze Lou, Di Liu, Rui Zhang, Wenpeng Yin

Abstract: Mathematical reasoning serves as a cornerstone for assessing the fundamental cognitive capabilities of human intelligence. In recent times, there has been a notable surge in the development of Large Language Models (LLMs) geared towards the automated resolution of mathematical problems. However, the landscape of mathematical problem types is vast and varied, with LLM-oriented techniques undergoing… ▽ More Mathematical reasoning serves as a cornerstone for assessing the fundamental cognitive capabilities of human intelligence. In recent times, there has been a notable surge in the development of Large Language Models (LLMs) geared towards the automated resolution of mathematical problems. However, the landscape of mathematical problem types is vast and varied, with LLM-oriented techniques undergoing evaluation across diverse datasets and settings. This diversity makes it challenging to discern the true advancements and obstacles within this burgeoning field. This survey endeavors to address four pivotal dimensions: i) a comprehensive exploration of the various mathematical problems and their corresponding datasets that have been investigated; ii) an examination of the spectrum of LLM-oriented techniques that have been proposed for mathematical problem-solving; iii) an overview of factors and concerns affecting LLMs in solving math; and iv) an elucidation of the persisting challenges within this domain. To the best of our knowledge, this survey stands as one of the first extensive examinations of the landscape of LLMs in the realm of mathematics, providing a holistic perspective on the current state, accomplishments, and future challenges in this rapidly evolving field. △ Less

Submitted 5 April, 2024; v1 submitted 31 January, 2024; originally announced February 2024.

Comments: EACL 2024 Student Research Workshop, 8 pages

arXiv:2401.02639 [pdf, ps, other]

Spectral integral variation of signed graphs

Authors: Jungho Ahn, Cheolwon Heo, Sunyo Moon

Abstract: We characterize when the spectral variation of the signed Laplacian matrices is integral after a new edge is added to a signed graph. As an application, for every fixed signed complete graph, we fully characterize the class of signed graphs to which one can recursively add new edges keeping spectral integral variation to make the signed complete graph. We characterize when the spectral variation of the signed Laplacian matrices is integral after a new edge is added to a signed graph. As an application, for every fixed signed complete graph, we fully characterize the class of signed graphs to which one can recursively add new edges keeping spectral integral variation to make the signed complete graph. △ Less

Submitted 5 January, 2024; originally announced January 2024.

Comments: 18 pages, 0 figure

MSC Class: 05C22; 05C50; 15A18

arXiv:2312.15288 [pdf, other]

Understanding normalization in contrastive representation learning and out-of-distribution detection

Authors: Tai Le-Gia, Jaehyun Ahn

Abstract: Contrastive representation learning has emerged as an outstanding approach for anomaly detection. In this work, we explore the $\ell_2$-norm of contrastive features and its applications in out-of-distribution detection. We propose a simple method based on contrastive learning, which incorporates out-of-distribution data by discriminating against normal samples in the contrastive layer space. Our a… ▽ More Contrastive representation learning has emerged as an outstanding approach for anomaly detection. In this work, we explore the $\ell_2$-norm of contrastive features and its applications in out-of-distribution detection. We propose a simple method based on contrastive learning, which incorporates out-of-distribution data by discriminating against normal samples in the contrastive layer space. Our approach can be applied flexibly as an outlier exposure (OE) approach, where the out-of-distribution data is a huge collective of random images, or as a fully self-supervised learning approach, where the out-of-distribution data is self-generated by applying distribution-shifting transformations. The ability to incorporate additional out-of-distribution samples enables a feasible solution for datasets where AD methods based on contrastive learning generally underperform, such as aerial images or microscopy images. Furthermore, the high-quality features learned through contrastive learning consistently enhance performance in OE scenarios, even when the available out-of-distribution dataset is not diverse enough. Our extensive experiments demonstrate the superiority of our proposed method under various scenarios, including unimodal and multimodal settings, with various image datasets. △ Less

Submitted 8 April, 2024; v1 submitted 23 December, 2023; originally announced December 2023.

arXiv:2312.12488 [pdf, other]

Foreseeing Reconstruction Quality of Gradient Inversion: An Optimization Perspective

Authors: HyeongGwon Hong, Yooshin Cho, Hanbyel Cho, Jaesung Ahn, Junmo Kim

Abstract: Gradient inversion attacks can leak data privacy when clients share weight updates with the server in federated learning (FL). Existing studies mainly use L2 or cosine distance as the loss function for gradient matching in the attack. Our empirical investigation shows that the vulnerability ranking varies with the loss function used. Gradient norm, which is commonly used as a vulnerability proxy f… ▽ More Gradient inversion attacks can leak data privacy when clients share weight updates with the server in federated learning (FL). Existing studies mainly use L2 or cosine distance as the loss function for gradient matching in the attack. Our empirical investigation shows that the vulnerability ranking varies with the loss function used. Gradient norm, which is commonly used as a vulnerability proxy for gradient inversion attack, cannot explain this as it remains constant regardless of the loss function for gradient matching. In this paper, we propose a loss-aware vulnerability proxy (LAVP) for the first time. LAVP refers to either the maximum or minimum eigenvalue of the Hessian with respect to gradient matching loss at ground truth. This suggestion is based on our theoretical findings regarding the local optimization of the gradient inversion in proximity to the ground truth, which corresponds to the worst case attack scenario. We demonstrate the effectiveness of LAVP on various architectures and datasets, showing its consistent superiority over the gradient norm in capturing sample vulnerabilities. The performance of each proxy is measured in terms of Spearman's rank correlation with respect to several similarity scores. This work will contribute to enhancing FL security against any potential loss functions beyond L2 or cosine distance in the future. △ Less

Submitted 19 December, 2023; originally announced December 2023.

Comments: To appear in AAAI 2024

arXiv:2312.11881 [pdf, other]

Punctuation restoration Model and Spacing Model for Korean Ancient Document

Authors: Taehong Jang, Joonmo Ahn, Sojung Lucia Kim

Abstract: In Korean ancient documents, there is no spacing or punctuation, and they are written in classical Chinese characters. This makes it challenging for modern individuals and translation models to accurately interpret and translate them. While China has models predicting punctuation and spacing, applying them directly to Korean texts is problematic due to data differences. Therefore, we developed the… ▽ More In Korean ancient documents, there is no spacing or punctuation, and they are written in classical Chinese characters. This makes it challenging for modern individuals and translation models to accurately interpret and translate them. While China has models predicting punctuation and spacing, applying them directly to Korean texts is problematic due to data differences. Therefore, we developed the first models which predict punctuation and spacing for Korean historical texts and evaluated their performance. Our punctuation restoration model achieved an F1 score of 0.84, and Spacing model achieved a score of 0.96. It has the advantage of enabling inference on low-performance GPUs with less VRAM while maintaining quite high accuracy. △ Less

Submitted 19 December, 2023; originally announced December 2023.

Comments: 5 Pages, 2 Figures

arXiv:2312.11805 [pdf, other]

Gemini: A Family of Highly Capable Multimodal Models

Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultra model advances the state of the art in 30 of 32 of these benchmarks - notably being the first model to achieve human-expert performance on the well-studied exam benchmark MMLU, and improving the state of the art in every one of the 20 multimodal benchmarks we examined. We believe that the new capabilities of the Gemini family in cross-modal reasoning and language understanding will enable a wide variety of use cases. We discuss our approach toward post-training and deploying Gemini models responsibly to users through services including Gemini, Gemini Advanced, Google AI Studio, and Cloud Vertex AI. △ Less

Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

arXiv:2312.08703 [pdf, other]

doi 10.1103/PhysRevResearch.6.023241

A Rydberg-atom approach to the integer factorization problem

Authors: Juyoung Park, Seokho Jeong, Minhyuk Kim, Kangheun Kim, Andrew Byun, Louis Vignoli, Louis-Paul Henry, Loïc Henriet, Jaewook Ahn

Abstract: The task of factoring integers poses a significant challenge in modern cryptography, and quantum computing holds the potential to efficiently address this problem compared to classical algorithms. Thus, it is crucial to develop quantum computing algorithms to address this problem. This study introduces a quantum approach that utilizes Rydberg atoms to tackle the factorization problem. Experimental… ▽ More The task of factoring integers poses a significant challenge in modern cryptography, and quantum computing holds the potential to efficiently address this problem compared to classical algorithms. Thus, it is crucial to develop quantum computing algorithms to address this problem. This study introduces a quantum approach that utilizes Rydberg atoms to tackle the factorization problem. Experimental demonstrations are conducted for the factorization of small composite numbers such as $6 = 2 \times 3$, $15 = 3 \times 5$, and $35 = 5 \times 7$. This approach involves employing Rydberg-atom graphs to algorithmically program binary multiplication tables, yielding many-body ground states that represent superpositions of factoring solutions. Subsequently, these states are probed using quantum adiabatic computing. Limitations of this method are discussed, specifically addressing the scalability of current Rydberg quantum computing for the intricate computational problem. △ Less

Submitted 31 January, 2024; v1 submitted 14 December, 2023; originally announced December 2023.

Comments: 12 pages, 5 figures

arXiv:2312.08476 [pdf, other]

Enhanced Magnetization by Defect-Assisted Exciton Recombination in Atomically Thin CrCl$_3$

Authors: Xin-Yue Zhang, Thomas K. M. Graham, Hyeonhu Bae, Yu-Xuan Wang, Nazar Delegan, Jonghoon Ahn, Zhi-Cheng Wang, Jakub Regner, Kenji Watanabe, Takashi Taniguchi, Minkyung Jung, Zdeněk Sofer, Fazel Tafti, David D. Awschalom, F. Joseph Heremans, Binghai Yan, Brian B. Zhou

Abstract: Two dimensional (2D) semiconductors present unique opportunities to intertwine optical and magnetic functionalities and to tune these performances through defects and dopants. Here, we integrate exciton pumping into a quantum sensing protocol on nitrogen-vacancy centers in diamond to image the optically-induced transient stray fields in few-layer, antiferromagnetic CrCl$_3$. We discover that excit… ▽ More Two dimensional (2D) semiconductors present unique opportunities to intertwine optical and magnetic functionalities and to tune these performances through defects and dopants. Here, we integrate exciton pumping into a quantum sensing protocol on nitrogen-vacancy centers in diamond to image the optically-induced transient stray fields in few-layer, antiferromagnetic CrCl$_3$. We discover that exciton recombination enhances the in-plane magnetization of the CrCl$_3$ layers, with a predominant effect in the surface monolayers. Concomitantly, time-resolved photoluminescence measurements reveal that nonradiative exciton recombination intensifies in atomically thin CrCl$_3$ with tightly localized, nearly dipole-forbidden excitons and amplified surface-to-volume ratio. Supported by experiments under controlled surface exposure and density functional theory calculations, we interpret the magnetically enhanced state to result from a defect-assisted Auger recombination that optically activates electron transfer between water vapor related surface impurities and the spin-polarized conduction band. Our work validates defect engineering as a route to enhance intrinsic magnetism in single magnetic layers and opens a novel experimental platform for studying optically-induced, transient magnetism in condensed matter systems. △ Less

Submitted 26 August, 2024; v1 submitted 13 December, 2023; originally announced December 2023.

Comments: 11 pages, 8 figures

Showing 1–50 of 540 results for author: Ahn, J