Search | arXiv e-print repository

Designing Extremely Memory-Efficient CNNs for On-device Vision Tasks

Authors: Jaewook Lee, Yoel Park, Seulki Lee

Abstract: In this paper, we introduce a memory-efficient CNN (convolutional neural network), which enables resource-constrained low-end embedded and IoT devices to perform on-device vision tasks, such as image classification and object detection, using extremely low memory, i.e., only 63 KB on ImageNet classification. Based on the bottleneck block of MobileNet, we propose three design principles that signif… ▽ More In this paper, we introduce a memory-efficient CNN (convolutional neural network), which enables resource-constrained low-end embedded and IoT devices to perform on-device vision tasks, such as image classification and object detection, using extremely low memory, i.e., only 63 KB on ImageNet classification. Based on the bottleneck block of MobileNet, we propose three design principles that significantly curtail the peak memory usage of a CNN so that it can fit the limited KB memory of the low-end device. First, 'input segmentation' divides an input image into a set of patches, including the central patch overlapped with the others, reducing the size (and memory requirement) of a large input image. Second, 'patch tunneling' builds independent tunnel-like paths consisting of multiple bottleneck blocks per patch, penetrating through the entire model from an input patch to the last layer of the network, maintaining lightweight memory usage throughout the whole network. Lastly, 'bottleneck reordering' rearranges the execution order of convolution operations inside the bottleneck block such that the memory usage remains constant regardless of the size of the convolution output channels. The experiment result shows that the proposed network classifies ImageNet with extremely low memory (i.e., 63 KB) while achieving competitive top-1 accuracy (i.e., 61.58\%). To the best of our knowledge, the memory usage of the proposed network is far smaller than state-of-the-art memory-efficient networks, i.e., up to 89x and 3.1x smaller than MobileNet (i.e., 5.6 MB) and MCUNet (i.e., 196 KB), respectively. △ Less

Submitted 7 August, 2024; originally announced August 2024.

arXiv:2408.03541 [pdf, other]

EXAONE 3.0 7.8B Instruction Tuned Language Model

Authors: LG AI Research, :, Soyoung An, Kyunghoon Bae, Eunbi Choi, Stanley Jungkyu Choi, Yemuk Choi, Seokhee Hong, Yeonjung Hong, Junwon Hwang, Hyojin Jeon, Gerrard Jeongwon Jo, Hyunjik Jo, Jiyeon Jung, Yountae Jung, Euisoon Kim, Hyosang Kim, Joonkee Kim, Seonghwan Kim, Soyeon Kim, Sunkyoung Kim, Yireun Kim, Youchul Kim, Edward Hwayoung Lee, Haeju Lee , et al. (14 additional authors not shown)

Abstract: We introduce EXAONE 3.0 instruction-tuned language model, the first open model in the family of Large Language Models (LLMs) developed by LG AI Research. Among different model sizes, we publicly release the 7.8B instruction-tuned model to promote open research and innovations. Through extensive evaluations across a wide range of public and in-house benchmarks, EXAONE 3.0 demonstrates highly compet… ▽ More We introduce EXAONE 3.0 instruction-tuned language model, the first open model in the family of Large Language Models (LLMs) developed by LG AI Research. Among different model sizes, we publicly release the 7.8B instruction-tuned model to promote open research and innovations. Through extensive evaluations across a wide range of public and in-house benchmarks, EXAONE 3.0 demonstrates highly competitive real-world performance with instruction-following capability against other state-of-the-art open models of similar size. Our comparative analysis shows that EXAONE 3.0 excels particularly in Korean, while achieving compelling performance across general tasks and complex reasoning. With its strong real-world effectiveness and bilingual proficiency, we hope that EXAONE keeps contributing to advancements in Expert AI. Our EXAONE 3.0 instruction-tuned model is available at https://huggingface.co/LGAI-EXAONE/EXAONE-3.0-7.8B-Instruct △ Less

Submitted 8 August, 2024; v1 submitted 7 August, 2024; originally announced August 2024.

arXiv:2408.03171 [pdf]

Elastic and Inelastic Electron Scattering Cross Sections of Trichlorofluoromethane

Authors: Mareike Dinger, Yeunsoo Park, Woon Yong Baek

Abstract: Differential elastic electron scattering cross sections of trichlorofluoromethane $\mathrm{(CCl_3F)}$ were measured for the first time for electron energies between 30 eV and 800 eV in the angular range of 20° to 150°. The experimental results were compared with calculations using the IAM-SCAR+I model. Satisfactory agreements between both data sets were found for electron energies above 200 eV wit… ▽ More Differential elastic electron scattering cross sections of trichlorofluoromethane $\mathrm{(CCl_3F)}$ were measured for the first time for electron energies between 30 eV and 800 eV in the angular range of 20° to 150°. The experimental results were compared with calculations using the IAM-SCAR+I model. Satisfactory agreements between both data sets were found for electron energies above 200 eV within experimental uncertainties, whereas significant deviations of up to 100% were observed at electron energies below 60 eV. In addition to the measurements of differential elastic scattering cross sections, total inelastic scattering cross sections of $\mathrm{(CCl_3F)}$ were calculated using the spherical complex optical potential (SCOP) model. These calculations closely match experimental total ionization cross sections available in the literature for energies below 50 eV. The sum of the experimental total elastic and the theoretical total inelastic scattering cross sections aligns very well with the total electron scattering cross sections of $\mathrm{(CCl_3F)}$ measured by other groups across the entire energy range (30 eV to 800 eV), demonstrating the consistency among these three cross sections. △ Less

Submitted 6 August, 2024; originally announced August 2024.

Comments: 12 pages, 3 figures, submitted to Physical Review A

arXiv:2407.21035 [pdf, other]

Direct Unlearning Optimization for Robust and Safe Text-to-Image Models

Authors: Yong-Hyun Park, Sangdoo Yun, Jin-Hwa Kim, Junho Kim, Geonhui Jang, Yonghyun Jeong, Junghyo Jo, Gayoung Lee

Abstract: Recent advancements in text-to-image (T2I) models have greatly benefited from large-scale datasets, but they also pose significant risks due to the potential generation of unsafe content. To mitigate this issue, researchers have developed unlearning techniques to remove the model's ability to generate potentially harmful content. However, these methods are easily bypassed by adversarial attacks, m… ▽ More Recent advancements in text-to-image (T2I) models have greatly benefited from large-scale datasets, but they also pose significant risks due to the potential generation of unsafe content. To mitigate this issue, researchers have developed unlearning techniques to remove the model's ability to generate potentially harmful content. However, these methods are easily bypassed by adversarial attacks, making them unreliable for ensuring the safety of generated images. In this paper, we propose Direct Unlearning Optimization (DUO), a novel framework for removing Not Safe For Work (NSFW) content from T2I models while preserving their performance on unrelated topics. DUO employs a preference optimization approach using curated paired image data, ensuring that the model learns to remove unsafe visual concepts while retaining unrelated features. Furthermore, we introduce an output-preserving regularization term to maintain the model's generative capabilities on safe content. Extensive experiments demonstrate that DUO can robustly defend against various state-of-the-art red teaming methods without significant performance degradation on unrelated topics, as measured by FID and CLIP scores. Our work contributes to the development of safer and more reliable T2I models, paving the way for their responsible deployment in both closed-source and open-source scenarios. △ Less

Submitted 17 July, 2024; originally announced July 2024.

Comments: Extended abstract accepted in GenLaw 2024 workshop @ ICML2024

arXiv:2407.16186 [pdf, other]

Automatic Environment Shaping is the Next Frontier in RL

Authors: Younghyo Park, Gabriel B. Margolis, Pulkit Agrawal

Abstract: Many roboticists dream of presenting a robot with a task in the evening and returning the next morning to find the robot capable of solving the task. What is preventing us from achieving this? Sim-to-real reinforcement learning (RL) has achieved impressive performance on challenging robotics tasks, but requires substantial human effort to set up the task in a way that is amenable to RL. It's our p… ▽ More Many roboticists dream of presenting a robot with a task in the evening and returning the next morning to find the robot capable of solving the task. What is preventing us from achieving this? Sim-to-real reinforcement learning (RL) has achieved impressive performance on challenging robotics tasks, but requires substantial human effort to set up the task in a way that is amenable to RL. It's our position that algorithmic improvements in policy optimization and other ideas should be guided towards resolving the primary bottleneck of shaping the training environment, i.e., designing observations, actions, rewards and simulation dynamics. Most practitioners don't tune the RL algorithm, but other environment parameters to obtain a desirable controller. We posit that scaling RL to diverse robotic tasks will only be achieved if the community focuses on automating environment shaping procedures. △ Less

Submitted 23 July, 2024; originally announced July 2024.

Comments: ICML 2024 Position Track; Website at https://auto-env-shaping.github.io/

arXiv:2407.13919 [pdf, other]

A Multi-Messenger Search for Exotic Field Emission with a Global Magnetometer Network

Authors: Sami S. Khamis, Ibrahim A. Sulai, Paul Hamilton, S. Afach, B. C. Buchler, D. Budker, N. L. Figueroa, R. Folman, D. Gavilán-Martín, M. Givon, Z. D. Grujić, H. Guo, M. P. Hedges, D. F. Jackson Kimball, D. Kim, E. Klinger, T. Kornack, A. Kryemadhi, N. Kukowski, G. Lukasiewicz, H. Masia-Roig, M. Padniuk, C. A. Palm, S. Y. Park, X. Peng , et al. (16 additional authors not shown)

Abstract: We present an analysis method to search for exotic low-mass field (ELF) bursts generated during large energy astrophysical events such as supernovae, binary black hole or binary neutron star mergers, and fast radio bursts using the Global Network of Optical Magnetometers for Exotic physics searches (GNOME). In our model, the associated gravitational waves or electromagnetic signals herald the arri… ▽ More We present an analysis method to search for exotic low-mass field (ELF) bursts generated during large energy astrophysical events such as supernovae, binary black hole or binary neutron star mergers, and fast radio bursts using the Global Network of Optical Magnetometers for Exotic physics searches (GNOME). In our model, the associated gravitational waves or electromagnetic signals herald the arrival of the ELF burst that interacts via coupling to the spin of fermions in the magnetometers. This enables GNOME to serve as a tool for multi-messenger astronomy. The algorithm employs a model-agnostic excess-power method to identify network-wide candidate events to be subjected to a model-dependent generalized likelihood-ratio test to determine their statistical significance. We perform the first search with this technique on GNOME data coincident with the binary black hole merger S200311bg detected by LIGO/Virgo on the 11th of March 2020 and find no significant events. We place the first lab-based limits on combinations of ELF production and coupling parameters. △ Less

Submitted 18 July, 2024; originally announced July 2024.

arXiv:2407.12401 [pdf, other]

Geometric Remove-and-Retrain (GOAR): Coordinate-Invariant eXplainable AI Assessment

Authors: Yong-Hyun Park, Junghoon Seo, Bomseok Park, Seongsu Lee, Junghyo Jo

Abstract: Identifying the relevant input features that have a critical influence on the output results is indispensable for the development of explainable artificial intelligence (XAI). Remove-and-Retrain (ROAR) is a widely accepted approach for assessing the importance of individual pixels by measuring changes in accuracy following their removal and subsequent retraining of the modified dataset. However, w… ▽ More Identifying the relevant input features that have a critical influence on the output results is indispensable for the development of explainable artificial intelligence (XAI). Remove-and-Retrain (ROAR) is a widely accepted approach for assessing the importance of individual pixels by measuring changes in accuracy following their removal and subsequent retraining of the modified dataset. However, we uncover notable limitations in pixel-perturbation strategies. When viewed from a geometric perspective, we discover that these metrics fail to discriminate between differences among feature attribution methods, thereby compromising the reliability of the evaluation. To address this challenge, we introduce an alternative feature-perturbation approach named Geometric Remove-and-Retrain (GOAR). Through a series of experiments with both synthetic and real datasets, we substantiate that GOAR transcends the limitations of pixel-centric metrics. △ Less

Submitted 17 July, 2024; originally announced July 2024.

Comments: Accepted in XAI in Action Workshop @ NeurIPS2023

arXiv:2407.12227 [pdf, other]

Development of MMC-based lithium molybdate cryogenic calorimeters for AMoRE-II

Authors: A. Agrawal, V. V. Alenkov, P. Aryal, H. Bae, J. Beyer, B. Bhandari, R. S. Boiko, K. Boonin, O. Buzanov, C. R. Byeon, N. Chanthima, M. K. Cheoun, J. S. Choe, S. Choi, S. Choudhury, J. S. Chung, F. A. Danevich, M. Djamal, D. Drung, C. Enss, A. Fleischmann, A. M. Gangapshev, L. Gastaldo, Y. M. Gavrilyuk, A. M. Gezhaev , et al. (84 additional authors not shown)

Abstract: The AMoRE collaboration searches for neutrinoless double beta decay of $^{100}$Mo using molybdate scintillating crystals via low temperature thermal calorimetric detection. The early phases of the experiment, AMoRE-pilot and AMoRE-I, have demonstrated competitive discovery potential. Presently, the AMoRE-II experiment, featuring a large detector array with about 90 kg of $^{100}$Mo isotope, is und… ▽ More The AMoRE collaboration searches for neutrinoless double beta decay of $^{100}$Mo using molybdate scintillating crystals via low temperature thermal calorimetric detection. The early phases of the experiment, AMoRE-pilot and AMoRE-I, have demonstrated competitive discovery potential. Presently, the AMoRE-II experiment, featuring a large detector array with about 90 kg of $^{100}$Mo isotope, is under construction.This paper discusses the baseline design and characterization of the lithium molybdate cryogenic calorimeters to be used in the AMoRE-II detector modules. The results from prototype setups that incorporate new housing structures and two different crystal masses (316 g and 517 - 521 g), operated at 10 mK temperature, show energy resolutions (FWHM) of 7.55 - 8.82 keV at the 2.615 MeV $^{208}$Tl $γがんま$ line, and effective light detection of 0.79 - 0.96 keV/MeV. The simultaneous heat and light detection enables clear separation of alpha particles with a discrimination power of 12.37 - 19.50 at the energy region around $^6$Li(n, $αあるふぁ$)$^3$H with Q-value = 4.785 MeV. Promising detector performances were demonstrated at temperatures as high as 30 mK, which relaxes the temperature constraints for operating the large AMoRE-II array. △ Less

Submitted 16 July, 2024; originally announced July 2024.

arXiv:2407.10352 [pdf]

Signature of Orbital Driven Finite Momentum Pairing in a 3D Ising Superconductor

Authors: F. Z. Yang, H. D. Zhang, Saswata Mandal, F. Y. Meng, G. Fabbris, A. Said, P. Mercado Lozano, A. Rajapitamahuni, E. Vescovo, C. Nelson, S. Lin, Y. Park, E. M. Clements, T. Z. Ward, H. -N. Lee, H. C. Lei, C. X. Liu, H. Miao

Abstract: The finite momentum superconducting pairing states (FMPs), where Cooper pairs carry non-zero momentum, are believed to give rise to exotic physical phenomena including the pseudogap phase of cuprate high-Tc superconductors and Majorana fermions in topological superconductivity. FMPs can emerge in intertwined electronic liquids with strong spin-spin interactions or be induced by lifting the spin de… ▽ More The finite momentum superconducting pairing states (FMPs), where Cooper pairs carry non-zero momentum, are believed to give rise to exotic physical phenomena including the pseudogap phase of cuprate high-Tc superconductors and Majorana fermions in topological superconductivity. FMPs can emerge in intertwined electronic liquids with strong spin-spin interactions or be induced by lifting the spin degeneracy under magnetic field as originally proposed by Fulde-Ferrell and Larkin-Ovchinnikov. In quantum materials with strong Ising-type spin-orbit coupling, such as the 2D transition metal dichalcogenides (TMDs), the spin degree of freedom is frozen enabling novel orbital driven FMPs via magnetoelectric effect. While evidence of orbital driven FMPs has been revealed in bilayer TMDs, its realization in 3D bulk materials remains an unresolved challenge. Here we report experimental signatures of FMP in a locally noncentrosymmetric bulk superconductor 4Hb-TaS2. Using hard X-ray diffraction and angle-resolved photoemission spectroscopy, we reveal unusual 2D chiral charge density wave (CDW) and weak interlayer hopping in 4Hb-TaS2. Below the superconducting transition temperature, the upper critical field, Hc2, linearly increases via decreasing temperature, and well exceeds the Pauli limit, thus establishing the dominant orbital pair-breaking mechanism. Remarkably, we discover a field-induced superconductivity-to-superconductivity transition that breaks continuous rotational symmetry of the s-wave uniform pairing in the Bardeen-Cooper-Schrieffer theory down to the six-fold rotation symmetry. Combining with a Ginzburg-Landau free energy analysis that incorporates magnetoelectric effect, our observations provide strong evidence of orbital driven FMP in the 3D quantum heterostructure 4Hb-TaS2. △ Less

Submitted 28 July, 2024; v1 submitted 14 July, 2024; originally announced July 2024.

arXiv:2407.09111 [pdf, other]

Inference Optimization of Foundation Models on AI Accelerators

Authors: Youngsuk Park, Kailash Budhathoki, Liangfu Chen, Jonas Kübler, Jiaji Huang, Matthäus Kleindessner, Jun Huan, Volkan Cevher, Yida Wang, George Karypis

Abstract: Powerful foundation models, including large language models (LLMs), with Transformer architectures have ushered in a new era of Generative AI across various industries. Industry and research community have witnessed a large number of new applications, based on those foundation models. Such applications include question and answer, customer services, image and video generation, and code completions… ▽ More Powerful foundation models, including large language models (LLMs), with Transformer architectures have ushered in a new era of Generative AI across various industries. Industry and research community have witnessed a large number of new applications, based on those foundation models. Such applications include question and answer, customer services, image and video generation, and code completions, among others. However, as the number of model parameters reaches to hundreds of billions, their deployment incurs prohibitive inference costs and high latency in real-world scenarios. As a result, the demand for cost-effective and fast inference using AI accelerators is ever more higher. To this end, our tutorial offers a comprehensive discussion on complementary inference optimization techniques using AI accelerators. Beginning with an overview of basic Transformer architectures and deep learning system frameworks, we deep dive into system optimization techniques for fast and memory-efficient attention computations and discuss how they can be implemented efficiently on AI accelerators. Next, we describe architectural elements that are key for fast transformer inference. Finally, we examine various model compression and fast decoding strategies in the same context. △ Less

Submitted 12 July, 2024; originally announced July 2024.

Comments: Tutorial published at KDD 2024. Camera-ready version

arXiv:2407.06149 [pdf, other]

What Is Being Argued (WIBA)? An Application to Legislative Deliberation in the U.S. Congress

Authors: Arman Irani, Ju Yeon Park, Kevin Esterling, Michalis Faloutsos

Abstract: How can we utilize state-of-the-art NLP tools to better understand legislative deliberation? Committee hearings are a core feature of any legislature, and they offer an institutional setting which promotes the exchange of arguments and reasoning that directly impact and shape legislation. We apply What Is Being Argued (WIBA), which is an argument extraction and analysis framework that we previousl… ▽ More How can we utilize state-of-the-art NLP tools to better understand legislative deliberation? Committee hearings are a core feature of any legislature, and they offer an institutional setting which promotes the exchange of arguments and reasoning that directly impact and shape legislation. We apply What Is Being Argued (WIBA), which is an argument extraction and analysis framework that we previously developed, to U.S. Congressional committee hearings from 2005 to 2023 (109th to 117th Congresses). Then, we further expand WIBA by introducing new ways to quantify various dynamics of democratic deliberation. Specifically, these extensions present a variety of summary statistics capturing how deliberative or controversial a discourse was, as well as useful visualizations to the WIBA output that aid analyzing arguments made during the legislative deliberation. Our application reveals potential biases in the committee system, and how political parties control the flow of information in 'hot topic' hearings. △ Less

Submitted 15 July, 2024; v1 submitted 8 July, 2024; originally announced July 2024.

Comments: Paper for presentation at Polmeth 2024. Updated version with additional methodology sections and analysis results

arXiv:2407.05618 [pdf, other]

Improved limit on neutrinoless double beta decay of \mohundred~from AMoRE-I

Authors: A. Agrawal, V. V. Alenkov, P. Aryal, J. Beyer, B. Bhandari, R. S. Boiko, K. Boonin, O. Buzanov, C. R. Byeon, N. Chanthima, M. K. Cheoun, J. S. Choe, Seonho Choi, S. Choudhury, J. S. Chung, F. A. Danevich, M. Djamal, D. Drung, C. Enss, A. Fleischmann, A. M. Gangapshev, L. Gastaldo, Y. M. Gavrilyuk, A. M. Gezhaev, O. Gileva , et al. (83 additional authors not shown)

Abstract: AMoRE searches for the signature of neutrinoless double beta decay of $^{100}$Mo with a 100 kg sample of enriched $^{100}$Mo. Scintillating molybdate crystals coupled with a metallic magnetic calorimeter operate at milli-Kelvin temperatures to measure the energy of electrons emitted in the decay. As a demonstration of the full-scale AMoRE, we conducted AMoRE-I, a pre-experiment with 18 molybdate c… ▽ More AMoRE searches for the signature of neutrinoless double beta decay of $^{100}$Mo with a 100 kg sample of enriched $^{100}$Mo. Scintillating molybdate crystals coupled with a metallic magnetic calorimeter operate at milli-Kelvin temperatures to measure the energy of electrons emitted in the decay. As a demonstration of the full-scale AMoRE, we conducted AMoRE-I, a pre-experiment with 18 molybdate crystals, at the Yangyang Underground Laboratory for over two years. The exposure was 8.02 kg$\cdot$year (or 3.89 kg$_{\mathrm{^{100}Mo}}\cdot$year) and the total background rate near the Q-value was 0.025 $\pm$ 0.002 counts/keV/kg/year. We observed no indication of $0νββ$ decay and report a new lower limit of the half-life of $^{100}$Mo $0νββ$ decay as $ T^{0νにゅー}_{1/2}>3.0\times10^{24}~\mathrm{years}$ at 90\% confidence level. The effective Majorana mass limit range is $m_{βべーたβべーた}<$(210--610) meV using nuclear matrix elements estimated in the framework of different models, including the recent shell model calculations. △ Less

Submitted 8 July, 2024; originally announced July 2024.

Comments: 7 pages, 4 figures

arXiv:2407.05467 [pdf, other]

The infrastructure powering IBM's Gen AI model development

Authors: Talia Gershon, Seetharami Seelam, Brian Belgodere, Milton Bonilla, Lan Hoang, Danny Barnett, I-Hsin Chung, Apoorve Mohan, Ming-Hung Chen, Lixiang Luo, Robert Walkup, Constantinos Evangelinos, Shweta Salaria, Marc Dombrowa, Yoonho Park, Apo Kayi, Liran Schour, Alim Alim, Ali Sydney, Pavlos Maniotis, Laurent Schares, Bernard Metzler, Bengi Karacali-Akyamac, Sophia Wen, Tatsuhiro Chiba , et al. (121 additional authors not shown)

Abstract: AI Infrastructure plays a key role in the speed and cost-competitiveness of developing and deploying advanced AI models. The current demand for powerful AI infrastructure for model training is driven by the emergence of generative AI and foundational models, where on occasion thousands of GPUs must cooperate on a single training job for the model to be trained in a reasonable time. Delivering effi… ▽ More AI Infrastructure plays a key role in the speed and cost-competitiveness of developing and deploying advanced AI models. The current demand for powerful AI infrastructure for model training is driven by the emergence of generative AI and foundational models, where on occasion thousands of GPUs must cooperate on a single training job for the model to be trained in a reasonable time. Delivering efficient and high-performing AI training requires an end-to-end solution that combines hardware, software and holistic telemetry to cater for multiple types of AI workloads. In this report, we describe IBM's hybrid cloud infrastructure that powers our generative AI model development. This infrastructure includes (1) Vela: an AI-optimized supercomputing capability directly integrated into the IBM Cloud, delivering scalable, dynamic, multi-tenant and geographically distributed infrastructure for large-scale model training and other AI workflow steps and (2) Blue Vela: a large-scale, purpose-built, on-premises hosting environment that is optimized to support our largest and most ambitious AI model training tasks. Vela provides IBM with the dual benefit of high performance for internal use along with the flexibility to adapt to an evolving commercial landscape. Blue Vela provides us with the benefits of rapid development of our largest and most ambitious models, as well as future-proofing against the evolving model landscape in the industry. Taken together, they provide IBM with the ability to rapidly innovate in the development of both AI models and commercial offerings. △ Less

Submitted 7 July, 2024; originally announced July 2024.

Comments: Corresponding Authors: Talia Gershon, Seetharami Seelam,Brian Belgodere, Milton Bonilla

arXiv:2407.04597 [pdf, other]

Feature Attenuation of Defective Representation Can Resolve Incomplete Masking on Anomaly Detection

Authors: YeongHyeon Park, Sungho Kang, Myung Jin Kim, Hyeong Seok Kim, Juneho Yi

Abstract: In unsupervised anomaly detection (UAD) research, while state-of-the-art models have reached a saturation point with extensive studies on public benchmark datasets, they adopt large-scale tailor-made neural networks (NN) for detection performance or pursued unified models for various tasks. Towards edge computing, it is necessary to develop a computationally efficient and scalable solution that av… ▽ More In unsupervised anomaly detection (UAD) research, while state-of-the-art models have reached a saturation point with extensive studies on public benchmark datasets, they adopt large-scale tailor-made neural networks (NN) for detection performance or pursued unified models for various tasks. Towards edge computing, it is necessary to develop a computationally efficient and scalable solution that avoids large-scale complex NNs. Motivated by this, we aim to optimize the UAD performance with minimal changes to NN settings. Thus, we revisit the reconstruction-by-inpainting approach and rethink to improve it by analyzing strengths and weaknesses. The strength of the SOTA methods is a single deterministic masking approach that addresses the challenges of random multiple masking that is inference latency and output inconsistency. Nevertheless, the issue of failure to provide a mask to completely cover anomalous regions is a remaining weakness. To mitigate this issue, we propose Feature Attenuation of Defective Representation (FADeR) that only employs two MLP layers which attenuates feature information of anomaly reconstruction during decoding. By leveraging FADeR, features of unseen anomaly patterns are reconstructed into seen normal patterns, reducing false alarms. Experimental results demonstrate that FADeR achieves enhanced performance compared to similar-scale NNs. Furthermore, our approach exhibits scalability in performance enhancement when integrated with other single deterministic masking methods in a plug-and-play manner. △ Less

Submitted 5 July, 2024; originally announced July 2024.

Comments: 11 pages, 6 figures, 5 tables

arXiv:2407.04529 [pdf, other]

doi 10.1016/j.physletb.2024.138828

Spectroscopy of deeply bound orbitals in neutron-rich Ca isotopes

Authors: P. J. Li, J. Lee, P. Doornenbal, S. Chen, S. Wang, A. Obertelli, Y. Chazono, J. D. Holt, B. S. Hu, K. Ogata, Y. Utsuno, K. Yoshida, N. L. Achouri, H. Baba, F. Browne, D. Calvet, F. Château, N. Chiga, A. Corsi, M. L. Cortés, A. Delbart, J-M. Gheller, A. Giganon, A. Gillibert, C. Hilaire , et al. (63 additional authors not shown)

Abstract: The calcium isotopes are an ideal system to investigate the evolution of shell structure and magic numbers. Although the properties of surface nucleons in calcium have been well studied, probing the structure of deeply bound nucleons remains a challenge. Here, we report on the first measurement of unbound states in $^{53}$Ca and $^{55}$Ca, populated from \ts{54,56}Ca($p,pn$) reactions at a beam en… ▽ More The calcium isotopes are an ideal system to investigate the evolution of shell structure and magic numbers. Although the properties of surface nucleons in calcium have been well studied, probing the structure of deeply bound nucleons remains a challenge. Here, we report on the first measurement of unbound states in $^{53}$Ca and $^{55}$Ca, populated from \ts{54,56}Ca($p,pn$) reactions at a beam energy of around 216 MeV/nucleon at the RIKEN Radioactive Isotopes Beam Factory. The resonance properties, partial cross sections, and momentum distributions of these unbound states were analyzed. Orbital angular momentum $l$ assignments were extracted from momentum distributions based on calculations using the distorted wave impulse approximation (DWIA) reaction model. The resonances at excitation energies of 5516(41)\,keV in $^{53}$Ca and 6000(250)\,keV in $^{55}$Ca indicate a significant $l$\, =\,3 component, providing the first experimental evidence for the $νにゅー0f_{7/2}$ single-particle strength of unbound hole states in the neutron-rich Ca isotopes. The observed excitation energies and cross-sections point towards extremely localized and well separated strength distributions, with some fragmentation for the $νにゅー0f_{7/2}$ orbital in $^{55}$Ca. These results are in good agreement with predictions from shell-model calculations using the effective GXPF1Bs interaction and \textit{ab initio} calculations and diverge markedly from the experimental distributions in the nickel isotones at $Z=28$. △ Less

Submitted 5 July, 2024; originally announced July 2024.

Comments: 13 pages, 7 figures

Journal ref: Phys. Lett. B, 855 (2024),138828

arXiv:2407.02472 [pdf, other]

ValueScope: Unveiling Implicit Norms and Values via Return Potential Model of Social Interactions

Authors: Chan Young Park, Shuyue Stella Li, Hayoung Jung, Svitlana Volkova, Tanushree Mitra, David Jurgens, Yulia Tsvetkov

Abstract: This study introduces ValueScope, a framework leveraging language models to quantify social norms and values within online communities, grounded in social science perspectives on normative structures. We employ ValueScope to dissect and analyze linguistic and stylistic expressions across 13 Reddit communities categorized under gender, politics, science, and finance. Our analysis provides a quantit… ▽ More This study introduces ValueScope, a framework leveraging language models to quantify social norms and values within online communities, grounded in social science perspectives on normative structures. We employ ValueScope to dissect and analyze linguistic and stylistic expressions across 13 Reddit communities categorized under gender, politics, science, and finance. Our analysis provides a quantitative foundation showing that even closely related communities exhibit remarkably diverse norms. This diversity supports existing theories and adds a new dimension--community preference--to understanding community interactions. ValueScope not only delineates differing social norms among communities but also effectively traces their evolution and the influence of significant external events like the U.S. presidential elections and the emergence of new sub-communities. The framework thus highlights the pivotal role of social norms in shaping online interactions, presenting a substantial advance in both the theory and application of social norm studies in digital spaces. △ Less

Submitted 2 July, 2024; originally announced July 2024.

Comments: First three authors contributed equally. 33 pages. In submission

arXiv:2407.00859 [pdf, other]

Statistical inference on partially shape-constrained function-on-scalar linear regression models

Authors: Kyunghee Han, Yeonjoo Park, Soo-Young Kim

Abstract: We consider functional linear regression models where functional outcomes are associated with scalar predictors by coefficient functions with shape constraints, such as monotonicity and convexity, that apply to sub-domains of interest. To validate the partial shape constraints, we propose testing a composite hypothesis of linear functional constraints on regression coefficients. Our approach emplo… ▽ More We consider functional linear regression models where functional outcomes are associated with scalar predictors by coefficient functions with shape constraints, such as monotonicity and convexity, that apply to sub-domains of interest. To validate the partial shape constraints, we propose testing a composite hypothesis of linear functional constraints on regression coefficients. Our approach employs kernel- and spline-based methods within a unified inferential framework, evaluating the statistical significance of the hypothesis by measuring an $L^2$-distance between constrained and unconstrained model fits. In the theoretical study of large-sample analysis under mild conditions, we show that both methods achieve the standard rate of convergence observed in the nonparametric estimation literature. Through numerical experiments of finite-sample analysis, we demonstrate that the type I error rate keeps the significance level as specified across various scenarios and that the power increases with sample size, confirming the consistency of the test procedure under both estimation methods. Our theoretical and numerical results provide researchers the flexibility to choose a method based on computational preference. The practicality of partial shape-constrained inference is illustrated by two data applications: one involving clinical trials of NeuroBloc in type A-resistant cervical dystonia and the other with the National Institute of Mental Health Schizophrenia Study. △ Less

Submitted 30 June, 2024; originally announced July 2024.

Comments: 30 pages, 7 figures

arXiv:2406.19287 [pdf, other]

Isotropy of cosmic rays beyond $10^{20}$ eV favors their heavy mass composition

Authors: Telescope Array Collaboration, R. U. Abbasi, Y. Abe, T. Abu-Zayyad, M. Allen, Y. Arai, R. Arimura, E. Barcikowski, J. W. Belz, D. R. Bergman, S. A. Blake, I. Buckland, B. G. Cheon, M. Chikawa, T. Fujii, K. Fujisue, K. Fujita, R. Fujiwara, M. Fukushima, G. Furlich, N. Globus, R. Gonzalez, W. Hanlon, N. Hayashida, H. He , et al. (118 additional authors not shown)

Abstract: We report an estimation of the injected mass composition of ultra-high energy cosmic rays (UHECRs) at energies higher than 10 EeV. The composition is inferred from an energy-dependent sky distribution of UHECR events observed by the Telescope Array surface detector by comparing it to the Large Scale Structure of the local Universe. In the case of negligible extra-galactic magnetic fields the resul… ▽ More We report an estimation of the injected mass composition of ultra-high energy cosmic rays (UHECRs) at energies higher than 10 EeV. The composition is inferred from an energy-dependent sky distribution of UHECR events observed by the Telescope Array surface detector by comparing it to the Large Scale Structure of the local Universe. In the case of negligible extra-galactic magnetic fields the results are consistent with a relatively heavy injected composition at E ~ 10 EeV that becomes lighter up to E ~ 100 EeV, while the composition at E > 100 EeV is very heavy. The latter is true even in the presence of highest experimentally allowed extra-galactic magnetic fields, while the composition at lower energies can be light if a strong EGMF is present. The effect of the uncertainty in the galactic magnetic field on these results is subdominant. △ Less

Submitted 3 July, 2024; v1 submitted 27 June, 2024; originally announced June 2024.

Comments: 8 pages, 3 figures, accepted for publication in PRL

arXiv:2406.19286 [pdf, other]

Mass composition of ultra-high energy cosmic rays from distribution of their arrival directions with the Telescope Array

Authors: Telescope Array Collaboration, R. U. Abbasi, Y. Abe, T. Abu-Zayyad, M. Allen, Y. Arai, R. Arimura, E. Barcikowski, J. W. Belz, D. R. Bergman, S. A. Blake, I. Buckland, B. G. Cheon, M. Chikawa, T. Fujii, K. Fujisue, K. Fujita, R. Fujiwara, M. Fukushima, G. Furlich, N. Globus, R. Gonzalez, W. Hanlon, N. Hayashida, H. He , et al. (118 additional authors not shown)

Abstract: We use a new method to estimate the injected mass composition of ultrahigh cosmic rays (UHECRs) at energies higher than 10 EeV. The method is based on comparison of the energy-dependent distribution of cosmic ray arrival directions as measured by the Telescope Array experiment (TA) with that calculated in a given putative model of UHECR under the assumption that sources trace the large-scale struc… ▽ More We use a new method to estimate the injected mass composition of ultrahigh cosmic rays (UHECRs) at energies higher than 10 EeV. The method is based on comparison of the energy-dependent distribution of cosmic ray arrival directions as measured by the Telescope Array experiment (TA) with that calculated in a given putative model of UHECR under the assumption that sources trace the large-scale structure (LSS) of the Universe. As we report in the companion letter, the TA data show large deflections with respect to the LSS which can be explained, assuming small extra-galactic magnetic fields (EGMF), by an intermediate composition changing to a heavy one (iron) in the highest energy bin. Here we show that these results are robust to uncertainties in UHECR injection spectra, the energy scale of the experiment and galactic magnetic fields (GMF). The assumption of weak EGMF, however, strongly affects this interpretation at all but the highest energies E > 100 EeV, where the remarkable isotropy of the data implies a heavy injected composition even in the case of strong EGMF. This result also holds if UHECR sources are as rare as $2 \times 10^{-5}$ Mpc$^{-3}$, that is the conservative lower limit for the source number density. △ Less

Submitted 3 July, 2024; v1 submitted 27 June, 2024; originally announced June 2024.

Comments: 18 pages, 11 figures, accepted for publication in PRD

arXiv:2406.15951 [pdf, other]

Modular Pluralism: Pluralistic Alignment via Multi-LLM Collaboration

Authors: Shangbin Feng, Taylor Sorensen, Yuhan Liu, Jillian Fisher, Chan Young Park, Yejin Choi, Yulia Tsvetkov

Abstract: While existing alignment paradigms have been integral in developing large language models (LLMs), LLMs often learn an averaged human preference and struggle to model diverse preferences across cultures, demographics, and communities. We propose Modular Pluralism, a modular framework based on multi-LLM collaboration for pluralistic alignment: it "plugs into" a base LLM a pool of smaller but special… ▽ More While existing alignment paradigms have been integral in developing large language models (LLMs), LLMs often learn an averaged human preference and struggle to model diverse preferences across cultures, demographics, and communities. We propose Modular Pluralism, a modular framework based on multi-LLM collaboration for pluralistic alignment: it "plugs into" a base LLM a pool of smaller but specialized community LMs, where models collaborate in distinct modes to flexibility support three modes of pluralism: Overton, steerable, and distributional. Modular Pluralism is uniquely compatible with black-box LLMs and offers the modular control of adding new community LMs for previously underrepresented communities. We evaluate Modular Pluralism with six tasks and four datasets featuring questions/instructions with value-laden and perspective-informed responses. Extensive experiments demonstrate that Modular Pluralism advances the three pluralism objectives across six black-box and open-source LLMs. Further analysis reveals that LLMs are generally faithful to the inputs from smaller community LLMs, allowing seamless patching by adding a new community LM to better cover previously underrepresented communities. △ Less

Submitted 22 June, 2024; originally announced June 2024.

arXiv:2406.15725 [pdf, other]

Self Training and Ensembling Frequency Dependent Networks with Coarse Prediction Pooling and Sound Event Bounding Boxes

Authors: Hyeonuk Nam, Deokki Min, Seungdeok Choi, Inhan Choi, Yong-Hwa Park

Abstract: To tackle sound event detection (SED) task, we propose frequency dependent networks (FreDNets), which heavily leverage frequency-dependent methods. We apply frequency warping and FilterAugment, which are frequency-dependent data augmentation methods. The model architecture consists of 3 branches: audio teacher-student transformer (ATST) branch, BEATs branch and CNN branch including either partial… ▽ More To tackle sound event detection (SED) task, we propose frequency dependent networks (FreDNets), which heavily leverage frequency-dependent methods. We apply frequency warping and FilterAugment, which are frequency-dependent data augmentation methods. The model architecture consists of 3 branches: audio teacher-student transformer (ATST) branch, BEATs branch and CNN branch including either partial dilated frequency dynamic convolution (PDFD) or squeeze-and-Excitation (SE) with time-frame frequency-wise SE (tfwSE). To train MAESTRO labels with coarse temporal resolution, we apply max pooling on prediction for the MAESTRO dataset. Using best ensemble model, we apply self training to obtain pseudo label from DESED weak set, DESED unlabeled set and AudioSet. AudioSet labels are filtered to focus on high-confidence pseudo labels and AudioSet pseudo labels are used to train on DESED labels only. We used change-detection-based sound event bounding boxes (cSEBBs) as post processing for ensemble models on self training and submission models. △ Less

Submitted 22 June, 2024; originally announced June 2024.

Comments: DCASE 2024 Challenge Task 4 technical report

arXiv:2406.13856 [pdf, other]

Kishu: Time-Traveling for Computational Notebooks

Authors: Zhaoheng Li, Supawit Chockchowwat, Ribhav Sahu, Areet Sheth, Yongjoo Park

Abstract: Computational notebooks (e.g., Jupyter, Google Colab) are widely used by data scientists. A key feature of notebooks is the interactive computing model of iteratively executing cells (i.e., a set of statements) and observing the result (e.g., model or plot). Unfortunately, existing notebook systems do not offer time-traveling to past states: when the user executes a cell, the notebook session stat… ▽ More Computational notebooks (e.g., Jupyter, Google Colab) are widely used by data scientists. A key feature of notebooks is the interactive computing model of iteratively executing cells (i.e., a set of statements) and observing the result (e.g., model or plot). Unfortunately, existing notebook systems do not offer time-traveling to past states: when the user executes a cell, the notebook session state consisting of user-defined variables can be irreversibly modified - e.g., the user cannot 'un-drop' a dataframe column. This is because, unlike DBMS, existing notebook systems do not keep track of the session state. Existing techniques for checkpointing and restoring session states, such as OS-level memory snapshot or application-level session dump, are insufficient: checkpointing can incur prohibitive storage costs and may fail, while restoration can only be inefficiently performed from scratch by fully loading checkpoint files. In this paper, we introduce a new notebook system, Kishu, that offers time-traveling to and from arbitrary notebook states using an efficient and fault-tolerant incremental checkpoint and checkout mechanism. Kishu creates incremental checkpoints that are small and correctly preserve complex inter-variable dependencies at a novel Co-variable granularity. Then, to return to a previous state, Kishu accurately identifies the state difference between the current and target states to perform incremental checkout at sub-second latency with minimal data loading. Kishu is compatible with 146 object classes from popular data science libraries (e.g., Ray, Spark, PyTorch), and reduces checkpoint size and checkout time by up to 4.55x and 9.02x, respectively, on a variety of notebooks. △ Less

Submitted 19 June, 2024; originally announced June 2024.

arXiv:2406.13312 [pdf, other]

Pushing the Limit of Sound Event Detection with Multi-Dilated Frequency Dynamic Convolution

Authors: Hyeonuk Nam, Yong-Hwa Park

Abstract: Frequency dynamic convolution (FDY conv) has been a milestone in the sound event detection (SED) field, but it involves a substantial increase in model size due to multiple basis kernels. In this work, we propose partial frequency dynamic convolution (PFD conv), which concatenates static convolution output and dynamic FDY conv output in order to minimize model size increase while maintaining the p… ▽ More Frequency dynamic convolution (FDY conv) has been a milestone in the sound event detection (SED) field, but it involves a substantial increase in model size due to multiple basis kernels. In this work, we propose partial frequency dynamic convolution (PFD conv), which concatenates static convolution output and dynamic FDY conv output in order to minimize model size increase while maintaining the performance. Additionally, we propose multi-dilated frequency dynamic convolution (MDFD conv), which integrates multiple dilated frequency dynamic convolution (DFD conv) branches with different dilation size sets and a static branch within a single convolution module, achieving a 3.17% improvement in polyphonic sound detection score (PSDS) over FDY conv. Proposed methods with extensive ablation studies further enhance understanding and usability of FDY conv variants. △ Less

Submitted 7 July, 2024; v1 submitted 19 June, 2024; originally announced June 2024.

arXiv:2406.13280 [pdf, other]

Design Optimization of NOMA Aided Multi-STAR-RIS for Indoor Environments: A Convex Approximation Imitated Reinforcement Learning Approach

Authors: Yu Min Park, Sheikh Salman Hassan, Yan Kyaw Tun, Eui-Nam Huh, Walid Saad, Choong Seon Hong

Abstract: Sixth-generation (6G) networks leverage simultaneously transmitting and reflecting reconfigurable intelligent surfaces (STAR-RISs) to overcome the limitations of traditional RISs. STAR-RISs offer 360-degree full-space coverage and optimized transmission and reflection for enhanced network performance and dynamic control of the indoor propagation environment. However, deploying STAR-RISs indoors pr… ▽ More Sixth-generation (6G) networks leverage simultaneously transmitting and reflecting reconfigurable intelligent surfaces (STAR-RISs) to overcome the limitations of traditional RISs. STAR-RISs offer 360-degree full-space coverage and optimized transmission and reflection for enhanced network performance and dynamic control of the indoor propagation environment. However, deploying STAR-RISs indoors presents challenges in interference mitigation, power consumption, and real-time configuration. In this work, a novel network architecture utilizing multiple access points (APs) and STAR-RISs is proposed for indoor communication. An optimization problem encompassing user assignment, access point beamforming, and STAR-RIS phase control for reflection and transmission is formulated. The inherent complexity of the formulated problem necessitates a decomposition approach for an efficient solution. This involves tackling different sub-problems with specialized techniques: a many-to-one matching algorithm is employed to assign users to appropriate access points, optimizing resource allocation. To facilitate efficient resource management, access points are grouped using a correlation-based K-means clustering algorithm. Multi-agent deep reinforcement learning (MADRL) is leveraged to optimize the control of the STAR-RIS. Within the proposed MADRL framework, a novel approach is introduced where each decision variable acts as an independent agent, enabling collaborative learning and decision-making. Additionally, the proposed MADRL approach incorporates convex approximation (CA). This technique utilizes suboptimal solutions from successive convex approximation (SCA) to accelerate policy learning for the agents, thereby leading to faster environment adaptation and convergence. Simulations demonstrate significant network utility improvements compared to baseline approaches. △ Less

Submitted 19 June, 2024; originally announced June 2024.

Comments: 37 pages, 11 figures, IEEE Transactions on Communications submitted. arXiv admin note: text overlap with arXiv:2311.08708

arXiv:2406.13251 [pdf, other]

Freq-Mip-AA : Frequency Mip Representation for Anti-Aliasing Neural Radiance Fields

Authors: Youngin Park, Seungtae Nam, Cheul-hee Hahm, Eunbyung Park

Abstract: Neural Radiance Fields (NeRF) have shown remarkable success in representing 3D scenes and generating novel views. However, they often struggle with aliasing artifacts, especially when rendering images from different camera distances from the training views. To address the issue, Mip-NeRF proposed using volumetric frustums to render a pixel and suggested integrated positional encoding (IPE). While… ▽ More Neural Radiance Fields (NeRF) have shown remarkable success in representing 3D scenes and generating novel views. However, they often struggle with aliasing artifacts, especially when rendering images from different camera distances from the training views. To address the issue, Mip-NeRF proposed using volumetric frustums to render a pixel and suggested integrated positional encoding (IPE). While effective, this approach requires long training times due to its reliance on MLP architecture. In this work, we propose a novel anti-aliasing technique that utilizes grid-based representations, usually showing significantly faster training time. In addition, we exploit frequency-domain representation to handle the aliasing problem inspired by the sampling theorem. The proposed method, FreqMipAA, utilizes scale-specific low-pass filtering (LPF) and learnable frequency masks. Scale-specific low-pass filters (LPF) prevent aliasing and prioritize important image details, and learnable masks effectively remove problematic high-frequency elements while retaining essential information. By employing a scale-specific LPF and trainable masks, FreqMipAA can effectively eliminate the aliasing factor while retaining important details. We validated the proposed technique by incorporating it into a widely used grid-based method. The experimental results have shown that the FreqMipAA effectively resolved the aliasing issues and achieved state-of-the-art results in the multi-scale Blender dataset. Our code is available at https://github.com/yi0109/FreqMipAA . △ Less

Submitted 19 June, 2024; originally announced June 2024.

Comments: Accepted to ICIP 2024, 7 pages, 3 figures

arXiv:2406.13134 [pdf]

Mode Coupling and Breathing Oscillation in Partially Magnetized Cross-Field Plasmas

Authors: Jong Yoon Park, June Young Kim

Abstract: We report on investigations of mode coupling between rotating spokes during the onset of the breathing oscillation. Demonstrating the existence of nonlinear coupling between the sporadic spokes and the breathing oscillation, we suggest the oscillating azimuthal electric field as the energy source for additional ionization within the plasma. Our results indicate that intermittent three-wave couplin… ▽ More We report on investigations of mode coupling between rotating spokes during the onset of the breathing oscillation. Demonstrating the existence of nonlinear coupling between the sporadic spokes and the breathing oscillation, we suggest the oscillating azimuthal electric field as the energy source for additional ionization within the plasma. Our results indicate that intermittent three-wave coupling is a possible mechanism for triggering low-frequency breathing oscillations in partially magnetized cross-field plasma. △ Less

Submitted 18 June, 2024; originally announced June 2024.

arXiv:2406.12904 [pdf, other]

Meent: Differentiable Electromagnetic Simulator for Machine Learning

Authors: Yongha Kim, Anthony W. Jung, Sanmun Kim, Kevin Octavian, Doyoung Heo, Chaejin Park, Jeongmin Shin, Sunghyun Nam, Chanhyung Park, Juho Park, Sangjun Han, Jinmyoung Lee, Seolho Kim, Min Seok Jang, Chan Y. Park

Abstract: Electromagnetic (EM) simulation plays a crucial role in analyzing and designing devices with sub-wavelength scale structures such as solar cells, semiconductor devices, image sensors, future displays and integrated photonic devices. Specifically, optics problems such as estimating semiconductor device structures and designing nanophotonic devices provide intriguing research topics with far-reachin… ▽ More Electromagnetic (EM) simulation plays a crucial role in analyzing and designing devices with sub-wavelength scale structures such as solar cells, semiconductor devices, image sensors, future displays and integrated photonic devices. Specifically, optics problems such as estimating semiconductor device structures and designing nanophotonic devices provide intriguing research topics with far-reaching real world impact. Traditional algorithms for such tasks require iteratively refining parameters through simulations, which often yield sub-optimal results due to the high computational cost of both the algorithms and EM simulations. Machine learning (ML) emerged as a promising candidate to mitigate these challenges, and optics research community has increasingly adopted ML algorithms to obtain results surpassing classical methods across various tasks. To foster a synergistic collaboration between the optics and ML communities, it is essential to have an EM simulation software that is user-friendly for both research communities. To this end, we present Meent, an EM simulation software that employs rigorous coupled-wave analysis (RCWA). Developed in Python and equipped with automatic differentiation (AD) capabilities, Meent serves as a versatile platform for integrating ML into optics research and vice versa. To demonstrate its utility as a research platform, we present three applications of Meent: 1) generating a dataset for training neural operator, 2) serving as an environment for the reinforcement learning of nanophotonic device optimization, and 3) providing a solution for inverse problems with gradient-based optimizers. These applications highlight Meent's potential to advance both EM simulation and ML methodologies. The code is available at https://github.com/kc-ml2/meent with the MIT license to promote the cross-polinations of ideas among academic researchers and industry practitioners. △ Less

Submitted 11 June, 2024; originally announced June 2024.

Comments: under review

arXiv:2406.11938 [pdf, other]

Tracking the perspectives of interacting language models

Authors: Hayden Helm, Brandon Duderstadt, Youngser Park, Carey E. Priebe

Abstract: Large language models (LLMs) are capable of producing high quality information at unprecedented rates. As these models continue to entrench themselves in society, the content they produce will become increasingly pervasive in databases that are, in turn, incorporated into the pre-training data, fine-tuning data, retrieval data, etc. of other language models. In this paper we formalize the idea of… ▽ More Large language models (LLMs) are capable of producing high quality information at unprecedented rates. As these models continue to entrench themselves in society, the content they produce will become increasingly pervasive in databases that are, in turn, incorporated into the pre-training data, fine-tuning data, retrieval data, etc. of other language models. In this paper we formalize the idea of a communication network of LLMs and introduce a method for representing the perspective of individual models within a collection of LLMs. Given these tools we systematically study information diffusion in the communication network of LLMs in various simulated settings. △ Less

Submitted 17 June, 2024; originally announced June 2024.

arXiv:2406.09746 [pdf, other]

Complex zeros of Bessel function derivatives and associated orthogonal polynomials

Authors: Seok-Young Chung, Sujin Lee, Young Woong Park

Abstract: We introduce a sequence of orthogonal polynomials whose associated moments are the Rayleigh-type sums, involving the zeros of the Bessel derivative $J_νにゅー'$ of order $νにゅー$. We also discuss the fundamental properties of those polynomials such as recurrence, orthogonality, etc. Consequently, we obtain a formula for the Hankel determinant, elements of which are chosen as the aforementioned Rayleigh-type… ▽ More We introduce a sequence of orthogonal polynomials whose associated moments are the Rayleigh-type sums, involving the zeros of the Bessel derivative $J_νにゅー'$ of order $νにゅー$. We also discuss the fundamental properties of those polynomials such as recurrence, orthogonality, etc. Consequently, we obtain a formula for the Hankel determinant, elements of which are chosen as the aforementioned Rayleigh-type sums. As an application, we complete the Hurwitz-type theorem for $J_νにゅー'$, which deals with the number of complex zeros of $J_νにゅー'$ depending on the range of $νにゅー$. △ Less

Submitted 14 June, 2024; originally announced June 2024.

Comments: 33 pages, 4 figures

MSC Class: 30C15; 30B70; 33C10; 33C47

arXiv:2406.09698 [pdf, other]

Projected background and sensitivity of AMoRE-II

Authors: A. Agrawal, V. V. Alenkov, P. Aryal, J. Beyer, B. Bhandari, R. S. Boiko, K. Boonin, O. Buzanov, C. R. Byeon, N. Chanthima, M. K. Cheoun, J. S. Choe, Seonho Choi, S. Choudhury, J. S. Chung, F. A. Danevich, M. Djamal, D. Drung, C. Enss, A. Fleischmann, A. M. Gangapshev, L. Gastaldo, Y. M. Gavrilyuk, A. M. Gezhaev, O. Gileva , et al. (81 additional authors not shown)

Abstract: AMoRE-II aims to search for neutrinoless double beta decay with an array of 423 Li$_2$$^{100}$MoO$_4$ crystals operating in the cryogenic system as the main phase of the Advanced Molybdenum-based Rare process Experiment (AMoRE). AMoRE has been planned to operate in three phases: AMoRE-pilot, AMoRE-I, and AMoRE-II. AMoRE-II is currently being installed at the Yemi Underground Laboratory, located ap… ▽ More AMoRE-II aims to search for neutrinoless double beta decay with an array of 423 Li$_2$$^{100}$MoO$_4$ crystals operating in the cryogenic system as the main phase of the Advanced Molybdenum-based Rare process Experiment (AMoRE). AMoRE has been planned to operate in three phases: AMoRE-pilot, AMoRE-I, and AMoRE-II. AMoRE-II is currently being installed at the Yemi Underground Laboratory, located approximately 1000 meters deep in Jeongseon, Korea. The goal of AMoRE-II is to reach up to $T^{0νにゅーβべーたβべーた}_{1/2}$ $\sim$ 6 $\times$ 10$^{26}$ years, corresponding to an effective Majorana mass of 15 - 29 meV, covering all the inverted mass hierarchy regions. To achieve this, the background level of the experimental configurations and possible background sources of gamma and beta events should be well understood. We have intensively performed Monte Carlo simulations using the GEANT4 toolkit in all the experimental configurations with potential sources. We report the estimated background level that meets the 10$^{-4}$counts/(keV$\cdot$kg$\cdot$yr) requirement for AMoRE-II in the region of interest (ROI) and show the projected half-life sensitivity based on the simulation study. △ Less

Submitted 13 June, 2024; originally announced June 2024.

arXiv:2406.08612 [pdf, other]

Observation of Declination Dependence in the Cosmic Ray Energy Spectrum

Authors: The Telescope Array Collaboration, R. U. Abbasi, T. Abu-Zayyad, M. Allen, J. W. Belz, D. R. Bergman, I. Buckland, W. Campbell, B. G. Cheon, K. Endo, A. Fedynitch, T. Fujii, K. Fujisue, K. Fujita, M. Fukushima, G. Furlich, Z. Gerber, N. Globus, W. Hanlon, N. Hayashida, H. He, K. Hibino, R. Higuchi, D. Ikeda, T. Ishii , et al. (101 additional authors not shown)

Abstract: We report on an observation of the difference between northern and southern skies of the ultrahigh energy cosmic ray energy spectrum with a significance of ${\sim}8σしぐま$. We use measurements from the two largest experiments$\unicode{x2014}$the Telescope Array observing the northern hemisphere and the Pierre Auger Observatory viewing the southern hemisphere. Since the comparison of two measurements fr… ▽ More We report on an observation of the difference between northern and southern skies of the ultrahigh energy cosmic ray energy spectrum with a significance of ${\sim}8σしぐま$. We use measurements from the two largest experiments$\unicode{x2014}$the Telescope Array observing the northern hemisphere and the Pierre Auger Observatory viewing the southern hemisphere. Since the comparison of two measurements from different observatories introduces the issue of possible systematic differences between detectors and analyses, we validate the methodology of the comparison by examining the region of the sky where the apertures of the two observatories overlap. Although the spectra differ in this region, we find that there is only a $1.8σしぐま$ difference between the spectrum measurements when anisotropic regions are removed and a fiducial cut in the aperture is applied. △ Less

Submitted 12 June, 2024; originally announced June 2024.

Comments: 8 pages, 6 figures

arXiv:2406.08070 [pdf, ps, other]

CFG++: Manifold-constrained Classifier Free Guidance for Diffusion Models

Authors: Hyungjin Chung, Jeongsol Kim, Geon Yeong Park, Hyelin Nam, Jong Chul Ye

Abstract: Classifier-free guidance (CFG) is a fundamental tool in modern diffusion models for text-guided generation. Although effective, CFG has notable drawbacks. For instance, DDIM with CFG lacks invertibility, complicating image editing; furthermore, high guidance scales, essential for high-quality outputs, frequently result in issues like mode collapse. Contrary to the widespread belief that these are… ▽ More Classifier-free guidance (CFG) is a fundamental tool in modern diffusion models for text-guided generation. Although effective, CFG has notable drawbacks. For instance, DDIM with CFG lacks invertibility, complicating image editing; furthermore, high guidance scales, essential for high-quality outputs, frequently result in issues like mode collapse. Contrary to the widespread belief that these are inherent limitations of diffusion models, this paper reveals that the problems actually stem from the off-manifold phenomenon associated with CFG, rather than the diffusion models themselves. More specifically, inspired by the recent advancements of diffusion model-based inverse problem solvers (DIS), we reformulate text-guidance as an inverse problem with a text-conditioned score matching loss, and develop CFG++, a novel approach that tackles the off-manifold challenges inherent in traditional CFG. CFG++ features a surprisingly simple fix to CFG, yet it offers significant improvements, including better sample quality for text-to-image generation, invertibility, smaller guidance scales, reduced mode collapse, etc. Furthermore, CFG++ enables seamless interpolation between unconditional and conditional sampling at lower guidance scales, consistently outperforming traditional CFG at all scales. Experimental results confirm that our method significantly enhances performance in text-to-image generation, DDIM inversion, editing, and solving inverse problems, suggesting a wide-ranging impact and potential applications in various fields that utilize text guidance. Project Page: https://cfgpp-diffusion.github.io/. △ Less

Submitted 12 June, 2024; originally announced June 2024.

arXiv:2406.05341 [pdf, other]

Diversifying and Expanding Frequency-Adaptive Convolution Kernels for Sound Event Detection

Authors: Hyeonuk Nam, Seong-Hu Kim, Deokki Min, Junhyeok Lee, Yong-Hwa Park

Abstract: Frequency dynamic convolution (FDY conv) has shown the state-of-the-art performance in sound event detection (SED) using frequency-adaptive kernels obtained by frequency-varying combination of basis kernels. However, FDY conv lacks an explicit mean to diversify frequency-adaptive kernels, potentially limiting the performance. In addition, size of basis kernels is limited while time-frequency patte… ▽ More Frequency dynamic convolution (FDY conv) has shown the state-of-the-art performance in sound event detection (SED) using frequency-adaptive kernels obtained by frequency-varying combination of basis kernels. However, FDY conv lacks an explicit mean to diversify frequency-adaptive kernels, potentially limiting the performance. In addition, size of basis kernels is limited while time-frequency patterns span larger spectro-temporal range. Therefore, we propose dilated frequency dynamic convolution (DFD conv) which diversifies and expands frequency-adaptive kernels by introducing different dilation sizes to basis kernels. Experiments showed advantages of varying dilation sizes along frequency dimension, and analysis on attention weight variance proved dilated basis kernels are effectively diversified. By adapting class-wise median filter with intersection-based F1 score, proposed DFD-CRNN outperforms FDY-CRNN by 3.12% in terms of polyphonic sound detection score (PSDS). △ Less

Submitted 7 June, 2024; originally announced June 2024.

Comments: Accepted to INTERSPEECH 2024

arXiv:2406.01079 [pdf, other]

Object Aware Egocentric Online Action Detection

Authors: Joungbin An, Yunsu Park, Hyolim Kang, Seon Joo Kim

Abstract: Advancements in egocentric video datasets like Ego4D, EPIC-Kitchens, and Ego-Exo4D have enriched the study of first-person human interactions, which is crucial for applications in augmented reality and assisted living. Despite these advancements, current Online Action Detection methods, which efficiently detect actions in streaming videos, are predominantly designed for exocentric views and thus f… ▽ More Advancements in egocentric video datasets like Ego4D, EPIC-Kitchens, and Ego-Exo4D have enriched the study of first-person human interactions, which is crucial for applications in augmented reality and assisted living. Despite these advancements, current Online Action Detection methods, which efficiently detect actions in streaming videos, are predominantly designed for exocentric views and thus fail to capitalize on the unique perspectives inherent to egocentric videos. To address this gap, we introduce an Object-Aware Module that integrates egocentric-specific priors into existing OAD frameworks, enhancing first-person footage interpretation. Utilizing object-specific details and temporal dynamics, our module improves scene understanding in detecting actions. Validated extensively on the Epic-Kitchens 100 dataset, our work can be seamlessly integrated into existing models with minimal overhead and bring consistent performance enhancements, marking an important step forward in adapting action detection systems to egocentric video analysis. △ Less

Submitted 3 June, 2024; originally announced June 2024.

Comments: CVPR First Joint Egocentric Vision Workshop 2024

arXiv:2405.20738 [pdf, other]

Federated Random Forest for Partially Overlapping Clinical Data

Authors: Youngjun Park, Cord Eric Schmidt, Benedikt Marcel Batton, Anne-Christin Hauschild

Abstract: In the healthcare sector, a consciousness surrounding data privacy and corresponding data protection regulations, as well as heterogeneous and non-harmonized data, pose huge challenges to large-scale data analysis. Moreover, clinical data often involves partially overlapping features, as some observations may be missing due to various reasons, such as differences in procedures, diagnostic tests, o… ▽ More In the healthcare sector, a consciousness surrounding data privacy and corresponding data protection regulations, as well as heterogeneous and non-harmonized data, pose huge challenges to large-scale data analysis. Moreover, clinical data often involves partially overlapping features, as some observations may be missing due to various reasons, such as differences in procedures, diagnostic tests, or other recorded patient history information across hospitals or institutes. To address the challenges posed by partially overlapping features and incomplete data in clinical datasets, a comprehensive approach is required. Particularly in the domain of medical data, promising outcomes are achieved by federated random forests whenever features align. However, for most standard algorithms, like random forest, it is essential that all data sets have identical parameters. Therefore, in this work the concept of federated random forest is adapted to a setting with partially overlapping features. Moreover, our research assesses the effectiveness of the newly developed federated random forest models for partially overlapping clinical data. For aggregating the federated, globally optimized model, only features available locally at each site can be used. We tackled two issues in federation: (i) the quantity of involved parties, (ii) the varying overlap of features. This evaluation was conducted across three clinical datasets. The federated random forest model even in cases where only a subset of features overlaps consistently demonstrates superior performance compared to its local counterpart. This holds true across various scenarios, including datasets with imbalanced classes. Consequently, federated random forests for partially overlapped data offer a promising solution to transcend barriers in collaborative research and corporate cooperation. △ Less

Submitted 31 May, 2024; originally announced May 2024.

arXiv:2405.20597 [pdf]

Double-sided van der Waals epitaxy of topological insulators across an atomically thin membrane

Authors: Joon Young Park, Young Jae Shin, Jeacheol Shin, Jehyun Kim, Janghyun Jo, Hyobin Yoo, Danial Haei, Chohee Hyun, Jiyoung Yun, Robert M. Huber, Arijit Gupta, Kenji Watanabe, Takashi Taniguchi, Wan Kyu Park, Hyeon Suk Shin, Miyoung Kim, Dohun Kim, Gyu-Chul Yi, Philip Kim

Abstract: Atomically thin van der Waals (vdW) films provide a novel material platform for epitaxial growth of quantum heterostructures. However, unlike the remote epitaxial growth of three-dimensional bulk crystals, the growth of two-dimensional (2D) material heterostructures across atomic layers has been limited due to the weak vdW interaction. Here, we report the double-sided epitaxy of vdW layered materi… ▽ More Atomically thin van der Waals (vdW) films provide a novel material platform for epitaxial growth of quantum heterostructures. However, unlike the remote epitaxial growth of three-dimensional bulk crystals, the growth of two-dimensional (2D) material heterostructures across atomic layers has been limited due to the weak vdW interaction. Here, we report the double-sided epitaxy of vdW layered materials through atomic membranes. We grow vdW topological insulators (TIs) Sb$_2$Te$_3$ and Bi$_2$Se$_3$ by molecular beam epitaxy on both surfaces of atomically thin graphene or hBN, which serve as suspended 2D vdW "$\textit{substrate}$" layers. Both homo- and hetero- double-sided vdW TI tunnel junctions are fabricated, with the atomically thin hBN acting as a crystal-momentum-conserving tunnelling barrier with abrupt and epitaxial interface. By performing field-angle dependent magneto-tunnelling spectroscopy on these devices, we reveal the energy-momentum-spin resonant tunnelling of massless Dirac electrons between helical Landau levels developed in the topological surface states at the interface. △ Less

Submitted 30 May, 2024; originally announced May 2024.

Comments: 24 pages, 4 main figures, 7 extended data figures

arXiv:2405.19393 [pdf, other]

Supernova axions convert to gamma-rays in magnetic fields of progenitor stars

Authors: Claudio Andrea Manzari, Yujin Park, Benjamin R. Safdi, Inbar Savoray

Abstract: It has long been established that axions could have been produced through the Primakoff process within the nascent proto-neutron-star formed following the type II supernova SN1987A, escaped the star due to their weak interactions, and then converted to gamma-rays in the Galactic magnetic fields; the non-observation of a gamma-ray flash coincident with the neutrino burst leads to strong constraints… ▽ More It has long been established that axions could have been produced through the Primakoff process within the nascent proto-neutron-star formed following the type II supernova SN1987A, escaped the star due to their weak interactions, and then converted to gamma-rays in the Galactic magnetic fields; the non-observation of a gamma-ray flash coincident with the neutrino burst leads to strong constraints on the axion-photon coupling for axion masses $m_a \lesssim 10^{-10}$ eV. In this work we use SN1987A to constrain, for the first time, higher mass axions, all the way to $m_a \sim 10^{-3}$ eV, by accounting for axion-photon conversion on the still-intact magnetic fields of the progenitor star. Moreover, we show that gamma-ray observations of the next Galactic supernova, leveraging the magnetic fields of the progenitor star, could detect quantum chromodynamics axions for masses above roughly $50$ $μみゅー$eV, depending on the supernova. We propose a new full-sky gamma-ray satellite constellation that we call the GALactic AXion Instrument for Supernova (GALAXIS) to search for such future signals along with related signals from extragalactic neutron star mergers. △ Less

Submitted 29 May, 2024; originally announced May 2024.

Comments: 10+18 pages, 3+20 figures, video abstract at https://youtu.be/QiNonlkARps

arXiv:2405.17148 [pdf, other]

Direct view of gate-tunable miniband dispersion in graphene superlattices near the magic twist angle

Authors: Zhihao Jiang, Dongkyu Lee, Alfred J. H. Jones, Youngju Park, Kimberly Hsieh, Paulina Majchrzak, Chakradhar Sahoo, Thomas S. Nielsen, Kenji Watanabe, Takashi Taniguchi, Philip Hofmann, Jill A. Miwa, Yong P. Chen, Jeil Jung, Søren Ulstrup

Abstract: Superlattices from twisted graphene mono- and bi-layer systems give rise to on-demand quantum many-body states such as Mott insulators, unconventional superconductors and the fractional quantum Hall effect. These phenomena are observed in transport experiments when changing the filling of the low-energy electronic bands. Their origin is broadly ascribed to a combination of flat bands and strong Co… ▽ More Superlattices from twisted graphene mono- and bi-layer systems give rise to on-demand quantum many-body states such as Mott insulators, unconventional superconductors and the fractional quantum Hall effect. These phenomena are observed in transport experiments when changing the filling of the low-energy electronic bands. Their origin is broadly ascribed to a combination of flat bands and strong Coulomb interactions, yet a comprehensive understanding is lacking. This is primarily because the relevant low-energy band structure is believed to strongly change in a non-trivial way as the electron filling is varied. Here we gain direct access to the filling-dependent low energy bands of twisted bilayer graphene (TBG) and twisted double bilayer graphene (TDBG) by applying micro-focused angle-resolved photoemission spectroscopy to in situ gated devices. Our findings for the two systems are in stark contrast: The doping dependent dispersion for TBG can be described in a simple model, combining a filling-dependent rigid band shift with a many-body related bandwidth change. In TDBG, on the other hand, we find a complex behaviour of the low-energy bands, combining non-monotonous bandwidth changes and tuneable gap openings. Our work establishes the extent of electric field tunability of the low energy electronic states in twisted graphene superlattices and can serve to underpin the theoretical understanding of the resulting phenomena. △ Less

Submitted 27 May, 2024; originally announced May 2024.

Comments: 25 pages, 4 main figures and 7 supplementary figures

arXiv:2405.16658 [pdf, other]

Acceleration of Grokking in Learning Arithmetic Operations via Kolmogorov-Arnold Representation

Authors: Yeachan Park, Minseok Kim, Yeoneung Kim

Abstract: We propose novel methodologies aimed at accelerating the grokking phenomenon, which refers to the rapid increment of test accuracy after a long period of overfitting as reported in~\cite{power2022grokking}. Focusing on the grokking phenomenon that arises in learning arithmetic binary operations via the transformer model, we begin with a discussion on data augmentation in the case of commutative bi… ▽ More We propose novel methodologies aimed at accelerating the grokking phenomenon, which refers to the rapid increment of test accuracy after a long period of overfitting as reported in~\cite{power2022grokking}. Focusing on the grokking phenomenon that arises in learning arithmetic binary operations via the transformer model, we begin with a discussion on data augmentation in the case of commutative binary operations. To further accelerate, we elucidate arithmetic operations through the lens of the Kolmogorov-Arnold (KA) representation theorem, revealing its correspondence to the transformer architecture: embedding, decoder block, and classifier. Observing the shared structure between KA representations associated with binary operations, we suggest various transfer learning mechanisms that expedite grokking. This interpretation is substantiated through a series of rigorous experiments. In addition, our approach is successful in learning two nonstandard arithmetic tasks: composition of operations and a system of equations. Furthermore, we reveal that the model is capable of learning arithmetic operations using a limited number of tokens under embedding transfer, which is supported by a set of experiments as well. △ Less

Submitted 26 May, 2024; originally announced May 2024.

arXiv:2405.14150 [pdf, other]

jp-evalb: Robust Alignment-based PARSEVAL Measures

Authors: Jungyeul Park, Junrui Wang, Eunkyul Leah Jo, Angela Yoonseo Park

Abstract: We introduce an evaluation system designed to compute PARSEVAL measures, offering a viable alternative to \texttt{evalb} commonly used for constituency parsing evaluation. The widely used \texttt{evalb} script has traditionally been employed for evaluating the accuracy of constituency parsing results, albeit with the requirement for consistent tokenization and sentence boundaries. In contrast, our… ▽ More We introduce an evaluation system designed to compute PARSEVAL measures, offering a viable alternative to \texttt{evalb} commonly used for constituency parsing evaluation. The widely used \texttt{evalb} script has traditionally been employed for evaluating the accuracy of constituency parsing results, albeit with the requirement for consistent tokenization and sentence boundaries. In contrast, our approach, named \texttt{jp-evalb}, is founded on an alignment method. This method aligns sentences and words when discrepancies arise. It aims to overcome several known issues associated with \texttt{evalb} by utilizing the `jointly preprocessed (JP)' alignment-based method. We introduce a more flexible and adaptive framework, ultimately contributing to a more accurate assessment of constituency parsing performance. △ Less

Submitted 22 May, 2024; originally announced May 2024.

Comments: To appear in The system demonstration track at NAACL-HLT 2024

arXiv:2405.11111 [pdf, other]

Euclidean mirrors and first-order changepoints in network time series

Authors: Tianyi Chen, Zachary Lubberts, Avanti Athreya, Youngser Park, Carey E. Priebe

Abstract: We describe a model for a network time series whose evolution is governed by an underlying stochastic process, known as the latent position process, in which network evolution can be represented in Euclidean space by a curve, called the Euclidean mirror. We define the notion of a first-order changepoint for a time series of networks, and construct a family of latent position process networks with… ▽ More We describe a model for a network time series whose evolution is governed by an underlying stochastic process, known as the latent position process, in which network evolution can be represented in Euclidean space by a curve, called the Euclidean mirror. We define the notion of a first-order changepoint for a time series of networks, and construct a family of latent position process networks with underlying first-order changepoints. We prove that a spectral estimate of the associated Euclidean mirror localizes these changepoints, even when the graph distribution evolves continuously, but at a rate that changes. Simulated and real data examples on organoid networks show that this localization captures empirically significant shifts in network evolution. △ Less

Submitted 17 May, 2024; originally announced May 2024.

arXiv:2405.03637 [pdf, other]

Collage: Light-Weight Low-Precision Strategy for LLM Training

Authors: Tao Yu, Gaurav Gupta, Karthick Gopalswamy, Amith Mamidala, Hao Zhou, Jeffrey Huynh, Youngsuk Park, Ron Diamant, Anoop Deoras, Luke Huan

Abstract: Large models training is plagued by the intense compute cost and limited hardware memory. A practical solution is low-precision representation but is troubled by loss in numerical accuracy and unstable training rendering the model less useful. We argue that low-precision floating points can perform well provided the error is properly compensated at the critical locations in the training process. W… ▽ More Large models training is plagued by the intense compute cost and limited hardware memory. A practical solution is low-precision representation but is troubled by loss in numerical accuracy and unstable training rendering the model less useful. We argue that low-precision floating points can perform well provided the error is properly compensated at the critical locations in the training process. We propose Collage which utilizes multi-component float representation in low-precision to accurately perform operations with numerical errors accounted. To understand the impact of imprecision to training, we propose a simple and novel metric which tracks the lost information during training as well as differentiates various precision strategies. Our method works with commonly used low-precision such as half-precision ($16$-bit floating points) and can be naturally extended to work with even lower precision such as $8$-bit. Experimental results show that pre-training using Collage removes the requirement of using $32$-bit floating-point copies of the model and attains similar/better training performance compared to $(16, 32)$-bit mixed-precision strategy, with up to $3.7\times$ speedup and $\sim 15\%$ to $23\%$ less memory usage in practice. △ Less

Submitted 6 May, 2024; originally announced May 2024.

Comments: ICML 2024

arXiv:2405.03225 [pdf, other]

Consistent response prediction for multilayer networks on unknown manifolds

Authors: Aranyak Acharyya, Jesús Arroyo Relión, Michael Clayton, Marta Zlatic, Youngser Park, Carey E. Priebe

Abstract: Our paper deals with a collection of networks on a common set of nodes, where some of the networks are associated with responses. Assuming that the networks correspond to points on a one-dimensional manifold in a higher dimensional ambient space, we propose an algorithm to consistently predict the response at an unlabeled network. Our model involves a specific multiple random network model, namely… ▽ More Our paper deals with a collection of networks on a common set of nodes, where some of the networks are associated with responses. Assuming that the networks correspond to points on a one-dimensional manifold in a higher dimensional ambient space, we propose an algorithm to consistently predict the response at an unlabeled network. Our model involves a specific multiple random network model, namely the common subspace independent edge model, where the networks share a common invariant subspace, and the heterogeneity amongst the networks is captured by a set of low dimensional matrices. Our algorithm estimates these low dimensional matrices that capture the heterogeneity of the networks, learns the underlying manifold by isomap, and consistently predicts the response at an unlabeled network. We provide theoretical justifications for the use of our algorithm, validated by numerical simulations. Finally, we demonstrate the use of our algorithm on larval Drosophila connectome data. △ Less

Submitted 6 May, 2024; originally announced May 2024.

arXiv:2405.01813 [pdf, other]

doi 10.1145/3555041.3589674

Towards Building Autonomous Data Services on Azure

Authors: Yiwen Zhu, Yuanyuan Tian, Joyce Cahoon, Subru Krishnan, Ankita Agarwal, Rana Alotaibi, Jesús Camacho-Rodríguez, Bibin Chundatt, Andrew Chung, Niharika Dutta, Andrew Fogarty, Anja Gruenheid, Brandon Haynes, Matteo Interlandi, Minu Iyer, Nick Jurgens, Sumeet Khushalani, Brian Kroth, Manoj Kumar, Jyoti Leeka, Sergiy Matusevych, Minni Mittal, Andreas Mueller, Kartheek Muthyala, Harsha Nagulapalli , et al. (13 additional authors not shown)

Abstract: Modern cloud has turned data services into easily accessible commodities. With just a few clicks, users are now able to access a catalog of data processing systems for a wide range of tasks. However, the cloud brings in both complexity and opportunity. While cloud users can quickly start an application by using various data services, it can be difficult to configure and optimize these services to… ▽ More Modern cloud has turned data services into easily accessible commodities. With just a few clicks, users are now able to access a catalog of data processing systems for a wide range of tasks. However, the cloud brings in both complexity and opportunity. While cloud users can quickly start an application by using various data services, it can be difficult to configure and optimize these services to gain the most value from them. For cloud providers, managing every aspect of an ever-increasing set of data services, while meeting customer SLAs and minimizing operational cost is becoming more challenging. Cloud technology enables the collection of significant amounts of workload traces and system telemetry. With the progress in data science (DS) and machine learning (ML), it is feasible and desirable to utilize a data-driven, ML-based approach to automate various aspects of data services, resulting in the creation of autonomous data services. This paper presents our perspectives and insights on creating autonomous data services on Azure. It also covers the future endeavors we plan to undertake and unresolved issues that still need attention. △ Less

Submitted 2 May, 2024; originally announced May 2024.

Comments: SIGMOD Companion of the 2023 International Conference on Management of Data. 2023

arXiv:2405.00828 [pdf, other]

WIBA: What Is Being Argued? A Comprehensive Approach to Argument Mining

Authors: Arman Irani, Ju Yeon Park, Kevin Esterling, Michalis Faloutsos

Abstract: We propose WIBA, a novel framework and suite of methods that enable the comprehensive understanding of "What Is Being Argued" across contexts. Our approach develops a comprehensive framework that detects: (a) the existence, (b) the topic, and (c) the stance of an argument, correctly accounting for the logical dependence among the three tasks. Our algorithm leverages the fine-tuning and prompt-engi… ▽ More We propose WIBA, a novel framework and suite of methods that enable the comprehensive understanding of "What Is Being Argued" across contexts. Our approach develops a comprehensive framework that detects: (a) the existence, (b) the topic, and (c) the stance of an argument, correctly accounting for the logical dependence among the three tasks. Our algorithm leverages the fine-tuning and prompt-engineering of Large Language Models. We evaluate our approach and show that it performs well in all the three capabilities. First, we develop and release an Argument Detection model that can classify a piece of text as an argument with an F1 score between 79% and 86% on three different benchmark datasets. Second, we release a language model that can identify the topic being argued in a sentence, be it implicit or explicit, with an average similarity score of 71%, outperforming current naive methods by nearly 40%. Finally, we develop a method for Argument Stance Classification, and evaluate the capability of our approach, showing it achieves a classification F1 score between 71% and 78% across three diverse benchmark datasets. Our evaluation demonstrates that WIBA allows the comprehensive understanding of What Is Being Argued in large corpora across diverse contexts, which is of core interest to many applications in linguistics, communication, and social and computer science. To facilitate accessibility to the advancements outlined in this work, we release WIBA as a free open access platform (wiba.dev). △ Less

Submitted 1 May, 2024; originally announced May 2024.

Comments: 8 pages, 2 figures, submitted to The 16th International Conference on Advances in Social Networks Analysis and Mining (ASONAM) '24

arXiv:2404.17206 [pdf]

Rytov Approximation of Vectorial Waves by Modifying Scattering Matrixes: Precise Reconstruction of Dielectric Tensor Tomography

Authors: ChulMin Oh, Herve Hugonnet, Juheon Lee, YongKeun Park

Abstract: Analyzing 3D anisotropic materials presents significant challenges, especially when assessing 3D orientations, material distributions, and anisotropies through scattered light, due to the inherently vectorial nature of light-matter interactions. In this study, we formulate a scattering theory based on the Rytov approximation, commonly employed in scalar wave tomography, tailored to accommodate vec… ▽ More Analyzing 3D anisotropic materials presents significant challenges, especially when assessing 3D orientations, material distributions, and anisotropies through scattered light, due to the inherently vectorial nature of light-matter interactions. In this study, we formulate a scattering theory based on the Rytov approximation, commonly employed in scalar wave tomography, tailored to accommodate vector waves by modifying the scattering matrix. Using this formulation, we investigate the intricate 3D structure of liquid crystals with multiple topological defects exploiting dielectric tensor tomography. By leveraging dielectric tensor tomography, we successfully visualize these topological defects in three dimensions, a task that conventional 2D imaging techniques fail to achieve. △ Less

Submitted 26 April, 2024; originally announced April 2024.

arXiv:2404.14687 [pdf, other]

Pegasus-v1 Technical Report

Authors: Raehyuk Jung, Hyojun Go, Jaehyuk Yi, Jiho Jang, Daniel Kim, Jay Suh, Aiden Lee, Cooper Han, Jae Lee, Jeff Kim, Jin-Young Kim, Junwan Kim, Kyle Park, Lucas Lee, Mars Ha, Minjoon Seo, Abraham Jo, Ed Park, Hassan Kianinejad, SJ Kim, Tony Moon, Wade Jeong, Andrei Popescu, Esther Kim, EK Yoon , et al. (19 additional authors not shown)

Abstract: This technical report introduces Pegasus-1, a multimodal language model specialized in video content understanding and interaction through natural language. Pegasus-1 is designed to address the unique challenges posed by video data, such as interpreting spatiotemporal information, to offer nuanced video content comprehension across various lengths. This technical report overviews Pegasus-1's archi… ▽ More This technical report introduces Pegasus-1, a multimodal language model specialized in video content understanding and interaction through natural language. Pegasus-1 is designed to address the unique challenges posed by video data, such as interpreting spatiotemporal information, to offer nuanced video content comprehension across various lengths. This technical report overviews Pegasus-1's architecture, training strategies, and its performance in benchmarks on video conversation, zero-shot video question answering, and video summarization. We also explore qualitative characteristics of Pegasus-1 , demonstrating its capabilities as well as its limitations, in order to provide readers a balanced view of its current state and its future direction. △ Less

Submitted 22 April, 2024; originally announced April 2024.

arXiv:2404.08080 [pdf, other]

Variance-reduced Zeroth-Order Methods for Fine-Tuning Language Models

Authors: Tanmay Gautam, Youngsuk Park, Hao Zhou, Parameswaran Raman, Wooseok Ha

Abstract: Fine-tuning language models (LMs) has demonstrated success in a wide array of downstream tasks. However, as LMs are scaled up, the memory requirements for backpropagation become prohibitively high. Zeroth-order (ZO) optimization methods can leverage memory-efficient forward passes to estimate gradients. More recently, MeZO, an adaptation of ZO-SGD, has been shown to consistently outperform zero-sh… ▽ More Fine-tuning language models (LMs) has demonstrated success in a wide array of downstream tasks. However, as LMs are scaled up, the memory requirements for backpropagation become prohibitively high. Zeroth-order (ZO) optimization methods can leverage memory-efficient forward passes to estimate gradients. More recently, MeZO, an adaptation of ZO-SGD, has been shown to consistently outperform zero-shot and in-context learning when combined with suitable task prompts. In this work, we couple ZO methods with variance reduction techniques to enhance stability and convergence for inference-based LM fine-tuning. We introduce Memory-Efficient Zeroth-Order Stochastic Variance-Reduced Gradient (MeZO-SVRG) and demonstrate its efficacy across multiple LM fine-tuning tasks, eliminating the reliance on task-specific prompts. Evaluated across a range of both masked and autoregressive LMs on benchmark GLUE tasks, MeZO-SVRG outperforms MeZO with up to 20% increase in test accuracies in both full- and partial-parameter fine-tuning settings. MeZO-SVRG benefits from reduced computation time as it often surpasses MeZO's peak test accuracy with a $2\times$ reduction in GPU-hours. MeZO-SVRG significantly reduces the required memory footprint compared to first-order SGD, i.e. by $2\times$ for autoregressive models. Our experiments highlight that MeZO-SVRG's memory savings progressively improve compared to SGD with larger batch sizes. △ Less

Submitted 11 April, 2024; originally announced April 2024.

Comments: 29 pages, 25 tables, 9 figures

arXiv:2404.07335 [pdf, other]

Finite-temperature renormalization of Standard Model coupled with gravity, and its implications for cosmology

Authors: I. Y. Park

Abstract: Finite-temperature one-loop renormalization of the Standard Model, coupled with dynamic metrics, is conducted in this study. The entire analysis is coherently carried out by using the refined background field method, applied in the spirit of the Coleman-Weinberg technique. The general form of the propagator, introduced in our previous work to facilitate Feynman diagram computation in a general cur… ▽ More Finite-temperature one-loop renormalization of the Standard Model, coupled with dynamic metrics, is conducted in this study. The entire analysis is coherently carried out by using the refined background field method, applied in the spirit of the Coleman-Weinberg technique. The general form of the propagator, introduced in our previous work to facilitate Feynman diagram computation in a general curved background, proves useful in the presence of time-dependent temperature. Its utilization allows for the analysis of a FLRW background to essentially reduce to that of a constant finite-T flat spacetime. Notably, this method also provides an effective tool for addressing one of the longstanding problems in QCD, the Linde problem. The implications of our findings for cosmology, particularly the cosmological constant problem and Hubble tension, are discussed. △ Less

Submitted 8 July, 2024; v1 submitted 10 April, 2024; originally announced April 2024.

Comments: 35 (32+3) pages, 5 figures, preliminary numerical results on Hubble tension added, clarifications added, minor changes

arXiv:2404.07308 [pdf, other]

Spatial Transfer Learning for Estimating PM2.5 in Data-poor Regions

Authors: Shrey Gupta, Yongbee Park, Jianzhao Bi, Suyash Gupta, Andreas Züfle, Avani Wildani, Yang Liu

Abstract: Air pollution, especially particulate matter 2.5 (PM2.5), is a pressing concern for public health and is difficult to estimate in developing countries (data-poor regions) due to a lack of ground sensors. Transfer learning models can be leveraged to solve this problem, as they use alternate data sources to gain knowledge (i.e., data from data-rich regions). However, current transfer learning method… ▽ More Air pollution, especially particulate matter 2.5 (PM2.5), is a pressing concern for public health and is difficult to estimate in developing countries (data-poor regions) due to a lack of ground sensors. Transfer learning models can be leveraged to solve this problem, as they use alternate data sources to gain knowledge (i.e., data from data-rich regions). However, current transfer learning methodologies do not account for dependencies between the source and the target domains. We recognize this transfer problem as spatial transfer learning and propose a new feature named Latent Dependency Factor (LDF) that captures spatial and semantic dependencies of both domains and is subsequently added to the feature spaces of the domains. We generate LDF using a novel two-stage autoencoder model that learns from clusters of similar source and target domain data. Our experiments show that transfer learning models using LDF have a 19.34% improvement over the baselines. We additionally support our experiments with qualitative findings. △ Less

Submitted 22 June, 2024; v1 submitted 10 April, 2024; originally announced April 2024.

Comments: Accepted for publication at ECML-PKDD 2024

Showing 1–50 of 1,027 results for author: Park, Y