Search | arXiv e-print repository

Is Flash Attention Stable?

Authors: Alicia Golden, Samuel Hsia, Fei Sun, Bilge Acun, Basil Hosmer, Yejin Lee, Zachary DeVito, Jeff Johnson, Gu-Yeon Wei, David Brooks, Carole-Jean Wu

Abstract: Training large-scale machine learning models poses distinct system challenges, given both the size and complexity of today's workloads. Recently, many organizations training state-of-the-art Generative AI models have reported cases of instability during training, often taking the form of loss spikes. Numeric deviation has emerged as a potential cause of this training instability, although quantify… ▽ More Training large-scale machine learning models poses distinct system challenges, given both the size and complexity of today's workloads. Recently, many organizations training state-of-the-art Generative AI models have reported cases of instability during training, often taking the form of loss spikes. Numeric deviation has emerged as a potential cause of this training instability, although quantifying this is especially challenging given the costly nature of training runs. In this work, we develop a principled approach to understanding the effects of numeric deviation, and construct proxies to put observations into context when downstream effects are difficult to quantify. As a case study, we apply this framework to analyze the widely-adopted Flash Attention optimization. We find that Flash Attention sees roughly an order of magnitude more numeric deviation as compared to Baseline Attention at BF16 when measured during an isolated forward pass. We then use a data-driven analysis based on the Wasserstein Distance to provide upper bounds on how this numeric deviation impacts model weights during training, finding that the numerical deviation present in Flash Attention is 2-5 times less significant than low-precision training. △ Less

Submitted 4 May, 2024; originally announced May 2024.

arXiv:2312.14385 [pdf, other]

Generative AI Beyond LLMs: System Implications of Multi-Modal Generation

Authors: Alicia Golden, Samuel Hsia, Fei Sun, Bilge Acun, Basil Hosmer, Yejin Lee, Zachary DeVito, Jeff Johnson, Gu-Yeon Wei, David Brooks, Carole-Jean Wu

Abstract: As the development of large-scale Generative AI models evolve beyond text (1D) generation to include image (2D) and video (3D) generation, processing spatial and temporal information presents unique challenges to quality, performance, and efficiency. We present the first work towards understanding this new system design space for multi-modal text-to-image (TTI) and text-to-video (TTV) generation m… ▽ More As the development of large-scale Generative AI models evolve beyond text (1D) generation to include image (2D) and video (3D) generation, processing spatial and temporal information presents unique challenges to quality, performance, and efficiency. We present the first work towards understanding this new system design space for multi-modal text-to-image (TTI) and text-to-video (TTV) generation models. Current model architecture designs are bifurcated into 2 categories: Diffusion- and Transformer-based models. Our systematic performance characterization on a suite of eight representative TTI/TTV models shows that after state-of-the-art optimization techniques such as Flash Attention are applied, Convolution accounts for up to 44% of execution time for Diffusion-based TTI models, while Linear layers consume up to 49% of execution time for Transformer-based models. We additionally observe that Diffusion-based TTI models resemble the Prefill stage of LLM inference, and benefit from 1.1-2.5x greater speedup from Flash Attention than Transformer-based TTI models that resemble the Decode phase. Since optimizations designed for LLMs do not map directly onto TTI/TTV models, we must conduct a thorough characterization of these workloads to gain insights for new optimization opportunities. In doing so, we define sequence length in the context of TTI/TTV models and observe sequence length can vary up to 4x in Diffusion model inference. We additionally observe temporal aspects of TTV workloads pose unique system bottlenecks, with Temporal Attention accounting for over 60% of total Attention time. Overall, our in-depth system performance characterization is a critical first step towards designing efficient and deployable systems for emerging TTI/TTV workloads. △ Less

Submitted 5 May, 2024; v1 submitted 21 December, 2023; originally announced December 2023.

Comments: Published at 2024 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)

arXiv:2312.11805 [pdf, other]

Gemini: A Family of Highly Capable Multimodal Models

Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultra model advances the state of the art in 30 of 32 of these benchmarks - notably being the first model to achieve human-expert performance on the well-studied exam benchmark MMLU, and improving the state of the art in every one of the 20 multimodal benchmarks we examined. We believe that the new capabilities of the Gemini family in cross-modal reasoning and language understanding will enable a wide variety of use cases. We discuss our approach toward post-training and deploying Gemini models responsibly to users through services including Gemini, Gemini Advanced, Google AI Studio, and Cloud Vertex AI. △ Less

Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

arXiv:2310.04799 [pdf, other]

Chat Vector: A Simple Approach to Equip LLMs with Instruction Following and Model Alignment in New Languages

Authors: Shih-Cheng Huang, Pin-Zu Li, Yu-Chi Hsu, Kuang-Ming Chen, Yu Tung Lin, Shih-Kai Hsiao, Richard Tzong-Han Tsai, Hung-yi Lee

Abstract: Recently, the development of open-source large language models (LLMs) has advanced rapidly. Nevertheless, due to data constraints, the capabilities of most open-source LLMs are primarily focused on English. To address this issue, we introduce the concept of $\textit{chat vector}$ to equip pre-trained language models with instruction following and human value alignment via simple model arithmetic.… ▽ More Recently, the development of open-source large language models (LLMs) has advanced rapidly. Nevertheless, due to data constraints, the capabilities of most open-source LLMs are primarily focused on English. To address this issue, we introduce the concept of $\textit{chat vector}$ to equip pre-trained language models with instruction following and human value alignment via simple model arithmetic. The chat vector is derived by subtracting the weights of a pre-trained base model (e.g. LLaMA2) from those of its corresponding chat model (e.g. LLaMA2-chat). By simply adding the chat vector to a continual pre-trained model's weights, we can endow the model with chat capabilities in new languages without the need for further training. Our empirical studies demonstrate the superior efficacy of the chat vector from three different aspects: instruction following, toxicity mitigation, and multi-turn dialogue. Moreover, to showcase the adaptability of our approach, we extend our experiments to encompass various languages, base models, and chat vectors. The results underscore the chat vector's simplicity, effectiveness, and wide applicability, making it a compelling solution for efficiently enabling conversational capabilities in pre-trained language models. Our code is available at https://github.com/aqweteddy/ChatVector. △ Less

Submitted 7 June, 2024; v1 submitted 7 October, 2023; originally announced October 2023.

Comments: ACL 2024 camera-ready version

arXiv:2310.02784 [pdf, other]

MAD Max Beyond Single-Node: Enabling Large Machine Learning Model Acceleration on Distributed Systems

Authors: Samuel Hsia, Alicia Golden, Bilge Acun, Newsha Ardalani, Zachary DeVito, Gu-Yeon Wei, David Brooks, Carole-Jean Wu

Abstract: Training and deploying large-scale machine learning models is time-consuming, requires significant distributed computing infrastructures, and incurs high operational costs. Our analysis, grounded in real-world large model training on datacenter-scale infrastructures, reveals that 14~32% of all GPU hours are spent on communication with no overlapping computation. To minimize this outstanding commun… ▽ More Training and deploying large-scale machine learning models is time-consuming, requires significant distributed computing infrastructures, and incurs high operational costs. Our analysis, grounded in real-world large model training on datacenter-scale infrastructures, reveals that 14~32% of all GPU hours are spent on communication with no overlapping computation. To minimize this outstanding communication latency and other inherent at-scale inefficiencies, we introduce an agile performance modeling framework, MAD-Max. This framework is designed to optimize parallelization strategies and facilitate hardware-software co-design opportunities. Through the application of MAD-Max to a suite of real-world large-scale ML models on state-of-the-art GPU clusters, we showcase potential throughput enhancements of up to 2.24x for pre-training and up to 5.2x for inference scenarios, respectively. △ Less

Submitted 10 June, 2024; v1 submitted 4 October, 2023; originally announced October 2023.

Comments: ISCA 2024

arXiv:2309.01383 [pdf, other]

LoRA-like Calibration for Multimodal Deception Detection using ATSFace Data

Authors: Shun-Wen Hsiao, Cheng-Yuan Sun

Abstract: Recently, deception detection on human videos is an eye-catching techniques and can serve lots applications. AI model in this domain demonstrates the high accuracy, but AI tends to be a non-interpretable black box. We introduce an attention-aware neural network addressing challenges inherent in video data and deception dynamics. This model, through its continuous assessment of visual, audio, and t… ▽ More Recently, deception detection on human videos is an eye-catching techniques and can serve lots applications. AI model in this domain demonstrates the high accuracy, but AI tends to be a non-interpretable black box. We introduce an attention-aware neural network addressing challenges inherent in video data and deception dynamics. This model, through its continuous assessment of visual, audio, and text features, pinpoints deceptive cues. We employ a multimodal fusion strategy that enhances accuracy; our approach yields a 92\% accuracy rate on a real-life trial dataset. Most important of all, the model indicates the attention focus in the videos, providing valuable insights on deception cues. Hence, our method adeptly detects deceit and elucidates the underlying process. We further enriched our study with an experiment involving students answering questions either truthfully or deceitfully, resulting in a new dataset of 309 video clips, named ATSFace. Using this, we also introduced a calibration method, which is inspired by Low-Rank Adaptation (LoRA), to refine individual-based deception detection accuracy. △ Less

Submitted 4 September, 2023; originally announced September 2023.

Comments: 10 pages, 9 figures

arXiv:2302.10872 [pdf, other]

MP-Rec: Hardware-Software Co-Design to Enable Multi-Path Recommendation

Authors: Samuel Hsia, Udit Gupta, Bilge Acun, Newsha Ardalani, Pan Zhong, Gu-Yeon Wei, David Brooks, Carole-Jean Wu

Abstract: Deep learning recommendation systems serve personalized content under diverse tail-latency targets and input-query loads. In order to do so, state-of-the-art recommendation models rely on terabyte-scale embedding tables to learn user preferences over large bodies of contents. The reliance on a fixed embedding representation of embedding tables not only imposes significant memory capacity and bandw… ▽ More Deep learning recommendation systems serve personalized content under diverse tail-latency targets and input-query loads. In order to do so, state-of-the-art recommendation models rely on terabyte-scale embedding tables to learn user preferences over large bodies of contents. The reliance on a fixed embedding representation of embedding tables not only imposes significant memory capacity and bandwidth requirements but also limits the scope of compatible system solutions. This paper challenges the assumption of fixed embedding representations by showing how synergies between embedding representations and hardware platforms can lead to improvements in both algorithmic- and system performance. Based on our characterization of various embedding representations, we propose a hybrid embedding representation that achieves higher quality embeddings at the cost of increased memory and compute requirements. To address the system performance challenges of the hybrid representation, we propose MP-Rec -- a co-design technique that exploits heterogeneity and dynamic selection of embedding representations and underlying hardware platforms. On real system hardware, we demonstrate how matching custom accelerators, i.e., GPUs, TPUs, and IPUs, with compatible embedding representations can lead to 16.65x performance speedup. Additionally, in query-serving scenarios, MP-Rec achieves 2.49x and 3.76x higher correct prediction throughput and 0.19% and 0.22% better model quality on a CPU-GPU system for the Kaggle and Terabyte datasets, respectively. △ Less

Submitted 21 February, 2023; originally announced February 2023.

ACM Class: C.1; H.0

arXiv:2209.00263 [pdf, other]

Attack Tactic Identification by Transfer Learning of Language Model

Authors: Ling-Hsuan Lin, Shun-Wen Hsiao

Abstract: Cybersecurity has become a primary global concern with the rapid increase in security attacks and data breaches. Artificial intelligence is promising to help humans analyzing and identifying attacks. However, labeling millions of packets for supervised learning is never easy. This study aims to leverage transfer learning technique that stores the knowledge gained from well-defined attack lifecycle… ▽ More Cybersecurity has become a primary global concern with the rapid increase in security attacks and data breaches. Artificial intelligence is promising to help humans analyzing and identifying attacks. However, labeling millions of packets for supervised learning is never easy. This study aims to leverage transfer learning technique that stores the knowledge gained from well-defined attack lifecycle documents and applies it to hundred thousands of unlabeled attacks (packets) for identifying their attack tactics. We anticipate the knowledge of an attack is well-described in the documents, and the cutting edge transformer-based language model can embed the knowledge into a high-dimensional latent space. Then, reusing the information from the language model for the learning of attack tactic carried by packets to improve the learning efficiency. We propose a system, PELAT, that fine-tunes BERT model with 1,417 articles from MITRE ATT&CK lifecycle framework to enhance its attack knowledge (including syntax used and semantic meanings embedded). PELAT then transfers its knowledge to perform semi-supervised learning for unlabeled packets to generate their tactic labels. Further, when a new attack packet arrives, the packet payload will be processed by the PELAT language model with a downstream classifier to predict its tactics. In this way, we can effectively reduce the burden of manually labeling big datasets. In a one-week honeypot attack dataset (227 thousand packets per day), PELAT performs 99% of precision, recall, and F1 on testing dataset. PELAT can infer over 99% of tactics on two other testing datasets (while nearly 90% of tactics are identified). △ Less

Submitted 1 September, 2022; originally announced September 2022.

Comments: 13 pages, 7 figures, 6 tables

arXiv:2208.08817 [pdf, other]

Exploring Nanofibrous Networks with X-ray Photon Correlation Spectroscopy

Authors: Tomas Rosén, HongRui He, Ruifu Wang, Korneliya Gordeyeva, Ahmad Reza Motezakker, Andrei Fluerasu, L. Daniel Söderberg, Benjamin S. Hsiao

Abstract: Nanofibrous networks are the foundation and natural building strategy for all life forms on our planet. Apart from providing structural integrity to cells and tissues, they also provide a porous scaffold allowing transport of substances, where the resulting properties rely on the nanoscale network structure. Recently, there has been a great deal of interest in extracting and reassembling biobased… ▽ More Nanofibrous networks are the foundation and natural building strategy for all life forms on our planet. Apart from providing structural integrity to cells and tissues, they also provide a porous scaffold allowing transport of substances, where the resulting properties rely on the nanoscale network structure. Recently, there has been a great deal of interest in extracting and reassembling biobased nanofibers to create sustainable, advanced materials with applications ranging from high-performance textiles to artificial tissues. However, achieving structural control of the extracted nanofibers is challenging as it is strongly dependent on the extraction methods and source materials. Furthermore, the small nanofiber cross-sections and fast Brownian dynamics make them notoriously difficult to characterize in dispersions. In this work, we study the diffusive motion of spherical gold nanoparticles in semi-dilute networks of cellulose nanofibers (CNFs) using X-ray Photon Correlation Spectroscopy (XPCS). We find that the motion becomes increasingly subdiffusive with higher CNF concentration, where the dynamics can be decomposed into several superdiffusive relaxation modes in reciprocal space. Using simulations of confined Brownian dynamics in combination with simulated XPCS-experiments, we observe that the dynamic modes can be connected to pore sizes and inter-pore transport properties in the network. The demonstrated analytical strategy by combining experiments using tracer particles with a digital twin may be the key to understand nanoscale properties of nanofibrous networks. △ Less

Submitted 18 August, 2022; originally announced August 2022.

Comments: 57 pages, 17 figures, supplementary material

arXiv:2208.05476 [pdf, other]

Sequence Feature Extraction for Malware Family Analysis via Graph Neural Network

Authors: S. W. Hsiao, P. Y. Chu

Abstract: Malicious software (malware) causes much harm to our devices and life. We are eager to understand the malware behavior and the threat it made. Most of the record files of malware are variable length and text-based files with time stamps, such as event log data and dynamic analysis profiles. Using the time stamps, we can sort such data into sequence-based data for the following analysis. However, d… ▽ More Malicious software (malware) causes much harm to our devices and life. We are eager to understand the malware behavior and the threat it made. Most of the record files of malware are variable length and text-based files with time stamps, such as event log data and dynamic analysis profiles. Using the time stamps, we can sort such data into sequence-based data for the following analysis. However, dealing with the text-based sequences with variable lengths is difficult. In addition, unlike natural language text data, most sequential data in information security have specific properties and structure, such as loop, repeated call, noise, etc. To deeply analyze the API call sequences with their structure, we use graphs to represent the sequences, which can further investigate the information and structure, such as the Markov model. Therefore, we design and implement an Attention Aware Graph Neural Network (AWGCN) to analyze the API call sequences. Through AWGCN, we can obtain the sequence embeddings to analyze the behavior of the malware. Moreover, the classification experiment result shows that AWGCN outperforms other classifiers in the call-like datasets, and the embedding can further improve the classic model's performance. △ Less

Submitted 10 August, 2022; originally announced August 2022.

Comments: 13 pages

arXiv:2205.13778 [pdf, other]

doi 10.1063/5.0102393

Temporally-ultralong biphotons with a linewidth of 50 kHzきろへるつ

Authors: Yu-Sheng Wang, Kai-Bo Li, Chao-Feng Chang, Tan-Wen Lin, Jian-Qing Li, Shih-Si Hsiao, Jia-Mou Chen, Yi-Hua Lai, Ying-Cheng Chen, Yong-Fan Chen, Chih-Sung Chuu, Ite A. Yu

Abstract: We report the generation of biphotons, with a temporal full width at the half maximum (FWHM) of 13.4$\pm$0.3 $μみゅー$s and a spectral FWHM of 50$\pm$1 kHzきろへるつ, via the process of spontaneous four-wave mixing. The temporal width is the longest, and the spectral linewidth is the narrowest up to date. This is also the first biphoton result that obtains a linewidth below 100 kHzきろへるつ, reaching a new milestone. The… ▽ More We report the generation of biphotons, with a temporal full width at the half maximum (FWHM) of 13.4$\pm$0.3 $μみゅー$s and a spectral FWHM of 50$\pm$1 kHzきろへるつ, via the process of spontaneous four-wave mixing. The temporal width is the longest, and the spectral linewidth is the narrowest up to date. This is also the first biphoton result that obtains a linewidth below 100 kHzきろへるつ, reaching a new milestone. The very long biphoton wave packet has a signal-to-background ratio of 3.4, which violates the Cauchy-Schwarz inequality for classical light by 4.8 folds. Furthermore, we demonstrated a highly-tunable-linewidth biphoton source and showed that while the biphoton source's temporal and spectral width were controllably varied by about 24 folds, its generation rate only changed by less than 15\%. A spectral brightness or generation rate per pump power per linewidth of 1.2$\times$10$^6$ pairs/(s$\cdot$mW$\cdot$MHz) was achieved at the temporal width of 13.4 $μみゅー$s. The above results were made possible by the low decoherence rate and high optical depth of the experimental system, as well as the nearly phase-mismatch-free scheme employed in the experiment. This work has demonstrated a high-efficiency ultranarrow-linewidth biphoton source, and has made a substantial advancement in the quantum technology utilizing heralded single photons. △ Less

Submitted 27 May, 2022; originally announced May 2022.

Comments: 9 pages, 4 figures, 1 table

Journal ref: APL Photon. 7, 126102 (2022)

arXiv:2203.15242 [pdf, other]

doi 10.1103/PhysRevA.106.023709

Temporal profile of biphotons generated from a hot atomic vapor and spectrum of electromagnetically induced transparency

Authors: Shih-Si Hsiao, Wei-Kai Huang, Yi-Min Lin, Jia-Mou Chen, Chia-Yu Hsu, Ite A. Yu

Abstract: We systematically studied the temporal profile of biphotons, i.e., pairs of time-correlated single photons, generated from a hot atomic vapor via the spontaneous four-wave mixing process. The measured temporal width of biphoton wave packet or two-photon correlation function against the coupling power was varied from about 70 to 580 ns. We derived an analytical expression of the biphoton's spectral… ▽ More We systematically studied the temporal profile of biphotons, i.e., pairs of time-correlated single photons, generated from a hot atomic vapor via the spontaneous four-wave mixing process. The measured temporal width of biphoton wave packet or two-photon correlation function against the coupling power was varied from about 70 to 580 ns. We derived an analytical expression of the biphoton's spectral profile in the Doppler-broadened medium. The analytical expression reveals that the spectral profile is mainly determined by the effect of electromagnetically induced transparency (EIT), and behaves like a Lorentzian function with a linewidth approximately equal to the EIT linewidth. Consequently, the biphoton's temporal profile influenced by the Doppler broadening is an exponential-decay function, which was consistent with the experimental data. Employing a weak input probe field of classical light, we further measured the EIT spectra under the same experimental conditions as those in the biphoton measurements. The theoretical predictions of the biphoton wave packets calculated with the parameters determined by the classical-light EIT spectra are consistent with the experimental data. The consistency demonstrates that in the Doppler-broadened medium, the classical-light EIT spectrum is a good indicator for the biphoton's temporal profile. Besides, the measured biphoton's temporal widths well approximated to the predictions of the analytical formula based on the biphoton's EIT effect. This study provides an analytical way to quantitatively understand the biphoton's spectral and temporal profiles in the Doppler-broadened medium. △ Less

Submitted 5 April, 2022; v1 submitted 29 March, 2022; originally announced March 2022.

arXiv:2110.06600 [pdf, other]

doi 10.1103/PhysRevResearch.4.023024

Increasing decoherence rate of Rydberg polaritons due to accumulating dark Rydberg atoms

Authors: Ko-Tang Chen, Bongjune Kim, Chia-Chen Su, Shih-Si Hsiao, Shou-Jou Huang, Wen-Te Liao, Ite A. Yu

Abstract: We experimentally observed an accumulative type of nonlinear attenuation and distortion of slow light, i.e., Rydberg polaritons, with the Rydberg state $|32D_{5/2}\rangle$ in the weak-interaction regime. The present effect of attenuation and distortion cannot be explained by considering only the dipole-dipole interaction (DDI) between Rydberg atoms in $|32D_{5/2}\rangle$. Our observation can be at… ▽ More We experimentally observed an accumulative type of nonlinear attenuation and distortion of slow light, i.e., Rydberg polaritons, with the Rydberg state $|32D_{5/2}\rangle$ in the weak-interaction regime. The present effect of attenuation and distortion cannot be explained by considering only the dipole-dipole interaction (DDI) between Rydberg atoms in $|32D_{5/2}\rangle$. Our observation can be attributed to the atoms in the dark Rydberg states other than those in the bright Rydberg state, i.e., $|32D_{5/2}\rangle$, driven by the coupling field. The dark Rydberg states are all the possible states, in which the population decaying from $|32D_{5/2}\rangle$ accumulated over time, and they were not driven by the coupling field. Consequently, the DDI between the dark and bright Rydberg atoms increased the decoherence rate of the Rydberg polaritons. We performed three different experiments to verify the above hypothesis, to confirm the existence of the dark Rydberg states, and to measure the decay rate from the bright to dark Rydberg states. In the theoretical model, we included the decay process from the bright to dark Rydberg states and the DDI effect induced by both the bright and dark Rydberg atoms. All the experimental data of slow light taken at various probe Rabi frequencies were in good agreement with the theoretical predictions based on the model. This study pointed out an additional decoherence rate in the Rydberg-EIT effect, and provides a better understanding of the Rydberg-polariton system. △ Less

Submitted 9 April, 2022; v1 submitted 13 October, 2021; originally announced October 2021.

Journal ref: Phys. Rev. Research 4, 023024 (2022)

arXiv:2109.09062 [pdf, other]

doi 10.1103/PhysRevResearch.4.023132

Room-temperature biphoton source with a spectral brightness near the ultimate limit

Authors: Jia-Mou Chen, Chia-Yu Hsu, Wei-Kai Huang, Shih-Si Hsiao, Fu-Chen Huang, Yi-Hsin Chen, Chih-Sung Chuu, Ying-Cheng Chen, Yong-Fan Chen, Ite A. Yu

Abstract: The biphotons, generated from a hot atomic vapor via the process of spontaneous four-wave mixing (SFWM), have the following merits: stable and tunable frequencies as well as linewidth. Such merits are very useful in the applications of long-distance quantum communication. However, the hot-atom SFWM biphoton sources previously had far lower values of generation rate per linewidth, i.e., spectral br… ▽ More The biphotons, generated from a hot atomic vapor via the process of spontaneous four-wave mixing (SFWM), have the following merits: stable and tunable frequencies as well as linewidth. Such merits are very useful in the applications of long-distance quantum communication. However, the hot-atom SFWM biphoton sources previously had far lower values of generation rate per linewidth, i.e., spectral brightness, as compared with the sources of biphotons generated by the spontaneous parametric down conversion (SPDC) process. Here, we report a hot-atom SFWM source of biphotons with a linewidth of 960 kHzきろへるつ and a generation rate of 3.7$\times$ $10^5$ pairs/s. The high generation rate, together with the narrow linewidth, results in a spectral brightness of 3.8$\times$ $10^5$ pairs/s/MHz, which is 17 times of the previous best result with atomic vapors and also better than all known results with all kinds of media. The all-copropagating scheme together with a large optical depth (OD) of the atomic vapor is the key improvement, enabling the achieved spectral brightness to be about one quarter of the ultimate limit. Furthermore, this biphoton source had a signal-to-background ratio (SBR) of 2.7, which violated the Cauchy-Schwartz inequality for classical light by about 3.6 folds. Although an increasing spectral brightness usually leads to a decreasing SBR, our systematic study indicates that both of the present spectral brightness and SBR can be enhanced by further increasing the OD. This work demonstrates a significant advancement and provides useful knowledge in the quantum technology using photons. △ Less

Submitted 8 May, 2022; v1 submitted 19 September, 2021; originally announced September 2021.

Journal ref: Phys. Rev. Research 4 (2022), 023132

arXiv:2105.08820 [pdf, other]

RecPipe: Co-designing Models and Hardware to Jointly Optimize Recommendation Quality and Performance

Authors: Udit Gupta, Samuel Hsia, Jeff Zhang, Mark Wilkening, Javin Pombra, Hsien-Hsin S. Lee, Gu-Yeon Wei, Carole-Jean Wu, David Brooks

Abstract: Deep learning recommendation systems must provide high quality, personalized content under strict tail-latency targets and high system loads. This paper presents RecPipe, a system to jointly optimize recommendation quality and inference performance. Central to RecPipe is decomposing recommendation models into multi-stage pipelines to maintain quality while reducing compute complexity and exposing… ▽ More Deep learning recommendation systems must provide high quality, personalized content under strict tail-latency targets and high system loads. This paper presents RecPipe, a system to jointly optimize recommendation quality and inference performance. Central to RecPipe is decomposing recommendation models into multi-stage pipelines to maintain quality while reducing compute complexity and exposing distinct parallelism opportunities. RecPipe implements an inference scheduler to map multi-stage recommendation engines onto commodity, heterogeneous platforms (e.g., CPUs, GPUs).While the hardware-aware scheduling improves ranking efficiency, the commodity platforms suffer from many limitations requiring specialized hardware. Thus, we design RecPipeAccel (RPAccel), a custom accelerator that jointly optimizes quality, tail-latency, and system throughput. RPAc-cel is designed specifically to exploit the distinct design space opened via RecPipe. In particular, RPAccel processes queries in sub-batches to pipeline recommendation stages, implements dual static and dynamic embedding caches, a set of top-k filtering units, and a reconfigurable systolic array. Com-pared to prior-art and at iso-quality, we demonstrate that RPAccel improves latency and throughput by 3x and 6x. △ Less

Submitted 22 May, 2021; v1 submitted 18 May, 2021; originally announced May 2021.

arXiv:2102.00075 [pdf, other]

RecSSD: Near Data Processing for Solid State Drive Based Recommendation Inference

Authors: Mark Wilkening, Udit Gupta, Samuel Hsia, Caroline Trippel, Carole-Jean Wu, David Brooks, Gu-Yeon Wei

Abstract: Neural personalized recommendation models are used across a wide variety of datacenter applications including search, social media, and entertainment. State-of-the-art models comprise large embedding tables that have billions of parameters requiring large memory capacities. Unfortunately, large and fast DRAM-based memories levy high infrastructure costs. Conventional SSD-based storage solutions of… ▽ More Neural personalized recommendation models are used across a wide variety of datacenter applications including search, social media, and entertainment. State-of-the-art models comprise large embedding tables that have billions of parameters requiring large memory capacities. Unfortunately, large and fast DRAM-based memories levy high infrastructure costs. Conventional SSD-based storage solutions offer an order of magnitude larger capacity, but have worse read latency and bandwidth, degrading inference performance. RecSSD is a near data processing based SSD memory system customized for neural recommendation inference that reduces end-to-end model inference latency by 2X compared to using COTS SSDs across eight industry-representative models. △ Less

Submitted 29 January, 2021; originally announced February 2021.

arXiv:2012.04893 [pdf, other]

doi 10.1364/OE.415473

Generation of sub-MHz and spectrally-bright biphotons from hot atomic vapors with a phase mismatch-free scheme

Authors: Chia-Yu Hsu, Yu-Sheng Wang, Jia-Mou Chen, Fu-Chen Huang, Yi-Ting Ke, Emily Kay Huang, Weilun Hung, Kai-Lin Chao, Shih-Si Hsiao, Yi-Hsin Chen, Chih-Sung Chuu, Ying-Cheng Chen, Yong-Fan Chen, Ite A. Yu

Abstract: We utilized the all-copropagating scheme, which maintains the phase-match condition, in the spontaneous four-wave mixing (SFWM) process to generate biphotons from a hot atomic vapor. The scheme enables our biphotons not only to surpass those in the previous works of hot-atom SFWM, but also to compete with the biphotons that are generated by either the cold-atom SFWM or the cavity-assisted spontane… ▽ More We utilized the all-copropagating scheme, which maintains the phase-match condition, in the spontaneous four-wave mixing (SFWM) process to generate biphotons from a hot atomic vapor. The scheme enables our biphotons not only to surpass those in the previous works of hot-atom SFWM, but also to compete with the biphotons that are generated by either the cold-atom SFWM or the cavity-assisted spontaneous parametric down conversion. The biphoton linewidth in this work is tunable for an order of magnitude. As we tuned the linewidth to 610 kHzきろへるつ, the maximum two-photon correlation function, $g_{s,as}^{(2)}$, of the biphotons is 42. This $g_{s,as}^{(2)}$ violates the Cauchy-Schwartz inequality for classical light by 440 folds, and demonstrates that the biphotons have a high purity. The generation rate per linewidth of the 610-kHz biphoton source is 1,500 pairs/(s$\cdot$MHz), which is the best result of all the sub-MHz biphoton sources in the literature. By increasing the pump power by 16 folds, we further enhanced the generation rate per linewidth to 2.3$\times$10$^4$ pairs/(s$\cdot$MHz), while the maximum $g_{s,as}^{(2)}$ became 6.7. In addition, we are able to tune the linewidth down to 290$\pm$20 kHzきろへるつ. This is the narrowest linewidth to date, among all the various kinds of single-mode biphotons. △ Less

Submitted 2 February, 2021; v1 submitted 9 December, 2020; originally announced December 2020.

Journal ref: Optics Express. 29(3): 4632-4644 (2021)

arXiv:2010.05037 [pdf, other]

Cross-Stack Workload Characterization of Deep Recommendation Systems

Authors: Samuel Hsia, Udit Gupta, Mark Wilkening, Carole-Jean Wu, Gu-Yeon Wei, David Brooks

Abstract: Deep learning based recommendation systems form the backbone of most personalized cloud services. Though the computer architecture community has recently started to take notice of deep recommendation inference, the resulting solutions have taken wildly different approaches - ranging from near memory processing to at-scale optimizations. To better design future hardware systems for deep recommendat… ▽ More Deep learning based recommendation systems form the backbone of most personalized cloud services. Though the computer architecture community has recently started to take notice of deep recommendation inference, the resulting solutions have taken wildly different approaches - ranging from near memory processing to at-scale optimizations. To better design future hardware systems for deep recommendation inference, we must first systematically examine and characterize the underlying systems-level impact of design decisions across the different levels of the execution stack. In this paper, we characterize eight industry-representative deep recommendation models at three different levels of the execution stack: algorithms and software, systems platforms, and hardware microarchitectures. Through this cross-stack characterization, we first show that system deployment choices (i.e., CPUs or GPUs, batch size granularity) can give us up to 15x speedup. To better understand the bottlenecks for further optimization, we look at both software operator usage breakdown and CPU frontend and backend microarchitectural inefficiencies. Finally, we model the correlation between key algorithmic model architecture features and hardware bottlenecks, revealing the absence of a single dominant algorithmic component behind each hardware bottleneck. △ Less

Submitted 10 October, 2020; originally announced October 2020.

Comments: Published in 2020 IEEE International Symposium on Workload Characterization (IISWC)

arXiv:2006.13526 [pdf, other]

doi 10.1038/s42005-021-00604-5

A Weakly-Interacting Many-Body System of Rydberg Polaritons Based on Electromagnetically Induced Transparency

Authors: Bongjune Kim, Ko-Tang Chen, Shih-Si Hsiao, Sheng-Yang Wang, Kai-Bo Li, Julius Ruseckas, Gediminas Juzeliunas, Teodora Kirova, Marcis Auzinsh, Ying-Cheng Chen, Yong-Fan Chen, Ite A. Yu

Abstract: We proposed utilizing a medium with a high optical depth (OD) and a Rydberg state of low principal quantum number, $n$, to create a weakly-interacting many-body system of Rydberg polaritons, based on the effect of electromagnetically induced transparency (EIT). We experimentally verified the mean field approach to weakly-interacting Rydberg polaritons, and observed the phase shift and attenuation… ▽ More We proposed utilizing a medium with a high optical depth (OD) and a Rydberg state of low principal quantum number, $n$, to create a weakly-interacting many-body system of Rydberg polaritons, based on the effect of electromagnetically induced transparency (EIT). We experimentally verified the mean field approach to weakly-interacting Rydberg polaritons, and observed the phase shift and attenuation induced by the dipole-dipole interaction (DDI). The DDI-induced phase shift or attenuation can be viewed as a consequence of the elastic or inelastic collisions among the Rydberg polaritons. Using a weakly-interacting system, we further observed that a larger DDI strength caused a width of the momentum distribution of Rydberg polaritons at the exit of the system to become notably smaller as compared with that at the entrance. In this study, we took $n =32$ and the atomic (or polariton) density of 5$\times10^{10}$ (or 2$\times10^{9}$) cm$^{-3}$. The observations demonstrate that the elastic collisions are sufficient to drive the thermalization process in this weakly-interacting many-body system. The combination of the $μみゅー$s-long interaction time due to the high-OD EIT medium and the $μみゅー$m$^2$-size collision cross section due to the DDI suggests a new and feasible platform for the Bose-Einstein condensation of the Rydberg polaritons. △ Less

Submitted 24 May, 2021; v1 submitted 24 June, 2020; originally announced June 2020.

Journal ref: Communications Physics 4, 101 (2021)

arXiv:2006.09190 [pdf, other]

doi 10.1364/OE.401310

Mean field theory of weakly-interacting Rydberg polaritons in the EIT system based on the nearest-neighbor distribution

Authors: Shih-Si Hsiao, Ko-Tang Chen, Ite A. Yu

Abstract: The combination of high optical nonlinearity in the electromagnetically induced transparency (EIT) effect and strong electric dipole-dipole interaction (DDI) among the Rydberg-state atoms can lead to important applications in quantum information processing and many-body physics. One can utilize the Rydberg-EIT system in the strongly-interacting regime to mediate photon-photon interaction or qubit-… ▽ More The combination of high optical nonlinearity in the electromagnetically induced transparency (EIT) effect and strong electric dipole-dipole interaction (DDI) among the Rydberg-state atoms can lead to important applications in quantum information processing and many-body physics. One can utilize the Rydberg-EIT system in the strongly-interacting regime to mediate photon-photon interaction or qubit-qubit operation. One can also employ the Rydberg-EIT system in the weaklyinteracting regime to study the Bose-Einstein condensation of Rydberg polaritons. Most of the present theoretical models dealt with the strongly-interacting cases. Here, we consider the weaklyinteracting regime and develop a mean field model based on the nearest-neighbor distribution. Using the mean field model, we further derive the analytical formulas for the attenuation coefficient and phase shift of the output probe field. The predictions from the formulas are consistent with the experimental data in the weakly-interacting regime, verifying the validity of our model. As the DDI-induced phase shift and attenuation can be seen as the consequences of elastic and inelastic collisions among particles, this work provides a very useful tool for conceiving ideas relevant to the EIT system of weakly-interacting Rydberg polaritons, and for evaluating experimental feasibility. △ Less

Submitted 11 September, 2020; v1 submitted 16 June, 2020; originally announced June 2020.

Journal ref: Optics Express. 28(19): 28414-28429 (2020)

arXiv:2001.02772 [pdf, other]

DeepRecSys: A System for Optimizing End-To-End At-scale Neural Recommendation Inference

Authors: Udit Gupta, Samuel Hsia, Vikram Saraph, Xiaodong Wang, Brandon Reagen, Gu-Yeon Wei, Hsien-Hsin S. Lee, David Brooks, Carole-Jean Wu

Abstract: Neural personalized recommendation is the corner-stone of a wide collection of cloud services and products, constituting significant compute demand of the cloud infrastructure. Thus, improving the execution efficiency of neural recommendation directly translates into infrastructure capacity saving. In this paper, we devise a novel end-to-end modeling infrastructure, DeepRecInfra, that adopts an al… ▽ More Neural personalized recommendation is the corner-stone of a wide collection of cloud services and products, constituting significant compute demand of the cloud infrastructure. Thus, improving the execution efficiency of neural recommendation directly translates into infrastructure capacity saving. In this paper, we devise a novel end-to-end modeling infrastructure, DeepRecInfra, that adopts an algorithm and system co-design methodology to custom-design systems for recommendation use cases. Leveraging the insights from the recommendation characterization, a new dynamic scheduler, DeepRecSched, is proposed to maximize latency-bounded throughput by taking into account characteristics of inference query size and arrival patterns, recommendation model architectures, and underlying hardware systems. By doing so, system throughput is doubled across the eight industry-representative recommendation models. Finally, design, deployment, and evaluation in at-scale production datacenter shows over 30% latency reduction across a wide variety of recommendation models running on hundreds of machines. △ Less

Submitted 8 January, 2020; originally announced January 2020.

arXiv:1902.09845 [pdf, other]

doi 10.1103/PhysRevA.100.013815

Effect of laser frequency fluctuation on the decay rate of Rydberg coherence

Authors: Bongjune Kim, Ko-Tang Chen, Chia-Yu Hsu, Shih-Si Hsiao, Yu-Chih Tseng, Chin-Yuan Lee, Shih-Lun Liang, Yi-Hua Lai, Julius Ruseckas, Gediminas Juzeliunas, Ite A. Yu

Abstract: The effect of electromagnetically induced transparency (EIT) combined with Rydberg-state atoms provides high optical nonlinearity to efficiently mediate the photon-photon interaction. However, the decay rate of Rydberg coherence, i.e., the decoherence rate, plays an important role in optical nonlinear efficiency, and can be largely influenced by laser frequency fluctuation. In this work, we carrie… ▽ More The effect of electromagnetically induced transparency (EIT) combined with Rydberg-state atoms provides high optical nonlinearity to efficiently mediate the photon-photon interaction. However, the decay rate of Rydberg coherence, i.e., the decoherence rate, plays an important role in optical nonlinear efficiency, and can be largely influenced by laser frequency fluctuation. In this work, we carried out a systematic study of the effect of laser frequency fluctuation on the decoherence rate. We derived an analytical formula that quantitatively describes the relationship between the decoherence rate and laser frequency fluctuation. The formula was experimentally verified by using the $Λらむだ$-type EIT system of laser-cooled $^{87}$Rb atoms, in which one can either completely eliminate or controllably introduce the effect of laser frequency fluctuation. We also included the effect of Doppler shift caused by the atomic thermal motion in the formula, which can be negligible in the $Λらむだ$-type EIT experiment but significant in the Rydberg-EIT experiment. Utilizing the atoms of 350 $μみゅー$K, we studied the decoherence rate in the Rydberg-EIT system involving with the state of $|32D_{5/2}\rangle$. The experimental data are consistent with the predictions from the formula. We were able to achieve a rather low decoherence rate of $2πぱい\times$48 kHzきろへるつ at a moderate coupling Rabi frequency of $2πぱい\times$4.3 MHz. △ Less

Submitted 13 June, 2019; v1 submitted 26 February, 2019; originally announced February 2019.

Journal ref: Phys. Rev. A 100, 013815 (2019)

arXiv:1705.01697 [pdf, other]

Virtual Machine Introspection Based Malware Behavior Profiling and Family Grouping

Authors: Shun-Wen Hsiao, Yeali S. Sun, Meng Chang Chen

Abstract: The proliferation of malwares have been attributed to the alternations of a handful of original malware source codes. The malwares alternated from the same origin share some intrinsic behaviors and form a malware family. Expediently, identifying its malware family when a malware is first seen on the Internet can provide useful clues to mitigate the threat. In this paper, a malware profiler (VMP) i… ▽ More The proliferation of malwares have been attributed to the alternations of a handful of original malware source codes. The malwares alternated from the same origin share some intrinsic behaviors and form a malware family. Expediently, identifying its malware family when a malware is first seen on the Internet can provide useful clues to mitigate the threat. In this paper, a malware profiler (VMP) is proposed to profile the execution behaviors of a malware by leveraging virtual machine introspection (VMI) technique. The VMP inserts plug-ins inside the virtual machine monitor (VMM) to record the invoked API calls with their input parameters and return values as the profile of malware. In this paper, a popular similarity measurement Jaccard distance and a phylogenetic tree construction method are adopted to discover malware families. The studies of malware profiles show the malwares from a malware family are very similar to each others and distinct from other malware families as well as benign software. This paper also examines VMP against existing anti-malware detection engines and some well-known malware grouping methods to compare the goodness in their malware family constructions. A peer voting approach is proposed and the results show VMP is better than almost all of the compared anti-malware engines, and compatible with the fine tuned text-mining approach and high order N-gram approaches. We also establish a malware profiling website based on VMP for malware research. △ Less

Submitted 4 May, 2017; originally announced May 2017.

Comments: 13 pages, 9 figures, 5 tables

arXiv:1009.4134 [pdf, ps, other]

doi 10.1016/j.aim.2011.12.024

Supercharacters, symmetric functions in noncommuting variables, and related Hopf algebras

Authors: Marcelo Aguiar, Carlos Andre, Carolina Benedetti, Nantel Bergeron, Zhi Chen, Persi Diaconis, Anders Hendrickson, Samuel Hsiao, I. Martin Isaacs, Andrea Jedwab, Kenneth Johnson, Gizem Karaali, Aaron Lauve, Tung Le, Stephen Lewis, Huilan Li, Kay Magaard, Eric Marberg, Jean-Christophe Novelli, Amy Pang, Franco Saliola, Lenny Tevlin, Jean-Yves Thibon, Nathaniel Thiem, Vidya Venkateswaran , et al. (3 additional authors not shown)

Abstract: We identify two seemingly disparate structures: supercharacters, a useful way of doing Fourier analysis on the group of unipotent uppertriangular matrices with coefficients in a finite field, and the ring of symmetric functions in noncommuting variables. Each is a Hopf algebra and the two are isomorphic as such. This allows developments in each to be transferred. The identification suggests a rich… ▽ More We identify two seemingly disparate structures: supercharacters, a useful way of doing Fourier analysis on the group of unipotent uppertriangular matrices with coefficients in a finite field, and the ring of symmetric functions in noncommuting variables. Each is a Hopf algebra and the two are isomorphic as such. This allows developments in each to be transferred. The identification suggests a rich class of examples for the emerging field of combinatorial Hopf algebras. △ Less

Submitted 22 December, 2011; v1 submitted 21 September, 2010; originally announced September 2010.

Comments: To Appear in Advances in Mathematics (2012), 23 pages

MSC Class: 05E10

Journal ref: Advances in Mathematics 229 (2012) 2310--2337

arXiv:0910.5773 [pdf, ps, other]

Multigraded combinatorial Hopf algebras and refinements of odd and even subalgebras

Authors: Samuel K. Hsiao, Gizem Karaali

Abstract: We develop a theory of multigraded (i.e., $N^l$-graded) combinatorial Hopf algebras modeled on the theory of graded combinatorial Hopf algebras developed by Aguiar, Bergeron, and Sottile [Compos. Math. 142 (2006), 1--30]. In particular we introduce the notion of canonical $k$-odd and $k$-even subalgebras associated with any multigraded combinatorial Hopf algebra, extending simultaneously the work… ▽ More We develop a theory of multigraded (i.e., $N^l$-graded) combinatorial Hopf algebras modeled on the theory of graded combinatorial Hopf algebras developed by Aguiar, Bergeron, and Sottile [Compos. Math. 142 (2006), 1--30]. In particular we introduce the notion of canonical $k$-odd and $k$-even subalgebras associated with any multigraded combinatorial Hopf algebra, extending simultaneously the work of Aguiar et al. and Ehrenborg. Among our results are specific categorical results for higher level quasisymmetric functions, several basis change formulas, and a generalization of the descents-to-peaks map. △ Less

Submitted 10 February, 2011; v1 submitted 30 October, 2009; originally announced October 2009.

Comments: 49 pages. To appear in the Journal of Algebraic Combinatorics

MSC Class: 05E99; 16W30; 06A07; 06A11

Journal ref: J. Algebr. Comb. 34 (2011), pages 451-506

arXiv:0710.2081 [pdf, ps, other]

A semigroup approach to wreath-product extensions of Solomon's descent algebras

Authors: Samuel K. Hsiao

Abstract: There is a well-known combinatorial definition, based on ordered set partitions, of the semigroup of faces of the braid arrangement. We generalize this definition to obtain a semigroup Sigma_n^G associated with G wr S_n, the wreath product of the symmetric group S_n with an arbitrary group G. Techniques of Bidigare and Brown are adapted to construct an anti-homomorphism from the S_n-invariant su… ▽ More There is a well-known combinatorial definition, based on ordered set partitions, of the semigroup of faces of the braid arrangement. We generalize this definition to obtain a semigroup Sigma_n^G associated with G wr S_n, the wreath product of the symmetric group S_n with an arbitrary group G. Techniques of Bidigare and Brown are adapted to construct an anti-homomorphism from the S_n-invariant subalgebra of the semigroup algebra of Sigma_n^G into the group algebra of G wr S_n. The generalized descent algebras of Mantaci and Reutenauer are obtained as homomorphic images when G is abelian. △ Less

Submitted 15 October, 2007; v1 submitted 10 October, 2007; originally announced October 2007.

Comments: 6 pages, added a reference and updated the Introduction

MSC Class: 05E99; 16S34; 20M25

arXiv:0709.1477 [pdf, ps, other]

Random walks on quasisymmetric functions

Authors: Patricia Hersh, Samuel K. Hsiao

Abstract: Conditions are provided under which an endomorphism on quasisymmetric functions gives rise to a left random walk on the descent algebra which is also a lumping of a left random walk on permutations. Spectral results are also obtained. Several well-studied random walks are now realized this way: Stanley's QS-distribution results from endomorphisms given by evaluation maps, a-shuffles result from… ▽ More Conditions are provided under which an endomorphism on quasisymmetric functions gives rise to a left random walk on the descent algebra which is also a lumping of a left random walk on permutations. Spectral results are also obtained. Several well-studied random walks are now realized this way: Stanley's QS-distribution results from endomorphisms given by evaluation maps, a-shuffles result from the a-th convolution power of the universal character, and the Tchebyshev operator of the second kind introduced recently by Ehrenborg and Readdy yields traditional riffle shuffles. A conjecture of Ehrenborg regarding the spectra for a family of random walks on ab-words is proven. A theorem of Stembridge from the theory of enriched P-partitions is also recovered as a special case. △ Less

Submitted 10 September, 2007; originally announced September 2007.

Comments: 25 pages

MSC Class: 05E99; 16W30; 60C05; 60J10

arXiv:0706.3486 [pdf, ps, other]

Peak Quasisymmetric Functions and Eulerian Enumeration

Authors: Louis J. Billera, Samuel K. Hsiao, Stephanie van Willigenburg

Abstract: Via duality of Hopf algebras, there is a direct association between peak quasisymmetric functions and enumeration of chains in Eulerian posets. We study this association explicitly, showing that the notion of $\cd$-index, long studied in the context of convex polytopes and Eulerian posets, arises as the dual basis to a natural basis of peak quasisymmetric functions introduced by Stembridge. Thus… ▽ More Via duality of Hopf algebras, there is a direct association between peak quasisymmetric functions and enumeration of chains in Eulerian posets. We study this association explicitly, showing that the notion of $\cd$-index, long studied in the context of convex polytopes and Eulerian posets, arises as the dual basis to a natural basis of peak quasisymmetric functions introduced by Stembridge. Thus Eulerian posets having a nonnegative $\cd$-index (for example, face lattices of convex polytopes) correspond to peak quasisymmetric functions having a nonnegative representation in terms of this basis. We diagonalize the operator that associates the basis of descent sets for all quasisymmetric functions to that of peak sets for the algebra of peak functions, and study the $g$-polynomial for Eulerian posets as an algebra homomorphism. △ Less

Submitted 23 June, 2007; originally announced June 2007.

Comments: 23 pages; final version

MSC Class: 05E05; 05A15; 06A07; 16W30

Journal ref: Adv. Math. 176: 248--276 (2003)

arXiv:math/0610984 [pdf, ps, other]

Colored posets and colored quasisymmetric functions

Authors: Samuel K. Hsiao, T. Kyle Petersen

Abstract: The colored quasisymmetric functions, like the classic quasisymmetric functions, are known to form a Hopf algebra with a natural peak subalgebra. We show how these algebras arise as the image of the algebra of colored posets. To effect this approach we introduce colored analogs of $P$-partitions and enriched $P$-partitions. We also frame our results in terms of Aguiar, Bergeron, and Sottile's th… ▽ More The colored quasisymmetric functions, like the classic quasisymmetric functions, are known to form a Hopf algebra with a natural peak subalgebra. We show how these algebras arise as the image of the algebra of colored posets. To effect this approach we introduce colored analogs of $P$-partitions and enriched $P$-partitions. We also frame our results in terms of Aguiar, Bergeron, and Sottile's theory of combinatorial Hopf algebras and its colored analog. △ Less

Submitted 31 October, 2006; originally announced October 2006.

Comments: 31 pages, 5 figures

arXiv:math/0610976 [pdf, ps, other]

The Hopf algebras of type B quasisymmetric functions and peak functions

Authors: Samuel K. Hsiao, T. Kyle Petersen

Abstract: We show that with the appropriate choice of coproduct, the type B quasisymmetric functions form a Hopf algebra, and the recently introduced type B peak functions form a Hopf subalgebra. We show that with the appropriate choice of coproduct, the type B quasisymmetric functions form a Hopf algebra, and the recently introduced type B peak functions form a Hopf subalgebra. △ Less

Submitted 31 October, 2006; originally announced October 2006.

Comments: 9 pages

arXiv:math/0505576 [pdf, ps, other]

Enumeration in convex geometries and associated polytopal subdivisions of spheres

Authors: Louis J. Billera, Samuel K. Hsiao, J. Scott Provan

Abstract: We construct CW spheres from the lattices that arise as the closed sets of a convex closure, the meet-distributive lattices. These spheres are nearly polytopal, in the sense that their barycentric subdivisions are simplicial polytopes. The complete information on the numbers of faces and chains of faces in these spheres can be obtained from the defining lattices in a manner analogous to the rela… ▽ More We construct CW spheres from the lattices that arise as the closed sets of a convex closure, the meet-distributive lattices. These spheres are nearly polytopal, in the sense that their barycentric subdivisions are simplicial polytopes. The complete information on the numbers of faces and chains of faces in these spheres can be obtained from the defining lattices in a manner analogous to the relation between arrangements of hyperplanes and their underlying geometric intersection lattices. △ Less

Submitted 19 April, 2006; v1 submitted 26 May, 2005; originally announced May 2005.

Comments: 18 pages, 4 figures; incorporates suggestions by referees; minor revisions throughout; expanded discussion on chain enumeration in last section; to be published in Discrete and Computational Geometry

MSC Class: 05E99; 06A07; 52B05; 52B40 (primary); 06B99; 52C40 (secondary)

arXiv:math/0408053 [pdf, ps, other]

Canonical characters on quasi-symmetric functions and bivariate Catalan numbers

Authors: Marcelo Aguiar, Samuel K. Hsiao

Abstract: Every character on a graded connected Hopf algebra decomposes uniquely as a product of an even character and an odd character (Aguiar, Bergeron, and Sottile, math.CO/0310016). We obtain explicit formulas for the even and odd parts of the universal character on the Hopf algebra of quasi-symmetric functions. They can be described in terms of Legendre's beta function evaluated at half-integers, or… ▽ More Every character on a graded connected Hopf algebra decomposes uniquely as a product of an even character and an odd character (Aguiar, Bergeron, and Sottile, math.CO/0310016). We obtain explicit formulas for the even and odd parts of the universal character on the Hopf algebra of quasi-symmetric functions. They can be described in terms of Legendre's beta function evaluated at half-integers, or in terms of bivariate Catalan numbers: $$ C(m,n)=\frac{(2m)!(2n)!}{m!(m+n)!n!}. $$ Properties of characters and of quasi-symmetric functions are then used to derive several interesting identities among bivariate Catalan numbers and in particular among Catalan numbers and central binomial coefficients. △ Less

Submitted 4 August, 2004; originally announced August 2004.

MSC Class: 05A15; 05E05; 16W30; 16W50

arXiv:nucl-th/0004007 [pdf, ps, other]

doi 10.1103/PhysRevC.61.068201

Pseudovector versus pseudoscalar coupling in kaon photoproduction - revisited

Authors: S. S. Hsiao, D. H. Lu, Shin Nan Yang

Abstract: The question of pseudovector versus pseudoscalar coupling schemes for the kaon-hyperon-nucleon interaction is re-examined for the reaction $γがんまp\to K^+ Λらむだ$ in several isobaric models. These models typically include Born terms, $K^*$- and $K_1$-exchange in the t-channel, and a few different combinations of spin-1/2 baryon resonances in the $s$- and $u$-channels. The coupling constants are obtained b… ▽ More The question of pseudovector versus pseudoscalar coupling schemes for the kaon-hyperon-nucleon interaction is re-examined for the reaction $γがんまp\to K^+ Λらむだ$ in several isobaric models. These models typically include Born terms, $K^*$- and $K_1$-exchange in the t-channel, and a few different combinations of spin-1/2 baryon resonances in the $s$- and $u$-channels. The coupling constants are obtained by fitting to a large data set. We find that both pseudoscalar and pseudovector couplings can allow for a satisfactory description of the present database. The resulting coupling constants, $g_{KΛらむだN}$ and $g_{KΣしぐまN}$, in the pseudovector coupling scheme are smaller than those predicted using flavor SU(3) symmetry, but consistent with the values obtained in a QCD sum rule calculation. △ Less

Submitted 5 April, 2000; originally announced April 2000.

Comments: 11 pages, 4 figures. to appear in Phys. Rev. C

Journal ref: Phys.Rev. C61 (2000) 068201

Showing 1–33 of 33 results for author: Hsia, S