(Translated by https://www.hiragana.jp/)
Search | arXiv e-print repository
Skip to main content

Showing 1–18 of 18 results for author: Tsai, P

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.13868  [pdf, other

    cs.LG cs.AI

    SDQ: Sparse Decomposed Quantization for LLM Inference

    Authors: Geonhwa Jeong, Po-An Tsai, Stephen W. Keckler, Tushar Krishna

    Abstract: Recently, large language models (LLMs) have shown surprising performance in task-specific workloads as well as general tasks with the given prompts. However, to achieve unprecedented performance, recent LLMs use billions to trillions of parameters, which hinder the wide adaptation of those models due to their extremely large compute and memory requirements. To resolve the issue, various model comp… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: Preprint

  2. arXiv:2403.07953  [pdf, other

    cs.LG cs.AI cs.AR

    Abstracting Sparse DNN Acceleration via Structured Sparse Tensor Decomposition

    Authors: Geonhwa Jeong, Po-An Tsai, Abhimanyu R. Bambhaniya, Stephen W. Keckler, Tushar Krishna

    Abstract: Exploiting sparsity in deep neural networks (DNNs) has been a promising area to meet the growing computation need of modern DNNs. However, in practice, sparse DNN acceleration still faces a key challenge. To minimize the overhead of sparse acceleration, hardware designers have proposed structured sparse hardware support recently, which provides limited flexibility and requires extra model fine-tun… ▽ More

    Submitted 31 March, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

  3. Quantitative Technology Forecasting: a Review of Trend Extrapolation Methods

    Authors: Peng-Hung Tsai, Daniel Berleant, Richard S. Segall, Hyacinthe Aboudja, Venkata Jaipal R. Batthula, Sheela Duggirala, Michael Howell

    Abstract: Quantitative technology forecasting uses quantitative methods to understand and project technological changes. It is a broad field encompassing many different techniques and has been applied to a vast range of technologies. A widely used approach in this field is trend extrapolation. Based on the publications available to us, there has been little or no attempt made to systematically review the em… ▽ More

    Submitted 4 January, 2024; originally announced January 2024.

    Journal ref: International Journal of Innovation and Technology Management (2023), 20(4):2330002

  4. arXiv:2311.02324  [pdf, other

    cs.CR

    Bounded and Unbiased Composite Differential Privacy

    Authors: Kai Zhang, Yanjun Zhang, Ruoxi Sun, Pei-Wei Tsai, Muneeb Ul Hassan, Xin Yuan, Minhui Xue, Jinjun Chen

    Abstract: The objective of differential privacy (DP) is to protect privacy by producing an output distribution that is indistinguishable between any two neighboring databases. However, traditional differentially private mechanisms tend to produce unbounded outputs in order to achieve maximum disturbance range, which is not always in line with real-world applications. Existing solutions attempt to address th… ▽ More

    Submitted 4 November, 2023; originally announced November 2023.

    Comments: Accepted at 45th IEEE Symposium on Security and Privacy (IEEE S&P)

  5. HighLight: Efficient and Flexible DNN Acceleration with Hierarchical Structured Sparsity

    Authors: Yannan Nellie Wu, Po-An Tsai, Saurav Muralidharan, Angshuman Parashar, Vivienne Sze, Joel S. Emer

    Abstract: Due to complex interactions among various deep neural network (DNN) optimization techniques, modern DNNs can have weights and activations that are dense or sparse with diverse sparsity degrees. To offer a good trade-off between accuracy and hardware performance, an ideal DNN accelerator should have high flexibility to efficiently translate DNN sparsity into reductions in energy and/or latency with… ▽ More

    Submitted 1 October, 2023; v1 submitted 22 May, 2023; originally announced May 2023.

    Comments: Accepted to MICRO23

  6. arXiv:2303.13631  [pdf, other

    cs.SD cs.AI cs.CL eess.AS

    In-depth analysis of music structure as a text network

    Authors: Ping-Rui Tsai, Yen-Ting Chou, Nathan-Christopher Wang, Hui-Ling Chen, Hong-Yue Huang, Zih-Jia Luo, Tzay-Ming Hong

    Abstract: Music, enchanting and poetic, permeates every corner of human civilization. Although music is not unfamiliar to people, our understanding of its essence remains limited, and there is still no universally accepted scientific description. This is primarily due to music being regarded as a product of both reason and emotion, making it difficult to define. In this article, we focus on the fundamental… ▽ More

    Submitted 2 January, 2024; v1 submitted 21 March, 2023; originally announced March 2023.

    Comments: 7 pages, 8 figures

  7. arXiv:2210.03731  [pdf, other

    cs.LG cs.DC

    Demystifying Map Space Exploration for NPUs

    Authors: Sheng-Chun Kao, Angshuman Parashar, Po-An Tsai, Tushar Krishna

    Abstract: Map Space Exploration is the problem of finding optimized mappings of a Deep Neural Network (DNN) model on an accelerator. It is known to be extremely computationally expensive, and there has been active research looking at both heuristics and learning-based methods to make the problem computationally tractable. However, while there are dozens of mappers out there (all empirically claiming to find… ▽ More

    Submitted 7 October, 2022; originally announced October 2022.

  8. Semantic2Graph: Graph-based Multi-modal Feature Fusion for Action Segmentation in Videos

    Authors: Junbin Zhang, Pei-Hsuan Tsai, Meng-Hsun Tsai

    Abstract: Video action segmentation have been widely applied in many fields. Most previous studies employed video-based vision models for this purpose. However, they often rely on a large receptive field, LSTM or Transformer methods to capture long-term dependencies within videos, leading to significant computational resource requirements. To address this challenge, graph-based model was proposed. However,… ▽ More

    Submitted 6 February, 2024; v1 submitted 12 September, 2022; originally announced September 2022.

    Comments: 13 pages, 3 figures, 9 tables. Published on Applied Intelligence

    MSC Class: 68T01; 68T30; 68T45 ACM Class: I.2.10; I.4.8; I.5

    Journal ref: Applied Intelligence(2024)

  9. arXiv:2205.05826  [pdf, other

    cs.AR cs.CV cs.DC

    Sparseloop: An Analytical Approach To Sparse Tensor Accelerator Modeling

    Authors: Yannan Nellie Wu, Po-An Tsai, Angshuman Parashar, Vivienne Sze, Joel S. Emer

    Abstract: In recent years, many accelerators have been proposed to efficiently process sparse tensor algebra applications (e.g., sparse neural networks). However, these proposals are single points in a large and diverse design space. The lack of systematic description and modeling support for these sparse tensor accelerators impedes hardware designers from efficient and effective design space exploration. T… ▽ More

    Submitted 9 January, 2023; v1 submitted 11 May, 2022; originally announced May 2022.

    Comments: Update website link, update UOP format description

  10. arXiv:2205.01252  [pdf, other

    cs.AR

    SIMD$^2$: A Generalized Matrix Instruction Set for Accelerating Tensor Computation beyond GEMM

    Authors: Yunan Zhang, Po-An Tsai, Hung-Wei Tseng

    Abstract: Matrix-multiplication units (MXUs) are now prevalent in every computing platform. The key attribute that makes MXUs so successful is the semiring structure, which allows tiling for both parallelism and data reuse. Nonetheless, matrix-multiplication is not the only algorithm with such attributes. We find that many algorithms share the same structure and differ in only the core operation; for exampl… ▽ More

    Submitted 31 August, 2022; v1 submitted 2 May, 2022; originally announced May 2022.

    Comments: To Appear in the 49th International Symposium on Computer Architecture (ISCA'22), June 18--22, 2022, New York, NY, USA

  11. arXiv:2201.09717  [pdf, other

    cs.CV eess.IV

    Keeping Deep Lithography Simulators Updated: Global-Local Shape-Based Novelty Detection and Active Learning

    Authors: Hao-Chiang Shao, Hsing-Lei Ping, Kuo-shiuan Chen, Weng-Tai Su, Chia-Wen Lin, Shao-Yun Fang, Pin-Yian Tsai, Yan-Hsiu Liu

    Abstract: Learning-based pre-simulation (i.e., layout-to-fabrication) models have been proposed to predict the fabrication-induced shape deformation from an IC layout to its fabricated circuit. Such models are usually driven by pairwise learning, involving a training set of layout patterns and their reference shape images after fabrication. However, it is expensive and time-consuming to collect the referenc… ▽ More

    Submitted 24 January, 2022; originally announced January 2022.

  12. An Efficient Probe-based Routing for Content-Centric Networking

    Authors: Pei-Hsuan Tsai, Junbin Zhang, Meng-Hsun Tsai

    Abstract: With the development of new technologies and applications, such as the Internet of Things, smart cities, 5G, and edge computing, traditional Internet Protocol-based (IP-based) networks have been exposed as having many problems. Information-Centric Networking (ICN), Named Data Networking (NDN), and Content-Centric Networking (CCN) are therefore proposed as an alternative for future networks. Howeve… ▽ More

    Submitted 10 January, 2022; v1 submitted 30 September, 2021; originally announced September 2021.

    Comments: 16 pages, 9 figures, 3 tables

    MSC Class: 68-11 ACM Class: C.2.1; C.2.2

    Journal ref: Sensors,2022; 22(1):341

  13. arXiv:2109.07419  [pdf, other

    cs.AR cs.DC cs.LG

    Union: A Unified HW-SW Co-Design Ecosystem in MLIR for Evaluating Tensor Operations on Spatial Accelerators

    Authors: Geonhwa Jeong, Gokcen Kestor, Prasanth Chatarasi, Angshuman Parashar, Po-An Tsai, Sivasankaran Rajamanickam, Roberto Gioiosa, Tushar Krishna

    Abstract: To meet the extreme compute demands for deep learning across commercial and scientific applications, dataflow accelerators are becoming increasingly popular. While these "domain-specific" accelerators are not fully programmable like CPUs and GPUs, they retain varying levels of flexibility with respect to data orchestration, i.e., dataflow and tiling optimizations to enhance efficiency. There are s… ▽ More

    Submitted 6 November, 2021; v1 submitted 15 September, 2021; originally announced September 2021.

    Comments: This paper is accepted to PACT 2021

  14. A Query-based Routing Table Update Mechanism for Content-Centric Network

    Authors: Pei-Hsuan Tsai, Yu-Lin Tseng, Jun-Bin Zhang, Meng-Hsun Tsai

    Abstract: Due to the popularity of network applications, such as multimedia, online shopping, Internet of Things (IoT), and 5G, the contents cached in the routers are frequently replaced in Content-Centric Networking (CCN). Generally, cache miss causes numerous propagated packets to get the required content that deteriorates network congestion and delay the response time of consumers. Many caching strategie… ▽ More

    Submitted 21 June, 2021; originally announced June 2021.

    Comments: 6 pages, 14 figures, conference. ISBN:978-1-7281-9256-7

    ACM Class: C.2.2

    Journal ref: 2020 International Computer Symposium (ICS), 2020, pp. 266-271

  15. Mind Mappings: Enabling Efficient Algorithm-Accelerator Mapping Space Search

    Authors: Kartik Hegde, Po-An Tsai, Sitao Huang, Vikas Chandra, Angshuman Parashar, Christopher W. Fletcher

    Abstract: Modern day computing increasingly relies on specialization to satiate growing performance and efficiency requirements. A core challenge in designing such specialized hardware architectures is how to perform mapping space search, i.e., search for an optimal mapping from algorithm to hardware. Prior work shows that choosing an inefficient mapping can lead to multiplicative-factor efficiency overhead… ▽ More

    Submitted 2 March, 2021; originally announced March 2021.

    Comments: Appears in the proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '21), April 19-23, 2021, Virtual, USA

  16. From IC Layout to Die Photo: A CNN-Based Data-Driven Approach

    Authors: Hao-Chiang Shao, Chao-Yi Peng, Jun-Rei Wu, Chia-Wen Lin, Shao-Yun Fang, Pin-Yen Tsai, Yan-Hsiu Liu

    Abstract: We propose a deep learning-based data-driven framework consisting of two convolutional neural networks: i) LithoNet that predicts the shape deformations on a circuit due to IC fabrication, and ii) OPCNet that suggests IC layout corrections to compensate for such shape deformations. By learning the shape correspondences between pairs of layout design patterns and their scanning electron microscope… ▽ More

    Submitted 6 August, 2020; v1 submitted 10 February, 2020; originally announced February 2020.

    Comments: 14 pages, 16 figures

  17. A Trajectory Calculus for Qualitative Spatial Reasoning Using Answer Set Programming

    Authors: George Baryannis, Ilias Tachmazidis, Sotiris Batsakis, Grigoris Antoniou, Mario Alviano, Timos Sellis, Pei-Wei Tsai

    Abstract: Spatial information is often expressed using qualitative terms such as natural language expressions instead of coordinates; reasoning over such terms has several practical applications, such as bus routes planning. Representing and reasoning on trajectories is a specific case of qualitative spatial reasoning that focuses on moving objects and their paths. In this work, we propose two versions of a… ▽ More

    Submitted 19 April, 2018; originally announced April 2018.

    Comments: Paper presented at the 34th International Conference on Logic Programming (ICLP 2018), Oxford, UK, July 14 to July 17, 2018, 20 pages, LaTeX, 16 figures

    Journal ref: Theory and Practice of Logic Programming 18 (2018) 355-371

  18. arXiv:1510.01823  [pdf, ps, other

    cs.IT

    Multiple Configurations LT Codes

    Authors: Pei-Chuan Tsai, Chih-Ming Chen, Ying-ping Chen

    Abstract: This paper introduces a new scheme of LT codes, named multiple configurations. In multiple configurations LT codes (MC-LT codes), multiple sets of output symbols are simultaneously provided to receivers for recovering the source data. Each receiver, without the need to send information back to the sender, is capable of receiving the output symbols generated by some configuration chosen according t… ▽ More

    Submitted 7 October, 2015; originally announced October 2015.

    Comments: 11 pages, 8 figures, 3 tables

    ACM Class: E.4; H.1.1; C.2.0