-
WHYPE: A Scale-Out Architecture with Wireless Over-the-Air Majority for Scalable In-memory Hyperdimensional Computing
Authors:
Robert Guirado,
Abbas Rahimi,
Geethan Karunaratne,
Eduard Alarcón,
Abu Sebastian,
Sergi Abadal
Abstract:
Hyperdimensional computing (HDC) is an emerging computing paradigm that represents, manipulates, and communicates data using long random vectors known as hypervectors. Among different hardware platforms capable of executing HDC algorithms, in-memory computing (IMC) has shown promise as it is very efficient in performing matrix-vector multiplications, which are common in the HDC algebra. Although H…
▽ More
Hyperdimensional computing (HDC) is an emerging computing paradigm that represents, manipulates, and communicates data using long random vectors known as hypervectors. Among different hardware platforms capable of executing HDC algorithms, in-memory computing (IMC) has shown promise as it is very efficient in performing matrix-vector multiplications, which are common in the HDC algebra. Although HDC architectures based on IMC already exist, how to scale them remains a key challenge due to collective communication patterns that these architectures required and that traditional chip-scale networks were not designed for. To cope with this difficulty, we propose a scale-out HDC architecture called WHYPE, which uses wireless in-package communication technology to interconnect a large number of physically distributed IMC cores that either encode hypervectors or perform multiple similarity searches in parallel. In this context, the key enabler of WHYPE is the opportunistic use of the wireless network as a medium for over-the-air computation. WHYPE implements an optimized source coding that allows receivers to calculate the bit-wise majority of multiple hypervectors (a useful operation in HDC) being transmitted concurrently over the wireless channel. By doing so, we achieve a joint broadcast distribution and computation with a performance and efficiency unattainable with wired interconnects, which in turn enables massive parallelization of the architecture. Through evaluations at the on-chip network and complete architecture levels, we demonstrate that WHYPE can bundle and distribute hypervectors faster and more efficiently than a hypothetical wired implementation, and that it scales well to tens of receivers. We show that the average error rate of the majority computation is low, such that it has negligible impact on the accuracy of HDC classification tasks.
△ Less
Submitted 4 February, 2023;
originally announced March 2023.
-
Dynamic Modelling of Liquid Crystal-Based Metasurfaces and its Application to Reducing Reconfigurability Times
Authors:
Robert Guirado,
Gerardo Perez-Palomino,
Marta Ferreras,
Eduardo Carrasco,
Manuel Caño-García
Abstract:
This paper describes and validates for the first time the dynamic modelling of Liquid Crystal (LC)-based planar multi-resonant cells, as well as its use as bias signals synthesis tool to improve their reconfigurability time. The dynamic LC director equation is solved in the longitudinal direction through the finite elements method, which provides the z- and time-dependent inhomogeneous permittivit…
▽ More
This paper describes and validates for the first time the dynamic modelling of Liquid Crystal (LC)-based planar multi-resonant cells, as well as its use as bias signals synthesis tool to improve their reconfigurability time. The dynamic LC director equation is solved in the longitudinal direction through the finite elements method, which provides the z- and time-dependent inhomogeneous permittivity tensor used in an electromagnetic simulator to evaluate the cells behaviour. The proposed model has been experimentally validated using reflective cells for phase control (reflectarray) and measuring the transient phase, both in excitation and relaxation regimes. It is shown how a very reduced number of stratified layers are needed to model the material inhomogeneity, and that even an homogeneous effective tensor can be used in most of the cases, which allows a model simplification suitable for design procedures without losing accuracy. Consequently, a novel bias signal design tool is proposed to significantly reduce the transition times of LC cells, and hence, of electrically large antennas composed of them. These tools, similar to those used in optical displays, are experimentally validated for the first time at mm- and sub-mm wave frequencies in this work, obtaining an improvement of orders of magnitude.
△ Less
Submitted 29 September, 2022;
originally announced September 2022.
-
Wireless On-Chip Communications for Scalable In-memory Hyperdimensional Computing
Authors:
Robert Guirado,
Abbas Rahimi,
Geethan Karunaratne,
Eduard Alarcón,
Abu Sebastian,
Sergi Abadal
Abstract:
Hyperdimensional computing (HDC) is an emerging computing paradigm that represents, manipulates, and communicates data using very long random vectors (aka hypervectors). Among different hardware platforms capable of executing HDC algorithms, in-memory computing (IMC) systems have been recently proved to be one of the most energy-efficient options, due to hypervector manipulations in the memory its…
▽ More
Hyperdimensional computing (HDC) is an emerging computing paradigm that represents, manipulates, and communicates data using very long random vectors (aka hypervectors). Among different hardware platforms capable of executing HDC algorithms, in-memory computing (IMC) systems have been recently proved to be one of the most energy-efficient options, due to hypervector manipulations in the memory itself that reduces data movement. Although implementations of HDC on single IMC cores have been made, their parallelization is still unresolved due to the communication challenges that these novel architectures impose and that traditional Networks-on-Chip and Networks-in-Package were not designed for. To cope with this difficulty, we propose the use of wireless on-chip communication technology in unique ways. We are particularly interested in physically distributing a large number of IMC cores performing similarity search across a chip, and maintaining the classification accuracy when each of which is queried with a slightly different version of a bundled hypervector. To achieve it, we introduce a novel over-the-air computing that consists of defining different binary decision regions in the receivers so as to compute the logical majority operation (i.e., bundling, or superposition) required in HDC. It introduces moderate overheads of a single antenna and receiver per IMC core. By doing so, we achieve a joint broadcast distribution and computation with a performance and efficiency unattainable with wired interconnects, which in turn enables massive parallelization of the architecture. It is demonstrated that the proposed approach allows to both bundle at least three hypervectors and scale similarity search to 64 IMC cores seamlessly, while incurring an average bit error ratio of 0.01 without any impact in the accuracy of a generic HDC-based classifier working with 512-bit vectors.
△ Less
Submitted 22 May, 2022;
originally announced May 2022.
-
Characterizing the Communication Requirements of GNN Accelerators: A Model-Based Approach
Authors:
Robert Guirado,
Akshay Jain,
Sergi Abadal,
Eduard Alarcón
Abstract:
Relational data present in real world graph representations demands for tools capable to study it accurately. In this regard Graph Neural Network (GNN) is a powerful tool, wherein various models for it have also been developed over the past decade. Recently, there has been a significant push towards creating accelerators that speed up the inference and training process of GNNs. These accelerators,…
▽ More
Relational data present in real world graph representations demands for tools capable to study it accurately. In this regard Graph Neural Network (GNN) is a powerful tool, wherein various models for it have also been developed over the past decade. Recently, there has been a significant push towards creating accelerators that speed up the inference and training process of GNNs. These accelerators, however, do not delve into the impact of their dataflows on the overall data movement and, hence, on the communication requirements. In this paper, we formulate analytical models that capture the amount of data movement in the most recent GNN accelerator frameworks. Specifically, the proposed models capture the dataflows and hardware setup of these accelerator designs and expose their scalability characteristics for a set of hardware, GNN model and input graph parameters. Additionally, the proposed approach provides means for the comparative analysis of the vastly different GNN accelerators.
△ Less
Submitted 18 March, 2021;
originally announced March 2021.
-
Understanding the Design-Space of Sparse/Dense Multiphase GNN dataflows on Spatial Accelerators
Authors:
Raveesh Garg,
Eric Qin,
Francisco Muñoz-Martínez,
Robert Guirado,
Akshay Jain,
Sergi Abadal,
José L. Abellán,
Manuel E. Acacio,
Eduard Alarcón,
Sivasankaran Rajamanickam,
Tushar Krishna
Abstract:
Graph Neural Networks (GNNs) have garnered a lot of recent interest because of their success in learning representations from graph-structured data across several critical applications in cloud and HPC. Owing to their unique compute and memory characteristics that come from an interplay between dense and sparse phases of computations, the emergence of reconfigurable dataflow (aka spatial) accelera…
▽ More
Graph Neural Networks (GNNs) have garnered a lot of recent interest because of their success in learning representations from graph-structured data across several critical applications in cloud and HPC. Owing to their unique compute and memory characteristics that come from an interplay between dense and sparse phases of computations, the emergence of reconfigurable dataflow (aka spatial) accelerators offers promise for acceleration by mapping optimized dataflows (i.e., computation order and parallelism) for both phases. The goal of this work is to characterize and understand the design-space of dataflow choices for running GNNs on spatial accelerators in order for mappers or design-space exploration tools to optimize the dataflow based on the workload. Specifically, we propose a taxonomy to describe all possible choices for mapping the dense and sparse phases of GNN inference, spatially and temporally over a spatial accelerator, capturing both the intra-phase dataflow and the inter-phase (pipelined) dataflow. Using this taxonomy, we do deep-dives into the cost and benefits of several dataflows and perform case studies on implications of hardware parameters for dataflows and value of flexibility to support pipelined execution.
△ Less
Submitted 6 March, 2022; v1 submitted 14 March, 2021;
originally announced March 2021.
-
Dataflow-Architecture Co-Design for 2.5D DNN Accelerators using Wireless Network-on-Package
Authors:
Robert Guirado,
Hyoukjun Kwon,
Sergi Abadal,
Eduard Alarcón,
Tushar Krishna
Abstract:
Deep neural network (DNN) models continue to grow in size and complexity, demanding higher computational power to enable real-time inference. To efficiently deliver such computational demands, hardware accelerators are being developed and deployed across scales. This naturally requires an efficient scale-out mechanism for increasing compute density as required by the application. 2.5D integration…
▽ More
Deep neural network (DNN) models continue to grow in size and complexity, demanding higher computational power to enable real-time inference. To efficiently deliver such computational demands, hardware accelerators are being developed and deployed across scales. This naturally requires an efficient scale-out mechanism for increasing compute density as required by the application. 2.5D integration over interposer has emerged as a promising solution, but as we show in this work, the limited interposer bandwidth and multiple hops in the Network-on-Package (NoP) can diminish the benefits of the approach. To cope with this challenge, we propose WIENNA, a wireless NoP-based 2.5D DNN accelerator. In WIENNA, the wireless NoP connects an array of DNN accelerator chiplets to the global buffer chiplet, providing high-bandwidth multicasting capabilities. Here, we also identify the dataflow style that most efficienty exploits the wireless NoP's high-bandwidth multicasting capability on each layer. With modest area and power overheads, WIENNA achieves 2.2X--5.1X higher throughput and 38.2% lower energy than an interposer-based NoP design.
△ Less
Submitted 30 November, 2020;
originally announced November 2020.
-
Graphene-based Wireless Agile Interconnects for Massive Heterogeneous Multi-chip Processors
Authors:
Sergi Abadal,
Robert Guirado,
Hamidreza Taghvaee,
Akshay Jain,
Elana Pereira de Santana,
Peter Haring Bolívar,
Mohamed Saeed,
Renato Negra,
Zhenxing Wang,
Kun-Ta Wang,
Max C. Lemme,
Joshua Klein,
Marina Zapater,
Alexandre Levisse,
David Atienza,
Davide Rossi,
Francesco Conti,
Martino Dazzi,
Geethan Karunaratne,
Irem Boybat,
Abu Sebastian
Abstract:
The main design principles in computer architecture have recently shifted from a monolithic scaling-driven approach to the development of heterogeneous architectures that tightly co-integrate multiple specialized processor and memory chiplets. In such data-hungry multi-chip architectures, current Networks-in-Package (NiPs) may not be enough to cater to their heterogeneous and fast-changing communi…
▽ More
The main design principles in computer architecture have recently shifted from a monolithic scaling-driven approach to the development of heterogeneous architectures that tightly co-integrate multiple specialized processor and memory chiplets. In such data-hungry multi-chip architectures, current Networks-in-Package (NiPs) may not be enough to cater to their heterogeneous and fast-changing communication demands. This position paper makes the case for wireless in-package nanonetworking as the enabler of efficient and versatile wired-wireless interconnect fabrics for massive heterogeneous processors. To that end, the use of graphene-based antennas and transceivers with unique frequency-beam reconfigurability in the terahertz band is proposed. The feasibility of such a nanonetworking vision and the main research challenges towards its realization are analyzed from the technological, communications, and computer architecture perspectives.
△ Less
Submitted 21 September, 2023; v1 submitted 8 November, 2020;
originally announced November 2020.
-
Computing Graph Neural Networks: A Survey from Algorithms to Accelerators
Authors:
Sergi Abadal,
Akshay Jain,
Robert Guirado,
Jorge López-Alonso,
Eduard Alarcón
Abstract:
Graph Neural Networks (GNNs) have exploded onto the machine learning scene in recent years owing to their capability to model and learn from graph-structured data. Such an ability has strong implications in a wide variety of fields whose data is inherently relational, for which conventional neural networks do not perform well. Indeed, as recent reviews can attest, research in the area of GNNs has…
▽ More
Graph Neural Networks (GNNs) have exploded onto the machine learning scene in recent years owing to their capability to model and learn from graph-structured data. Such an ability has strong implications in a wide variety of fields whose data is inherently relational, for which conventional neural networks do not perform well. Indeed, as recent reviews can attest, research in the area of GNNs has grown rapidly and has lead to the development of a variety of GNN algorithm variants as well as to the exploration of groundbreaking applications in chemistry, neurology, electronics, or communication networks, among others. At the current stage of research, however, the efficient processing of GNNs is still an open challenge for several reasons. Besides of their novelty, GNNs are hard to compute due to their dependence on the input graph, their combination of dense and very sparse operations, or the need to scale to huge graphs in some applications. In this context, this paper aims to make two main contributions. On the one hand, a review of the field of GNNs is presented from the perspective of computing. This includes a brief tutorial on the GNN fundamentals, an overview of the evolution of the field in the last decade, and a summary of operations carried out in the multiple phases of different GNN algorithm variants. On the other hand, an in-depth analysis of current software and hardware acceleration schemes is provided, from which a hardware-software, graph-aware, and communication-centric vision for GNN accelerators is distilled.
△ Less
Submitted 23 July, 2021; v1 submitted 30 September, 2020;
originally announced October 2020.
-
Understanding the Impact of On-chip Communication on DNN Accelerator Performance
Authors:
Robert Guirado,
Hyoukjun Kwon,
Eduard Alarcón,
Sergi Abadal,
Tushar Krishna
Abstract:
Deep Neural Networks have flourished at an unprecedented pace in recent years. They have achieved outstanding accuracy in fields such as computer vision, natural language processing, medicine or economics. Specifically, Convolutional Neural Networks (CNN) are particularly suited to object recognition or identification tasks. This, however, comes at a high computational cost, prompting the use of s…
▽ More
Deep Neural Networks have flourished at an unprecedented pace in recent years. They have achieved outstanding accuracy in fields such as computer vision, natural language processing, medicine or economics. Specifically, Convolutional Neural Networks (CNN) are particularly suited to object recognition or identification tasks. This, however, comes at a high computational cost, prompting the use of specialized GPU architectures or even ASICs to achieve high speeds and energy efficiency. ASIC accelerators streamline the execution of certain dataflows amenable to CNN computation that imply the constant movement of large amounts of data, thereby turning on-chip communication into a critical function within the accelerator. This paper studies the communication flows within CNN inference accelerators of edge devices, with the aim to justify current and future decisions in the design of the on-chip networks that interconnect their processing elements. Leveraging this analysis, we then qualitatively discuss the potential impact of introducing the novel paradigm of wireless on-chip network in this context.
△ Less
Submitted 3 December, 2019;
originally announced December 2019.