Publications
75,000,000,000 streaming inserts/second using hierarchical hypersparse GraphBLAS matrices
Summary
Summary
The SuiteSparse GraphBLAS C-library implements high performance hypersparse matrices with bindings to a variety of languages (Python, Julia, and Matlab/Octave). GraphBLAS provides a lightweight in-memory database implementation of hypersparse matrices that are ideal for analyzing many types of network data, while providing rigorous mathematical guarantees, such as linearity. Streaming updates...
Large scale parallelization using file-based communications
Summary
Summary
In this paper, we present a novel and new file-based communication architecture using the local filesystem for large scale parallelization. This new approach eliminates the issues with filesystem overload and resource contention when using the central filesystem for large parallel jobs. The new approach incurs additional overhead due to inter-node...
Optimizing the visualization pipeline of a 3-D monitoring and management system
Summary
Summary
Monitoring and managing High Performance Computing (HPC) systems and environments generate an ever growing amount of data. Making sense of this data and generating a platform where the data can be visualized for system administrators and management to proactively identify system failures or understand the state of the system requires...
Streaming 1.9 billion hyperspace network updates per second with D4M
Summary
Summary
The Dynamic Distributed Dimensional Data Model (D4M) library implements associative arrays in a variety of languages (Python, Julia, and Matlab/Octave) and provides a lightweight in-memory database implementation of hypersparse arrays that are ideal for analyzing many types of network data. D4M relies on associative arrays which combine properties of spreadsheets...
Scaling big data platform for big data pipeline
Summary
Summary
Monitoring and Managing High Performance Computing (HPC) systems and environments generate an ever growing amount of data. Making sense of this data and generating a platform where the data can be visualized for system administrators and management to proactively identify system failures or understand the state of the system requires...
A billion updates per second using 30,000 hierarchical in-memory D4M databases
Summary
Summary
Analyzing large scale networks requires high performance streaming updates of graph representations of these data. Associative arrays are mathematical objects combining properties of spreadsheets, databases, matrices, and graphs, and are well-suited for representing and analyzing streaming network data. The Dynamic Distributed Dimensional Data Model (D4M) library implements associative arrays in...
Hyperscaling internet graph analysis with D4M on the MIT SuperCloud
Summary
Summary
Detecting anomalous behavior in network traffic is a major challenge due to the volume and velocity of network traffic. For example, a 10 Gigabit Ethernet connection can generate over 50 MB/s of packet headers. For global network providers, this challenge can be amplified by many orders of magnitude. Development of...
Interactive supercomputing on 40,000 cores for machine learning and data analysis
Summary
Summary
Interactive massively parallel computations are critical for machine learning and data analysis. These computations are a staple of the MIT Lincoln Laboratory Supercomputing Center (LLSC) and has required the LLSC to develop unique interactive supercomputing capabilities. Scaling interactive machine learning frameworks, such as TensorFlow, and data analysis environments, such as...
Measuring the impact of Spectre and Meltdown
Summary
Summary
The Spectre and Meltdown flaws in modern microprocessors represent a new class of attacks that have been difficult to mitigate. The mitigations that have been proposed have known performance impacts. The reported magnitude of these impacts varies depending on the industry sector and expected workload characteristics. In this paper, we...
Simulation approach to sensor placement using Unity3D
Summary
Summary
3D game simulation engines have demonstrated utility in the areas of training, scientific analysis, and knowledge solicitation. This paper will make the case for the use of 3D game simulation engines in the field of sensor placement optimization. Our study used a series of parallel simulations in the Unity3D simulation...