-
Positivity properties of scattering amplitudes
Authors:
Johannes Henn,
Prashanth Raman
Abstract:
We investigate positivity properties in quantum field theory (QFT). We find that planar Feynman integrals in QFT, as well as many related quantities, satisfy an infinite number of positivity conditions: the functions, as well as all their signed derivatives, are non-negative in a specified kinematic region. Such functions are known as completely monotonic (CM) in the mathematics literature. A powe…
▽ More
We investigate positivity properties in quantum field theory (QFT). We find that planar Feynman integrals in QFT, as well as many related quantities, satisfy an infinite number of positivity conditions: the functions, as well as all their signed derivatives, are non-negative in a specified kinematic region. Such functions are known as completely monotonic (CM) in the mathematics literature. A powerful way to certify complete monotonicity is via integral representations. We thus show that it applies to non-planar integrals possessing a Euclidean region, to cosmological correlators, as well as to certain stringy integrals. Motivated by Positive Geometry, we investigate positivity properties in planar maximally supersymmetric Yang-Mills theory. We present evidence, based on known analytic multi-loop results, that the CM property extends to several physical quantities in this theory. This includes the (suitably normalized) finite remainder function of the six-particle maximally-helicity violating (MHV) amplitude, four-point scattering amplitudes on the Coulomb branch, four-point correlation functions, as well as the angle-dependent cusp anomalous dimension. Our findings are however not limited to supersymmetric theories. It is shown that the CM property holds for the QCD and QED cusp anomalous dimensions, to three and four loops, respectively. We comment on open questions, and on possible numerical applications of complete monotonicity.
△ Less
Submitted 8 July, 2024;
originally announced July 2024.
-
HLAT: High-quality Large Language Model Pre-trained on AWS Trainium
Authors:
Haozheng Fan,
Hao Zhou,
Guangtai Huang,
Parameswaran Raman,
Xinwei Fu,
Gaurav Gupta,
Dhananjay Ram,
Yida Wang,
Jun Huan
Abstract:
Getting large language models (LLMs) to perform well on the downstream tasks requires pre-training over trillions of tokens. This typically demands a large number of powerful computational devices in addition to a stable distributed training framework to accelerate the training. The growing number of applications leveraging AI/ML had led to a scarcity of the expensive conventional accelerators (su…
▽ More
Getting large language models (LLMs) to perform well on the downstream tasks requires pre-training over trillions of tokens. This typically demands a large number of powerful computational devices in addition to a stable distributed training framework to accelerate the training. The growing number of applications leveraging AI/ML had led to a scarcity of the expensive conventional accelerators (such as GPUs), which begs the need for the alternative specialized-accelerators that are scalable and cost-efficient. AWS Trainium is the second-generation machine learning accelerator that has been purposely built for training large deep learning models. Its corresponding instance, Amazon EC2 trn1, is an alternative to GPU instances for LLM training. However, training LLMs with billions of parameters on trn1 is challenging due to its relatively nascent software ecosystem. In this paper, we showcase HLAT: a 7 billion parameter decoder-only LLM pre-trained using trn1 instances over 1.8 trillion tokens. The performance of HLAT is benchmarked against popular open source baseline models including LLaMA and OpenLLaMA, which have been trained on NVIDIA GPUs and Google TPUs, respectively. On various evaluation tasks, we show that HLAT achieves model quality on par with the baselines. We also share the best practice of using the Neuron Distributed Training Library (NDTL), a customized distributed training library for AWS Trainium to achieve efficient training. Our work demonstrates that AWS Trainium powered by the NDTL is able to successfully pre-train state-of-the-art LLM models with high performance and cost-effectiveness.
△ Less
Submitted 16 April, 2024;
originally announced April 2024.
-
EMC$^2$: Efficient MCMC Negative Sampling for Contrastive Learning with Global Convergence
Authors:
Chung-Yiu Yau,
Hoi-To Wai,
Parameswaran Raman,
Soumajyoti Sarkar,
Mingyi Hong
Abstract:
A key challenge in contrastive learning is to generate negative samples from a large sample set to contrast with positive samples, for learning better encoding of the data. These negative samples often follow a softmax distribution which are dynamically updated during the training process. However, sampling from this distribution is non-trivial due to the high computational costs in computing the…
▽ More
A key challenge in contrastive learning is to generate negative samples from a large sample set to contrast with positive samples, for learning better encoding of the data. These negative samples often follow a softmax distribution which are dynamically updated during the training process. However, sampling from this distribution is non-trivial due to the high computational costs in computing the partition function. In this paper, we propose an Efficient Markov Chain Monte Carlo negative sampling method for Contrastive learning (EMC$^2$). We follow the global contrastive learning loss as introduced in SogCLR, and propose EMC$^2$ which utilizes an adaptive Metropolis-Hastings subroutine to generate hardness-aware negative samples in an online fashion during the optimization. We prove that EMC$^2$ finds an $\mathcal{O}(1/\sqrt{T})$-stationary point of the global contrastive loss in $T$ iterations. Compared to prior works, EMC$^2$ is the first algorithm that exhibits global convergence (to stationarity) regardless of the choice of batch size while exhibiting low computation and memory cost. Numerical experiments validate that EMC$^2$ is effective with small batch training and achieves comparable or better performance than baseline algorithms. We report the results for pre-training image encoders on STL-10 and Imagenet-100.
△ Less
Submitted 16 April, 2024;
originally announced April 2024.
-
Variance-reduced Zeroth-Order Methods for Fine-Tuning Language Models
Authors:
Tanmay Gautam,
Youngsuk Park,
Hao Zhou,
Parameswaran Raman,
Wooseok Ha
Abstract:
Fine-tuning language models (LMs) has demonstrated success in a wide array of downstream tasks. However, as LMs are scaled up, the memory requirements for backpropagation become prohibitively high. Zeroth-order (ZO) optimization methods can leverage memory-efficient forward passes to estimate gradients. More recently, MeZO, an adaptation of ZO-SGD, has been shown to consistently outperform zero-sh…
▽ More
Fine-tuning language models (LMs) has demonstrated success in a wide array of downstream tasks. However, as LMs are scaled up, the memory requirements for backpropagation become prohibitively high. Zeroth-order (ZO) optimization methods can leverage memory-efficient forward passes to estimate gradients. More recently, MeZO, an adaptation of ZO-SGD, has been shown to consistently outperform zero-shot and in-context learning when combined with suitable task prompts. In this work, we couple ZO methods with variance reduction techniques to enhance stability and convergence for inference-based LM fine-tuning. We introduce Memory-Efficient Zeroth-Order Stochastic Variance-Reduced Gradient (MeZO-SVRG) and demonstrate its efficacy across multiple LM fine-tuning tasks, eliminating the reliance on task-specific prompts. Evaluated across a range of both masked and autoregressive LMs on benchmark GLUE tasks, MeZO-SVRG outperforms MeZO with up to 20% increase in test accuracies in both full- and partial-parameter fine-tuning settings. MeZO-SVRG benefits from reduced computation time as it often surpasses MeZO's peak test accuracy with a $2\times$ reduction in GPU-hours. MeZO-SVRG significantly reduces the required memory footprint compared to first-order SGD, i.e. by $2\times$ for autoregressive models. Our experiments highlight that MeZO-SVRG's memory savings progressively improve compared to SGD with larger batch sizes.
△ Less
Submitted 11 April, 2024;
originally announced April 2024.
-
MADA: Meta-Adaptive Optimizers through hyper-gradient Descent
Authors:
Kaan Ozkara,
Can Karakus,
Parameswaran Raman,
Mingyi Hong,
Shoham Sabach,
Branislav Kveton,
Volkan Cevher
Abstract:
Following the introduction of Adam, several novel adaptive optimizers for deep learning have been proposed. These optimizers typically excel in some tasks but may not outperform Adam uniformly across all tasks. In this work, we introduce Meta-Adaptive Optimizers (MADA), a unified optimizer framework that can generalize several known optimizers and dynamically learn the most suitable one during tra…
▽ More
Following the introduction of Adam, several novel adaptive optimizers for deep learning have been proposed. These optimizers typically excel in some tasks but may not outperform Adam uniformly across all tasks. In this work, we introduce Meta-Adaptive Optimizers (MADA), a unified optimizer framework that can generalize several known optimizers and dynamically learn the most suitable one during training. The key idea in MADA is to parameterize the space of optimizers and dynamically search through it using hyper-gradient descent during training. We empirically compare MADA to other popular optimizers on vision and language tasks, and find that MADA consistently outperforms Adam and other popular optimizers, and is robust against sub-optimally tuned hyper-parameters. MADA achieves a greater validation performance improvement over Adam compared to other popular optimizers during GPT-2 training and fine-tuning. We also propose AVGrad, a modification of AMSGrad that replaces the maximum operator with averaging, which is more suitable for hyper-gradient optimization. Finally, we provide a convergence analysis to show that parameterized interpolations of optimizers can improve their error bounds (up to constants), hinting at an advantage for meta-optimizers.
△ Less
Submitted 17 June, 2024; v1 submitted 16 January, 2024;
originally announced January 2024.
-
Krylov Cubic Regularized Newton: A Subspace Second-Order Method with Dimension-Free Convergence Rate
Authors:
Ruichen Jiang,
Parameswaran Raman,
Shoham Sabach,
Aryan Mokhtari,
Mingyi Hong,
Volkan Cevher
Abstract:
Second-order optimization methods, such as cubic regularized Newton methods, are known for their rapid convergence rates; nevertheless, they become impractical in high-dimensional problems due to their substantial memory requirements and computational costs. One promising approach is to execute second-order updates within a lower-dimensional subspace, giving rise to subspace second-order methods.…
▽ More
Second-order optimization methods, such as cubic regularized Newton methods, are known for their rapid convergence rates; nevertheless, they become impractical in high-dimensional problems due to their substantial memory requirements and computational costs. One promising approach is to execute second-order updates within a lower-dimensional subspace, giving rise to subspace second-order methods. However, the majority of existing subspace second-order methods randomly select subspaces, consequently resulting in slower convergence rates depending on the problem's dimension $d$. In this paper, we introduce a novel subspace cubic regularized Newton method that achieves a dimension-independent global convergence rate of ${O}\left(\frac{1}{mk}+\frac{1}{k^2}\right)$ for solving convex optimization problems. Here, $m$ represents the subspace dimension, which can be significantly smaller than $d$. Instead of adopting a random subspace, our primary innovation involves performing the cubic regularized Newton update within the Krylov subspace associated with the Hessian and the gradient of the objective function. This result marks the first instance of a dimension-independent convergence rate for a subspace second-order method. Furthermore, when specific spectral conditions of the Hessian are met, our method recovers the convergence rate of a full-dimensional cubic regularized Newton method. Numerical experiments show our method converges faster than existing random subspace methods, especially for high-dimensional problems.
△ Less
Submitted 5 January, 2024;
originally announced January 2024.
-
Contractive error feedback for gradient compression
Authors:
Bingcong Li,
Shuai Zheng,
Parameswaran Raman,
Anshumali Shrivastava,
Georgios B. Giannakis
Abstract:
On-device memory concerns in distributed deep learning have become severe due to (i) the growth of model size in multi-GPU training, and (ii) the wide adoption of deep neural networks for federated learning on IoT devices which have limited storage. In such settings, communication efficient optimization methods are attractive alternatives, however they still struggle with memory issues. To tackle…
▽ More
On-device memory concerns in distributed deep learning have become severe due to (i) the growth of model size in multi-GPU training, and (ii) the wide adoption of deep neural networks for federated learning on IoT devices which have limited storage. In such settings, communication efficient optimization methods are attractive alternatives, however they still struggle with memory issues. To tackle these challenges, we propose an communication efficient method called contractive error feedback (ConEF). As opposed to SGD with error-feedback (EFSGD) that inefficiently manages memory, ConEF obtains the sweet spot of convergence and memory usage, and achieves communication efficiency by leveraging biased and all-reducable gradient compression. We empirically validate ConEF on various learning tasks that include image classification, language modeling, and machine translation and observe that ConEF saves 80\% - 90\% of the extra memory in EFSGD with almost no loss on test performance, while also achieving 1.3x - 5x speedup of SGD. Through our work, we also demonstrate the feasibility and convergence of ConEF to clear up the theoretical barrier of integrating ConEF to popular memory efficient frameworks such as ZeRO-3.
△ Less
Submitted 13 December, 2023;
originally announced December 2023.
-
Resonant Anti-Reflection Metasurface for Infrared Transmission Optics
Authors:
John Brewer,
Sachin Kulkarni,
Aaswath P. Raman
Abstract:
A fundamental capability for any transmissive optical component is anti-reflection, yet this capability is challenging to achieve in a cost-efficient manner over longer infrared wavelengths. We demonstrate that Mie resonant nanophotonic structures enhance transmission in Silicon, allowing it to function as an effective optical material over long-wave infrared wavelengths. This approach enables a w…
▽ More
A fundamental capability for any transmissive optical component is anti-reflection, yet this capability is challenging to achieve in a cost-efficient manner over longer infrared wavelengths. We demonstrate that Mie resonant nanophotonic structures enhance transmission in Silicon, allowing it to function as an effective optical material over long-wave infrared wavelengths. This approach enables a window optic with up to 40\% greater transmission than equal thickness unpatterned Si. Imaging comparisons with unpatterned silicon and off-the-shelf Germanium optics are shown, as well as basic broadband slant edge MTF measurements. Overall, we demonstrate how Mie-resonant structures can be used to improve optical transmission through window optics of arbitrary lithographically patternable optical media, and highlight their possible use in imaging applications.
△ Less
Submitted 8 June, 2023;
originally announced June 2023.
-
Simultaneous control of spectral and directional emissivity with gradient epsilon-near-zero InAs photonic structures
Authors:
Jae Seung Hwang,
Jin Xu,
Aaswath P. Raman
Abstract:
Controlling both the spectral bandwidth and directional range of emitted thermal radiation is a fundamental challenge in modern photonics and materials research. Recent work has shown that materials with a spatial gradient in their epsilon near zero response can support broad spectrum directionality in their emissivity, enabling high radiance to specific angles of incidence. However, this capabili…
▽ More
Controlling both the spectral bandwidth and directional range of emitted thermal radiation is a fundamental challenge in modern photonics and materials research. Recent work has shown that materials with a spatial gradient in their epsilon near zero response can support broad spectrum directionality in their emissivity, enabling high radiance to specific angles of incidence. However, this capability has been limited spectrally and directionally by the availability of materials supporting phonon-polariton resonances over long-wave infrared wavelengths. Here, we design and experimentally demonstrate an approach using doped III-V semiconductors that can simultaneously tailor spectral peak, bandwidth and directionality of infrared emissivity. We epitaxially grow and characterize InAs-based gradient ENZ photonic structures that exhibit broadband directional emission with varying spectral bandwidths and peak directions as a function of their doping concentration profile and thickness. Due to its easy-to-fabricate geometry we believe this approach provides a versatile photonic platform to dynamically control broadband spectral and directional emissivity for a range of emerging applications.
△ Less
Submitted 16 December, 2022;
originally announced December 2022.
-
DeepAdjoint: An All-in-One Photonic Inverse Design Framework Integrating Data-Driven Machine Learning with Optimization Algorithms
Authors:
Christopher Yeung,
Benjamin Pham,
Ryan Tsai,
Katherine T. Fountaine,
Aaswath P. Raman
Abstract:
In recent years, hybrid design strategies combining machine learning (ML) with electromagnetic optimization algorithms have emerged as a new paradigm for the inverse design of photonic structures and devices. While a trained, data-driven neural network can rapidly identify solutions near the global optimum with a given dataset's design space, an iterative optimization algorithm can further refine…
▽ More
In recent years, hybrid design strategies combining machine learning (ML) with electromagnetic optimization algorithms have emerged as a new paradigm for the inverse design of photonic structures and devices. While a trained, data-driven neural network can rapidly identify solutions near the global optimum with a given dataset's design space, an iterative optimization algorithm can further refine the solution and overcome dataset limitations. Furthermore, such hybrid ML-optimization methodologies can reduce computational costs and expedite the discovery of novel electromagnetic components. However, existing hybrid ML-optimization methods have yet to optimize across both materials and geometries in a single integrated and user-friendly environment. In addition, due to the challenge of acquiring large datasets for ML, as well as the exponential growth of isolated models being trained for photonics design, there is a need to standardize the ML-optimization workflow while making the pre-trained models easily accessible. Motivated by these challenges, here we introduce DeepAdjoint, a general-purpose, open-source, and multi-objective "all-in-one" global photonics inverse design application framework which integrates pre-trained deep generative networks with state-of-the-art electromagnetic optimization algorithms such as the adjoint variables method. DeepAdjoint allows a designer to specify an arbitrary optical design target, then obtain a photonic structure that is robust to fabrication tolerances and possesses the desired optical properties - all within a single user-guided application interface. Our framework thus paves a path towards the systematic unification of ML and optimization algorithms for photonic inverse design.
△ Less
Submitted 28 September, 2022;
originally announced September 2022.
-
Hybrid Supervised and Reinforcement Learning for the Design and Optimization of Nanophotonic Structures
Authors:
Christopher Yeung,
Benjamin Pham,
Zihan Zhang,
Katherine T. Fountaine,
Aaswath P. Raman
Abstract:
From higher computational efficiency to enabling the discovery of novel and complex structures, deep learning has emerged as a powerful framework for the design and optimization of nanophotonic circuits and components. However, both data-driven and exploration-based machine learning strategies have limitations in their effectiveness for nanophotonic inverse design. Supervised machine learning appr…
▽ More
From higher computational efficiency to enabling the discovery of novel and complex structures, deep learning has emerged as a powerful framework for the design and optimization of nanophotonic circuits and components. However, both data-driven and exploration-based machine learning strategies have limitations in their effectiveness for nanophotonic inverse design. Supervised machine learning approaches require large quantities of training data to produce high-performance models and have difficulty generalizing beyond training data given the complexity of the design space. Unsupervised and reinforcement learning-based approaches on the other hand can have very lengthy training or optimization times associated with them. Here we demonstrate a hybrid supervised learning and reinforcement learning approach to the inverse design of nanophotonic structures and show this approach can reduce training data dependence, improve the generalizability of model predictions, and shorten exploratory training times by orders of magnitude. The presented strategy thus addresses a number of contemporary deep learning-based challenges, while opening the door for new design methodologies that leverage multiple classes of machine learning algorithms to produce more effective and practical solutions for photonic design.
△ Less
Submitted 8 September, 2022;
originally announced September 2022.
-
Temporal coupled-mode theory for thermal emission from multiple arbitrarily coupled resonators
Authors:
Xin Huang,
Christopher Yeung,
Aaswath P. Raman
Abstract:
Controlling the spectral response of thermal emitters has become increasingly important for a range of energy and sensing applications. Conventional approaches to achieving arbitrary spectrum selectivity in photonic systems have entailed combining multiple resonantly emissive elements together to achieve a range of spectral profiles through numerical optimization, with a universal theoretical fram…
▽ More
Controlling the spectral response of thermal emitters has become increasingly important for a range of energy and sensing applications. Conventional approaches to achieving arbitrary spectrum selectivity in photonic systems have entailed combining multiple resonantly emissive elements together to achieve a range of spectral profiles through numerical optimization, with a universal theoretical framework lacking. Here, we develop a temporal coupled mode theory for thermal emission from multiple, arbtirarily-coupled resonators. We validate our theory against numerical simulations of complex two- and three-dimensional nanophotonic thermal emitters, highlighting the anomalous thermal emission spectra that can emerge when multiple resonators with arbitrary properties couple to each other with varying strengths.
△ Less
Submitted 8 September, 2022;
originally announced September 2022.
-
Celestial insights into the S-matrix bootstrap
Authors:
Sudip Ghosh,
Prashanth Raman,
Aninda Sinha
Abstract:
We consider 2-2 scattering in four spacetime dimensions in Celestial variables. Using the crossing symmetric dispersion relation (CSDR), we recast the Celestial amplitudes in terms of crossing symmetric partial waves. These partial waves have spurious singularities in the complex Celestial variable, which need to be removed in local theories. The locality constraints (null constraints) admit close…
▽ More
We consider 2-2 scattering in four spacetime dimensions in Celestial variables. Using the crossing symmetric dispersion relation (CSDR), we recast the Celestial amplitudes in terms of crossing symmetric partial waves. These partial waves have spurious singularities in the complex Celestial variable, which need to be removed in local theories. The locality constraints (null constraints) admit closed form expressions, which lead to novel bounds on partial wave moments. These bounds allow us to quantify the degree of low spin dominance(LSD) for scalar theories. We study a new kind of positivity that seems to be present in a wide class of theories. We prove that this positivity arises only in theories with a spin-0 dominance. The crossing symmetric partial waves with spurious singularities removed, dubbed as Feynman blocks, have remarkable properties in the Celestial variable, namely typically realness, in the sense of Geometric Function Theory (GFT). Using GFT techniques we derive non-projective bounds on Wilson coefficients in terms of partial wave moments.
△ Less
Submitted 28 July, 2022; v1 submitted 15 April, 2022;
originally announced April 2022.
-
Crossing Symmetric Spinning S-matrix Bootstrap: EFT bounds
Authors:
Subham Dutta Chowdhury,
Kausik Ghosh,
Parthiv Haldar,
Prashanth Raman,
Aninda Sinha
Abstract:
We develop crossing symmetric dispersion relations for describing 2-2 scattering of identical external particles carrying spin. This enables us to import techniques from Geometric Function Theory and study two sided bounds on low energy Wilson coefficients. We consider scattering of photons, gravitons in weakly coupled effective field theories. We provide general expressions for the locality/null…
▽ More
We develop crossing symmetric dispersion relations for describing 2-2 scattering of identical external particles carrying spin. This enables us to import techniques from Geometric Function Theory and study two sided bounds on low energy Wilson coefficients. We consider scattering of photons, gravitons in weakly coupled effective field theories. We provide general expressions for the locality/null constraints. Consideration of the positivity of the absorptive part leads to an interesting connection with the recently conjectured weak low spin dominance. We also construct the crossing symmetric amplitudes and locality constraints for the massive neutral Majorana fermions and parity violating photon and graviton theories. The techniques developed in this paper will be useful for considering numerical S-matrix bootstrap in the future.
△ Less
Submitted 21 August, 2022; v1 submitted 22 December, 2021;
originally announced December 2021.
-
Enhancing Adjoint Optimization-based Photonics Inverse Design with Explainable Machine Learning
Authors:
Christopher Yeung,
David Ho,
Benjamin Pham,
Katherine T. Fountaine,
Aaswath P. Raman
Abstract:
A fundamental challenge in the design of photonic devices, and electromagnetic structures more generally, is the optimization of their overall architecture to achieve a desired response. To this end, topology or shape optimizers based on the adjoint variables method have been widely adopted due to their high computational efficiency and ability to create complex freeform geometries. However, the f…
▽ More
A fundamental challenge in the design of photonic devices, and electromagnetic structures more generally, is the optimization of their overall architecture to achieve a desired response. To this end, topology or shape optimizers based on the adjoint variables method have been widely adopted due to their high computational efficiency and ability to create complex freeform geometries. However, the functional understanding of such freeform structures remains a black box. Moreover, unless a design space of high-performance devices is known in advance, such gradient-based optimizers can get trapped in local minima valleys or saddle points, which limits performance achievable through this inverse design process. To elucidate the relationships between device performance and nanoscale structuring while mitigating the effects of local minima trapping, we present an inverse design framework that combines adjoint optimization, automated machine learning (AutoML), and explainable artificial intelligence (XAI). Integrated with a numerical electromagnetics simulation method, our framework reveals structural contributions towards a figure-of-merit (FOM) of interest. Through an explanation-based reoptimization process, this information is then leveraged to minimize the FOM further than that obtained through adjoint optimization alone, thus overcoming the optimization's local minima. We demonstrate our framework in the context of waveguide design and achieve between 39% and 74% increases in device performance relative to state-of-the-art adjoint optimization-based inverse design across a range of telecom wavelengths. Results of this work therefore highlight machine learning strategies that can substantially extend and enhance the capabilities of a conventional, optimization-based inverse design algorithm while revealing deeper insights into the algorithm's designs.
△ Less
Submitted 25 April, 2022; v1 submitted 30 September, 2021;
originally announced September 2021.
-
Accurately Quantifying Radiative Cooling Potentials: A Temperature-correction to the Transmittance-based approximation
Authors:
Jyotirmoy Mandal,
Xin Huang,
Aaswath P. Raman
Abstract:
Theoretical calculations of the cooling potential of radiative cooling materials are crucial for determining their cooling capability under different meteorological conditions and evaluating their performance. To enable these calculations, accurate models of long-wave infrared downwelling atmospheric irradiance are needed, However, the transmittance-based cosine approximation, which is widely used…
▽ More
Theoretical calculations of the cooling potential of radiative cooling materials are crucial for determining their cooling capability under different meteorological conditions and evaluating their performance. To enable these calculations, accurate models of long-wave infrared downwelling atmospheric irradiance are needed, However, the transmittance-based cosine approximation, which is widely used to determine radiative cooling potentials, does not account for the cooling potential arising from heat loss to the colder reaches of the atmosphere itself. Here, we show that use of the approximation can lead to > 10% underestimation of the cooling potential relative to MODTRAN 6 outputs. We propose a temperature correction to the transmittance-based approximation which accounts for heat loss to the cold upper atmosphere, and significantly reduces this underestimation, while retaining the advantages of the original model. In light of the widespread and continued use of the transmittance-based model, our results highlight an important source of potential errors and a means to correct for them.
△ Less
Submitted 17 August, 2021;
originally announced August 2021.
-
QFT, EFT and GFT
Authors:
Prashanth Raman,
Aninda Sinha
Abstract:
We explore the correspondence between geometric function theory (GFT) and quantum field theory (QFT). The crossing symmetric dispersion relation provides the necessary tool to examine the connection between GFT, QFT, and effective field theories (EFTs), enabling us to connect with the crossing-symmetric EFT-hedron. Several existing mathematical bounds on the Taylor coefficients of Typically Real f…
▽ More
We explore the correspondence between geometric function theory (GFT) and quantum field theory (QFT). The crossing symmetric dispersion relation provides the necessary tool to examine the connection between GFT, QFT, and effective field theories (EFTs), enabling us to connect with the crossing-symmetric EFT-hedron. Several existing mathematical bounds on the Taylor coefficients of Typically Real functions are summarized and shown to be of enormous use in bounding Wilson coefficients in the context of 2-2 scattering. We prove that two-sided bounds on Wilson coefficients are guaranteed to exist quite generally for the fully crossing symmetric situation. Numerical implementation of the GFT constraints (Bieberbach-Rogosinski inequalities) is straightforward and allows a systematic exploration. A comparison of our findings obtained using GFT techniques and other results in the literature is made. We study both the three-channel as well as the two-channel crossing-symmetric cases, the latter having some crucial differences. We also consider bound state poles as well as massless poles in EFTs. Finally, we consider nonlinear constraints arising from the positivity of certain Toeplitz determinants, which occur in the trigonometric moment problem.
△ Less
Submitted 6 October, 2022; v1 submitted 14 July, 2021;
originally announced July 2021.
-
Nanostructured Plasmonic Metal Surfaces as Optical Components for Infrared Imaging and Sensing
Authors:
Jyotirmoy Mandal,
John Brewer,
Sagar Mandal,
Aaswath P. Raman
Abstract:
Thermal imaging and sensing technologies offer critical information about our thermally radiant world, and in recent years, have seen dramatic increases in usage for a range of applications. However, the cost and technical finesse of manufacturing infrared optical components remain a major barrier towards the democratization of these technologies. In this report, we present a solution processed pl…
▽ More
Thermal imaging and sensing technologies offer critical information about our thermally radiant world, and in recent years, have seen dramatic increases in usage for a range of applications. However, the cost and technical finesse of manufacturing infrared optical components remain a major barrier towards the democratization of these technologies. In this report, we present a solution processed plasmonic reflective filter or PRF as a scalable and inexpensive thermal infrared optic. The PRF selectively absorbs sunlight and specularly reflects thermal infrared TIR wavelengths with performance comparable to state-of-the-art TIR optics made of materials like Germanium. Unlike traditional infrared optical components, however, the PRF can be conveniently fabricated using inexpensive materials and a dip and dry chemical synthesis technique, and crucially, has manufacturing costs that are orders of magnitude lower. We experimentally demonstrate the core optical functionality of the PRF, as well as its integration into infrared imaging and sensing systems without compromising their thermographic or radiometric capabilities. From a practical standpoint, the inexpensive and convenient fabricability of the PRF represent a significant advance towards making the benefits of thermal imaging and sensing systems more affordable and accessible. Scientifically, our work demonstrates a previously unexplored optical functionality and a new direction for versatile chemical synthesis in designing optical components.
△ Less
Submitted 27 June, 2021;
originally announced June 2021.
-
Multi-scale photonic emissivity engineering for relativistic lightsail thermal regulation
Authors:
John Brewer,
Matthew F. Campbell,
Pawan Kumar,
Sachin Kulkarni,
Deep Jariwala,
Igor Bargatin,
Aaswath P. Raman
Abstract:
The Breakthrough Starshot Initiative aims to send a gram-scale probe to Proxima Centuri B using a laser-accelerated lightsail traveling at relativistic speeds. Thermal management is a key lightsail design objective because of the intense laser powers required but has generally been considered secondary to accelerative performance. Here, we demonstrate nanophotonic photonic crystal slab reflectors…
▽ More
The Breakthrough Starshot Initiative aims to send a gram-scale probe to Proxima Centuri B using a laser-accelerated lightsail traveling at relativistic speeds. Thermal management is a key lightsail design objective because of the intense laser powers required but has generally been considered secondary to accelerative performance. Here, we demonstrate nanophotonic photonic crystal slab reflectors composed of 2H-phase molybdenum disulfide and crystalline silicon nitride, highlight the inverse relationship between the thermal band extinction coefficient and the lightsail's maximum temperature, and examine the trade-off between the acceleration distance and setting realistic sail thermal limits, ultimately realizing a thermally endurable acceleration minimum distance of 16.3~Gm. We additionally demonstrate multi-scale photonic structures featuring thermal-wavelength-scale Mie resonant geometries, and characterize their broadband Mie resonance-driven emissivity enhancement and acceleration distance reduction. Our results highlight new possibilities in simultaneously controlling optical and thermal response over broad wavelength ranges in ultralight nanophotonic structures.
△ Less
Submitted 14 September, 2021; v1 submitted 3 June, 2021;
originally announced June 2021.
-
Global Inverse Design Across Multiple Photonic Structure Classes Using Generative Deep Learning
Authors:
Christopher Yeung,
Ryan Tsai,
Benjamin Pham,
Brian King,
Yusaku Kawagoe,
David Ho,
Julia Liang,
Mark W. Knight,
Aaswath P. Raman
Abstract:
Understanding how nano- or micro-scale structures and material properties can be optimally configured to attain specific functionalities remains a fundamental challenge. Photonic metasurfaces, for instance, can be spectrally tuned through material choice and structural geometry to achieve unique optical responses. However, existing numerical design methods require prior identification of specific…
▽ More
Understanding how nano- or micro-scale structures and material properties can be optimally configured to attain specific functionalities remains a fundamental challenge. Photonic metasurfaces, for instance, can be spectrally tuned through material choice and structural geometry to achieve unique optical responses. However, existing numerical design methods require prior identification of specific material-structure combinations, or device classes, as the starting point for optimization. As such, a unified solution that simultaneously optimizes across materials and geometries has yet to be realized. To overcome these challenges, we present a global deep learning-based inverse design framework, where a conditional deep convolutional generative adversarial network is trained on colored images encoded with a range of material and structural parameters, including refractive index, plasma frequency, and geometric design. We demonstrate that, in response to target absorption spectra, the network can identify an effective metasurface in terms of its class, materials properties, and overall shape. Furthermore, the model can arrive at multiple design variants with distinct materials and structures that present nearly identical absorption spectra. Our proposed framework is thus an important step towards global photonics and materials design strategies that can identify combinations of device categories, material properties, and geometric parameters which algorithmically deliver a sought functionality.
△ Less
Submitted 19 July, 2021; v1 submitted 31 December, 2020;
originally announced December 2020.
-
Learning Structured Representations of Entity Names using Active Learning and Weak Supervision
Authors:
Kun Qian,
Poornima Chozhiyath Raman,
Yunyao Li,
Lucian Popa
Abstract:
Structured representations of entity names are useful for many entity-related tasks such as entity normalization and variant generation. Learning the implicit structured representations of entity names without context and external knowledge is particularly challenging. In this paper, we present a novel learning framework that combines active learning and weak supervision to solve this problem. Our…
▽ More
Structured representations of entity names are useful for many entity-related tasks such as entity normalization and variant generation. Learning the implicit structured representations of entity names without context and external knowledge is particularly challenging. In this paper, we present a novel learning framework that combines active learning and weak supervision to solve this problem. Our experimental evaluation show that this framework enables the learning of high-quality models from merely a dozen or so labeled examples.
△ Less
Submitted 30 October, 2020;
originally announced November 2020.
-
Modeling and optimization of radiative cooling based thermoelectric generators
Authors:
Bin Zhao,
Gang Pei,
Aaswath P. Raman
Abstract:
The possibility of night-time power generation has recently stimulated interest in using the radiative sky cooling mechanism with thermoelectric generators (TEG). These passive, low-temperature difference devices have been shown to generate electricity at night with no active input of heat needed, instead using the ambient air itself as the heat source. Here, we optimize both the geometry and oper…
▽ More
The possibility of night-time power generation has recently stimulated interest in using the radiative sky cooling mechanism with thermoelectric generators (TEG). These passive, low-temperature difference devices have been shown to generate electricity at night with no active input of heat needed, instead using the ambient air itself as the heat source. Here, we optimize both the geometry and operating conditions of radiative cooling driven thermoelectric (RC-TE) generators. We determine the optimal operating conditions, including maximum power point and maximum efficiency point, by developing a combined thermal and electrical model. Our results show that the optimal operating condition results in larger power output than was previously expected. Moreover, we show that maximum power density occurs when the area ratio between cooler and P or N element reaches an optimal value. Finally, we perform a parametric study that takes account of environmental and structural parameters to improve the performance of the RC-TE device, including enhancing heat transfer between the hot surface and ambient air, suppressing the cooling loss of the radiative cooler, and optimizing the geometry of individual thermocouples. Our work identifies how to maximize the output of RC-TE devices and provides comprehensive guidance on making use of this new passive power generation method.
△ Less
Submitted 21 July, 2020;
originally announced August 2020.
-
Multiplexed Supercell Metasurface Design and Optimization with Tandem Residual Networks
Authors:
Christopher Yeung,
Ju-Ming Tsai,
Brian King,
Benjamin Pham,
David Ho,
Julia Liang,
Mark W. Knight,
Aaswath P. Raman
Abstract:
Complex nanophotonic structures hold the potential to deliver exquisitely tailored optical responses for a range of applications. Metal-insulator-metal (MIM) metasurfaces arranged in supercells, for instance, can be tailored by geometry and material choice to exhibit a variety of absorption properties and resonant wavelengths. With this flexibility, however, comes a vast space of design possibilit…
▽ More
Complex nanophotonic structures hold the potential to deliver exquisitely tailored optical responses for a range of applications. Metal-insulator-metal (MIM) metasurfaces arranged in supercells, for instance, can be tailored by geometry and material choice to exhibit a variety of absorption properties and resonant wavelengths. With this flexibility, however, comes a vast space of design possibilities that classical design paradigms struggle to effectively navigate. To overcome this challenge, here we demonstrate a tandem residual network approach to efficiently generate multiplexed supercells through inverse design. By using a training dataset with several thousand full-wave electromagnetic simulations in a design space of over three trillion possible designs, the deep learning model can accurately generate a wide range of complex supercell designs given a spectral target. Beyond inverse design, the presented approach can also be used to explore the structure-property relationships of broadband absorption and emission in such supercell configurations. Thus, this study demonstrates the feasibility of high-dimensional supercell inverse design with deep neural networks that is applicable to complex nanophotonic structures composed of multiple subunit elements that may exhibit coupling.
△ Less
Submitted 14 December, 2020; v1 submitted 2 August, 2020;
originally announced August 2020.
-
Radiative Cooling and Thermoregulation in the Earth's Glow
Authors:
Jyotirmoy Mandal,
Sagar Mandal,
John Brewer,
Arvind Ramachandran,
Aaswath Pattabhi Raman
Abstract:
Passive radiative cooling involves a net radiative heat loss into the cold outer space through the atmospheric transmission windows. Due to its passive nature and net cooling effect, it is a promising alternative or complement to electrical cooling. For efficient radiative cooling of objects, an unimpeded view of the sky is ideal. However, the view of the sky is usually limited - for instance, the…
▽ More
Passive radiative cooling involves a net radiative heat loss into the cold outer space through the atmospheric transmission windows. Due to its passive nature and net cooling effect, it is a promising alternative or complement to electrical cooling. For efficient radiative cooling of objects, an unimpeded view of the sky is ideal. However, the view of the sky is usually limited - for instance, the walls of buildings have >50% of their field of view subtended by the earth. Moreover, objects on earth become sources of heat under sunlight. Therefore, building walls with hot terrestrial objects in view experience reduced cooling or heating, even with materials optimized for heat loss into the sky. We show that by using materials with selective long-wavelength infrared (LWIR) emittances, vertical building facades experience higher cooling than achievable by using broadband thermal emitters like typical building envelopes. Intriguingly, this effect is pronounced in the summer and diminishes or even reverses during the winter, indicating a thermoregulation effect. The findings highlight a major opportunity to harness untapped energy savings in buildings.
△ Less
Submitted 7 May, 2021; v1 submitted 21 June, 2020;
originally announced June 2020.
-
Stringy canonical forms and binary geometries from associahedra, cyclohedra and generalized permutohedra
Authors:
Song He,
Zhenjie Li,
Prashanth Raman,
Chi Zhang
Abstract:
Stringy canonical forms are a class of integrals that provide $α'$-deformations of the canonical form of any polytopes. For generalized associahedra of finite-type cluster algebra, there exist completely rigid stringy integrals, whose configuration spaces are the so-called binary geometries, and for classical types are associated with (generalized) scattering of particles and strings. In this pape…
▽ More
Stringy canonical forms are a class of integrals that provide $α'$-deformations of the canonical form of any polytopes. For generalized associahedra of finite-type cluster algebra, there exist completely rigid stringy integrals, whose configuration spaces are the so-called binary geometries, and for classical types are associated with (generalized) scattering of particles and strings. In this paper we propose a large class of rigid stringy canonical forms for another class of polytopes, generalized permutohedra, which also include associahedra and cyclohedra as special cases (type $A_n$ and $B_n$ generalized associahedra). Remarkably, we find that the configuration spaces of such integrals are also binary geometries, which were suspected to exist for generalized associahedra only. For any generalized permutohedron that can be written as Minkowski sum of coordinate simplices, we show that its rigid stringy integral factorizes into products of lower integrals for massless poles at finite $α'$, and the configuration space is binary although the $u$ equations take a more general form than those "perfect" ones for cluster cases. Moreover, we provide an infinite class of examples obtained by degenerations of type $A_n$ and $B_n$ integrals, which have perfect $u$ equations as well. Our results provide yet another family of generalizations of the usual string integral and moduli space, whose physical interpretations remain to be explored.
△ Less
Submitted 28 November, 2020; v1 submitted 15 May, 2020;
originally announced May 2020.
-
DS-FACTO: Doubly Separable Factorization Machines
Authors:
Parameswaran Raman,
S. V. N. Vishwanathan
Abstract:
Factorization Machines (FM) are powerful class of models that incorporate higher-order interaction among features to add more expressive power to linear models. They have been used successfully in several real-world tasks such as click-prediction, ranking and recommender systems. Despite using a low-rank representation for the pairwise features, the memory overheads of using factorization machines…
▽ More
Factorization Machines (FM) are powerful class of models that incorporate higher-order interaction among features to add more expressive power to linear models. They have been used successfully in several real-world tasks such as click-prediction, ranking and recommender systems. Despite using a low-rank representation for the pairwise features, the memory overheads of using factorization machines on large-scale real-world datasets can be prohibitively high. For instance on the criteo tera dataset, assuming a modest $128$ dimensional latent representation and $10^{9}$ features, the memory requirement for the model is in the order of $1$ TB. In addition, the data itself occupies $2.1$ TB. Traditional algorithms for FM which work on a single-machine are not equipped to handle this scale and therefore, using a distributed algorithm to parallelize the computation across a cluster is inevitable. In this work, we propose a hybrid-parallel stochastic optimization algorithm DS-FACTO, which partitions both the data as well as parameters of the factorization machine simultaneously. Our solution is fully de-centralized and does not require the use of any parameter servers. We present empirical results to analyze the convergence behavior, predictive power and scalability of DS-FACTO.
△ Less
Submitted 28 April, 2020;
originally announced April 2020.
-
Broadband directional control of thermal emission
Authors:
Jin Xu,
Jyotirmoy Mandal,
Aaswath P. Raman
Abstract:
Controlling the directionality of emitted far-field thermal radiation is a fundamental challenge in contemporary photonics and materials research. While photonic strategies have enabled angular selectivity of thermal emission over narrow sets of bandwidths, thermal radiation is inherently a broadband phenomenon. We currently lack the ability to constrain emitted thermal radiation to arbitrary angu…
▽ More
Controlling the directionality of emitted far-field thermal radiation is a fundamental challenge in contemporary photonics and materials research. While photonic strategies have enabled angular selectivity of thermal emission over narrow sets of bandwidths, thermal radiation is inherently a broadband phenomenon. We currently lack the ability to constrain emitted thermal radiation to arbitrary angular ranges over broad bandwidths. Here, we introduce and experimentally realize gradient epsilon-near-zero (ENZ) material structures that enable broad spectrum directional control of thermal emission by supporting leaky electromagnetic modes that couple to free space at fixed angles over a broad bandwidth. We demonstrate two emitter structures consisting of multiple semiconductor oxides in a photonic configuration that enable gradient ENZ behavior over long-wave infrared wavelengths. The structures exhibit high average emissivity (greater than 0.6 and 0.7) in the p polarization between 7.7 and 11.5 micron over an angular range of 70 deg - 85 deg, and between 10.0 to 14.3 micron over an angular range of 60 deg-75 deg, respectively. Outside these angular ranges, the emissivity dramatically drops to 0.4 at 50 deg and 40 deg. The structures broadband thermal beaming capability enables strong radiative heat transfer only at particular angles and is experimentally verified through direct measurements of thermal emission. By decoupling conventional limitations on angular and spectral response, our approach opens new possibilities for radiative heat transfer in applications such as thermal camouflaging, solar heating, radiative cooling and waste heat recovery.
△ Less
Submitted 3 December, 2020; v1 submitted 13 April, 2020;
originally announced April 2020.
-
Elucidating the Behavior of Nanophotonic Structures Through Explainable Machine Learning Algorithms
Authors:
Christopher Yeung,
Ju-Ming Tsai,
Brian King,
Yusaku Kawagoe,
David Ho,
Aaswath P. Raman
Abstract:
A central challenge in the development of nanophotonic structures is identifying the optimal design for a target functionality, and understanding the physical mechanisms that enable the optimized device's capabilities. Previously investigated design methods for nanophotonic structures, including both conventional optimization approaches as well as nascent machine learning (ML) strategies, have mad…
▽ More
A central challenge in the development of nanophotonic structures is identifying the optimal design for a target functionality, and understanding the physical mechanisms that enable the optimized device's capabilities. Previously investigated design methods for nanophotonic structures, including both conventional optimization approaches as well as nascent machine learning (ML) strategies, have made progress, yet they remain 'black boxes' that lack explanations for their predictions. Here we demonstrate that convolutional neural networks (CNN) trained to predict the electromagnetic response of classes of metal-dielectric-metal metamaterials, including complex freeform designs, can be explained to reveal deeper insights into the underlying physics of nanophotonic structures. Using an explainable AI (XAI) approach, we show that we can identify the importance of specific spatial regions of a nanophotonic structure for the presence or lack of an absorption peak. Our results highlight that ML strategies can be used for physics discovery, as well as design optimization, in optics and photonics.
△ Less
Submitted 3 July, 2020; v1 submitted 12 March, 2020;
originally announced March 2020.
-
Optimization on the Surface of the (Hyper)-Sphere
Authors:
Parameswaran Raman,
Jiasen Yang
Abstract:
Thomson problem is a classical problem in physics to study how $n$ number of charged particles distribute themselves on the surface of a sphere of $k$ dimensions. When $k=2$, i.e. a 2-sphere (a circle), the particles appear at equally spaced points. Such a configuration can be computed analytically. However, for higher dimensions such as $k \ge 3$, i.e. the case of 3-sphere (standard sphere), ther…
▽ More
Thomson problem is a classical problem in physics to study how $n$ number of charged particles distribute themselves on the surface of a sphere of $k$ dimensions. When $k=2$, i.e. a 2-sphere (a circle), the particles appear at equally spaced points. Such a configuration can be computed analytically. However, for higher dimensions such as $k \ge 3$, i.e. the case of 3-sphere (standard sphere), there is not much that is understood analytically. Finding global minimum of the problem under these settings is particularly tough since the optimization problem becomes increasingly computationally intensive with larger values of $k$ and $n$. In this work, we explore a wide variety of numerical optimization methods to solve the Thomson problem. In our empirical study, we find stochastic gradient based methods (SGD) to be a compelling choice for this problem as it scales well with the number of points.
△ Less
Submitted 13 September, 2019;
originally announced September 2019.
-
The positive geometry for $φ^{p}$ interactions
Authors:
Prashanth Raman
Abstract:
Starting with the seminal work of Arkani-Hamed et al arXiv:1711.09102, in arXiv:1811.05904, the "Amplituhedron program" was extended to analyzing (planar) amplitudes in massless $φ^{4}$ theory. In this paper we show that the program can be further extended to include $φ^{p}$ ($p>4$) interactions. We show that tree-level planar amplitudes in these theories can be obtained from geometry of polytopes…
▽ More
Starting with the seminal work of Arkani-Hamed et al arXiv:1711.09102, in arXiv:1811.05904, the "Amplituhedron program" was extended to analyzing (planar) amplitudes in massless $φ^{4}$ theory. In this paper we show that the program can be further extended to include $φ^{p}$ ($p>4$) interactions. We show that tree-level planar amplitudes in these theories can be obtained from geometry of polytopes called accordiohedron which naturally sits inside kinematic space. As in the case of quartic interactions the accordiohedron of a given dimension is not unique, and we show that a weighted sum of residues of the canonical form on these polytopes can be used to compute scattering amplitudes. We finally provide a prescription to compute the weights and demonstrate how it works in various examples.
△ Less
Submitted 7 June, 2019;
originally announced June 2019.
-
Stokes Polytopes : The positive geometry for $φ^{4}$ interactions
Authors:
Pinaki Banerjee,
Alok Laddha,
Prashanth Raman
Abstract:
In a remarkable recent work [arXiv : 1711.09102] by Arkani-Hamed et al, the amplituhedron program was extended to the realm of non-supersymmetric scattering amplitudes. In particular it was shown that for tree-level planar diagrams in massless $φ^{3}$ theory (and its close cousin, bi-adjoint $φ^{3}$ theory) a polytope known as the associahedron sits inside the kinematic space and is the amplituhed…
▽ More
In a remarkable recent work [arXiv : 1711.09102] by Arkani-Hamed et al, the amplituhedron program was extended to the realm of non-supersymmetric scattering amplitudes. In particular it was shown that for tree-level planar diagrams in massless $φ^{3}$ theory (and its close cousin, bi-adjoint $φ^{3}$ theory) a polytope known as the associahedron sits inside the kinematic space and is the amplituhedron for the theory. Precisely as in the case of amplituhedron, it was shown that scattering amplitude is nothing but residue of the canonical form associated to the associahedron. Combinatorial and geometric properties of associahedron naturally encode properties like locality and unitarity of (tree level) scattering amplitudes. In this paper we attempt to extend this program to planar amplitudes in massless $φ^{4}$ theory. We show that tree-level planar amplitudes in this theory can be obtained from geometry of objects known as the Stokes polytope which sits naturally inside the kinematic space. As in the case of associahedron we show that residues of the canonical form on these Stokes polytopes can be used to compute scattering amplitudes for quartic interactions. However unlike associahedron, Stokes polytope of a given dimension is not unique and as we show, one must sum over all of them to obtain the complete scattering amplitude. Not all Stokes polytopes contribute equally and we argue that the corresponding weights depend on purely combinatorial properties of the Stokes polytopes. As in the case of $φ^{3}$ theory, we show how factorization of Stokes polytope implies unitarity and locality of the amplitudes.
△ Less
Submitted 3 August, 2019; v1 submitted 14 November, 2018;
originally announced November 2018.
-
Scalar Blocks as Gravitational Wilson Networks
Authors:
Atanu Bhatta,
Prashanth Raman,
Nemani V. Suryanarayana
Abstract:
In this paper we continue to develop further our prescription [arXiv:1602.02962] to holographically compute the conformal partial waves of CFT correlation functions using the gravitational open Wilson network operators in the bulk. In particular, we demonstrate how to implement it to compute four-point scalar partial waves in general dimension. In the process we introduce the concept of OPE module…
▽ More
In this paper we continue to develop further our prescription [arXiv:1602.02962] to holographically compute the conformal partial waves of CFT correlation functions using the gravitational open Wilson network operators in the bulk. In particular, we demonstrate how to implement it to compute four-point scalar partial waves in general dimension. In the process we introduce the concept of OPE modules, that helps us simplify the computations. Our result for scalar partial waves is naturally given in terms of the Gegenbauer polynomials. We also provide a simpler proof of a previously known recursion relation for the even dimensional CFT partial waves, which naturally leads us to an odd dimensional counterpart.
△ Less
Submitted 14 June, 2018;
originally announced June 2018.
-
Extreme Stochastic Variational Inference: Distributed and Asynchronous
Authors:
Jiong Zhang,
Parameswaran Raman,
Shihao Ji,
Hsiang-Fu Yu,
S. V. N. Vishwanathan,
Inderjit S. Dhillon
Abstract:
Stochastic variational inference (SVI), the state-of-the-art algorithm for scaling variational inference to large-datasets, is inherently serial. Moreover, it requires the parameters to fit in the memory of a single processor; this is problematic when the number of parameters is in billions. In this paper, we propose extreme stochastic variational inference (ESVI), an asynchronous and lock-free al…
▽ More
Stochastic variational inference (SVI), the state-of-the-art algorithm for scaling variational inference to large-datasets, is inherently serial. Moreover, it requires the parameters to fit in the memory of a single processor; this is problematic when the number of parameters is in billions. In this paper, we propose extreme stochastic variational inference (ESVI), an asynchronous and lock-free algorithm to perform variational inference for mixture models on massive real world datasets. ESVI overcomes the limitations of SVI by requiring that each processor only access a subset of the data and a subset of the parameters, thus providing data and model parallelism simultaneously. We demonstrate the effectiveness of ESVI by running Latent Dirichlet Allocation (LDA) on UMBC-3B, a dataset that has a vocabulary of 3 million and a token size of 3 billion. In our experiments, we found that ESVI not only outperforms VI and SVI in wallclock-time, but also achieves a better quality solution. In addition, we propose a strategy to speed up computation and save memory when fitting large number of topics.
△ Less
Submitted 3 August, 2018; v1 submitted 31 May, 2016;
originally announced May 2016.
-
DS-MLR: Exploiting Double Separability for Scaling up Distributed Multinomial Logistic Regression
Authors:
Parameswaran Raman,
Sriram Srinivasan,
Shin Matsushima,
Xinhua Zhang,
Hyokun Yun,
S. V. N. Vishwanathan
Abstract:
Scaling multinomial logistic regression to datasets with very large number of data points and classes is challenging. This is primarily because one needs to compute the log-partition function on every data point. This makes distributing the computation hard. In this paper, we present a distributed stochastic gradient descent based optimization method (DS-MLR) for scaling up multinomial logistic re…
▽ More
Scaling multinomial logistic regression to datasets with very large number of data points and classes is challenging. This is primarily because one needs to compute the log-partition function on every data point. This makes distributing the computation hard. In this paper, we present a distributed stochastic gradient descent based optimization method (DS-MLR) for scaling up multinomial logistic regression problems to massive scale datasets without hitting any storage constraints on the data and model parameters. Our algorithm exploits double-separability, an attractive property that allows us to achieve both data as well as model parallelism simultaneously. In addition, we introduce a non-blocking and asynchronous variant of our algorithm that avoids bulk-synchronization. We demonstrate the versatility of DS-MLR to various scenarios in data and model parallelism, through an extensive empirical study using several real-world datasets. In particular, we demonstrate the scalability of DS-MLR by solving an extreme multi-class classification problem on the Reddit dataset (159 GB data, 358 GB parameters) where, to the best of our knowledge, no other existing methods apply.
△ Less
Submitted 3 August, 2018; v1 submitted 16 April, 2016;
originally announced April 2016.
-
Holographic Conformal Partial Waves as Gravitational Open Wilson Networks
Authors:
Atanu Bhatta,
Prashanth Raman,
Nemani V. Suryanarayana
Abstract:
We propose a method to holographically compute the conformal partial waves in any decomposition of correlation functions of primary operators in conformal field theories using open Wilson network operators in the holographic gravitational dual. The Wilson operators are the gravitational ones where gravity is written as a gauge theory in the first order Hilbert-Palatini formalism. We apply this met…
▽ More
We propose a method to holographically compute the conformal partial waves in any decomposition of correlation functions of primary operators in conformal field theories using open Wilson network operators in the holographic gravitational dual. The Wilson operators are the gravitational ones where gravity is written as a gauge theory in the first order Hilbert-Palatini formalism. We apply this method to compute the global conformal blocks and partial waves in 2d CFTs reproducing many of the known results.
△ Less
Submitted 9 February, 2016;
originally announced February 2016.
-
Ranking via Robust Binary Classification and Parallel Parameter Estimation in Large-Scale Data
Authors:
Hyokun Yun,
Parameswaran Raman,
S. V. N. Vishwanathan
Abstract:
We propose RoBiRank, a ranking algorithm that is motivated by observing a close connection between evaluation metrics for learning to rank and loss functions for robust classification. The algorithm shows a very competitive performance on standard benchmark datasets against other representative algorithms in the literature. On the other hand, in large scale problems where explicit feature vectors…
▽ More
We propose RoBiRank, a ranking algorithm that is motivated by observing a close connection between evaluation metrics for learning to rank and loss functions for robust classification. The algorithm shows a very competitive performance on standard benchmark datasets against other representative algorithms in the literature. On the other hand, in large scale problems where explicit feature vectors and scores are not given, our algorithm can be efficiently parallelized across a large number of machines; for a task that requires 386,133 x 49,824,519 pairwise interactions between items to be ranked, our algorithm finds solutions that are of dramatically higher quality than that can be found by a state-of-the-art competitor algorithm, given the same amount of wall-clock time for computation.
△ Less
Submitted 21 August, 2014; v1 submitted 11 February, 2014;
originally announced February 2014.
-
Power to the Points: Validating Data Memberships in Clusterings
Authors:
Parasaran Raman,
Suresh Venkatasubramanian
Abstract:
A clustering is an implicit assignment of labels of points, based on proximity to other points. It is these labels that are then used for downstream analysis (either focusing on individual clusters, or identifying representatives of clusters and so on). Thus, in order to trust a clustering as a first step in exploratory data analysis, we must trust the labels assigned to individual data. Without s…
▽ More
A clustering is an implicit assignment of labels of points, based on proximity to other points. It is these labels that are then used for downstream analysis (either focusing on individual clusters, or identifying representatives of clusters and so on). Thus, in order to trust a clustering as a first step in exploratory data analysis, we must trust the labels assigned to individual data. Without supervision, how can we validate this assignment? In this paper, we present a method to attach affinity scores to the implicit labels of individual points in a clustering. The affinity scores capture the confidence level of the cluster that claims to "own" the point. This method is very general: it can be used with clusterings derived from Euclidean data, kernelized data, or even data derived from information spaces. It smoothly incorporates importance functions on clusters, allowing us to eight different clusters differently. It is also efficient: assigning an affinity score to a point depends only polynomially on the number of clusters and is independent of the number of points in the data. The dimensionality of the underlying space only appears in preprocessing. We demonstrate the value of our approach with an experimental study that illustrates the use of these scores in different data analysis tasks, as well as the efficiency and flexibility of the method. We also demonstrate useful visualizations of these scores; these might prove useful within an interactive analytics framework.
△ Less
Submitted 21 May, 2013;
originally announced May 2013.
-
A Geometric Algorithm for Scalable Multiple Kernel Learning
Authors:
John Moeller,
Parasaran Raman,
Avishek Saha,
Suresh Venkatasubramanian
Abstract:
We present a geometric formulation of the Multiple Kernel Learning (MKL) problem. To do so, we reinterpret the problem of learning kernel weights as searching for a kernel that maximizes the minimum (kernel) distance between two convex polytopes. This interpretation combined with novel structural insights from our geometric formulation allows us to reduce the MKL problem to a simple optimization r…
▽ More
We present a geometric formulation of the Multiple Kernel Learning (MKL) problem. To do so, we reinterpret the problem of learning kernel weights as searching for a kernel that maximizes the minimum (kernel) distance between two convex polytopes. This interpretation combined with novel structural insights from our geometric formulation allows us to reduce the MKL problem to a simple optimization routine that yields provable convergence as well as quality guarantees. As a result our method scales efficiently to much larger data sets than most prior methods can handle. Empirical evaluation on eleven datasets shows that we are significantly faster and even compare favorably with a uniform unweighted combination of kernels.
△ Less
Submitted 15 March, 2014; v1 submitted 25 June, 2012;
originally announced June 2012.
-
Generating a Diverse Set of High-Quality Clusterings
Authors:
Jeff M. Phillips,
Parasaran Raman,
Suresh Venkatasubramanian
Abstract:
We provide a new framework for generating multiple good quality partitions (clusterings) of a single data set. Our approach decomposes this problem into two components, generating many high-quality partitions, and then grouping these partitions to obtain k representatives. The decomposition makes the approach extremely modular and allows us to optimize various criteria that control the choice of r…
▽ More
We provide a new framework for generating multiple good quality partitions (clusterings) of a single data set. Our approach decomposes this problem into two components, generating many high-quality partitions, and then grouping these partitions to obtain k representatives. The decomposition makes the approach extremely modular and allows us to optimize various criteria that control the choice of representative partitions.
△ Less
Submitted 29 July, 2011;
originally announced August 2011.
-
Spatially-Aware Comparison and Consensus for Clusterings
Authors:
Parasaran Raman,
Jeff M. Phillips,
Suresh Venkatasubramanian
Abstract:
This paper proposes a new distance metric between clusterings that incorporates information about the spatial distribution of points and clusters. Our approach builds on the idea of a Hilbert space-based representation of clusters as a combination of the representations of their constituent points. We use this representation and the underlying metric to design a spatially-aware consensus clusterin…
▽ More
This paper proposes a new distance metric between clusterings that incorporates information about the spatial distribution of points and clusters. Our approach builds on the idea of a Hilbert space-based representation of clusters as a combination of the representations of their constituent points. We use this representation and the underlying metric to design a spatially-aware consensus clustering procedure. This consensus procedure is implemented via a novel reduction to Euclidean clustering, and is both simple and efficient. All of our results apply to both soft and hard clusterings. We accompany these algorithms with a detailed experimental evaluation that demonstrates the efficiency and quality of our techniques.
△ Less
Submitted 31 January, 2011;
originally announced February 2011.