-
A New Algorithm for Computing Branch Number of Non-Singular Matrices over Finite Fields
Authors:
P. R. Mishra,
Yogesh Kumar,
Susanta Samanta,
Atul Gaur
Abstract:
The notion of branch numbers of a linear transformation is crucial for both linear and differential cryptanalysis. The number of non-zero elements in a state difference or linear mask directly correlates with the active S-Boxes. The differential or linear branch number indicates the minimum number of active S-Boxes in two consecutive rounds of an SPN cipher, specifically for differential or linear…
▽ More
The notion of branch numbers of a linear transformation is crucial for both linear and differential cryptanalysis. The number of non-zero elements in a state difference or linear mask directly correlates with the active S-Boxes. The differential or linear branch number indicates the minimum number of active S-Boxes in two consecutive rounds of an SPN cipher, specifically for differential or linear cryptanalysis, respectively. This paper presents a new algorithm for computing the branch number of non-singular matrices over finite fields. The algorithm is based on the existing classical method but demonstrates improved computational complexity compared to its predecessor. We conduct a comparative study of the proposed algorithm and the classical approach, providing an analytical estimation of the algorithm's complexity. Our analysis reveals that the computational complexity of our algorithm is the square root of that of the classical approach.
△ Less
Submitted 11 May, 2024;
originally announced May 2024.
-
A Systematic Construction Approach for All $4\times 4$ Involutory MDS Matrices
Authors:
Yogesh Kumar,
P. R. Mishra,
Susanta Samanta,
Atul Gaur
Abstract:
Maximum distance separable (MDS) matrices play a crucial role not only in coding theory but also in the design of block ciphers and hash functions. Of particular interest are involutory MDS matrices, which facilitate the use of a single circuit for both encryption and decryption in hardware implementations. In this article, we present several characterizations of involutory MDS matrices of even or…
▽ More
Maximum distance separable (MDS) matrices play a crucial role not only in coding theory but also in the design of block ciphers and hash functions. Of particular interest are involutory MDS matrices, which facilitate the use of a single circuit for both encryption and decryption in hardware implementations. In this article, we present several characterizations of involutory MDS matrices of even order. Additionally, we introduce a new matrix form for obtaining all involutory MDS matrices of even order and compare it with other matrix forms available in the literature. We then propose a technique to systematically construct all $4 \times 4$ involutory MDS matrices over a finite field $\mathbb{F}_{2^m}$. This method significantly reduces the search space by focusing on involutory MDS class representative matrices, leading to the generation of all such matrices within a substantially smaller set compared to considering all $4 \times 4$ involutory matrices. Specifically, our approach involves searching for these representative matrices within a set of cardinality $(2^m-1)^5$. Through this method, we provide an explicit enumeration of the total number of $4 \times 4$ involutory MDS matrices over $\mathbb{F}_{2^m}$ for $m=3,4,\ldots,8$.
△ Less
Submitted 17 June, 2024; v1 submitted 12 April, 2024;
originally announced April 2024.
-
Construction of all MDS and involutory MDS matrices
Authors:
Yogesh Kumar,
P. R. Mishra,
Susanta Samanta,
Kishan Chand Gupta,
Atul Gaur
Abstract:
In this paper, we propose two algorithms for a hybrid construction of all $n\times n$ MDS and involutory MDS matrices over a finite field $\mathbb{F}_{p^m}$, respectively. The proposed algorithms effectively narrow down the search space to identify $(n-1) \times (n-1)$ MDS matrices, facilitating the generation of all $n \times n$ MDS and involutory MDS matrices over $\mathbb{F}_{p^m}$. To the best…
▽ More
In this paper, we propose two algorithms for a hybrid construction of all $n\times n$ MDS and involutory MDS matrices over a finite field $\mathbb{F}_{p^m}$, respectively. The proposed algorithms effectively narrow down the search space to identify $(n-1) \times (n-1)$ MDS matrices, facilitating the generation of all $n \times n$ MDS and involutory MDS matrices over $\mathbb{F}_{p^m}$. To the best of our knowledge, existing literature lacks methods for generating all $n\times n$ MDS and involutory MDS matrices over $\mathbb{F}_{p^m}$. In our approach, we introduce a representative matrix form for generating all $n\times n$ MDS and involutory MDS matrices over $\mathbb{F}_{p^m}$. The determination of these representative MDS matrices involves searching through all $(n-1)\times (n-1)$ MDS matrices over $\mathbb{F}_{p^m}$. Our contributions extend to proving that the count of all $3\times 3$ MDS matrices over $\mathbb{F}_{2^m}$ is precisely $(2^m-1)^5(2^m-2)(2^m-3)(2^{2m}-9\cdot 2^m+21)$. Furthermore, we explicitly provide the count of all $4\times 4$ MDS and involutory MDS matrices over $\mathbb{F}_{2^m}$ for $m=2, 3, 4$.
△ Less
Submitted 13 August, 2024; v1 submitted 15 March, 2024;
originally announced March 2024.
-
Improving Medical Multi-modal Contrastive Learning with Expert Annotations
Authors:
Yogesh Kumar,
Pekka Marttinen
Abstract:
We introduce eCLIP, an enhanced version of the CLIP model that integrates expert annotations in the form of radiologist eye-gaze heatmaps. It tackles key challenges in contrastive multi-modal medical imaging analysis, notably data scarcity and the "modality gap" -- a significant disparity between image and text embeddings that diminishes the quality of representations and hampers cross-modal inter…
▽ More
We introduce eCLIP, an enhanced version of the CLIP model that integrates expert annotations in the form of radiologist eye-gaze heatmaps. It tackles key challenges in contrastive multi-modal medical imaging analysis, notably data scarcity and the "modality gap" -- a significant disparity between image and text embeddings that diminishes the quality of representations and hampers cross-modal interoperability. eCLIP integrates a heatmap processor and leverages mixup augmentation to efficiently utilize the scarce expert annotations, thus boosting the model's learning effectiveness. eCLIP is designed to be generally applicable to any variant of CLIP without requiring any modifications of the core architecture. Through detailed evaluations across several tasks, including zero-shot inference, linear probing, cross-modal retrieval, and Retrieval Augmented Generation (RAG) of radiology reports using a frozen Large Language Model, eCLIP showcases consistent improvements in embedding quality. The outcomes reveal enhanced alignment and uniformity, affirming eCLIP's capability to harness high-quality annotations for enriched multi-modal analysis in the medical imaging domain.
△ Less
Submitted 15 July, 2024; v1 submitted 15 March, 2024;
originally announced March 2024.
-
Fair Interval Scheduling of Indivisible Chores
Authors:
Sarfaraz Equbal,
Rohit Gurjar,
Yatharth Kumar,
Swaprava Nath,
Rohit Vaish
Abstract:
We study the problem of fairly assigning a set of discrete tasks (or chores) among a set of agents with additive valuations. Each chore is associated with a start and finish time, and each agent can perform at most one chore at any given time. The goal is to find a fair and efficient schedule of the chores, where fairness pertains to satisfying envy-freeness up to one chore (EF1) and efficiency pe…
▽ More
We study the problem of fairly assigning a set of discrete tasks (or chores) among a set of agents with additive valuations. Each chore is associated with a start and finish time, and each agent can perform at most one chore at any given time. The goal is to find a fair and efficient schedule of the chores, where fairness pertains to satisfying envy-freeness up to one chore (EF1) and efficiency pertains to maximality (i.e., no unallocated chore can be feasibly assigned to any agent). Our main result is a polynomial-time algorithm for computing an EF1 and maximal schedule for two agents under monotone valuations when the conflict constraints constitute an arbitrary interval graph. The algorithm uses a coloring technique in interval graphs that may be of independent interest. For an arbitrary number of agents, we provide an algorithm for finding a fair schedule under identical dichotomous valuations when the constraints constitute a path graph. We also show that stronger fairness and efficiency properties, including envy-freeness up to any chore (EFX) along with maximality and EF1 along with Pareto optimality, cannot be achieved.
△ Less
Submitted 6 February, 2024;
originally announced February 2024.
-
Improving Contextual Congruence Across Modalities for Effective Multimodal Marketing using Knowledge-infused Learning
Authors:
Trilok Padhi,
Ugur Kursuncu,
Yaman Kumar,
Valerie L. Shalin,
Lane Peterson Fronczek
Abstract:
The prevalence of smart devices with the ability to capture moments in multiple modalities has enabled users to experience multimodal information online. However, large Language (LLMs) and Vision models (LVMs) are still limited in capturing holistic meaning with cross-modal semantic relationships. Without explicit, common sense knowledge (e.g., as a knowledge graph), Visual Language Models (VLMs)…
▽ More
The prevalence of smart devices with the ability to capture moments in multiple modalities has enabled users to experience multimodal information online. However, large Language (LLMs) and Vision models (LVMs) are still limited in capturing holistic meaning with cross-modal semantic relationships. Without explicit, common sense knowledge (e.g., as a knowledge graph), Visual Language Models (VLMs) only learn implicit representations by capturing high-level patterns in vast corpora, missing essential contextual cross-modal cues. In this work, we design a framework to couple explicit commonsense knowledge in the form of knowledge graphs with large VLMs to improve the performance of a downstream task, predicting the effectiveness of multi-modal marketing campaigns. While the marketing application provides a compelling metric for assessing our methods, our approach enables the early detection of likely persuasive multi-modal campaigns and the assessment and augmentation of marketing theory.
△ Less
Submitted 5 February, 2024;
originally announced February 2024.
-
CABINET: Content Relevance based Noise Reduction for Table Question Answering
Authors:
Sohan Patnaik,
Heril Changwal,
Milan Aggarwal,
Sumit Bhatia,
Yaman Kumar,
Balaji Krishnamurthy
Abstract:
Table understanding capability of Large Language Models (LLMs) has been extensively studied through the task of question-answering (QA) over tables. Typically, only a small part of the whole table is relevant to derive the answer for a given question. The irrelevant parts act as noise and are distracting information, resulting in sub-optimal performance due to the vulnerability of LLMs to noise. T…
▽ More
Table understanding capability of Large Language Models (LLMs) has been extensively studied through the task of question-answering (QA) over tables. Typically, only a small part of the whole table is relevant to derive the answer for a given question. The irrelevant parts act as noise and are distracting information, resulting in sub-optimal performance due to the vulnerability of LLMs to noise. To mitigate this, we propose CABINET (Content RelevAnce-Based NoIse ReductioN for TablE QuesTion-Answering) - a framework to enable LLMs to focus on relevant tabular data by suppressing extraneous information. CABINET comprises an Unsupervised Relevance Scorer (URS), trained differentially with the QA LLM, that weighs the table content based on its relevance to the input question before feeding it to the question-answering LLM (QA LLM). To further aid the relevance scorer, CABINET employs a weakly supervised module that generates a parsing statement describing the criteria of rows and columns relevant to the question and highlights the content of corresponding table cells. CABINET significantly outperforms various tabular LLM baselines, as well as GPT3-based in-context learning methods, is more robust to noise, maintains outperformance on tables of varying sizes, and establishes new SoTA performance on WikiTQ, FeTaQA, and WikiSQL datasets. We release our code and datasets at https://github.com/Sohanpatnaik106/CABINET_QA.
△ Less
Submitted 13 February, 2024; v1 submitted 2 February, 2024;
originally announced February 2024.
-
Exploring Graph Neural Networks for Indian Legal Judgment Prediction
Authors:
Mann Khatri,
Mirza Yusuf,
Yaman Kumar,
Rajiv Ratn Shah,
Ponnurangam Kumaraguru
Abstract:
The burdensome impact of a skewed judges-to-cases ratio on the judicial system manifests in an overwhelming backlog of pending cases alongside an ongoing influx of new ones. To tackle this issue and expedite the judicial process, the proposition of an automated system capable of suggesting case outcomes based on factual evidence and precedent from past cases gains significance. This research paper…
▽ More
The burdensome impact of a skewed judges-to-cases ratio on the judicial system manifests in an overwhelming backlog of pending cases alongside an ongoing influx of new ones. To tackle this issue and expedite the judicial process, the proposition of an automated system capable of suggesting case outcomes based on factual evidence and precedent from past cases gains significance. This research paper centres on developing a graph neural network-based model to address the Legal Judgment Prediction (LJP) problem, recognizing the intrinsic graph structure of judicial cases and making it a binary node classification problem. We explored various embeddings as model features, while nodes such as time nodes and judicial acts were added and pruned to evaluate the model's performance. The study is done while considering the ethical dimension of fairness in these predictions, considering gender and name biases. A link prediction task is also conducted to assess the model's proficiency in anticipating connections between two specified nodes. By harnessing the capabilities of graph neural networks and incorporating fairness analyses, this research aims to contribute insights towards streamlining the adjudication process, enhancing judicial efficiency, and fostering a more equitable legal landscape, ultimately alleviating the strain imposed by mounting case backlogs. Our best-performing model with XLNet pre-trained embeddings as its features gives the macro F1 score of 75% for the LJP task. For link prediction, the same set of features is the best performing giving ROC of more than 80%
△ Less
Submitted 19 October, 2023;
originally announced October 2023.
-
MOSAIC: Multi-Object Segmented Arbitrary Stylization Using CLIP
Authors:
Prajwal Ganugula,
Y S S S Santosh Kumar,
N K Sagar Reddy,
Prabhath Chellingi,
Avinash Thakur,
Neeraj Kasera,
C Shyam Anand
Abstract:
Style transfer driven by text prompts paved a new path for creatively stylizing the images without collecting an actual style image. Despite having promising results, with text-driven stylization, the user has no control over the stylization. If a user wants to create an artistic image, the user requires fine control over the stylization of various entities individually in the content image, which…
▽ More
Style transfer driven by text prompts paved a new path for creatively stylizing the images without collecting an actual style image. Despite having promising results, with text-driven stylization, the user has no control over the stylization. If a user wants to create an artistic image, the user requires fine control over the stylization of various entities individually in the content image, which is not addressed by the current state-of-the-art approaches. On the other hand, diffusion style transfer methods also suffer from the same issue because the regional stylization control over the stylized output is ineffective. To address this problem, We propose a new method Multi-Object Segmented Arbitrary Stylization Using CLIP (MOSAIC), that can apply styles to different objects in the image based on the context extracted from the input prompt. Text-based segmentation and stylization modules which are based on vision transformer architecture, were used to segment and stylize the objects. Our method can extend to any arbitrary objects, styles and produce high-quality images compared to the current state of art methods. To our knowledge, this is the first attempt to perform text-guided arbitrary object-wise stylization. We demonstrate the effectiveness of our approach through qualitative and quantitative analysis, showing that it can generate visually appealing stylized images with enhanced control over stylization and the ability to generalize to unseen object classes.
△ Less
Submitted 24 September, 2023;
originally announced September 2023.
-
CiteCaseLAW: Citation Worthiness Detection in Caselaw for Legal Assistive Writing
Authors:
Mann Khatri,
Pritish Wadhwa,
Gitansh Satija,
Reshma Sheik,
Yaman Kumar,
Rajiv Ratn Shah,
Ponnurangam Kumaraguru
Abstract:
In legal document writing, one of the key elements is properly citing the case laws and other sources to substantiate claims and arguments. Understanding the legal domain and identifying appropriate citation context or cite-worthy sentences are challenging tasks that demand expensive manual annotation. The presence of jargon, language semantics, and high domain specificity makes legal language com…
▽ More
In legal document writing, one of the key elements is properly citing the case laws and other sources to substantiate claims and arguments. Understanding the legal domain and identifying appropriate citation context or cite-worthy sentences are challenging tasks that demand expensive manual annotation. The presence of jargon, language semantics, and high domain specificity makes legal language complex, making any associated legal task hard for automation. The current work focuses on the problem of citation-worthiness identification. It is designed as the initial step in today's citation recommendation systems to lighten the burden of extracting an adequate set of citation contexts. To accomplish this, we introduce a labeled dataset of 178M sentences for citation-worthiness detection in the legal domain from the Caselaw Access Project (CAP). The performance of various deep learning models was examined on this novel dataset. The domain-specific pre-trained model tends to outperform other models, with an 88% F1-score for the citation-worthiness detection task.
△ Less
Submitted 3 May, 2023;
originally announced May 2023.
-
EJ-FAT Joint ESnet JLab FPGA Accelerated Transport Load Balancer
Authors:
Stacey Sheldon,
Yatish Kumar,
Michael Goodrich,
Graham Heyes
Abstract:
To increase the science rate for high data rates/volumes, Thomas Jefferson National Accelerator Facility (JLab) has partnered with Energy Sciences Network (ESnet) to define an edge to data center traffic shaping / steering transport capability featuring data event-aware network shaping and forwarding. The keystone of this ESnet JLab FPGA Accelerated Transport (EJFAT) is the joint development of a…
▽ More
To increase the science rate for high data rates/volumes, Thomas Jefferson National Accelerator Facility (JLab) has partnered with Energy Sciences Network (ESnet) to define an edge to data center traffic shaping / steering transport capability featuring data event-aware network shaping and forwarding. The keystone of this ESnet JLab FPGA Accelerated Transport (EJFAT) is the joint development of a dynamic compute work Load Balancer (LB) of UDP streamed data. The LB is a suite consisting of a Field Programmable Gate Array (FPGA) executing the dynamically configurable, low fixed latency LB data plane featuring real-time packet redirection at high throughput, and a control plane running on the FPGA host computer that monitors network and compute farm telemetry in order to make dynamic decisions for destination compute host redirection / load balancing.
△ Less
Submitted 28 March, 2023;
originally announced March 2023.
-
H-AES: Towards Automated Essay Scoring for Hindi
Authors:
Shubhankar Singh,
Anirudh Pupneja,
Shivaansh Mital,
Cheril Shah,
Manish Bawkar,
Lakshman Prasad Gupta,
Ajit Kumar,
Yaman Kumar,
Rushali Gupta,
Rajiv Ratn Shah
Abstract:
The use of Natural Language Processing (NLP) for Automated Essay Scoring (AES) has been well explored in the English language, with benchmark models exhibiting performance comparable to human scorers. However, AES in Hindi and other low-resource languages remains unexplored. In this study, we reproduce and compare state-of-the-art methods for AES in the Hindi domain. We employ classical feature-ba…
▽ More
The use of Natural Language Processing (NLP) for Automated Essay Scoring (AES) has been well explored in the English language, with benchmark models exhibiting performance comparable to human scorers. However, AES in Hindi and other low-resource languages remains unexplored. In this study, we reproduce and compare state-of-the-art methods for AES in the Hindi domain. We employ classical feature-based Machine Learning (ML) and advanced end-to-end models, including LSTM Networks and Fine-Tuned Transformer Architecture, in our approach and derive results comparable to those in the English language domain. Hindi being a low-resource language, lacks a dedicated essay-scoring corpus. We train and evaluate our models using translated English essays and empirically measure their performance on our own small-scale, real-world Hindi corpus. We follow this up with an in-depth analysis discussing prompt-specific behavior of different language models implemented.
△ Less
Submitted 28 February, 2023;
originally announced February 2023.
-
Comparison and Evaluation of Methods for a Predict+Optimize Problem in Renewable Energy
Authors:
Christoph Bergmeir,
Frits de Nijs,
Abishek Sriramulu,
Mahdi Abolghasemi,
Richard Bean,
John Betts,
Quang Bui,
Nam Trong Dinh,
Nils Einecke,
Rasul Esmaeilbeigi,
Scott Ferraro,
Priya Galketiya,
Evgenii Genov,
Robert Glasgow,
Rakshitha Godahewa,
Yanfei Kang,
Steffen Limmer,
Luis Magdalena,
Pablo Montero-Manso,
Daniel Peralta,
Yogesh Pipada Sunil Kumar,
Alejandro Rosales-Pérez,
Julian Ruddick,
Akylas Stratigakos,
Peter Stuckey
, et al. (3 additional authors not shown)
Abstract:
Algorithms that involve both forecasting and optimization are at the core of solutions to many difficult real-world problems, such as in supply chains (inventory optimization), traffic, and in the transition towards carbon-free energy generation in battery/load/production scheduling in sustainable energy systems. Typically, in these scenarios we want to solve an optimization problem that depends o…
▽ More
Algorithms that involve both forecasting and optimization are at the core of solutions to many difficult real-world problems, such as in supply chains (inventory optimization), traffic, and in the transition towards carbon-free energy generation in battery/load/production scheduling in sustainable energy systems. Typically, in these scenarios we want to solve an optimization problem that depends on unknown future values, which therefore need to be forecast. As both forecasting and optimization are difficult problems in their own right, relatively few research has been done in this area. This paper presents the findings of the ``IEEE-CIS Technical Challenge on Predict+Optimize for Renewable Energy Scheduling," held in 2021. We present a comparison and evaluation of the seven highest-ranked solutions in the competition, to provide researchers with a benchmark problem and to establish the state of the art for this benchmark, with the aim to foster and facilitate research in this area. The competition used data from the Monash Microgrid, as well as weather data and energy market data. It then focused on two main challenges: forecasting renewable energy production and demand, and obtaining an optimal schedule for the activities (lectures) and on-site batteries that lead to the lowest cost of energy. The most accurate forecasts were obtained by gradient-boosted tree and random forest models, and optimization was mostly performed using mixed integer linear and quadratic programming. The winning method predicted different scenarios and optimized over all scenarios jointly using a sample average approximation method.
△ Less
Submitted 20 December, 2022;
originally announced December 2022.
-
Optimal activity and battery scheduling algorithm using load and solar generation forecasts
Authors:
Yogesh Pipada Sunil Kumar,
Rui Yuan,
Nam Trong Dinh,
S. Ali Pourmousavi
Abstract:
Energy usage optimal scheduling has attracted great attention in the power system community, where various methodologies have been proposed. However, in real-world applications, the optimal scheduling problems require reliable energy forecasting, which is scarcely discussed as a joint solution to the scheduling problem. The 5\textsuperscript{th} IEEE Computational Intelligence Society (IEEE-CIS) c…
▽ More
Energy usage optimal scheduling has attracted great attention in the power system community, where various methodologies have been proposed. However, in real-world applications, the optimal scheduling problems require reliable energy forecasting, which is scarcely discussed as a joint solution to the scheduling problem. The 5\textsuperscript{th} IEEE Computational Intelligence Society (IEEE-CIS) competition raised a practical problem of decreasing the electricity bill by scheduling building activities, where forecasting the solar energy generation and building consumption is a necessity. To solve this problem, we propose a technical sequence for tackling the solar PV and demand forecast and optimal scheduling problems, where solar generation prediction methods and an optimal university lectures scheduling algorithm are proposed.
△ Less
Submitted 24 October, 2022;
originally announced October 2022.
-
Deep Convolutional Architectures for Extrapolative Forecast in Time-dependent Flow Problems
Authors:
Pratyush Bhatt,
Yash Kumar,
Azzeddine Soulaimani
Abstract:
Physical systems whose dynamics are governed by partial differential equations (PDEs) find applications in numerous fields, from engineering design to weather forecasting. The process of obtaining the solution from such PDEs may be computationally expensive for large-scale and parameterized problems. In this work, deep learning techniques developed especially for time-series forecasts, such as LST…
▽ More
Physical systems whose dynamics are governed by partial differential equations (PDEs) find applications in numerous fields, from engineering design to weather forecasting. The process of obtaining the solution from such PDEs may be computationally expensive for large-scale and parameterized problems. In this work, deep learning techniques developed especially for time-series forecasts, such as LSTM and TCN, or for spatial-feature extraction such as CNN, are employed to model the system dynamics for advection dominated problems. These models take as input a sequence of high-fidelity vector solutions for consecutive time-steps obtained from the PDEs and forecast the solutions for the subsequent time-steps using auto-regression; thereby reducing the computation time and power needed to obtain such high-fidelity solutions. The models are tested on numerical benchmarks (1D Burgers' equation and Stoker's dam break problem) to assess the long-term prediction accuracy, even outside the training domain (extrapolation). Non-intrusive reduced-order modelling techniques such as deep auto-encoder networks are utilized to compress the high-fidelity snapshots before feeding them as input to the forecasting models in order to reduce the complexity and the required computations in the online and offline stages. Deep ensembles are employed to perform uncertainty quantification of the forecasting models, which provides information about the variance of the predictions as a result of the epistemic uncertainties.
△ Less
Submitted 17 September, 2022;
originally announced September 2022.
-
Transport Layer Networking
Authors:
Yatish Kumar,
Stacey Sheldon,
Dale Carder
Abstract:
In this paper we focus on the invention of new network forwarding behaviors between network Layers 4 and Layer 7 in the OSI network model. Our design goal is to propose no changes to L3 - The IP network layer, thus maintaining 100% compatibility with the existing internet. Small changes are made to L4 the transport layer, and a new design for a session ( L5 ) is proposed. This new capability is in…
▽ More
In this paper we focus on the invention of new network forwarding behaviors between network Layers 4 and Layer 7 in the OSI network model. Our design goal is to propose no changes to L3 - The IP network layer, thus maintaining 100% compatibility with the existing internet. Small changes are made to L4 the transport layer, and a new design for a session ( L5 ) is proposed. This new capability is intended to have minimal or no impact on the application layer, except for exposing the ability for L7 to select this new mode of data transfer or not. The invention of new networking technologies is frequently done in an academic setting, however the design needs to be constrained by practical considerations for cost, operational feasibility, robustness and scale. Our goal is to improve the production data infrastructure for HEP 24/7 on a global scale.
△ Less
Submitted 6 April, 2022;
originally announced April 2022.
-
Energy networks for state estimation with random sensors using sparse labels
Authors:
Yash Kumar,
Souvik Chakraborty
Abstract:
State estimation is required whenever we deal with high-dimensional dynamical systems, as the complete measurement is often unavailable. It is key to gaining insight, performing control or optimizing design tasks. Most deep learning-based approaches require high-resolution labels and work with fixed sensor locations, thus being restrictive in their scope. Also, doing Proper orthogonal decompositio…
▽ More
State estimation is required whenever we deal with high-dimensional dynamical systems, as the complete measurement is often unavailable. It is key to gaining insight, performing control or optimizing design tasks. Most deep learning-based approaches require high-resolution labels and work with fixed sensor locations, thus being restrictive in their scope. Also, doing Proper orthogonal decomposition (POD) on sparse data is nontrivial. To tackle these problems, we propose a technique with an implicit optimization layer and a physics-based loss function that can learn from sparse labels. It works by minimizing the energy of the neural network prediction, enabling it to work with a varying number of sensors at different locations. Based on this technique we present two models for discrete and continuous prediction in space. We demonstrate the performance using two high-dimensional fluid problems of Burgers' equation and Flow Past Cylinder for discrete model and using Allen Cahn equation and Convection-diffusion equations for continuous model. We show the models are also robust to noise in measurements.
△ Less
Submitted 12 March, 2022;
originally announced March 2022.
-
Deconfounded Representation Similarity for Comparison of Neural Networks
Authors:
Tianyu Cui,
Yogesh Kumar,
Pekka Marttinen,
Samuel Kaski
Abstract:
Similarity metrics such as representational similarity analysis (RSA) and centered kernel alignment (CKA) have been used to compare layer-wise representations between neural networks. However, these metrics are confounded by the population structure of data items in the input space, leading to spuriously high similarity for even completely random neural networks and inconsistent domain relations i…
▽ More
Similarity metrics such as representational similarity analysis (RSA) and centered kernel alignment (CKA) have been used to compare layer-wise representations between neural networks. However, these metrics are confounded by the population structure of data items in the input space, leading to spuriously high similarity for even completely random neural networks and inconsistent domain relations in transfer learning. We introduce a simple and generally applicable fix to adjust for the confounder with covariate adjustment regression, which retains the intuitive invariance properties of the original similarity measures. We show that deconfounding the similarity metrics increases the resolution of detecting semantically similar neural networks. Moreover, in real-world applications, deconfounding improves the consistency of representation similarities with domain similarities in transfer learning, and increases correlation with out-of-distribution accuracy.
△ Less
Submitted 31 January, 2022;
originally announced February 2022.
-
ML-Based Analysis to Identify Speech Features Relevant in Predicting Alzheimer's Disease
Authors:
Yash Kumar,
Piyush Maheshwari,
Shreyansh Joshi,
Veeky Baths
Abstract:
Alzheimer's disease (AD) is a neurodegenerative disease that affects nearly 50 million individuals across the globe and is one of the leading causes of deaths globally. It is projected that by 2050, the number of people affected by the disease would more than double. Consequently, the growing advancements in technology beg the question, can technology be used to predict Alzheimer's for a better an…
▽ More
Alzheimer's disease (AD) is a neurodegenerative disease that affects nearly 50 million individuals across the globe and is one of the leading causes of deaths globally. It is projected that by 2050, the number of people affected by the disease would more than double. Consequently, the growing advancements in technology beg the question, can technology be used to predict Alzheimer's for a better and early diagnosis? In this paper, we focus on this very problem. Specifically, we have trained both ML models and neural networks to predict and classify participants based on their speech patterns. We computed a number of linguistic variables using DementiaBank's Pitt Corpus, a database consisting of transcripts of interviews with subjects suffering from multiple neurodegenerative diseases. We then trained both binary classifiers, as well as multiclass classifiers to distinguish AD from normal aging and other neurodegenerative diseases. We also worked on establishing the link between specific speech factors that can help determine the onset of AD. Confusion matrices and feature importance graphs have been plotted model-wise to compare the performances of our models. In both multiclass and binary classification, neural networks were found to outperform the other models with a testing accuracy of 76.44% and 92.05% respectively. For the feature importance, it was concluded that '%_PRESP' (present participle), '%_3S' (3rd person present tense markers) were two of the most important speech features for our classifiers in predicting AD.
△ Less
Submitted 25 October, 2021;
originally announced October 2021.
-
MINIMAL: Mining Models for Data Free Universal Adversarial Triggers
Authors:
Swapnil Parekh,
Yaman Singla Kumar,
Somesh Singh,
Changyou Chen,
Balaji Krishnamurthy,
Rajiv Ratn Shah
Abstract:
It is well known that natural language models are vulnerable to adversarial attacks, which are mostly input-specific in nature. Recently, it has been shown that there also exist input-agnostic attacks in NLP models, called universal adversarial triggers. However, existing methods to craft universal triggers are data intensive. They require large amounts of data samples to generate adversarial trig…
▽ More
It is well known that natural language models are vulnerable to adversarial attacks, which are mostly input-specific in nature. Recently, it has been shown that there also exist input-agnostic attacks in NLP models, called universal adversarial triggers. However, existing methods to craft universal triggers are data intensive. They require large amounts of data samples to generate adversarial triggers, which are typically inaccessible by attackers. For instance, previous works take 3000 data samples per class for the SNLI dataset to generate adversarial triggers. In this paper, we present a novel data-free approach, MINIMAL, to mine input-agnostic adversarial triggers from models. Using the triggers produced with our data-free algorithm, we reduce the accuracy of Stanford Sentiment Treebank's positive class from 93.6% to 9.6%. Similarly, for the Stanford Natural Language Inference (SNLI), our single-word trigger reduces the accuracy of the entailment class from 90.95% to less than 0.6\%. Despite being completely data-free, we get equivalent accuracy drops as data-dependent methods.
△ Less
Submitted 25 September, 2021;
originally announced September 2021.
-
SANSformers: Self-Supervised Forecasting in Electronic Health Records with Attention-Free Models
Authors:
Yogesh Kumar,
Alexander Ilin,
Henri Salo,
Sangita Kulathinal,
Maarit K. Leinonen,
Pekka Marttinen
Abstract:
Despite the proven effectiveness of Transformer neural networks across multiple domains, their performance with Electronic Health Records (EHR) can be nuanced. The unique, multidimensional sequential nature of EHR data can sometimes make even simple linear models with carefully engineered features more competitive. Thus, the advantages of Transformers, such as efficient transfer learning and impro…
▽ More
Despite the proven effectiveness of Transformer neural networks across multiple domains, their performance with Electronic Health Records (EHR) can be nuanced. The unique, multidimensional sequential nature of EHR data can sometimes make even simple linear models with carefully engineered features more competitive. Thus, the advantages of Transformers, such as efficient transfer learning and improved scalability are not always fully exploited in EHR applications. Addressing these challenges, we introduce SANSformer, an attention-free sequential model designed with specific inductive biases to cater for the unique characteristics of EHR data.
In this work, we aim to forecast the demand for healthcare services, by predicting the number of patient visits to healthcare facilities. The challenge amplifies when dealing with divergent patient subgroups, like those with rare diseases, which are characterized by unique health trajectories and are typically smaller in size. To address this, we employ a self-supervised pretraining strategy, Generative Summary Pretraining (GSP), which predicts future summary statistics based on past health records of a patient. Our models are pretrained on a health registry of nearly one million patients, then fine-tuned for specific subgroup prediction tasks, showcasing the potential to handle the multifaceted nature of EHR data.
In evaluation, SANSformer consistently surpasses robust EHR baselines, with our GSP pretraining method notably amplifying model performance, particularly within smaller patient subgroups. Our results illuminate the promising potential of tailored attention-free models and self-supervised pretraining in refining healthcare utilization predictions across various patient demographics.
△ Less
Submitted 10 November, 2023; v1 submitted 31 August, 2021;
originally announced August 2021.
-
GrADE: A graph based data-driven solver for time-dependent nonlinear partial differential equations
Authors:
Yash Kumar,
Souvik Chakraborty
Abstract:
The physical world is governed by the laws of physics, often represented in form of nonlinear partial differential equations (PDEs). Unfortunately, solution of PDEs is non-trivial and often involves significant computational time. With recent developments in the field of artificial intelligence and machine learning, the solution of PDEs using neural network has emerged as a domain with huge potent…
▽ More
The physical world is governed by the laws of physics, often represented in form of nonlinear partial differential equations (PDEs). Unfortunately, solution of PDEs is non-trivial and often involves significant computational time. With recent developments in the field of artificial intelligence and machine learning, the solution of PDEs using neural network has emerged as a domain with huge potential. However, most of the developments in this field are based on either fully connected neural networks (FNN) or convolutional neural networks (CNN). While FNN is computationally inefficient as the number of network parameters can be potentially huge, CNN necessitates regular grid and simpler domain. In this work, we propose a novel framework referred to as the Graph Attention Differential Equation (GrADE) for solving time dependent nonlinear PDEs. The proposed approach couples FNN, graph neural network, and recently developed Neural ODE framework. The primary idea is to use graph neural network for modeling the spatial domain, and Neural ODE for modeling the temporal domain. The attention mechanism identifies important inputs/features and assign more weightage to the same; this enhances the performance of the proposed framework. Neural ODE, on the other hand, results in constant memory cost and allows trading of numerical precision for speed. We also propose depth refinement as an effective technique for training the proposed architecture in lesser time with better accuracy. The effectiveness of the proposed framework is illustrated using 1D and 2D Burgers' equations. Results obtained illustrate the capability of the proposed framework in modeling PDE and its scalability to larger domains without the need for retraining.
△ Less
Submitted 24 August, 2021;
originally announced August 2021.
-
State estimation with limited sensors -- A deep learning based approach
Authors:
Yash Kumar,
Pranav Bahl,
Souvik Chakraborty
Abstract:
The importance of state estimation in fluid mechanics is well-established; it is required for accomplishing several tasks including design/optimization, active control, and future state prediction. A common tactic in this regards is to rely on reduced order models. Such approaches, in general, use measurement data of one-time instance. However, oftentimes data available from sensors is sequential…
▽ More
The importance of state estimation in fluid mechanics is well-established; it is required for accomplishing several tasks including design/optimization, active control, and future state prediction. A common tactic in this regards is to rely on reduced order models. Such approaches, in general, use measurement data of one-time instance. However, oftentimes data available from sensors is sequential and ignoring it results in information loss. In this paper, we propose a novel deep learning based state estimation framework that learns from sequential data. The proposed model structure consists of the recurrent cell to pass information from different time steps enabling utilization of this information to recover the full state. We illustrate that utilizing sequential data allows for state recovery from only one or two sensors. For efficient recovery of the state, the proposed approached is coupled with an auto-encoder based reduced order model. We illustrate the performance of the proposed approach using two examples and it is found to outperform other alternatives existing in the literature.
△ Less
Submitted 26 August, 2021; v1 submitted 27 January, 2021;
originally announced January 2021.
-
Get It Scored Using AutoSAS -- An Automated System for Scoring Short Answers
Authors:
Yaman Kumar,
Swati Aggarwal,
Debanjan Mahata,
Rajiv Ratn Shah,
Ponnurangam Kumaraguru,
Roger Zimmermann
Abstract:
In the era of MOOCs, online exams are taken by millions of candidates, where scoring short answers is an integral part. It becomes intractable to evaluate them by human graders. Thus, a generic automated system capable of grading these responses should be designed and deployed. In this paper, we present a fast, scalable, and accurate approach towards automated Short Answer Scoring (SAS). We propos…
▽ More
In the era of MOOCs, online exams are taken by millions of candidates, where scoring short answers is an integral part. It becomes intractable to evaluate them by human graders. Thus, a generic automated system capable of grading these responses should be designed and deployed. In this paper, we present a fast, scalable, and accurate approach towards automated Short Answer Scoring (SAS). We propose and explain the design and development of a system for SAS, namely AutoSAS. Given a question along with its graded samples, AutoSAS can learn to grade that prompt successfully. This paper further lays down the features such as lexical diversity, Word2Vec, prompt, and content overlap that plays a pivotal role in building our proposed model. We also present a methodology for indicating the factors responsible for scoring an answer. The trained model is evaluated on an extensively used public dataset, namely Automated Student Assessment Prize Short Answer Scoring (ASAP-SAS). AutoSAS shows state-of-the-art performance and achieves better results by over 8% in some of the question prompts as measured by Quadratic Weighted Kappa (QWK), showing performance comparable to humans.
△ Less
Submitted 21 December, 2020;
originally announced December 2020.
-
LIFI: Towards Linguistically Informed Frame Interpolation
Authors:
Aradhya Neeraj Mathur,
Devansh Batra,
Yaman Kumar,
Rajiv Ratn Shah,
Roger Zimmermann
Abstract:
In this work, we explore a new problem of frame interpolation for speech videos. Such content today forms the major form of online communication. We try to solve this problem by using several deep learning video generation algorithms to generate the missing frames. We also provide examples where computer vision models despite showing high performance on conventional non-linguistic metrics fail to…
▽ More
In this work, we explore a new problem of frame interpolation for speech videos. Such content today forms the major form of online communication. We try to solve this problem by using several deep learning video generation algorithms to generate the missing frames. We also provide examples where computer vision models despite showing high performance on conventional non-linguistic metrics fail to accurately produce faithful interpolation of speech. With this motivation, we provide a new set of linguistically-informed metrics specifically targeted to the problem of speech videos interpolation. We also release several datasets to test computer vision video generation models of their speech understanding.
△ Less
Submitted 2 December, 2020; v1 submitted 30 October, 2020;
originally announced October 2020.
-
Attaining Real-Time Super-Resolution for Microscopic Images Using GAN
Authors:
Vibhu Bhatia,
Yatender Kumar
Abstract:
In the last few years, several deep learning models, especially Generative Adversarial Networks have received a lot of attention for the task of Single Image Super-Resolution (SISR). These methods focus on building an end-to-end framework, which produce a high resolution(SR) image from a given low resolution(LR) image in a single step to achieve state-of-the-art performance. This paper focuses on…
▽ More
In the last few years, several deep learning models, especially Generative Adversarial Networks have received a lot of attention for the task of Single Image Super-Resolution (SISR). These methods focus on building an end-to-end framework, which produce a high resolution(SR) image from a given low resolution(LR) image in a single step to achieve state-of-the-art performance. This paper focuses on improving an existing deep-learning based method to perform Super-Resolution Microscopy in real-time using a standard GPU. For this, we first propose a tiling strategy, which takes advantage of parallelism provided by a GPU to speed up the network training process. Further, we suggest simple changes to the architecture of the generator and the discriminator of SRGAN. Subsequently, We compare the quality and the running time for the outputs produced by our model, opening its applications in different areas like low-end benchtop and even mobile microscopy. Finally, we explore the possibility of the trained network to produce High-Resolution HR outputs for different domains.
△ Less
Submitted 9 October, 2020;
originally announced October 2020.
-
Evaluation Toolkit For Robustness Testing Of Automatic Essay Scoring Systems
Authors:
Anubha Kabra,
Mehar Bhatia,
Yaman Kumar,
Junyi Jessy Li,
Rajiv Ratn Shah
Abstract:
Automatic scoring engines have been used for scoring approximately fifteen million test-takers in just the last three years. This number is increasing further due to COVID-19 and the associated automation of education and testing. Despite such wide usage, the AI-based testing literature of these "intelligent" models is highly lacking. Most of the papers proposing new models rely only on quadratic…
▽ More
Automatic scoring engines have been used for scoring approximately fifteen million test-takers in just the last three years. This number is increasing further due to COVID-19 and the associated automation of education and testing. Despite such wide usage, the AI-based testing literature of these "intelligent" models is highly lacking. Most of the papers proposing new models rely only on quadratic weighted kappa (QWK) based agreement with human raters for showing model efficacy. However, this effectively ignores the highly multi-feature nature of essay scoring. Essay scoring depends on features like coherence, grammar, relevance, sufficiency and, vocabulary. To date, there has been no study testing Automated Essay Scoring: AES systems holistically on all these features. With this motivation, we propose a model agnostic adversarial evaluation scheme and associated metrics for AES systems to test their natural language understanding capabilities and overall robustness. We evaluate the current state-of-the-art AES models using the proposed scheme and report the results on five recent models. These models range from feature-engineering-based approaches to the latest deep learning algorithms. We find that AES models are highly overstable. Even heavy modifications(as much as 25%) with content unrelated to the topic of the questions do not decrease the score produced by the models. On the other hand, irrelevant content, on average, increases the scores, thus showing that the model evaluation strategy and rubrics should be reconsidered. We also ask 200 human raters to score both an original and adversarial response to seeing if humans can detect differences between the two and whether they agree with the scores assigned by auto scores.
△ Less
Submitted 14 November, 2021; v1 submitted 13 July, 2020;
originally announced July 2020.
-
"Notic My Speech" -- Blending Speech Patterns With Multimedia
Authors:
Dhruva Sahrawat,
Yaman Kumar,
Shashwat Aggarwal,
Yifang Yin,
Rajiv Ratn Shah,
Roger Zimmermann
Abstract:
Speech as a natural signal is composed of three parts - visemes (visual part of speech), phonemes (spoken part of speech), and language (the imposed structure). However, video as a medium for the delivery of speech and a multimedia construct has mostly ignored the cognitive aspects of speech delivery. For example, video applications like transcoding and compression have till now ignored the fact h…
▽ More
Speech as a natural signal is composed of three parts - visemes (visual part of speech), phonemes (spoken part of speech), and language (the imposed structure). However, video as a medium for the delivery of speech and a multimedia construct has mostly ignored the cognitive aspects of speech delivery. For example, video applications like transcoding and compression have till now ignored the fact how speech is delivered and heard. To close the gap between speech understanding and multimedia video applications, in this paper, we show the initial experiments by modelling the perception on visual speech and showing its use case on video compression. On the other hand, in the visual speech recognition domain, existing studies have mostly modeled it as a classification problem, while ignoring the correlations between views, phonemes, visemes, and speech perception. This results in solutions which are further away from how human perception works. To bridge this gap, we propose a view-temporal attention mechanism to model both the view dependence and the visemic importance in speech recognition and understanding. We conduct experiments on three public visual speech recognition datasets. The experimental results show that our proposed method outperformed the existing work by 4.99% in terms of the viseme error rate. Moreover, we show that there is a strong correlation between our model's understanding of multi-view speech and the human perception. This characteristic benefits downstream applications such as video compression and streaming where a significant number of less important frames can be compressed or eliminated while being able to maximally preserve human speech understanding with good user experience.
△ Less
Submitted 12 June, 2020;
originally announced June 2020.
-
audino: A Modern Annotation Tool for Audio and Speech
Authors:
Manraj Singh Grover,
Pakhi Bamdev,
Ratin Kumar Brala,
Yaman Kumar,
Mika Hama,
Rajiv Ratn Shah
Abstract:
In this paper, we introduce a collaborative and modern annotation tool for audio and speech: audino. The tool allows annotators to define and describe temporal segmentation in audios. These segments can be labelled and transcribed easily using a dynamically generated form. An admin can centrally control user roles and project assignment through the admin dashboard. The dashboard also enables descr…
▽ More
In this paper, we introduce a collaborative and modern annotation tool for audio and speech: audino. The tool allows annotators to define and describe temporal segmentation in audios. These segments can be labelled and transcribed easily using a dynamically generated form. An admin can centrally control user roles and project assignment through the admin dashboard. The dashboard also enables describing labels and their values. The annotations can easily be exported in JSON format for further analysis. The tool allows audio data and their corresponding annotations to be uploaded and assigned to a user through a key-based API. The flexibility available in the annotation tool enables annotation for Speech Scoring, Voice Activity Detection (VAD), Speaker Diarisation, Speaker Identification, Speech Recognition, Emotion Recognition tasks and more. The MIT open source license allows it to be used for academic and commercial projects.
△ Less
Submitted 28 November, 2021; v1 submitted 9 June, 2020;
originally announced June 2020.
-
Multi-modal Automated Speech Scoring using Attention Fusion
Authors:
Manraj Singh Grover,
Yaman Kumar,
Sumit Sarin,
Payman Vafaee,
Mika Hama,
Rajiv Ratn Shah
Abstract:
In this study, we propose a novel multi-modal end-to-end neural approach for automated assessment of non-native English speakers' spontaneous speech using attention fusion. The pipeline employs Bi-directional Recurrent Convolutional Neural Networks and Bi-directional Long Short-Term Memory Neural Networks to encode acoustic and lexical cues from spectrograms and transcriptions, respectively. Atten…
▽ More
In this study, we propose a novel multi-modal end-to-end neural approach for automated assessment of non-native English speakers' spontaneous speech using attention fusion. The pipeline employs Bi-directional Recurrent Convolutional Neural Networks and Bi-directional Long Short-Term Memory Neural Networks to encode acoustic and lexical cues from spectrograms and transcriptions, respectively. Attention fusion is performed on these learned predictive features to learn complex interactions between different modalities before final scoring. We compare our model with strong baselines and find combined attention to both lexical and acoustic cues significantly improves the overall performance of the system. Further, we present a qualitative and quantitative analysis of our model.
△ Less
Submitted 28 November, 2021; v1 submitted 17 May, 2020;
originally announced May 2020.
-
Touchless Typing Using Head Movement-based Gestures
Authors:
Shivam Rustagi,
Aakash Garg,
Pranay Raj Anand,
Rajesh Kumar,
Yaman Kumar,
Rajiv Ratn Shah
Abstract:
In this paper, we propose a novel touchless typing interface that makes use of an on-screen QWERTY keyboard and a smartphone camera. The keyboard was divided into nine color-coded clusters. The user moved their head toward clusters, which contained the letters that they wanted to type. A front-facing smartphone camera recorded the head movements. A bidirectional GRU based model which used pre-trai…
▽ More
In this paper, we propose a novel touchless typing interface that makes use of an on-screen QWERTY keyboard and a smartphone camera. The keyboard was divided into nine color-coded clusters. The user moved their head toward clusters, which contained the letters that they wanted to type. A front-facing smartphone camera recorded the head movements. A bidirectional GRU based model which used pre-trained embedding rich in head pose features was employed to translate the recordings into cluster sequences. The model achieved an accuracy of 96.78% and 86.81% under intra- and inter-user scenarios, respectively, over a dataset of 2234 video sequences collected from 22 users.
△ Less
Submitted 10 October, 2020; v1 submitted 24 January, 2020;
originally announced January 2020.
-
Integrating Automated Play in Level Co-Creation
Authors:
Andrew Hoyt,
Matthew Guzdial,
Yalini Kumar,
Gillian Smith,
Mark O. Riedl
Abstract:
In level co-creation an AI and human work together to create a video game level. One open challenge in level co-creation is how to empower human users to ensure particular qualities of the final level, such as challenge. There has been significant prior research into automated pathing and automated playtesting for video game levels, but not in how to incorporate these into tools. In this demonstra…
▽ More
In level co-creation an AI and human work together to create a video game level. One open challenge in level co-creation is how to empower human users to ensure particular qualities of the final level, such as challenge. There has been significant prior research into automated pathing and automated playtesting for video game levels, but not in how to incorporate these into tools. In this demonstration we present an improvement of the Morai Maker mixed-initiative level editor for Super Mario Bros. that includes automated pathing and challenge approximation features.
△ Less
Submitted 20 November, 2019;
originally announced November 2019.
-
Learning based Methods for Code Runtime Complexity Prediction
Authors:
Jagriti Sikka,
Kushal Satya,
Yaman Kumar,
Shagun Uppal,
Rajiv Ratn Shah,
Roger Zimmermann
Abstract:
Predicting the runtime complexity of a programming code is an arduous task. In fact, even for humans, it requires a subtle analysis and comprehensive knowledge of algorithms to predict time complexity with high fidelity, given any code. As per Turing's Halting problem proof, estimating code complexity is mathematically impossible. Nevertheless, an approximate solution to such a task can help devel…
▽ More
Predicting the runtime complexity of a programming code is an arduous task. In fact, even for humans, it requires a subtle analysis and comprehensive knowledge of algorithms to predict time complexity with high fidelity, given any code. As per Turing's Halting problem proof, estimating code complexity is mathematically impossible. Nevertheless, an approximate solution to such a task can help developers to get real-time feedback for the efficiency of their code. In this work, we model this problem as a machine learning task and check its feasibility with thorough analysis. Due to the lack of any open source dataset for this task, we propose our own annotated dataset CoRCoD: Code Runtime Complexity Dataset, extracted from online judges. We establish baselines using two different approaches: feature engineering and code embeddings, to achieve state of the art results and compare their performances. Such solutions can be widely useful in potential applications like automatically grading coding assignments, IDE-integrated tools for static code analysis, and others.
△ Less
Submitted 4 November, 2019;
originally announced November 2019.
-
Keyphrase Extraction from Scholarly Articles as Sequence Labeling using Contextualized Embeddings
Authors:
Dhruva Sahrawat,
Debanjan Mahata,
Mayank Kulkarni,
Haimin Zhang,
Rakesh Gosangi,
Amanda Stent,
Agniv Sharma,
Yaman Kumar,
Rajiv Ratn Shah,
Roger Zimmermann
Abstract:
In this paper, we formulate keyphrase extraction from scholarly articles as a sequence labeling task solved using a BiLSTM-CRF, where the words in the input text are represented using deep contextualized embeddings. We evaluate the proposed architecture using both contextualized and fixed word embedding models on three different benchmark datasets (Inspec, SemEval 2010, SemEval 2017) and compare w…
▽ More
In this paper, we formulate keyphrase extraction from scholarly articles as a sequence labeling task solved using a BiLSTM-CRF, where the words in the input text are represented using deep contextualized embeddings. We evaluate the proposed architecture using both contextualized and fixed word embedding models on three different benchmark datasets (Inspec, SemEval 2010, SemEval 2017) and compare with existing popular unsupervised and supervised techniques. Our results quantify the benefits of (a) using contextualized embeddings (e.g. BERT) over fixed word embeddings (e.g. Glove); (b) using a BiLSTM-CRF architecture with contextualized word embeddings over fine-tuning the contextualized word embedding model directly, and (c) using genre-specific contextualized embeddings (SciBERT). Through error analysis, we also provide some insights into why particular models work better than others. Lastly, we present a case study where we analyze different self-attention layers of the two best models (BERT and SciBERT) to better understand the predictions made by each for the task of keyphrase extraction.
△ Less
Submitted 19 October, 2019;
originally announced October 2019.
-
BHAAV- A Text Corpus for Emotion Analysis from Hindi Stories
Authors:
Yaman Kumar,
Debanjan Mahata,
Sagar Aggarwal,
Anmol Chugh,
Rajat Maheshwari,
Rajiv Ratn Shah
Abstract:
In this paper, we introduce the first and largest Hindi text corpus, named BHAAV, which means emotions in Hindi, for analyzing emotions that a writer expresses through his characters in a story, as perceived by a narrator/reader. The corpus consists of 20,304 sentences collected from 230 different short stories spanning across 18 genres such as Inspirational and Mystery. Each sentence has been ann…
▽ More
In this paper, we introduce the first and largest Hindi text corpus, named BHAAV, which means emotions in Hindi, for analyzing emotions that a writer expresses through his characters in a story, as perceived by a narrator/reader. The corpus consists of 20,304 sentences collected from 230 different short stories spanning across 18 genres such as Inspirational and Mystery. Each sentence has been annotated into one of the five emotion categories - anger, joy, suspense, sad, and neutral, by three native Hindi speakers with at least ten years of formal education in Hindi. We also discuss challenges in the annotation of low resource languages such as Hindi, and discuss the scope of the proposed corpus along with its possible uses. We also provide a detailed analysis of the dataset and train strong baseline classifiers reporting their performances.
△ Less
Submitted 9 October, 2019;
originally announced October 2019.
-
Enhancing Spectral Utilization by Maximizing the Reuse in LTE Network
Authors:
Yuva Kumar,
Vanlin Sathya,
Sreenath Ramanath
Abstract:
Need for increased spectral efficiency is key to improve the quality of experience for next-generation wireless applications like online gaming, HD Video, etc.,. In our work, we consider an LTE Device-to-device (D2D) network where LTE UEs have primary access to the spectrum and D2D pairs have secondary access. To enhance spectral efficiency, BS can offload the traffic by activating multiple D2D pa…
▽ More
Need for increased spectral efficiency is key to improve the quality of experience for next-generation wireless applications like online gaming, HD Video, etc.,. In our work, we consider an LTE Device-to-device (D2D) network where LTE UEs have primary access to the spectrum and D2D pairs have secondary access. To enhance spectral efficiency, BS can offload the traffic by activating multiple D2D pairs within the serving cell. This ensures that the same radio resource will be reused across the primary LTE UEs and different D2D pairs. In this context, we propose to enable more D2D secondary users in the serving cell, by utilizing neighboring BS spectrum to fairly co-exist with neighboring LTE primary users. We model the system and show via extensive simulations, that the above configuration guarantees good throughput for the D2D pairs in the serving cell while ensuring that the primary LTE throughput demand is not compromised.
△ Less
Submitted 19 June, 2019;
originally announced July 2019.
-
Lipper: Synthesizing Thy Speech using Multi-View Lipreading
Authors:
Yaman Kumar,
Rohit Jain,
Khwaja Mohd. Salik,
Rajiv Ratn Shah,
Yifang yin,
Roger Zimmermann
Abstract:
Lipreading has a lot of potential applications such as in the domain of surveillance and video conferencing. Despite this, most of the work in building lipreading systems has been limited to classifying silent videos into classes representing text phrases. However, there are multiple problems associated with making lipreading a text-based classification task like its dependence on a particular lan…
▽ More
Lipreading has a lot of potential applications such as in the domain of surveillance and video conferencing. Despite this, most of the work in building lipreading systems has been limited to classifying silent videos into classes representing text phrases. However, there are multiple problems associated with making lipreading a text-based classification task like its dependence on a particular language and vocabulary mapping. Thus, in this paper we propose a multi-view lipreading to audio system, namely Lipper, which models it as a regression task. The model takes silent videos as input and produces speech as the output. With multi-view silent videos, we observe an improvement over single-view speech reconstruction results. We show this by presenting an exhaustive set of experiments for speaker-dependent, out-of-vocabulary and speaker-independent settings. Further, we compare the delay values of Lipper with other speechreading systems in order to show the real-time nature of audio produced. We also perform a user study for the audios produced in order to understand the level of comprehensibility of audios produced using Lipper.
△ Less
Submitted 28 June, 2019;
originally announced July 2019.
-
MobiVSR: A Visual Speech Recognition Solution for Mobile Devices
Authors:
Nilay Shrivastava,
Astitwa Saxena,
Yaman Kumar,
Rajiv Ratn Shah,
Debanjan Mahata,
Amanda Stent
Abstract:
Visual speech recognition (VSR) is the task of recognizing spoken language from video input only, without any audio. VSR has many applications as an assistive technology, especially if it could be deployed in mobile devices and embedded systems. The need of intensive computational resources and large memory footprint are two of the major obstacles in developing neural network models for VSR in a r…
▽ More
Visual speech recognition (VSR) is the task of recognizing spoken language from video input only, without any audio. VSR has many applications as an assistive technology, especially if it could be deployed in mobile devices and embedded systems. The need of intensive computational resources and large memory footprint are two of the major obstacles in developing neural network models for VSR in a resource constrained environment. We propose a novel end-to-end deep neural network architecture for word level VSR called MobiVSR with a design parameter that aids in balancing the model's accuracy and parameter count. We use depthwise-separable 3D convolution for the first time in the domain of VSR and show how it makes our model efficient. MobiVSR achieves an accuracy of 73\% on a challenging Lip Reading in the Wild dataset with 6 times fewer parameters and 20 times lesser memory footprint than the current state of the art. MobiVSR can also be compressed to 6 MB by applying post training quantization.
△ Less
Submitted 4 June, 2019; v1 submitted 10 May, 2019;
originally announced May 2019.
-
Suggestion Mining from Online Reviews using ULMFiT
Authors:
Sarthak Anand,
Debanjan Mahata,
Kartik Aggarwal,
Laiba Mehnaz,
Simra Shahid,
Haimin Zhang,
Yaman Kumar,
Rajiv Ratn Shah,
Karan Uppal
Abstract:
In this paper we present our approach and the system description for Sub Task A of SemEval 2019 Task 9: Suggestion Mining from Online Reviews and Forums. Given a sentence, the task asks to predict whether the sentence consists of a suggestion or not. Our model is based on Universal Language Model Fine-tuning for Text Classification. We apply various pre-processing techniques before training the la…
▽ More
In this paper we present our approach and the system description for Sub Task A of SemEval 2019 Task 9: Suggestion Mining from Online Reviews and Forums. Given a sentence, the task asks to predict whether the sentence consists of a suggestion or not. Our model is based on Universal Language Model Fine-tuning for Text Classification. We apply various pre-processing techniques before training the language and the classification model. We further provide detailed analysis of the results obtained using the trained model. Our team ranked 10th out of 34 participants, achieving an F1 score of 0.7011. We publicly share our implementation at https://github.com/isarth/SemEval9_MIDAS
△ Less
Submitted 19 April, 2019;
originally announced April 2019.
-
Harnessing GANs for Zero-shot Learning of New Classes in Visual Speech Recognition
Authors:
Yaman Kumar,
Dhruva Sahrawat,
Shubham Maheshwari,
Debanjan Mahata,
Amanda Stent,
Yifang Yin,
Rajiv Ratn Shah,
Roger Zimmermann
Abstract:
Visual Speech Recognition (VSR) is the process of recognizing or interpreting speech by watching the lip movements of the speaker. Recent machine learning based approaches model VSR as a classification problem; however, the scarcity of training data leads to error-prone systems with very low accuracies in predicting unseen classes. To solve this problem, we present a novel approach to zero-shot le…
▽ More
Visual Speech Recognition (VSR) is the process of recognizing or interpreting speech by watching the lip movements of the speaker. Recent machine learning based approaches model VSR as a classification problem; however, the scarcity of training data leads to error-prone systems with very low accuracies in predicting unseen classes. To solve this problem, we present a novel approach to zero-shot learning by generating new classes using Generative Adversarial Networks (GANs), and show how the addition of unseen class samples increases the accuracy of a VSR system by a significant margin of 27% and allows it to handle speaker-independent out-of-vocabulary phrases. We also show that our models are language agnostic and therefore capable of seamlessly generating, using English training data, videos for a new language (Hindi). To the best of our knowledge, this is the first work to show empirical evidence of the use of GANs for generating training samples of unseen classes in the domain of VSR, hence facilitating zero-shot learning. We make the added videos for new classes publicly available along with our code.
△ Less
Submitted 2 January, 2020; v1 submitted 29 January, 2019;
originally announced January 2019.
-
Kiki Kills: Identifying Dangerous Challenge Videos from Social Media
Authors:
Nupur Baghel,
Yaman Kumar,
Paavini Nanda,
Rajiv Ratn Shah,
Debanjan Mahata,
Roger Zimmermann
Abstract:
There has been upsurge in the number of people participating in challenges made popular through social media channels. One of the examples of such a challenge is the Kiki Challenge, in which people step out of their moving cars and dance to the tunes of the song, 'Kiki, Do you love me?'. Such an action makes the people taking the challenge prone to accidents and can also create nuisance for the ot…
▽ More
There has been upsurge in the number of people participating in challenges made popular through social media channels. One of the examples of such a challenge is the Kiki Challenge, in which people step out of their moving cars and dance to the tunes of the song, 'Kiki, Do you love me?'. Such an action makes the people taking the challenge prone to accidents and can also create nuisance for the others traveling on the road. In this work, we introduce the prevalence of such challenges in social media and show how the machine learning community can aid in preventing dangerous situations triggered by them by developing models that can distinguish between dangerous and non-dangerous challenge videos. Towards this objective, we release a new dataset namely MIDAS-KIKI dataset, consisting of manually annotated dangerous and non-dangerous Kiki challenge videos. Further, we train a deep learning model to identify dangerous and non-dangerous videos, and report our results.
△ Less
Submitted 16 December, 2018; v1 submitted 2 December, 2018;
originally announced December 2018.
-
Mind Your Language: Abuse and Offense Detection for Code-Switched Languages
Authors:
Raghav Kapoor,
Yaman Kumar,
Kshitij Rajput,
Rajiv Ratn Shah,
Ponnurangam Kumaraguru,
Roger Zimmermann
Abstract:
In multilingual societies like the Indian subcontinent, use of code-switched languages is much popular and convenient for the users. In this paper, we study offense and abuse detection in the code-switched pair of Hindi and English (i.e. Hinglish), the pair that is the most spoken. The task is made difficult due to non-fixed grammar, vocabulary, semantics and spellings of Hinglish language. We app…
▽ More
In multilingual societies like the Indian subcontinent, use of code-switched languages is much popular and convenient for the users. In this paper, we study offense and abuse detection in the code-switched pair of Hindi and English (i.e. Hinglish), the pair that is the most spoken. The task is made difficult due to non-fixed grammar, vocabulary, semantics and spellings of Hinglish language. We apply transfer learning and make a LSTM based model for hate speech classification. This model surpasses the performance shown by the current best models to establish itself as the state-of-the-art in the unexplored domain of Hinglish offensive text classification.We also release our model and the embeddings trained for research purposes
△ Less
Submitted 23 September, 2018;
originally announced September 2018.
-
IceBreaker: Solving Cold Start Problem for Video Recommendation Engines
Authors:
Yaman Kumar,
Agniv Sharma,
Abhigyan Khaund,
Akash Kumar,
Ponnurangam Kumaraguru,
Rajiv Ratn Shah
Abstract:
Internet has brought about a tremendous increase in content of all forms and, in that, video content constitutes the major backbone of the total content being published as well as watched. Thus it becomes imperative for video recommendation engines such as Hulu to look for novel and innovative ways to recommend the newly added videos to their users. However, the problem with new videos is that the…
▽ More
Internet has brought about a tremendous increase in content of all forms and, in that, video content constitutes the major backbone of the total content being published as well as watched. Thus it becomes imperative for video recommendation engines such as Hulu to look for novel and innovative ways to recommend the newly added videos to their users. However, the problem with new videos is that they lack any sort of metadata and user interaction so as to be able to rate the videos for the consumers. To this effect, this paper introduces the several techniques we develop for the Content Based Video Relevance Prediction (CBVRP) Challenge being hosted by Hulu for the ACM Multimedia Conference 2018. We employ different architectures on the CBVRP dataset to make use of the provided frame and video level features and generate predictions of videos that are similar to the other videos. We also implement several ensemble strategies to explore complementarity between both the types of provided features. The obtained results are encouraging and will impel the boundaries of research for multimedia based video recommendation systems.
△ Less
Submitted 16 August, 2018;
originally announced August 2018.
-
Harnessing AI for Speech Reconstruction using Multi-view Silent Video Feed
Authors:
Yaman Kumar,
Mayank Aggarwal,
Pratham Nawal,
Shin'ichi Satoh,
Rajiv Ratn Shah,
Roger Zimmerman
Abstract:
Speechreading or lipreading is the technique of understanding and getting phonetic features from a speaker's visual features such as movement of lips, face, teeth and tongue. It has a wide range of multimedia applications such as in surveillance, Internet telephony, and as an aid to a person with hearing impairments. However, most of the work in speechreading has been limited to text generation fr…
▽ More
Speechreading or lipreading is the technique of understanding and getting phonetic features from a speaker's visual features such as movement of lips, face, teeth and tongue. It has a wide range of multimedia applications such as in surveillance, Internet telephony, and as an aid to a person with hearing impairments. However, most of the work in speechreading has been limited to text generation from silent videos. Recently, research has started venturing into generating (audio) speech from silent video sequences but there have been no developments thus far in dealing with divergent views and poses of a speaker. Thus although, we have multiple camera feeds for the speech of a user, but we have failed in using these multiple video feeds for dealing with the different poses. To this end, this paper presents the world's first ever multi-view speech reading and reconstruction system. This work encompasses the boundaries of multimedia research by putting forth a model which leverages silent video feeds from multiple cameras recording the same subject to generate intelligent speech for a speaker. Initial results confirm the usefulness of exploiting multiple camera views in building an efficient speech reading and reconstruction system. It further shows the optimal placement of cameras which would lead to the maximum intelligibility of speech. Next, it lays out various innovative applications for the proposed system focusing on its potential prodigious impact in not just security arena but in many other multimedia analytics problems.
△ Less
Submitted 12 August, 2018; v1 submitted 2 July, 2018;
originally announced July 2018.
-
Lip Reading Using Convolutional Auto Encoders as Feature Extractor
Authors:
Dharin Parekh,
Ankitesh Gupta,
Shharrnam Chhatpar,
Anmol Yash Kumar,
Manasi Kulkarni
Abstract:
Visual recognition of speech using the lip movement is called Lip-reading. Recent developments in this nascent field uses different neural networks as feature extractors which serve as input to a model which can map the temporal relationship and classify. Though end to end sentence level Lip-reading is the current trend, we proposed a new model which employs word level classification and breaks th…
▽ More
Visual recognition of speech using the lip movement is called Lip-reading. Recent developments in this nascent field uses different neural networks as feature extractors which serve as input to a model which can map the temporal relationship and classify. Though end to end sentence level Lip-reading is the current trend, we proposed a new model which employs word level classification and breaks the set benchmarks for standard datasets. In our model we use convolutional autoencoders as feature extractors which are then fed to a Long short-term memory model. We tested our proposed model on BBC's LRW dataset, MIRACL-VC1 and GRID dataset. Achieving a classification accuracy of 98% on MIRACL-VC1 as compared to 93.4% of the set benchmark (Rekik et al., 2014). On BBC's LRW the proposed model performed better than the baseline model of convolutional neural networks and Long short-term memory model (Garg et al., 2016). Showing the features learned by the models we clearly indicate how the proposed model works better than the baseline model. The same model can also be extended for end to end sentence level classification.
△ Less
Submitted 31 May, 2018;
originally announced May 2018.
-
Animal Classification System: A Block Based Approach
Authors:
Y H Sharath Kumar,
Manohar N,
H K Chethan
Abstract:
In this work, we propose a method for the classification of animal in images. Initially, a graph cut based method is used to perform segmentation in order to eliminate the background from the given image. The segmented animal images are partitioned in to number of blocks and then the color texture moments are extracted from different blocks. Probabilistic neural network and K-nearest neighbors are…
▽ More
In this work, we propose a method for the classification of animal in images. Initially, a graph cut based method is used to perform segmentation in order to eliminate the background from the given image. The segmented animal images are partitioned in to number of blocks and then the color texture moments are extracted from different blocks. Probabilistic neural network and K-nearest neighbors are considered here for classification. To corroborate the efficacy of the proposed method, an experiment was conducted on our own data set of 25 classes of animals, which consisted of 4000 sample images. The experiment was conducted by picking images randomly from the database to study the effect of classification accuracy, and the results show that the K-nearest neighbors classifier achieves good performance.
△ Less
Submitted 6 September, 2016;
originally announced September 2016.
-
Delaunay Triangulation on Skeleton of Flowers for Classification
Authors:
Y H Sharath Kumar,
N Vinay Kumar,
D S Guru
Abstract:
In this work, we propose a Triangle based approach to classify flower images. Initially, flowers are segmented using whorl based region merging segmentation. Skeleton of a flower is obtained from the segmented flower using a skeleton pruning method. The Delaunay triangulation is obtained from the endpoints and junction points detected on the skeleton. The length and angle features are extracted fr…
▽ More
In this work, we propose a Triangle based approach to classify flower images. Initially, flowers are segmented using whorl based region merging segmentation. Skeleton of a flower is obtained from the segmented flower using a skeleton pruning method. The Delaunay triangulation is obtained from the endpoints and junction points detected on the skeleton. The length and angle features are extracted from the obtained Delaunay triangles and then are aggregated to represent in the form of interval-valued type data. A suitable classifier has been explored for the purpose of classification. To corroborate the efficacy of the proposed method, an experiment is conducted on our own data set of 30 classes of flowers, containing 3000 samples.
△ Less
Submitted 6 September, 2016;
originally announced September 2016.
-
Framework for Application Mapping over Packet-Switched Network of FPGAs: Case Studies
Authors:
Vinay B. Y. Kumar,
Pinalkumar Engineer,
Mandar Datar,
Yatish Turakhia,
Saurabh Agarwal,
Sanket Diwale,
Sachin B. Patkar
Abstract:
The algorithm-to-hardware High-level synthesis (HLS) tools today are purported to produce hardware comparable in quality to handcrafted designs, particularly with user directive driven or domains specific HLS. However, HLS tools are not readily equipped for when an application/algorithm needs to scale. We present a (work-in-progress) semi-automated framework to map applications over a packet-switc…
▽ More
The algorithm-to-hardware High-level synthesis (HLS) tools today are purported to produce hardware comparable in quality to handcrafted designs, particularly with user directive driven or domains specific HLS. However, HLS tools are not readily equipped for when an application/algorithm needs to scale. We present a (work-in-progress) semi-automated framework to map applications over a packet-switched network of modules (single FPGA) and then to seamlessly partition such a network over multiple FPGAs over quasi-serial links. We illustrate the framework through three application case studies: LDPC Decoding, Particle Filter based Object Tracking, and Matrix Vector Multiplication over GF(2). Starting with high-level representations of each case application, we first express them in an intermediate message passing formulation, a model of communicating processing elements. Once the processing elements are identified, these are either handcrafted or realized using HLS. The rest of the flow is automated where the processing elements are plugged on to a configurable network-on-chip (CONNECT) topology of choice, followed by partitioning the 'on-chip' links to work seamlessly across chips/FPGAs.
△ Less
Submitted 27 August, 2015;
originally announced August 2015.
-
A Probabilistic Collocation Method Based Statistical Gate Delay Model Considering Process Variations and Multiple Input Switching
Authors:
Y. Satish Kumar,
Jun Li,
Claudio Talarico,
Janet Wang
Abstract:
Since the advent of new nanotechnologies, the variability of gate delay due to process variations has become a major concern. This paper proposes a new gate delay model that includes impact from both process variations and multiple input switching. The proposed model uses orthogonal polynomial based probabilistic collocation method to construct a delay analytical equation from circuit timing per…
▽ More
Since the advent of new nanotechnologies, the variability of gate delay due to process variations has become a major concern. This paper proposes a new gate delay model that includes impact from both process variations and multiple input switching. The proposed model uses orthogonal polynomial based probabilistic collocation method to construct a delay analytical equation from circuit timing performance. From the experimental results, our approach has less that 0.2% error on the mean delay of gates and less than 3% error on the standard deviation.
△ Less
Submitted 25 October, 2007;
originally announced October 2007.
-
On Shift Sequences for Interleaved Construction of Sequence Sets with Low Correlation
Authors:
N Rajesh Pillai,
Yogesh Kumar
Abstract:
Construction of signal sets with low correlation property is of interest to designers of CDMA systems. One of the preferred ways of constructing such sets is the interleaved construction which uses two sequences a and b with 2-level autocorrelation and a shift sequence e. The shift sequence has to satisfy certain conditions for the resulting signal set to have low correlation properties. This ar…
▽ More
Construction of signal sets with low correlation property is of interest to designers of CDMA systems. One of the preferred ways of constructing such sets is the interleaved construction which uses two sequences a and b with 2-level autocorrelation and a shift sequence e. The shift sequence has to satisfy certain conditions for the resulting signal set to have low correlation properties. This article shows that the conditions reported in literature are too strong and gives a version which results in more number of shift sequences. An open problem on the existence of shift sequences for attaining an interleaved set with maximum correlation value bounded by v+2 is also taken up and solved.
△ Less
Submitted 21 June, 2007; v1 submitted 4 October, 2006;
originally announced October 2006.