-
ReGrAt: Regularization in Graphs using Attention to handle class imbalance
Authors:
Neeraja Kirtane,
Jeshuren Chelladurai,
Balaraman Ravindran,
Ashish Tendulkar
Abstract:
Node classification is an important task to solve in graph-based learning. Even though a lot of work has been done in this field, imbalance is neglected. Real-world data is not perfect, and is imbalanced in representations most of the times. Apart from text and images, data can be represented using graphs, and thus addressing the imbalance in graphs has become of paramount importance. In the conte…
▽ More
Node classification is an important task to solve in graph-based learning. Even though a lot of work has been done in this field, imbalance is neglected. Real-world data is not perfect, and is imbalanced in representations most of the times. Apart from text and images, data can be represented using graphs, and thus addressing the imbalance in graphs has become of paramount importance. In the context of node classification, one class has less examples than others. Changing data composition is a popular way to address the imbalance in node classification. This is done by resampling the data to balance the dataset. However, that can sometimes lead to loss of information or add noise to the dataset. Therefore, in this work, we implicitly solve the problem by changing the model loss. Specifically, we study how attention networks can help tackle imbalance. Moreover, we observe that using a regularizer to assign larger weights to minority nodes helps to mitigate this imbalance. We achieve State of the Art results than the existing methods on several standard citation benchmark datasets.
△ Less
Submitted 27 November, 2022;
originally announced November 2022.
-
Cohorting to isolate asymptomatic spreaders: An agent-based simulation study on the Mumbai Suburban Railway
Authors:
Alok Talekar,
Sharad Shriram,
Nidhin Vaidhiyan,
Gaurav Aggarwal,
Jiangzhuo Chen,
Srini Venkatramanan,
Lijing Wang,
Aniruddha Adiga,
Adam Sadilek,
Ashish Tendulkar,
Madhav Marathe,
Rajesh Sundaresan,
Milind Tambe
Abstract:
The Mumbai Suburban Railways, \emph{locals}, are a key transit infrastructure of the city and is crucial for resuming normal economic activity. To reduce disease transmission, policymakers can enforce reduced crowding and mandate wearing of masks. \emph{Cohorting} -- forming groups of travelers that always travel together, is an additional policy to reduce disease transmission on \textit{locals} w…
▽ More
The Mumbai Suburban Railways, \emph{locals}, are a key transit infrastructure of the city and is crucial for resuming normal economic activity. To reduce disease transmission, policymakers can enforce reduced crowding and mandate wearing of masks. \emph{Cohorting} -- forming groups of travelers that always travel together, is an additional policy to reduce disease transmission on \textit{locals} without severe restrictions. Cohorting allows us to: ($i$) form traveler bubbles, thereby decreasing the number of distinct interactions over time; ($ii$) potentially quarantine an entire cohort if a single case is detected, making contact tracing more efficient, and ($iii$) target cohorts for testing and early detection of symptomatic as well as asymptomatic cases. Studying impact of cohorts using compartmental models is challenging because of the ensuing representational complexity. Agent-based models provide a natural way to represent cohorts along with the representation of the cohort members with the larger social network. This paper describes a novel multi-scale agent-based model to study the impact of cohorting strategies on COVID-19 dynamics in Mumbai. We achieve this by modeling the Mumbai urban region using a detailed agent-based model comprising of 12.4 million agents. Individual cohorts and their inter-cohort interactions as they travel on locals are modeled using local mean field approximations. The resulting multi-scale model in conjunction with a detailed disease transmission and intervention simulator is used to assess various cohorting strategies. The results provide a quantitative trade-off between cohort size and its impact on disease dynamics and well being. The results show that cohorts can provide significant benefit in terms of reduced transmission without significantly impacting ridership and or economic \& social activity.
△ Less
Submitted 24 December, 2020; v1 submitted 23 December, 2020;
originally announced December 2020.
-
Accurate Demarcation of Protein Domain Linkers based on Structural Analysis of Linker Probable Region
Authors:
Vivekanand Samant,
Arvind Hulgeri,
Alfonso Valencia,
Ashish V. Tendulkar
Abstract:
In multi-domain proteins, the domains are connected by a flexible unstructured region called as protein domain linker. The accurate demarcation of these linkers holds a key to understanding of their biochemical and evolutionary attributes. This knowledge helps in designing a suitable linker for engineering stable multi-domain chimeric proteins. Here we propose a novel method for the demarcation of…
▽ More
In multi-domain proteins, the domains are connected by a flexible unstructured region called as protein domain linker. The accurate demarcation of these linkers holds a key to understanding of their biochemical and evolutionary attributes. This knowledge helps in designing a suitable linker for engineering stable multi-domain chimeric proteins. Here we propose a novel method for the demarcation of the linker based on a three-dimensional protein structure and a domain definition. The proposed method is based on biological knowledge about structural flexibility of the linkers. We performed structural analysis on a linker probable region (LPR) around domain boundary points of known SCOP domains. The LPR was described using a set of overlapping peptide fragments of fixed size. Each peptide fragment was then described by geometric invariants (GIs) and subjected to clustering process where the fragments corresponding to actual linker come up as outliers. We then discover the actual linkers by finding the longest continuous stretch of outlier fragments from LPRs. This method was evaluated on a benchmark dataset of 51 continuous multi-domain proteins, where it achieves F1 score of 0.745 (0.83 precision and 0.66 recall). When the method was applied on 725 continuous multi-domain proteins, it was able to identify novel linkers that were not reported previously. This method can be used in combination with supervised / sequence based linker prediction methods for accurate linker demarcation.
△ Less
Submitted 23 November, 2012;
originally announced November 2012.