(Translated by https://www.hiragana.jp/)
Search | arXiv e-print repository
Skip to main content

Showing 1–50 of 62 results for author: Koutra, D

.
  1. arXiv:2406.16321  [pdf, other

    cs.LG cs.AI

    Multimodal Graph Benchmark

    Authors: Jing Zhu, Yuhang Zhou, Shengyi Qian, Zhongmou He, Tong Zhao, Neil Shah, Danai Koutra

    Abstract: Associating unstructured data with structured information is crucial for real-world tasks that require relevance search. However, existing graph learning benchmarks often overlook the rich semantic information associate with each node. To bridge such gap, we introduce the Multimodal Graph Benchmark (MM-GRAPH), the first comprehensive multi-modal graph benchmark that incorporates both textual and v… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: https://mm-graph-benchmark.github.io/

  2. arXiv:2406.13114  [pdf, other

    cs.CL cs.AI

    Multi-Stage Balanced Distillation: Addressing Long-Tail Challenges in Sequence-Level Knowledge Distillation

    Authors: Yuhang Zhou, Jing Zhu, Paiheng Xu, Xiaoyu Liu, Xiyao Wang, Danai Koutra, Wei Ai, Furong Huang

    Abstract: Large language models (LLMs) have significantly advanced various natural language processing tasks, but deploying them remains computationally expensive. Knowledge distillation (KD) is a promising solution, enabling the transfer of capabilities from larger teacher LLMs to more compact student models. Particularly, sequence-level KD, which distills rationale-based reasoning processes instead of mer… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: preprint

  3. arXiv:2406.05109  [pdf, other

    cs.LG

    Large Generative Graph Models

    Authors: Yu Wang, Ryan A. Rossi, Namyong Park, Huiyuan Chen, Nesreen K. Ahmed, Puja Trivedi, Franck Dernoncourt, Danai Koutra, Tyler Derr

    Abstract: Large Generative Models (LGMs) such as GPT, Stable Diffusion, Sora, and Suno are trained on a huge amount of language corpus, images, videos, and audio that are extremely diverse from numerous domains. This training paradigm over diverse well-curated data lies at the heart of generating creative and sensible content. However, all previous graph generative models (e.g., GraphRNN, MDVAE, MoFlow, GDS… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  4. arXiv:2406.04640  [pdf, other

    cs.LG

    LinkGPT: Teaching Large Language Models To Predict Missing Links

    Authors: Zhongmou He, Jing Zhu, Shengyi Qian, Joyce Chai, Danai Koutra

    Abstract: Large Language Models (LLMs) have shown promising results on various language and vision tasks. Recently, there has been growing interest in applying LLMs to graph-based tasks, particularly on Text-Attributed Graphs (TAGs). However, most studies have focused on node classification, while the use of LLMs for link prediction (LP) remains understudied. In this work, we propose a new task on LLMs, whe… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  5. arXiv:2401.03350  [pdf, other

    cs.LG stat.ML

    Accurate and Scalable Estimation of Epistemic Uncertainty for Graph Neural Networks

    Authors: Puja Trivedi, Mark Heimann, Rushil Anirudh, Danai Koutra, Jayaraman J. Thiagarajan

    Abstract: While graph neural networks (GNNs) are widely used for node and graph representation learning tasks, the reliability of GNN uncertainty estimates under distribution shifts remains relatively under-explored. Indeed, while post-hoc calibration strategies can be used to improve in-distribution calibration, they need not also improve calibration under distribution shift. However, techniques which prod… ▽ More

    Submitted 6 January, 2024; originally announced January 2024.

    Comments: 33 pages; 10 Figures. arXiv admin note: text overlap with arXiv:2309.10976

  6. arXiv:2312.15520  [pdf, other

    cs.LG

    Graph Coarsening via Convolution Matching for Scalable Graph Neural Network Training

    Authors: Charles Dickens, Eddie Huang, Aishwarya Reganti, Jiong Zhu, Karthik Subbian, Danai Koutra

    Abstract: Graph summarization as a preprocessing step is an effective and complementary technique for scalable graph neural network (GNN) training. In this work, we propose the Coarsening Via Convolution Matching (CONVMATCH) algorithm and a highly scalable variant, A-CONVMATCH, for creating summarized graphs that preserve the output of graph convolution. We evaluate CONVMATCH on six real-world link predicti… ▽ More

    Submitted 24 December, 2023; originally announced December 2023.

  7. arXiv:2311.17856  [pdf, other

    cs.LG cs.SI

    Leveraging Graph Diffusion Models for Network Refinement Tasks

    Authors: Puja Trivedi, Ryan Rossi, David Arbour, Tong Yu, Franck Dernoncourt, Sungchul Kim, Nedim Lipka, Namyong Park, Nesreen K. Ahmed, Danai Koutra

    Abstract: Most real-world networks are noisy and incomplete samples from an unknown target distribution. Refining them by correcting corruptions or inferring unobserved regions typically improves downstream performance. Inspired by the impressive generative capabilities that have been used to correct corruptions in images, and the similarities between "in-painting" and filling in missing nodes and edges con… ▽ More

    Submitted 29 November, 2023; originally announced November 2023.

    Comments: Work in Progress. 21 pages, 7 figures

  8. arXiv:2309.13885  [pdf, other

    cs.LG cs.AI cs.CL cs.CV cs.SI

    TouchUp-G: Improving Feature Representation through Graph-Centric Finetuning

    Authors: Jing Zhu, Xiang Song, Vassilis N. Ioannidis, Danai Koutra, Christos Faloutsos

    Abstract: How can we enhance the node features acquired from Pretrained Models (PMs) to better suit downstream graph learning tasks? Graph Neural Networks (GNNs) have become the state-of-the-art approach for many high-impact, real-world graph applications. For feature-rich graphs, a prevalent practice involves utilizing a PM directly to generate features, without incorporating any domain adaptation techniqu… ▽ More

    Submitted 25 September, 2023; originally announced September 2023.

    Comments: preprint, ongoing work

  9. arXiv:2309.10976  [pdf, other

    cs.LG

    Accurate and Scalable Estimation of Epistemic Uncertainty for Graph Neural Networks

    Authors: Puja Trivedi, Mark Heimann, Rushil Anirudh, Danai Koutra, Jayaraman J. Thiagarajan

    Abstract: Safe deployment of graph neural networks (GNNs) under distribution shift requires models to provide accurate confidence indicators (CI). However, while it is well-known in computer vision that CI quality diminishes under distribution shift, this behavior remains understudied for GNNs. Hence, we begin with a case study on CI calibration under controlled structural and feature distribution shifts an… ▽ More

    Submitted 19 September, 2023; originally announced September 2023.

    Comments: 22 pages, 11 figures

  10. Interpretable Sparsification of Brain Graphs: Better Practices and Effective Designs for Graph Neural Networks

    Authors: Gaotang Li, Marlena Duda, Xiang Zhang, Danai Koutra, Yujun Yan

    Abstract: Brain graphs, which model the structural and functional relationships between brain regions, are crucial in neuroscientific and clinical applications involving graph classification. However, dense brain graphs pose computational challenges including high runtime and memory usage and limited interpretability. In this paper, we investigate effective designs in Graph Neural Networks (GNNs) to sparsif… ▽ More

    Submitted 25 June, 2023; originally announced June 2023.

    Comments: To appear in Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 23)

  11. arXiv:2306.05557  [pdf, other

    cs.SI cs.LG

    On Performance Discrepancies Across Local Homophily Levels in Graph Neural Networks

    Authors: Donald Loveland, Jiong Zhu, Mark Heimann, Benjamin Fish, Michael T. Schaub, Danai Koutra

    Abstract: Graph Neural Network (GNN) research has highlighted a relationship between high homophily (i.e., the tendency of nodes of the same class to connect) and strong predictive performance in node classification. However, recent work has found the relationship to be more nuanced, demonstrating that simple GNNs can learn in certain heterophilous settings. To resolve these conflicting findings and align c… ▽ More

    Submitted 20 November, 2023; v1 submitted 8 June, 2023; originally announced June 2023.

    Comments: 30 pages

  12. arXiv:2306.00899  [pdf, other

    cs.LG cs.IR cs.SI

    Pitfalls in Link Prediction with Graph Neural Networks: Understanding the Impact of Target-link Inclusion & Better Practices

    Authors: Jing Zhu, Yuhang Zhou, Vassilis N. Ioannidis, Shengyi Qian, Wei Ai, Xiang Song, Danai Koutra

    Abstract: While Graph Neural Networks (GNNs) are remarkably successful in a variety of high-impact applications, we demonstrate that, in link prediction, the common practices of including the edges being predicted in the graph at training and/or test have outsized impact on the performance of low-degree nodes. We theoretically and empirically investigate how these practices impact node-level performance acr… ▽ More

    Submitted 17 December, 2023; v1 submitted 1 June, 2023; originally announced June 2023.

    Comments: Extended Version of our WSDM'24 paper. 8 pages, 2 page appendix

  13. arXiv:2305.15611  [pdf, other

    cs.LG cs.AI

    Size Generalization of Graph Neural Networks on Biological Data: Insights and Practices from the Spectral Perspective

    Authors: Gaotang Li, Danai Koutra, Yujun Yan

    Abstract: We investigate size-induced distribution shifts in graphs and assess their impact on the ability of graph neural networks (GNNs) to generalize to larger graphs relative to the training data. Existing literature presents conflicting conclusions on GNNs' size generalizability, primarily due to disparities in application domains and underlying assumptions concerning size-induced distribution shifts.… ▽ More

    Submitted 6 February, 2024; v1 submitted 24 May, 2023; originally announced May 2023.

    Comments: 21 pages, including appendix

  14. arXiv:2305.09887  [pdf, other

    cs.LG cs.DC

    Simplifying Distributed Neural Network Training on Massive Graphs: Randomized Partitions Improve Model Aggregation

    Authors: Jiong Zhu, Aishwarya Reganti, Edward Huang, Charles Dickens, Nikhil Rao, Karthik Subbian, Danai Koutra

    Abstract: Distributed training of GNNs enables learning on massive graphs (e.g., social and e-commerce networks) that exceed the storage and computational capacity of a single machine. To reach performance comparable to centralized training, distributed frameworks focus on maximally recovering cross-instance node dependencies with either communication across instances or periodic fallback to centralized tra… ▽ More

    Submitted 16 May, 2023; originally announced May 2023.

    Comments: 14 pages, 3 figures

  15. arXiv:2303.13589  [pdf, other

    cs.LG stat.ML

    On the Efficacy of Generalization Error Prediction Scoring Functions

    Authors: Puja Trivedi, Danai Koutra, Jayaraman J. Thiagarajan

    Abstract: Generalization error predictors (GEPs) aim to predict model performance on unseen distributions by deriving dataset-level error estimates from sample-level scores. However, GEPs often utilize disparate mechanisms (e.g., regressors, thresholding functions, calibration datasets, etc), to derive such error estimates, which can obfuscate the benefits of a particular scoring function. Therefore, in thi… ▽ More

    Submitted 29 May, 2023; v1 submitted 23 March, 2023; originally announced March 2023.

    Comments: Accepted to ICASSP 2023. (Previous title: A Closer Look at Scoring Functions and Generalization Prediction.)

  16. arXiv:2303.13500  [pdf, other

    cs.LG

    A Closer Look at Model Adaptation using Feature Distortion and Simplicity Bias

    Authors: Puja Trivedi, Danai Koutra, Jayaraman J. Thiagarajan

    Abstract: Advances in the expressivity of pretrained models have increased interest in the design of adaptation protocols which enable safe and effective transfer learning. Going beyond conventional linear probing (LP) and fine tuning (FT) strategies, protocols that can effectively control feature distortion, i.e., the failure to update features orthogonal to the in-distribution, have been found to achieve… ▽ More

    Submitted 23 March, 2023; originally announced March 2023.

    Comments: Accepted to ICLR 2023 as notable-25% (spotlight)

  17. arXiv:2208.10682  [pdf, other

    cs.SI cs.IR cs.LG

    CAPER: Coarsen, Align, Project, Refine - A General Multilevel Framework for Network Alignment

    Authors: Jing Zhu, Danai Koutra, Mark Heimann

    Abstract: Network alignment, or the task of finding corresponding nodes in different networks, is an important problem formulation in many application domains. We propose CAPER, a multilevel alignment framework that Coarsens the input graphs, Aligns the coarsened graphs, Projects the alignment solution to finer levels and Refines the alignment solution. We show that CAPER can improve upon many different exi… ▽ More

    Submitted 22 August, 2022; originally announced August 2022.

    Comments: CIKM 2022

  18. arXiv:2208.02810  [pdf, other

    cs.LG

    Analyzing Data-Centric Properties for Graph Contrastive Learning

    Authors: Puja Trivedi, Ekdeep Singh Lubana, Mark Heimann, Danai Koutra, Jayaraman J. Thiagarajan

    Abstract: Recent analyses of self-supervised learning (SSL) find the following data-centric properties to be critical for learning good representations: invariance to task-irrelevant semantics, separability of classes in some latent space, and recoverability of labels from augmented samples. However, given their discrete, non-Euclidean nature, graph datasets and graph SSL methods are unlikely to satisfy the… ▽ More

    Submitted 22 January, 2023; v1 submitted 4 August, 2022; originally announced August 2022.

    Comments: Accepted to NeurIPS 2022

  19. arXiv:2207.12615  [pdf, other

    cs.LG

    Exploring the Design of Adaptation Protocols for Improved Generalization and Machine Learning Safety

    Authors: Puja Trivedi, Danai Koutra, Jayaraman J. Thiagarajan

    Abstract: While directly fine-tuning (FT) large-scale, pretrained models on task-specific data is well-known to induce strong in-distribution task performance, recent works have demonstrated that different adaptation protocols, such as linear probing (LP) prior to FT, can improve out-of-distribution generalization. However, the design space of such adaptation protocols remains under-explored and the evaluat… ▽ More

    Submitted 25 July, 2022; originally announced July 2022.

    Comments: Principles of Distribution Shift (PODS) Workshop at ICML 2022, 4 pages, 2 figures

  20. arXiv:2207.04376  [pdf, other

    cs.SI cs.CY cs.LG

    On Graph Neural Network Fairness in the Presence of Heterophilous Neighborhoods

    Authors: Donald Loveland, Jiong Zhu, Mark Heimann, Ben Fish, Michael T. Schaub, Danai Koutra

    Abstract: We study the task of node classification for graph neural networks (GNNs) and establish a connection between group fairness, as measured by statistical parity and equal opportunity, and local assortativity, i.e., the tendency of linked nodes to have similar attributes. Such assortativity is often induced by homophily, the tendency for nodes of similar properties to connect. Homophily can be common… ▽ More

    Submitted 14 November, 2022; v1 submitted 10 July, 2022; originally announced July 2022.

    Comments: 6 pages, KDD 2022 DLG Workshop

  21. arXiv:2207.01189  [pdf, ps, other

    cs.LG cs.SI

    Learning node embeddings via summary graphs: a brief theoretical analysis

    Authors: Houquan Zhou, Shenghua Liu, Danai Koutra, Huawei Shen, Xueqi Cheng

    Abstract: Graph representation learning plays an important role in many graph mining applications, but learning embeddings of large-scale graphs remains a problem. Recent works try to improve scalability via graph summarization -- i.e., they learn embeddings on a smaller summary graph, and then restore the node embeddings of the original graph. However, all existing works depend on heuristic designs and lac… ▽ More

    Submitted 4 July, 2022; originally announced July 2022.

  22. arXiv:2111.05410  [pdf, other

    cs.LG cs.AI

    Leveraging the Graph Structure of Neural Network Training Dynamics

    Authors: Fatemeh Vahedian, Ruiyu Li, Puja Trivedi, Di Jin, Danai Koutra

    Abstract: Understanding the training dynamics of deep neural networks (DNNs) is important as it can lead to improved training efficiency and task performance. Recent works have demonstrated that representing the wirings of static graph cannot capture how DNNs change over the course of training. Thus, in this work, we propose a compact, expressive temporal graph framework that effectively captures the dynami… ▽ More

    Submitted 20 February, 2023; v1 submitted 9 November, 2021; originally announced November 2021.

  23. arXiv:2111.03220  [pdf, other

    cs.LG

    Augmentations in Graph Contrastive Learning: Current Methodological Flaws & Towards Better Practices

    Authors: Puja Trivedi, Ekdeep Singh Lubana, Yujun Yan, Yaoqing Yang, Danai Koutra

    Abstract: Unsupervised graph representation learning is critical to a wide range of applications where labels may be scarce or expensive to procure. Contrastive learning (CL) is an increasingly popular paradigm for such settings and the state-of-the-art in unsupervised visual representation learning. Recent work attributes the success of visual CL to use of task-relevant augmentations and large, diverse dat… ▽ More

    Submitted 11 March, 2022; v1 submitted 4 November, 2021; originally announced November 2021.

    Comments: 8 pages, 4 figures, Accepted WebConf 2022

  24. arXiv:2110.14509  [pdf, other

    cs.LG cs.DB

    Deep Transfer Learning for Multi-source Entity Linkage via Domain Adaptation

    Authors: Di Jin, Bunyamin Sisman, Hao Wei, Xin Luna Dong, Danai Koutra

    Abstract: Multi-source entity linkage focuses on integrating knowledge from multiple sources by linking the records that represent the same real world entity. This is critical in high-impact applications such as data cleaning and user stitching. The state-of-the-art entity linkage pipelines mainly depend on supervised learning that requires abundant amounts of training data. However, collecting well-labeled… ▽ More

    Submitted 27 October, 2021; originally announced October 2021.

  25. How does Heterophily Impact the Robustness of Graph Neural Networks? Theoretical Connections and Practical Implications

    Authors: Jiong Zhu, Junchen Jin, Donald Loveland, Michael T. Schaub, Danai Koutra

    Abstract: We bridge two research directions on graph neural networks (GNNs), by formalizing the relation between heterophily of node labels (i.e., connected nodes tend to have dissimilar labels) and the robustness of GNNs to adversarial attacks. Our theoretical and empirical analyses show that for homophilous graph data, impactful structural attacks always lead to reduced homophily, while for heterophilous… ▽ More

    Submitted 22 July, 2022; v1 submitted 14 June, 2021; originally announced June 2021.

    Comments: KDD 2022 camera ready version + full appendix; 20 pages, 2 figures

  26. arXiv:2104.05837  [pdf, other

    cs.CL cs.AI cs.LG

    Relational World Knowledge Representation in Contextual Language Models: A Review

    Authors: Tara Safavi, Danai Koutra

    Abstract: Relational knowledge bases (KBs) are commonly used to represent world knowledge in machines. However, while advantageous for their high degree of precision and interpretability, KBs are usually organized according to manually-defined schemas, which limit their expressiveness and require significant human efforts to engineer and maintain. In this review, we take a natural language processing perspe… ▽ More

    Submitted 9 September, 2021; v1 submitted 12 April, 2021; originally announced April 2021.

    Comments: EMNLP 2021

  27. arXiv:2102.13582  [pdf, other

    cs.SI cs.LG

    Node Proximity Is All You Need: Unified Structural and Positional Node and Graph Embedding

    Authors: Jing Zhu, Xingyu Lu, Mark Heimann, Danai Koutra

    Abstract: While most network embedding techniques model the relative positions of nodes in a network, recently there has been significant interest in structural embeddings that model node role equivalences, irrespective of their distances to any specific nodes. We present PhUSION, a proximity-based unified framework for computing structural and positional node embeddings, which leverages well-established me… ▽ More

    Submitted 26 February, 2021; originally announced February 2021.

    Comments: SDM 2021

  28. A Hidden Challenge of Link Prediction: Which Pairs to Check?

    Authors: Caleb Belth, Alican Büyükçakır, Danai Koutra

    Abstract: The traditional setup of link prediction in networks assumes that a test set of node pairs, which is usually balanced, is available over which to predict the presence of links. However, in practice, there is no test set: the ground-truth is not known, so the number of possible pairs to predict over is quadratic in the number of nodes in the graph. Moreover, because graphs are sparse, most of these… ▽ More

    Submitted 15 February, 2021; originally announced February 2021.

  29. arXiv:2102.06462  [pdf, other

    cs.LG

    Two Sides of the Same Coin: Heterophily and Oversmoothing in Graph Convolutional Neural Networks

    Authors: Yujun Yan, Milad Hashemi, Kevin Swersky, Yaoqing Yang, Danai Koutra

    Abstract: In node classification tasks, graph convolutional neural networks (GCNs) have demonstrated competitive performance over traditional methods on diverse graph data. However, it is known that the performance of GCNs degrades with increasing number of layers (oversmoothing problem) and recent studies have also shown that GCNs may perform worse in heterophilous graphs, where neighboring nodes tend to b… ▽ More

    Submitted 28 November, 2022; v1 submitted 12 February, 2021; originally announced February 2021.

    Comments: Accepted to ICDM 2022, including 14-page supplement

  30. arXiv:2102.02805  [pdf, other

    cs.LG

    How do Quadratic Regularizers Prevent Catastrophic Forgetting: The Role of Interpolation

    Authors: Ekdeep Singh Lubana, Puja Trivedi, Danai Koutra, Robert P. Dick

    Abstract: Catastrophic forgetting undermines the effectiveness of deep neural networks (DNNs) in scenarios such as continual learning and lifelong learning. While several methods have been proposed to tackle this problem, there is limited work explaining why these methods work well. This paper has the goal of better explaining a popularly used technique for avoiding catastrophic forgetting: quadratic regula… ▽ More

    Submitted 12 August, 2022; v1 submitted 4 February, 2021; originally announced February 2021.

    Comments: Camera-ready for Conference on Lifelong Learning Agents (CoLLAs), 2022

  31. arXiv:2101.08808  [pdf, other

    cs.SI

    Refining Network Alignment to Improve Matched Neighborhood Consistency

    Authors: Mark Heimann, Xiyuan Chen, Fatemeh Vahedian, Danai Koutra

    Abstract: Network alignment, or the task of finding meaningful node correspondences between nodes in different graphs, is an important graph mining task with many scientific and industrial applications. An important principle for network alignment is matched neighborhood consistency (MNC): nodes that are close in one graph should be matched to nodes that are close in the other graph. We theoretically demons… ▽ More

    Submitted 21 January, 2021; originally announced January 2021.

    Comments: SDM 2021 (long version of paper with supplementary material)

  32. arXiv:2101.05730  [pdf, other

    cs.SI

    Towards Understanding and Evaluating Structural Node Embeddings

    Authors: Junchen Jin, Mark Heimann, Di Jin, Danai Koutra

    Abstract: While most network embedding techniques model the proximity between nodes in a network, recently there has been significant interest in structural embeddings that are based on node equivalences, a notion rooted in sociology: equivalences or positions are collections of nodes that have similar roles--i.e., similar functions, ties or interactions with nodes in other positions--irrespective of their… ▽ More

    Submitted 14 January, 2021; originally announced January 2021.

    Comments: A shorter version of this paper was presented in the Mining and Learning with Graphs workshop at KDD 2020

  33. arXiv:2011.07497  [pdf, other

    cs.AI cs.CL cs.LG

    NegatER: Unsupervised Discovery of Negatives in Commonsense Knowledge Bases

    Authors: Tara Safavi, Jing Zhu, Danai Koutra

    Abstract: Codifying commonsense knowledge in machines is a longstanding goal of artificial intelligence. Recently, much progress toward this goal has been made with automatic knowledge base (KB) construction techniques. However, such techniques focus primarily on the acquisition of positive (true) KB statements, even though negative (false) statements are often also important for discriminative reasoning ov… ▽ More

    Submitted 9 September, 2021; v1 submitted 15 November, 2020; originally announced November 2020.

    Comments: EMNLP 2021

  34. arXiv:2009.13566  [pdf, other

    cs.LG cs.SI stat.ML

    Graph Neural Networks with Heterophily

    Authors: Jiong Zhu, Ryan A. Rossi, Anup Rao, Tung Mai, Nedim Lipka, Nesreen K. Ahmed, Danai Koutra

    Abstract: Graph Neural Networks (GNNs) have proven to be useful for many different practical applications. However, many existing GNN models have implicitly assumed homophily among the nodes connected in the graph, and therefore have largely overlooked the important setting of heterophily, where most connected nodes are from different classes. In this work, we propose a novel framework called CPGNN that gen… ▽ More

    Submitted 14 June, 2021; v1 submitted 28 September, 2020; originally announced September 2020.

    Comments: Proceedings version of AAAI 2021 with appendix and additional typo fixes; 12 pages, 4 figures

    Journal ref: Proceedings of the AAAI Conference on Artificial Intelligence. 35, 12 (May 2021), 11168-11176

  35. arXiv:2009.10017  [pdf, other

    cs.LG cs.AI cs.SI stat.ML

    From Static to Dynamic Node Embeddings

    Authors: Di Jin, Sungchul Kim, Ryan A. Rossi, Danai Koutra

    Abstract: We introduce a general framework for leveraging graph stream data for temporal prediction-based applications. Our proposed framework includes novel methods for learning an appropriate graph time-series representation, modeling and weighting the temporal dependencies, and generalizing existing embedding methods for such data. While previous work on dynamic modeling and embedding has focused on repr… ▽ More

    Submitted 21 September, 2020; originally announced September 2020.

  36. arXiv:2009.07810  [pdf, other

    cs.CL cs.AI cs.IR cs.LG

    CoDEx: A Comprehensive Knowledge Graph Completion Benchmark

    Authors: Tara Safavi, Danai Koutra

    Abstract: We present CoDEx, a set of knowledge graph completion datasets extracted from Wikidata and Wikipedia that improve upon existing knowledge graph completion benchmarks in scope and level of difficulty. In terms of scope, CoDEx comprises three knowledge graphs varying in size and structure, multilingual descriptions of entities and relations, and tens of thousands of hard negative triples that are pl… ▽ More

    Submitted 6 October, 2020; v1 submitted 16 September, 2020; originally announced September 2020.

    Comments: EMNLP 2020

  37. arXiv:2007.16208  [pdf, other

    cs.SI cs.DB cs.IR cs.LG stat.ML

    G-CREWE: Graph CompREssion With Embedding for Network Alignment

    Authors: Kyle K. Qin, Flora D. Salim, Yongli Ren, Wei Shao, Mark Heimann, Danai Koutra

    Abstract: Network alignment is useful for multiple applications that require increasingly large graphs to be processed. Existing research approaches this as an optimization problem or computes the similarity based on node representations. However, the process of aligning every pair of nodes between relatively large networks is time-consuming and resource-intensive. In this paper, we propose a framework, cal… ▽ More

    Submitted 30 July, 2020; originally announced July 2020.

    Comments: 10 pages, accepted at the 29th ACM International Conference onInformation and Knowledge Management (CIKM 20)

  38. Mining Persistent Activity in Continually Evolving Networks

    Authors: Caleb Belth, Xinyi Zheng, Danai Koutra

    Abstract: Frequent pattern mining is a key area of study that gives insights into the structure and dynamics of evolving networks, such as social or road networks. However, not only does a network evolve, but often the way that it evolves, itself evolves. Thus, knowing, in addition to patterns' frequencies, for how long and how regularly they have occurred---i.e., their persistence---can add to our understa… ▽ More

    Submitted 27 June, 2020; originally announced June 2020.

    Comments: 9 pages, plus 2 pages of supplementary material. Accepted at KDD 2020

  39. arXiv:2006.11468  [pdf, other

    cs.LG stat.ML

    Beyond Homophily in Graph Neural Networks: Current Limitations and Effective Designs

    Authors: Jiong Zhu, Yujun Yan, Lingxiao Zhao, Mark Heimann, Leman Akoglu, Danai Koutra

    Abstract: We investigate the representation power of graph neural networks in the semi-supervised node classification task under heterophily or low homophily, i.e., in networks where connected nodes may have different class labels and dissimilar features. Many popular GNNs fail to generalize to this setting, and are even outperformed by models that ignore the graph structure (e.g., multilayer perceptrons).… ▽ More

    Submitted 23 October, 2020; v1 submitted 19 June, 2020; originally announced June 2020.

    Comments: Accepted to NeurIPS 2020; version with full appendix

  40. arXiv:2006.08084  [pdf, other

    cs.LG cs.NE cs.PL stat.ML

    Neural Execution Engines: Learning to Execute Subroutines

    Authors: Yujun Yan, Kevin Swersky, Danai Koutra, Parthasarathy Ranganathan, Milad Hashemi

    Abstract: A significant effort has been made to train neural networks that replicate algorithmic reasoning, but they often fail to learn the abstract concepts underlying these algorithms. This is evidenced by their inability to generalize to data distributions that are outside of their restricted training sets, namely larger inputs and unseen data. We study these generalization issues at the level of numeri… ▽ More

    Submitted 22 October, 2020; v1 submitted 14 June, 2020; originally announced June 2020.

    Comments: Accepted at 34th Conference on Neural Information Processing Systems (NeurIPS 2020)

  41. CONE-Align: Consistent Network Alignment with Proximity-Preserving Node Embedding

    Authors: Xiyuan Chen, Mark Heimann, Fatemeh Vahedian, Danai Koutra

    Abstract: Network alignment, the process of finding correspondences between nodes in different graphs, has many scientific and industrial applications. Existing unsupervised network alignment methods find suboptimal alignments that break up node neighborhoods, i.e. do not preserve matched neighborhood consistency. To improve this, we propose CONE-Align, which models intra-network proximity with node embeddi… ▽ More

    Submitted 17 August, 2020; v1 submitted 10 May, 2020; originally announced May 2020.

    Comments: In Proceedings of the 29th ACM International Conference on Information and Knowledge Management (CIKM), 2020

  42. arXiv:2004.01168  [pdf, other

    cs.AI cs.CL cs.LG

    Evaluating the Calibration of Knowledge Graph Embeddings for Trustworthy Link Prediction

    Authors: Tara Safavi, Danai Koutra, Edgar Meij

    Abstract: Little is known about the trustworthiness of predictions made by knowledge graph embedding (KGE) models. In this paper we take initial steps toward this direction by investigating the calibration of KGE models, or the extent to which they output confidence scores that reflect the expected correctness of predicted knowledge graph triples. We first conduct an evaluation under the standard closed-wor… ▽ More

    Submitted 6 October, 2020; v1 submitted 2 April, 2020; originally announced April 2020.

    Comments: EMNLP 2020

  43. arXiv:2003.10412  [pdf, other

    cs.AI cs.IR cs.LG

    What is Normal, What is Strange, and What is Missing in a Knowledge Graph: Unified Characterization via Inductive Summarization

    Authors: Caleb Belth, Xinyi Zheng, Jilles Vreeken, Danai Koutra

    Abstract: Knowledge graphs (KGs) store highly heterogeneous information about the world in the structure of a graph, and are useful for tasks such as question answering and reasoning. However, they often contain errors and are missing information. Vibrant research in KG refinement has worked to resolve these issues, tailoring techniques to either detect specific types of errors or complete a KG. In this w… ▽ More

    Submitted 23 March, 2020; originally announced March 2020.

    Comments: 10 pages, plus 2 pages of references. 5 figures. Accepted at The Web Conference 2020

  44. arXiv:2002.10010  [pdf, other

    cs.CY

    Driving with Data in the Motor City: Mining and Modeling Vehicle Fleet Maintenance Data

    Authors: Josh Gardner, Jawad Mroueh, Natalia Jenuwine, Noah Weaverdyck, Samuel Krassenstein, Arya Farahi, Danai Koutra

    Abstract: The City of Detroit maintains an active fleet of over 2500 vehicles, spending an annual average of over \$5 million on purchases and over \$7.7 million on maintenance. Modeling patterns and trends in this data is of particular importance to a variety of stakeholders, particularly as Detroit emerges from Chapter 9 bankruptcy, but the structure in such data is complex, and the city lacks dedicated r… ▽ More

    Submitted 21 September, 2020; v1 submitted 23 February, 2020; originally announced February 2020.

  45. arXiv:1908.08572  [pdf, other

    cs.SI cs.LG

    On Proximity and Structural Role-based Embeddings in Networks: Misconceptions, Techniques, and Applications

    Authors: Ryan A. Rossi, Di Jin, Sungchul Kim, Nesreen K. Ahmed, Danai Koutra, John Boaz Lee

    Abstract: Structural roles define sets of structurally similar nodes that are more similar to nodes inside the set than outside, whereas communities define sets of nodes with more connections inside the set than outside. Roles based on structural similarity and communities based on proximity are fundamentally different but important complementary notions. Recently, the notion of structural roles has become… ▽ More

    Submitted 21 September, 2020; v1 submitted 22 August, 2019; originally announced August 2019.

    Journal ref: ACM Transactions on Knowledge Discovery from Data (TKDD), Vol. 14, No. 5, Article 63 (August 2020), 37 pages

  46. arXiv:1904.08572  [pdf, other

    cs.SI cs.IR

    node2bits: Compact Time- and Attribute-aware Node Representations for User Stitching

    Authors: Di Jin, Mark Heimann, Ryan Rossi, Danai Koutra

    Abstract: Identity stitching, the task of identifying and matching various online references (e.g., sessions over different devices and timespans) to the same user in real-world web services, is crucial for personalization and recommendations. However, traditional user stitching approaches, such as grouping or blocking, require quadratic pairwise comparisons between a massive number of user activities, thus… ▽ More

    Submitted 19 September, 2019; v1 submitted 17 April, 2019; originally announced April 2019.

  47. arXiv:1811.04461  [pdf, other

    cs.SI

    Latent Network Summarization: Bridging Network Embedding and Summarization

    Authors: Di Jin, Ryan Rossi, Danai Koutra, Eunyee Koh, Sungchul Kim, Anup Rao

    Abstract: Motivated by the computational and storage challenges that dense embeddings pose, we introduce the problem of latent network summarization that aims to learn a compact, latent representation of the graph structure with dimensionality that is independent of the input graph size (i.e., #nodes and #edges), while retaining the ability to derive node representations on the fly. We propose Multi-LENS, a… ▽ More

    Submitted 20 June, 2019; v1 submitted 11 November, 2018; originally announced November 2018.

  48. Career Transitions and Trajectories: A Case Study in Computing

    Authors: Tara Safavi, Maryam Davoodi, Danai Koutra

    Abstract: From artificial intelligence to network security to hardware design, it is well-known that computing research drives many important technological and societal advancements. However, less is known about the long-term career paths of the people behind these innovations. What do their careers reveal about the evolution of computing research? Which institutions were and are the most important in this… ▽ More

    Submitted 24 May, 2018; v1 submitted 16 May, 2018; originally announced May 2018.

    Comments: To appear in KDD 2018

  49. arXiv:1805.01889  [pdf, other

    cs.LG stat.ML

    t-PINE: Tensor-based Predictable and Interpretable Node Embeddings

    Authors: Saba A. Al-Sayouri, Ekta Gujral, Danai Koutra, Evangelos E. Papalexakis, Sarah S. Lam

    Abstract: Graph representations have increasingly grown in popularity during the last years. Existing representation learning approaches explicitly encode network structure. Despite their good performance in downstream processes (e.g., node classification, link prediction), there is still room for improvement in different aspects, like efficacy, visualization, and interpretability. In this paper, we propose… ▽ More

    Submitted 3 May, 2018; originally announced May 2018.

  50. arXiv:1805.01509  [pdf, other

    cs.LG cs.AI cs.SI stat.ML

    RECS: Robust Graph Embedding Using Connection Subgraphs

    Authors: Saba A. Al-Sayouri, Danai Koutra, Evangelos E. Papalexakis, Sarah S. Lam

    Abstract: The success of graph embeddings or node representation learning in a variety of downstream tasks, such as node classification, link prediction, and recommendation systems, has led to their popularity in recent years. Representation learning algorithms aim to preserve local and global network structure by identifying node neighborhood notions. However, many existing algorithms generate embeddings t… ▽ More

    Submitted 5 September, 2018; v1 submitted 3 May, 2018; originally announced May 2018.