Analyzing Data-Centric Properties for Graph Contrastive Learning

Trivedi, Puja; Lubana, Ekdeep Singh; Heimann, Mark; Koutra, Danai; Thiagarajan, Jayaraman J.

Computer Science > Machine Learning

arXiv:2208.02810 (cs)

[Submitted on 4 Aug 2022 (v1), last revised 23 Jan 2023 (this version, v3)]

Title:Analyzing Data-Centric Properties for Graph Contrastive Learning

Authors:Puja Trivedi, Ekdeep Singh Lubana, Mark Heimann, Danai Koutra, Jayaraman J. Thiagarajan

View PDF

Abstract:Recent analyses of self-supervised learning (SSL) find the following data-centric properties to be critical for learning good representations: invariance to task-irrelevant semantics, separability of classes in some latent space, and recoverability of labels from augmented samples. However, given their discrete, non-Euclidean nature, graph datasets and graph SSL methods are unlikely to satisfy these properties. This raises the question: how do graph SSL methods, such as contrastive learning (CL), work well? To systematically probe this question, we perform a generalization analysis for CL when using generic graph augmentations (GGAs), with a focus on data-centric properties. Our analysis yields formal insights into the limitations of GGAs and the necessity of task-relevant augmentations. As we empirically show, GGAs do not induce task-relevant invariances on common benchmark datasets, leading to only marginal gains over naive, untrained baselines. Our theory motivates a synthetic data generation process that enables control over task-relevant information and boasts pre-defined optimal augmentations. This flexible benchmark helps us identify yet unrecognized limitations in advanced augmentation techniques (e.g., automated methods). Overall, our work rigorously contextualizes, both empirically and theoretically, the effects of data-centric properties on augmentation strategies and learning paradigms for graph SSL.

Comments:	Accepted to NeurIPS 2022
Subjects:	Machine Learning (cs.LG)
Cite as:	arXiv:2208.02810 [cs.LG]
	(or arXiv:2208.02810v3 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2208.02810

Submission history

From: Puja Trivedi [view email]
[v1] Thu, 4 Aug 2022 17:58:37 UTC (337 KB)
[v2] Thu, 27 Oct 2022 18:04:19 UTC (338 KB)
[v3] Mon, 23 Jan 2023 01:51:01 UTC (338 KB)

Computer Science > Machine Learning

Title:Analyzing Data-Centric Properties for Graph Contrastive Learning

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Analyzing Data-Centric Properties for Graph Contrastive Learning

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators