Physics-Inspired Geometric Pretraining for Molecule Representation
AAAI 2025 Tutorial

  • University of California, Berkeley

Abstract

Molecular representation pretraining is critical in various applications for drug and material discovery. Along this research line, most existing work focuses on pretraining on 2D molecular graphs. Meanwhile, the power of pretraining on 3D geometric structures has been recently explored. In this tutorial, I would like to start the introduction to molecule geometric representation methods (group invariant and equivariant representation) and self-supervised learning for pretraining. After this, I will combine these two topics and comprehensively introduce geometric pretraining for molecule representation, discussing the most recent works in detail (GraphMVP, 3D InfoMax, 3D-EMGP, Uni-Mol, GeoSSL, MoleculeSDE, and NeuralCrystal).

Keywords: Artificial intelligence, machine learning, deep learning, drug discovery, graph representation learning, geometric representation learning, geometric pretraining, self-supervised pretraining, group symmetry, Invariance, E(3)-equivariance, SE(3)-equivariance.

Outline

  • Overview
  • Geometric Representation Learning
    • Invariant Geometric Modeling: SchNet, DimeNet, SphereNet, GemNet.
    • E(3)-Equivariant and SE(3)-Equivariant Geometric Modeling: TFN, EGNN, PaiNN.
  • Self-supervised Pretraining
    • Self-supervised Learning in Computer Vision: InfoNCE, SimCLR, BYOL, SimSiam.
    • Self-supervised Learning in Molecular Graph: N-Gram Graph, AttrMask, ContextPred, InfoGraph, MolCLR, GraphSSL.
  • Geometric Self-supervised Pretraining
    • Self-supervised Learning Framework on Geometry: GeoSSL, GeoSSL-1L, 3D-EMGP, Uni-Mol.
    • Self-supervised Learning Framework on Topology and Geometry: GraphMVP, 3D InfoMax, MoleculeSDE, NeuralCrystal.
  • Conclusion