Multi-modal Learning in Chemistry and Biology
-- A Perspective from Topology, Geometry, and Textual Description
NeurIPS 2023 Tutorial
- 1Mila - Québec AI Institute
- 2Université de Montréal
Multi-modal learning has revolutionized computer vision (CV) and natural language processing (NLP) communities. Simultaneously, artificial intelligence (AI) for chemistry and biology has brought more potential to solve challenging problems. Thus, a recent trend is to adopt the promising multi-modal learning to science fields. In this tutorial, we would like to discuss recent multi-modal learning works in chemistry and biology from an aspect of topology, geometry, and textual description. We concentrate on the small molecules and proteins, the two most fundamental building blocks in science problems. We start with the internal structures of chemicals, i.e., topology and geometry. Then we will discuss how the textual data, as an external description, enables more versatile functionalities.
- Introduction on topology, geometry, and textual description
- Topology in chemistry and biology data.
- Geometry and Geom3D platform.
- Invariant modeling: SchNet, DimeNet, SphereNet, GemNet.
- Equivariant modeling using spherical frame basis: TFN, SE3-Transformer.
- Equivariant modeling using vector frame basis: EGNN, PaiNN.
- Textual descriptions: PubChem, PubMed, UniProt, and Gene Ongology.
- Pretraining with topology
- Self-supervised Learning in CV: InfoNCE, SimCLR, BYOL, SimSiam.
- Self-supervised learning in chemistry and biology: N-Gram Graph, AttrMask, ContextPred, InfoGraph, MolCLR, GraphSSL.
- Pretraining with geometry
- Self-supervised Learning Framework on Geometry: GeoSSL, GeoSSL-1L, 3D-EMGP, Uni-Mol.
- Self-supervised Learning Framework on Topology and Geometry: GraphMVP, 3D InfoMax, MoleculeSDE.
- Pretraining with textual description
- Text and molecule: MolT5, MoMu, MoleculeSTM.
- Text and protein: ProGen, ProteinDT.
- ChatGPT provides more opportunities: ChemCrow, ChatDrug.
- Panel discussion