Multi-modal Foundation Model for Scientific Discovery: With Applications in Chemistry, Material, and Biology

Multi-modal Foundation Model for Scientific Discovery:
With Applications in Chemistry, Material, and Biology
AAAI 2025 Tutorial
8:30am - 12:30pm, 25th Feb
TH05, Room 117, Philadelphia Convention Center, Philadelphia, PA USA

¹University of California, Berkeley
²University of Oxford

Abstract

The foundation model has become popular in solving many vision-language tasks, yet the exploration of using the foundation models for solving scientific tasks has just started. There are vast opportunities there due to the comprehensive functions of foundation models as well as the complicated order of magnitude of scientific problems in reality. Combining these two reveals the extreme potential of the foundation model for science. Meanwhile, challenges still remain. For instance, what is the boundary of using foundation models for solving scientific problems? What are the feasible tasks with measurable evaluation metrics? How do we utilize the heterogeneous multi-modal data in the scientific domain? We will provide detailed answers to these questions in this tutorial.

Keywords: Artificial intelligence, machine learning, deep learning, large language model, scientific discovery, physics, chemistry, biology, material, drug discovery, material discovery, graph representation learning, multi-modal learning, multi-agent.

Outline

Part 1: Introduction

Achievements of AI for Science
What is AI for Science and why?
What is foundation model for science?
Goals -- A Roadmap in FM4Science

Part 2: AI and Physics Foundation

Molecule and Geometry
Data Structure
Density Estimation & Generative Modeling
Pretraining

Part 3: FM for Chemistry and Material

Single-modal modeling

Representation: Fingerprint, String, Neural Fingerprint, MPNN, SE(3)-equivariant Modeling
Pretraining: N-Gram Graph, GraphMVP, MoleculeSDE, GeoSSL-DDM
Downstream:
- [PhysAI] DeepMD, NeuralMD
- [GenAI] Character VAE, Grammar VAE, HierVAE, EDM, DiffCSP, MatterGen, CrystalLLM, FlowLLM, AssembleFlow

From Single-modal to Multi-modal modeling
- GraphCG
Multi-modal modeling
- [Early Exploration] KV-PLM, MolT5
- [Steering] MoleculeSTM, 3DToMolo, MoleculeSTM-3D, MOFFUSION
- [Reasoning & Planning] ChatDrug, Co-scientist, AI Co-scientist
Insights for Wet-lab

Part 4: FM for Biology

Single-modal modeling
- Representation: MSA Transformer, MaSIF, AlphaFold2
- Pretraining: ProteinTeritiary, Foldseek, GearNet, CLEAN
- Downstream:
  - [PhysAI] AI2BMD
  - [GenAI] FrameDiff, FoldFlow, SurfGen, NucleusDiff
Multi-modal modeling
- ProGen, ProteinDT, Chroma, ESM3, ProteinDT-3D
Insights for Wet-lab

Part 5: Perspective and Conclusion