Multi-modal Foundation Model for Scientific Discovery:
With Applications in Chemistry, Material, and Biology
AAAI 2025 Tutorial
8:30am - 12:30pm, 25th Feb
TH05, Room 117, Philadelphia Convention Center, Philadelphia, PA USA

  • 1University of California, Berkeley
  • 2University of Oxford

Abstract

The foundation model has become popular in solving many vision-language tasks, yet the exploration of using the foundation models for solving scientific tasks has just started. There are vast opportunities there due to the comprehensive functions of foundation models as well as the complicated order of magnitude of scientific problems in reality. Combining these two reveals the extreme potential of the foundation model for science. Meanwhile, challenges still remain. For instance, what is the boundary of using foundation models for solving scientific problems? What are the feasible tasks with measurable evaluation metrics? How do we utilize the heterogeneous multi-modal data in the scientific domain? We will provide detailed answers to these questions in this tutorial.

Keywords: Artificial intelligence, machine learning, deep learning, large language model, scientific discovery, physics, chemistry, biology, material, drug discovery, material discovery, graph representation learning, multi-modal learning, multi-agent.

Outline

Part 1: Introduction

  • Achievements of AI for Science
  • What is AI for Science and why?
  • What is foundation model for science?
  • Goals -- A Roadmap in FM4Science

Part 2: AI and Physics Foundation

  • Molecule and Geometry
  • Data Structure
  • Density Estimation & Generative Modeling
  • Pretraining

Part 3: FM for Chemistry and Material

  • Single-modal modeling
    • Representation: Fingerprint, String, Neural Fingerprint, MPNN, SE(3)-equivariant Modeling
    • Pretraining: N-Gram Graph, GraphMVP, MoleculeSDE, GeoSSL-DDM
    • Downstream:
      • [PhysAI] DeepMD, NeuralMD
      • [GenAI] Character VAE, Grammar VAE, HierVAE, EDM, DiffCSP, MatterGen, CrystalLLM, FlowLLM, AssembleFlow
  • From Single-modal to Multi-modal modeling
    • GraphCG
  • Multi-modal modeling
    • [Early Exploration] KV-PLM, MolT5
    • [Steering] MoleculeSTM, 3DToMolo, MoleculeSTM-3D, MOFFUSION
    • [Reasoning & Planning] ChatDrug, Co-scientist, AI Co-scientist
  • Insights for Wet-lab

Part 4: FM for Biology

  • Single-modal modeling
    • Representation: MSA Transformer, MaSIF, AlphaFold2
    • Pretraining: ProteinTeritiary, Foldseek, GearNet, CLEAN
    • Downstream:
      • [PhysAI] AI2BMD
      • [GenAI] FrameDiff, FoldFlow, SurfGen, NucleusDiff
  • Multi-modal modeling
    • ProGen, ProteinDT, Chroma, ESM3, ProteinDT-3D
  • Insights for Wet-lab

Part 5: Perspective and Conclusion