MoleculeCLIP: Learning Transferable Molecule Multi-Modality Models via Natural Language

  • 1Mila
  • 2Université de Montréal
  • 3Nvidia Research
  • 4University of Illinois Urbana-Champaign

  • 5California Institute of Technology
  • 6Princeton University
  • 7HEC Montréal
  • 8CIFAR AI Chair
  • 9Arizona State University


Artificial intelligence has gained increasing interests in drug discovery. Existing works on molecular machine learning mainly focus on chemical structures while overlooking the vast human-understandable domain-knowledge, hindering the applicability to unseen drug design objectives and the prediction of complex biological activities. To facilitate the molecular representation learning with domain-knowledge, we present a multi-modal model, MoleculeCLIP by jointly learning from molecule chemical structures and natural language text descriptions via a contrastive learning strategy. To train our model, we build a novel large-scale dataset named PubChemCLIP with over 280K text and chemical structure pairs. In addition to molecular property prediction benchmarks, we design two challenging zero-shot tasks including retrieval and language-guided editing to highlight two crucial features of natural language: open vocabulary and compositionality. Through extensive experiments, we show that MoleculeCLIP not only reaches the best quantitative performance but also generalizes to novel biochemical concepts.

MoleculeCLIP pipeline

Here is the pipeline for pretraining and downstream.

Downstream: Molecular Property Prediction

Results for molecular property prediction.

Downstream: Zero-shot Retrieval

Results for zero-shot retrieval.

Downstream: Zero-shot Language-guided Molecule Editing

Results for zero-shot language-guided molecule editing.