Structured Multi-task Learning for Molecular Property Prediction

  • 1Mila
  • 2Université de Montréal
  • 3HEC Montréal
  • 4CIFAR AI Chair

1. Motivation

Multi-task learning for molecular property prediction is becoming increasingly important in drug discovery. However, in contrast to other domains, the performance of multi-task learning in drug discovery is still not satisfying as the number of labeled data for each task is too limited, which calls for additional data to complement the data scarcity. In this paper, we study multi-task learning for molecular property prediction in a novel setting, where a relation graph between tasks is available.

2. Dataset Generation

We first generate a dataset with explicit task relation graph, by better utilizing the domain knowledge. The high-level pipeline is as follows:

  • Extract molceule-task pairs from ChEMBL (focus on bioassay task).
  • Extract the proteins w.r.t. each bioassay task.
  • Extract protein interactions from PPI.
  • Aggregate the protein interations for task relation.

3. Method: SGNN-EBM

Then to better utilize such relation graph, we propose a method called SGNN-EBM to systematically investigate the structured task modeling from two perspectives.

  • In the latent space, we model the task representations by applying a state graph neural network (SGNN) on the relation graph.
  • In the output space, we employ structured prediction with the energy-based model (EBM), which can be efficiently trained through noise-contrastive estimation (NCE) approach.