Structured Multi-task Learning for Molecular Property Prediction

Structured Multi-task Learning for Molecular Property Prediction
AISTATS 2022

¹Mila
²Université de Montréal
³HEC Montréal
⁴CIFAR AI Chair

1. Motivation

Multi-task learning for molecular property prediction is becoming increasingly important in drug discovery. However, in contrast to other domains, the performance of multi-task learning in drug discovery is still not satisfying as the number of labeled data for each task is too limited, which calls for additional data to complement the data scarcity. In this paper, we study multi-task learning for molecular property prediction in a novel setting, where a relation graph between tasks is available.

2. Dataset Generation

We first generate a dataset with explicit task relation graph, by better utilizing the domain knowledge. The high-level pipeline is as follows:

Extract molceule-task pairs from ChEMBL (focus on bioassay task).
Extract the proteins w.r.t. each bioassay task.
Extract protein interactions from PPI.
Aggregate the protein interations for task relation.

3. Method: SGNN-EBM

Then to better utilize such relation graph, we propose a method called SGNN-EBM to systematically investigate the structured task modeling from two perspectives.

In the latent space, we model the task representations by applying a state graph neural network (SGNN) on the relation graph.
In the output space, we employ structured prediction with the energy-based model (EBM), which can be efficiently trained through noise-contrastive estimation (NCE) approach.

Citation

@inproceedings{liu2022multi,
    title={Structured Multi-task Learning for Molecular Property Prediction},
    author={Liu, Shengchao and Qu, Meng and Zhang, Zuobai and Cai, Huiyu and Tang, Jian},
    booktitle={AISTATS},
    year={2022}
}