Structured Multi-task Learning for Molecular Property Prediction
Multi-task learning for molecular property prediction is becoming increasingly important in drug discovery. However, in contrast to other domains, the performance of multi-task learning in drug discovery is still not satisfying as the number of labeled data for each task is too limited, which calls for additional data to complement the data scarcity. In this paper, we study multi-task learning for molecular property prediction in a novel setting, where a relation graph between tasks is available.
2. Dataset Generation
We first generate a dataset with explicit task relation graph, by better utilizing the domain knowledge. The high-level pipeline is as follows:
- Extract molceule-task pairs from ChEMBL (focus on bioassay task).
- Extract the proteins w.r.t. each bioassay task.
- Extract protein interactions from PPI.
- Aggregate the protein interations for task relation.
3. Method: SGNN-EBM
Then to better utilize such relation graph, we propose a method called SGNN-EBM to systematically investigate the structured task modeling from two perspectives.
- In the latent space, we model the task representations by applying a state graph neural network (SGNN) on the relation graph.
- In the output space, we employ structured prediction with the energy-based model (EBM), which can be efficiently trained through noise-contrastive estimation (NCE) approach.