Molecular Geometry Pretraining
with SE(3)-Invariant Denoising Distance Matching
In Submission

  • 1Mila
  • 2Université de Montréal
  • 3National Research Council Canada
  • 4University of Ottawa
  • 5HEC Montréal
  • 6CIFAR AI Chair


Pretraining molecular representations is critical in a variety of applications in drug and material discovery due to the limited number of labeled molecules, yet most of existing work focuses on pretraining on 2D molecular graphs. The power of pretraining on 3D geometric structures, however, has been less explored, owning to the difficulty of finding a sufficient proxy task to empower the pretraining to effectively extract essential features from the geometric structures. Motivated by the dynamic nature of 3D molecules, where the continuous motion of a molecule in the 3D Euclidean space forms a smooth potential energy surface, we propose a 3D coordinate denoising pretraining framework to model such an energy landscape. Leveraging a SE(3)-invariant score matching method, we propose SE(3)-DDM where the coordinate denoising proxy task is effectively boiled down to the denoising of the pairwise atomic distances in a molecule. Our comprehensive experiments confirm the effectiveness and robustness of our proposed method. The source codes of this paper will be released in the near future.

Problem Formulation: Molecular Geometry Pretraining

Molecules are not static but in a continuous motion in the 3D Euclidean space, forming a potential energy surface (PES). As shown in the above figure, it is desirable to study the molecule in the local minima of the PES, called conformer. However, such stable state conformer often comes with different noises for the following reasons.

  • First, the statistical and systematic errors on conformation estimation are unavoidable.
  • Second, it has been well-acknowledged that a conformer can have vibrations around the local minima in PES.
Such characteristics of the molecular geometry motivate us to attempt to denoise the molecular coordinates around the local minima, to mimic the computation errors and conformation vibration within the corresponding local region. The denoising goal is to learn molecular representations which are insensitive to such noises and effectively capture the energy surface around the local minima.

Denoising Coordinate Matching

The 3D geometric information, or the atomic coordinates are critical to molecular properties. Then based on this, we propose a geometry perturbation, which adds small noises to the atom coordinates. For notation, we define the original geometry graph and an augmented geometry graph as two views, denoted as \(g_1=(X_1, R_1)\) and \(g_2=(X_2, R_2)\) respectively. The augmented geometry graph can be seen as a coordinate perturbation to the original graph with the same atom types, i.e., \(X_2=X_1\) and \(R_2 = R_1 + \epsilon\), where \(\epsilon\) is drawn from a normal distribution.
The two views defined above share certain common information. By maximizing the mutual information (MI) between them, we expect that the learned representation can better capture the geometric information and is insensitive to noises and thus can generalize well to the target downstream tasks. To maximize the MI, we turn to maximizing the following lower bound on the two geometry views: \[ \begin{aligned} I(G_1; G_2) & = \mathbb{E}_{p(g_1,g_2)} \Big[ \log \frac{p(g_1,g_2)}{p(g_1) p(g_2)} \Big] \ge \frac{1}{2} \mathbb{E}_{p(g_1,g_2)} \Big[ \log p(g_1|g_2) + \log p(g_2|g_1) \Big] \triangleq \mathcal{L}_{\text{MI}}. \end{aligned} \]
To solve this, we introduce using the energy-based model (EBM) for estimation. To adapt it for MI maximization in our setting, the lower bound can be turned into: \[ \begin{aligned} \mathcal{L}_{\text{Coor-MI}} & = \frac{1}{2} \mathbb{E}_{p(g_1,g_2)} \Big[ \log p(R_1|g_2) \Big] + \frac{1}{2} \mathbb{E}_{p(g_1,g_2)} \Big[ \log p(R_2|g_1) \Big]\\ & = \frac{1}{2} \mathbb{E}_{p(g_1,g_2)} \Big[ \log \frac{\exp(f(R_1, g_2))}{A_{R_1|g_2}} \Big] + \frac{1}{2} \mathbb{E}_{p(g_2,g_1)} \Big[ \log \frac{\exp(f(R_2, g_1))}{A_{R_2|g_1}} \Big], \end{aligned} \] where the \(f(\cdot)\) are the negative of energy functions, and \(A_{R_1|g_2}\) and \(A_{R_2|g_1}\) are the intractable partition functions. The first equation in results from that the two views share the same atom types. This equation can be treated as denoising the atom coordinates of one view from the geometry of the other view.

Denoising Distance Matching

Then we adopt a SE(3)-invariant denoising score matching method to get the following equation: \[ \begin{aligned} \mathcal{L}_{\text{EBM-SM}} = & \frac{1}{2L} \sum_{l=1}^L \sigma_l^\beta \mathbb{E}_{p_{\text{data}}(d_1|g_2)} \mathbb{E}_{q(\tilde d_1|d_1,g_2)} \Big[ \Big\| \frac{s_\theta(\tilde d_1, g_2)}{\sigma_l} - \frac{d_1 - \tilde d_1}{\sigma_l^2}\Big\|^2_2 \Big] \\ & + \frac{1}{2L} \sum_{l=1}^L \sigma_l^\beta \mathbb{E}_{p_{\text{data}}(d_2|g_1)} \mathbb{E}_{q(\tilde d_2|d_2,g_1)} \Big[\Big\|\frac{s_\theta(\tilde d_2, g_1)}{\sigma_l} - \frac{d_2 - \tilde d_2}{\sigma_l^2} \Big\|^2_2 \Big]. \end{aligned} \] This transforms the coordinate-aware mutual information maximization to the denoising distance matching as the final objective.