Molecular Geometry Pretraining
with SE(3)-Invariant Denoising Distance Matching
ICLR 2023

  • 1Mila
  • 2Université de Montréal
  • 3National Research Council Canada
  • 4University of Ottawa
  • 5HEC Montréal
  • 6CIFAR AI Chair

Problem Formulation: Molecular Geometry Pretraining

Molecules are not static but in a continuous motion in the 3D Euclidean space, forming a potential energy surface (PES). As shown in the above figure, it is desirable to study the molecule in the local minima of the PES, called conformer. However, such stable state conformer often comes with different noises for the following reasons.

  • First, the statistical and systematic errors on conformation estimation are unavoidable.
  • Second, it has been well-acknowledged that a conformer can have vibrations around the local minima in PES.
Such characteristics of the molecular geometry motivate us to attempt to denoise the molecular coordinates around the local minima, to mimic the computation errors and conformation vibration within the corresponding local region. The denoising goal is to learn molecular representations which are insensitive to such noises and effectively capture the energy surface around the local minima.

Denoising Coordinate Matching

The 3D geometric information, or the atomic coordinates are critical to molecular properties. Then based on this, we propose a geometry perturbation, which adds small noises to the atom coordinates. For notation, we define the original geometry graph and an augmented geometry graph as two views, denoted as \(g_1=(X_1, R_1)\) and \(g_2=(X_2, R_2)\) respectively. The augmented geometry graph can be seen as a coordinate perturbation to the original graph with the same atom types, i.e., \(X_2=X_1\) and \(R_2 = R_1 + \epsilon\), where \(\epsilon\) is drawn from a normal distribution.
The two views defined above share certain common information. By maximizing the mutual information (MI) between them, we expect that the learned representation can better capture the geometric information and is insensitive to noises and thus can generalize well to the target downstream tasks. To maximize the MI, we turn to maximizing the following lower bound on the two geometry views: \[ \begin{aligned} \mathcal{L}_{\text{GeoSSL}} \triangleq \frac{1}{2} \mathbb{E}_{p(g_1,g_2)} \Big[ \log p(g_1|g_2) + \log p(g_2|g_1) \Big]. \end{aligned} \]
To solve this, we introduce using the energy-based model (EBM) for estimation. To adapt it for MI maximization in our setting, the lower bound can be turned into: \[ \begin{aligned} \mathcal{L}_{\text{GeoSSL-EBM}} & = \frac{1}{2} \mathbb{E}_{p(g_1,g_2)} \Big[ \log p(R_1|g_2) \Big] + \frac{1}{2} \mathbb{E}_{p(g_1,g_2)} \Big[ \log p(R_2|g_1) \Big]\\ & = \frac{1}{2} \mathbb{E}_{p(g_1,g_2)} \Big[ \log \frac{\exp(f(R_1, g_2))}{A_{R_1|g_2}} \Big] + \frac{1}{2} \mathbb{E}_{p(g_2,g_1)} \Big[ \log \frac{\exp(f(R_2, g_1))}{A_{R_2|g_1}} \Big], \end{aligned} \] where the \(f(\cdot)\) are the negative of energy functions, and \(A_{R_1|g_2}\) and \(A_{R_2|g_1}\) are the intractable partition functions. The first equation in results from that the two views share the same atom types. This equation can be treated as denoising the atom coordinates of one view from the geometry of the other view.

Denoising Distance Matching

Then we adopt a SE(3)-invariant denoising score matching method to get the following equation: \[ \begin{aligned} \mathcal{L}_{\text{GeoSSL-DDM}} = & \frac{1}{2L} \sum_{l=1}^L \sigma_l^\beta \mathbb{E}_{p_{\text{data}}(d_1|g_2)} \mathbb{E}_{q(\tilde d_1|d_1,g_2)} \Big[ \Big\| \frac{s_\theta(\tilde d_1, g_2)}{\sigma_l} - \frac{d_1 - \tilde d_1}{\sigma_l^2}\Big\|^2_2 \Big] \\ & + \frac{1}{2L} \sum_{l=1}^L \sigma_l^\beta \mathbb{E}_{p_{\text{data}}(d_2|g_1)} \mathbb{E}_{q(\tilde d_2|d_2,g_1)} \Big[\Big\|\frac{s_\theta(\tilde d_2, g_1)}{\sigma_l} - \frac{d_2 - \tilde d_2}{\sigma_l^2} \Big\|^2_2 \Big]. \end{aligned} \] This transforms the coordinate-aware mutual information maximization to the denoising distance matching as the final objective.

Citation