Skip to content

Commit

Permalink
anew
Browse files Browse the repository at this point in the history
  • Loading branch information
utksi committed Jan 15, 2025
1 parent da5445f commit e0d4ea9
Show file tree
Hide file tree
Showing 8 changed files with 224 additions and 167 deletions.
45 changes: 34 additions & 11 deletions _posts/2023-08-01-mace.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,9 +10,11 @@ related_posts: false
---

### **Introduction**

MACE (Message Passing Atomic Cluster Expansion) is an equivariant message-passing neural network that uses higher-order messages to enhance the accuracy and efficiency of force fields in computational chemistry.

### **Node Representation**

Each node \(\large{i}\) is represented by:

\[
Expand All @@ -22,38 +24,43 @@ Each node \(\large{i}\) is represented by:
where \(r_i \in \mathbb{R}^3\) is the position, \(\large{z_i}\) is the chemical element, and \(\large{h_i^{(t)}}\) are the learnable features at layer \(\large{t}\).

### **Message Construction**

Messages are constructed hierarchically using a body order expansion:

\[
m_i^{(t)} = \sum_j u_1(\sigma_i^{(t)}, \sigma_j^{(t)}) + \sum_{j_1, j_2} u_2(\sigma_i^{(t)}, \sigma_{j_1}^{(t)}, \sigma_{j_2}^{(t)}) + \cdots + \sum_{j_1, \ldots, j_\nu} u_\nu(\sigma_i^{(t)}, \sigma_{j_1}^{(t)}, \ldots, \sigma_{j_\nu}^{(t)})
m*i^{(t)} = \sum_j u_1(\sigma_i^{(t)}, \sigma_j^{(t)}) + \sum*{j*1, j_2} u_2(\sigma_i^{(t)}, \sigma*{j*1}^{(t)}, \sigma*{j*2}^{(t)}) + \cdots + \sum*{j*1, \ldots, j*\nu} u*\nu(\sigma_i^{(t)}, \sigma*{j*1}^{(t)}, \ldots, \sigma*{j\_\nu}^{(t)})
\]

### **Two-body Message Construction**

For two-body interactions, the message \(m_i^{(t)}\) is:

\[
A_i^{(t)} = \sum_{j \in N(i)} R_{kl_1l_2l_3}^{(t)}(r_{ij}) Y_{l_1}^{m_1}(\hat{r}_{ij}) W_{kk_2l_2}^{(t)} h_{j,k_2l_2m_2}^{(t)}
A*i^{(t)} = \sum*{j \in N(i)} R*{kl_1l_2l_3}^{(t)}(r*{ij}) Y*{l_1}^{m_1}(\hat{r}*{ij}) W*{kk_2l_2}^{(t)} h*{j,k_2l_2m_2}^{(t)}
\]

where \(\large{R}\) is a learnable radial basis, \(\large{Y}\) are spherical harmonics, and \(\large{W}\) are learnable weights. \(\large{C}\) are Clebsch-Gordan coefficients ensuring equivariance.

### **Higher-order Feature Construction**

Higher-order features are constructed using tensor products and symmetrization:

\[
\large{B_{i, \eta \nu k LM}^{(t)} = \sum_{lm} C_{LM \eta \nu, lm} \prod_{\xi=1}^\nu \sum_{k_\xi} w_{kk_\xi l_\xi}^{(t)} A_{i, k_\xi l_\xi m_\xi}^{(t)}}
\large{B*{i, \eta \nu k LM}^{(t)} = \sum*{lm} C*{LM \eta \nu, lm} \prod*{\xi=1}^\nu \sum*{k*\xi} w*{kk*\xi l*\xi}^{(t)} A*{i, k*\xi l*\xi m\_\xi}^{(t)}}
\]

where \(\large{C}\) are generalized Clebsch-Gordan coefficients.

### **Message Passing**

The message passing updates the node features by aggregating messages:

\[
\large{h_i^{(t+1)} = U_{kL}^{(t)}(\sigma_i^{(t)}, m_i^{(t)}) = \sum_{k'} W_{kL, k'}^{(t)} m_{i, k' LM} + \sum_{k'} W_{z_i kL, k'}^{(t)} h_{i, k' LM}^{(t)}}
\large{h*i^{(t+1)} = U*{kL}^{(t)}(\sigma*i^{(t)}, m_i^{(t)}) = \sum*{k'} W*{kL, k'}^{(t)} m*{i, k' LM} + \sum*{k'} W*{z*i kL, k'}^{(t)} h*{i, k' LM}^{(t)}}
\]

### **Readout Phase**

In the readout phase, invariant features are mapped to site energies:

\[
Expand All @@ -63,35 +70,39 @@ In the readout phase, invariant features are mapped to site energies:
where:

\[
\large{E_i^{(t)} = R_t(h_i^{(t)}) = \sum_{k'} W_{\text{readout}, k'}^{(t)} h_{i, k' 00}^{(t)} \quad \text{for } t < T}
\large{E*i^{(t)} = R_t(h_i^{(t)}) = \sum*{k'} W*{\text{readout}, k'}^{(t)} h*{i, k' 00}^{(t)} \quad \text{for } t < T}
\]

\[
\large{E_i^{(T)} = \text{MLP}_{\text{readout}}^{(t)}(\{h_{i, k 00}^{(t)}\})}
\large{E*i^{(T)} = \text{MLP}*{\text{readout}}^{(t)}(\{h\_{i, k 00}^{(t)}\})}
\]

### **Equivariance**

The model ensures equivariance under rotation \(\large{Q \in O(3)}\):

\[
\large{h_i^{(t)}(Q \cdot (r_1, \ldots, r_N)) = D(Q) h_i^{(t)}(r_1, \ldots, r_N)}
\]

where \(\large{D(Q)}\) is a Wigner D-matrix. For feature \(\large{h_{i, k LM}^{(t)}}\), it transforms as:
where \(\large{D(Q)}\) is a Wigner D-matrix. For feature \(\large{h\_{i, k LM}^{(t)}}\), it transforms as:

\[
\large{h_{i, k LM}^{(t)}(Q \cdot (r_1, \ldots, r_N)) = \sum_{M'} D_L(Q)_{M'M} h_{i, k LM'}^{(t)}(r_1, \ldots, r_N)}
\large{h*{i, k LM}^{(t)}(Q \cdot (r_1, \ldots, r_N)) = \sum*{M'} D*L(Q)*{M'M} h\_{i, k LM'}^{(t)}(r_1, \ldots, r_N)}
\]

## Properties and Computational Efficiency

1. **Body Order Expansion**:

- MACE constructs messages using higher body order expansions, enabling rich representations of atomic environments.

2. **Computational Efficiency**:

- The use of higher-order messages reduces the required number of message-passing layers to two, enhancing computational efficiency and scalability.

3. **Receptive Field**:

- MACE maintains a small receptive field by decoupling correlation order increase from the number of message-passing iterations, facilitating parallelization.

4. **State-of-the-Art Performance**:
Expand All @@ -106,12 +117,15 @@ For further details, refer to the [Batatia et al.](https://arxiv.org/abs/2206.07
### 1. **Spherical Harmonics**

**Concept**:

- Spherical harmonics \(Y^L_M\) are functions defined on the surface of a sphere. They are used in many areas of physics, including quantum mechanics and electrodynamics, to describe the angular part of a system.

**Role in MACE**:

- Spherical harmonics are used to decompose the angular dependency of the atomic environment. This helps in capturing the rotational properties of the features in a systematic way.

**Mathematically**:

- The spherical harmonics \(Y^L_M(\theta, \phi)\) are given by:

\[
Expand All @@ -123,29 +137,35 @@ where \(P^M_L\) are the associated Legendre polynomials.
### 2. **Clebsch-Gordan Coefficients**

**Concept**:

- Clebsch-Gordan coefficients are used in quantum mechanics to combine angular momenta. They arise in the coupling of two angular momentum states to form a new angular momentum state.

**Role in MACE**:

- In MACE, Clebsch-Gordan coefficients are used to combine features from different atoms while maintaining rotational invariance. They ensure that the resulting features transform correctly under rotations, preserving the physical symmetry of the system.

**Mathematically**:

- When combining two angular momentum states \(\vert l_1, m_1\rangle\) and \(\vert l_2, m_2\rangle\), the resulting state \(\vert L, M\rangle\) is given by:

\[
|L, M\rangle = \sum_{m_1, m_2} C_{L, M}^{l_1, m_1; l_2, m_2} |l_1, m_1\rangle |l_2, m_2\rangle
|L, M\rangle = \sum*{m_1, m_2} C*{L, M}^{l_1, m_1; l_2, m_2} |l_1, m_1\rangle |l_2, m_2\rangle
\]

where \(C_{L, M}^{l_1, m_1; l_2, m_2}\) are the Clebsch-Gordan coefficients.
where \(C\_{L, M}^{l_1, m_1; l_2, m_2}\) are the Clebsch-Gordan coefficients.

### 3. **\(O(3)\) Rotations**

**Concept**:

- The group \(O(3)\) consists of all rotations and reflections in three-dimensional space. It represents the symmetries of a 3D system, including operations that preserve the distance between points.

**Role in MACE**:

- Ensuring that the neural network respects \(O(3)\) symmetry is crucial for modeling physical systems accurately. MACE achieves this by using operations that are invariant or equivariant under these rotations and reflections.

**Mathematically**:

- A rotation in \(O(3)\) can be represented by a 3x3 orthogonal matrix \(Q\) such that:

\[
Expand All @@ -157,14 +177,17 @@ where \(I\) is the identity matrix.
### 4. **Wigner D-matrix**

**Concept**:

- The Wigner D-matrix \(D^L(Q)\) represents the action of a rotation \(Q\) on spherical harmonics. It provides a way to transform the components of a tensor under rotation.

**Role in MACE**:

- Wigner D-matrices are used to ensure that the feature vectors in the neural network transform correctly under rotations. This is essential for maintaining the rotational equivariance of the model.

**Mathematically**:

- For a rotation \(Q \in O(3)\) and a spherical harmonic of degree \(L\), the Wigner D-matrix \(D^L(Q)\) is a \((2L+1) \times (2L+1)\) matrix. If \(Y^L_M\) is a spherical harmonic, then under rotation \(Q\), it transforms as:

\[
Y^L_M(Q \cdot \mathbf{r}) = \sum_{M'=-L}^{L} D^L_{M'M}(Q) Y^L_{M'}(\mathbf{r})
Y^L*M(Q \cdot \mathbf{r}) = \sum*{M'=-L}^{L} D^L*{M'M}(Q) Y^L*{M'}(\mathbf{r})
\]
28 changes: 14 additions & 14 deletions _posts/2024-04-18-kan.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,41 +23,41 @@ The motivation for KANs stems from the limitations of MLPs, such as fixed activa
The Kolmogorov-Arnold representation theorem states:

\[
f(x) = \sum_{q=1}^{2n+1} \Phi_q \left( \sum_{p=1}^n \varphi_{q,p}(x_p) \right)
f(x) = \sum*{q=1}^{2n+1} \Phi_q \left( \sum*{p=1}^n \varphi\_{q,p}(x_p) \right)
\]

where \(\varphi_{q,p} : [0, 1] \to \mathbb{R}\) and \(\Phi_q : \mathbb{R} \to \mathbb{R}\).
where \(\varphi\_{q,p} : [0, 1] \to \mathbb{R}\) and \(\Phi_q : \mathbb{R} \to \mathbb{R}\).

### 3. KAN Architecture

KANs generalize the representation theorem to arbitrary depths and widths. Each weight parameter in KANs is replaced by a learnable 1D function (spline).

#### 3.1. Mathematical Formulation of KANs

Define a KAN layer with \(n_{\text{in}}\)-dimensional inputs and \(n_{\text{out}}\)-dimensional outputs as a matrix of 1D functions:
Define a KAN layer with \(n*{\text{in}}\)-dimensional inputs and \(n*{\text{out}}\)-dimensional outputs as a matrix of 1D functions:

\[
\Phi = \{ \varphi_{q,p} \}, \quad p = 1, 2, \ldots, n_{\text{in}}, \quad q = 1, 2, \ldots, n_{\text{out}}
\Phi = \{ \varphi*{q,p} \}, \quad p = 1, 2, \ldots, n*{\text{in}}, \quad q = 1, 2, \ldots, n\_{\text{out}}
\]

Activation function on edge \(\varphi_{l,j,i}\) between layer \(l\) and \(l+1\) is given by:
Activation function on edge \(\varphi\_{l,j,i}\) between layer \(l\) and \(l+1\) is given by:

\[
\varphi_{l,j,i}(x) = w \big(b(x) + \text{spline}(x)\big)
\varphi\_{l,j,i}(x) = w \big(b(x) + \text{spline}(x)\big)
\]

where \(b(x) = \text{silu}(x) = \frac{x}{1 + e^{-x}}\).

The output of each layer is computed as:

\[
x_{l+1, j} = \sum_{i=1}^{n_l} \varphi_{l,j,i}(x_{l,i})
x*{l+1, j} = \sum*{i=1}^{n*l} \varphi*{l,j,i}(x\_{l,i})
\]

in matrix form:

\[
x_{l+1} = \Phi_l x_l
x\_{l+1} = \Phi_l x_l
\]

where \(\Phi_l\) is the function matrix of layer \(l\).
Expand All @@ -71,13 +71,13 @@ KANs can approximate functions by decomposing high-dimensional problems into sev
Let \(f(x)\) be represented as:

\[
f = (\Phi_{L-1} \circ \Phi_{L-2} \circ \cdots \circ \Phi_1 \circ \Phi_0)x
f = (\Phi*{L-1} \circ \Phi*{L-2} \circ \cdots \circ \Phi_1 \circ \Phi_0)x
\]

For each \(\Phi_{l,i,j}\), there exist \(k\)-th order B-spline functions \(\Phi_{l,i,j}^G\) such that:
For each \(\Phi*{l,i,j}\), there exist \(k\)-th order B-spline functions \(\Phi*{l,i,j}^G\) such that:

\[
\| f - (\Phi_{L-1}^G \circ \Phi_{L-2}^G \circ \cdots \circ \Phi_1^G \circ \Phi_0^G)x \|_{C^m} \leq C G^{-k-1+m}
\| f - (\Phi*{L-1}^G \circ \Phi*{L-2}^G \circ \cdots \circ \Phi*1^G \circ \Phi_0^G)x \|*{C^m} \leq C G^{-k-1+m}
\]

where \(G\) is the grid size and \(C\) depends on \(f\) and its representation.
Expand All @@ -87,7 +87,7 @@ where \(G\) is the grid size and \(C\) depends on \(f\) and its representation.
KANs can increase accuracy by refining the grid used in splines:

\[
\{c'_j\} = \arg\min_{\{c'_j\}} E_{x \sim p(x)} \left( \sum_{j=0}^{G2+k-1} c'_j B'_j(x) - \sum_{i=0}^{G1+k-1} c_i B_i(x) \right)^2
\{c'_j\} = \arg\min_{\{c'_j\}} E_{x \sim p(x)} \left( \sum*{j=0}^{G2+k-1} c'\_j B'\_j(x) - \sum*{i=0}^{G1+k-1} c_i B_i(x) \right)^2
\]

### 6. Simplification Techniques
Expand All @@ -101,10 +101,10 @@ KANs were shown to have better scaling laws than MLPs, achieving lower test loss
#### Example Functions:

1. Bessel function: \(f(x) = J_0(20x)\)
2. High-dimensional function:
2. High-dimensional function:

\[
f(x_1, \ldots, x_{100}) = \exp\left( \frac{1}{100} \sum_{i=1}^{100} \sin^2\left(\frac{\pi x_i}{2}\right) \right)
f(x*1, \ldots, x*{100}) = \exp\left( \frac{1}{100} \sum\_{i=1}^{100} \sin^2\left(\frac{\pi x_i}{2}\right) \right)
\]

KANs can achieve near-theoretical scaling exponents \(\alpha = 4\), outperforming MLPs in accuracy and parameter efficiency.
Expand Down
Loading

0 comments on commit e0d4ea9

Please sign in to comment.