Skip to content

Commit

Permalink
anew
Browse files Browse the repository at this point in the history
  • Loading branch information
utksi committed Jan 15, 2025
1 parent 2ed91e7 commit d5bc517
Show file tree
Hide file tree
Showing 8 changed files with 811 additions and 575 deletions.
12 changes: 6 additions & 6 deletions _config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,8 @@
# -----------------------------------------------------------------------------

title: blank # the website title (if blank, full name will be used instead)
first_name: Utkarsh,
middle_name:
first_name: Utkarsh
middle_name:
last_name: Singh
contact_note: >
Best way to reach me is on my email: utkarsh.singh@liu.se
Expand Down Expand Up @@ -87,8 +87,8 @@ bing_site_verification: # out your bing-site-verification ID (Bing Webmaster)
# Blog
# -----------------------------------------------------------------------------

blog_name: Utkarsh # blog_name will be displayed in your blog page
blog_description: a simple whitespace theme for academics
blog_name: Worklog # blog_name will be displayed in your blog page
blog_description: This is where thoughts and prayers are compiled
permalink: /blog/:year/:title/
lsi: false # produce an index for related posts

Expand Down Expand Up @@ -116,15 +116,15 @@ giscus:
lang: en

# Disqus comments (DEPRECATED)
disqus_shortname: al-folio # put your disqus shortname
disqus_shortname: utksi # put your disqus shortname
# https://help.disqus.com/en/articles/1717111-what-s-a-shortname

# External sources.
# If you have blog posts published on medium.com or other external sources,
# you can display them in your blog by adding a link to the RSS feed.
external_sources:
- name: medium.com
rss_url: https://medium.com/@al-folio/feed
rss_url: https://medium.com/@utksi/feed
- name: Google Blog
posts:
- url: https://blog.google/technology/ai/google-gemini-update-flash-ai-assistant-io-2024/
Expand Down
2 changes: 1 addition & 1 deletion _pages/about.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
layout: about
title: about
permalink: /
subtitle: <a href='#'>Affiliations</a>. Address. Contacts. Motto. Etc.
subtitle: <a href='#'>Ph.D. Student, Linköping University</a>.

profile:
align: right
Expand Down
138 changes: 67 additions & 71 deletions _posts/2023-08-01-mace.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,83 +4,84 @@ title: "MACE (Message Passing ACE)"
date: 2023-08-01 22:36
description: "A summary of Message Passing Atomic Cluster Expansion Graph Neural Networks"
tags: machine learning potential
categories: sample-posts
categories: worklog
giscus_comments: true
related_posts: false
---

### **Introduction**
MACE (Message Passing Atomic Cluster Expansion) is an equivariant message passing neural network that uses higher-order messages to enhance the accuracy and efficiency of force fields in computational chemistry.
MACE (Message Passing Atomic Cluster Expansion) is an equivariant message-passing neural network that uses higher-order messages to enhance the accuracy and efficiency of force fields in computational chemistry.

### **Node Representation**
Each node $\large{i}$ is represented by:
Each node \(\large{i}\) is represented by:

$$
\[
\large{\sigma_i^{(t)} = (r_i, z_i, h_i^{(t)})}
$$
\]

where $r_i \in \mathbb{R}^3$ is the position, $\large{z_i}$ is the chemical element, and $\large{h_i^{(t)}}$ are the learnable features at layer $\large{t}$.
where \(r_i \in \mathbb{R}^3\) is the position, \(\large{z_i}\) is the chemical element, and \(\large{h_i^{(t)}}\) are the learnable features at layer \(\large{t}\).

### **Message Construction**
Messages are constructed hierarchically using a body order expansion:

$$
\[
m_i^{(t)} = \sum_j u_1(\sigma_i^{(t)}, \sigma_j^{(t)}) + \sum_{j_1, j_2} u_2(\sigma_i^{(t)}, \sigma_{j_1}^{(t)}, \sigma_{j_2}^{(t)}) + \cdots + \sum_{j_1, \ldots, j_\nu} u_\nu(\sigma_i^{(t)}, \sigma_{j_1}^{(t)}, \ldots, \sigma_{j_\nu}^{(t)})
$$
\]

### **Two-body Message Construction**
For two-body interactions, the message $m_i^{(t)}$ is:
For two-body interactions, the message \(m_i^{(t)}\) is:

$$
\[
A_i^{(t)} = \sum_{j \in N(i)} R_{kl_1l_2l_3}^{(t)}(r_{ij}) Y_{l_1}^{m_1}(\hat{r}_{ij}) W_{kk_2l_2}^{(t)} h_{j,k_2l_2m_2}^{(t)}
$$
\]

where $\large{R}$ is a learnable radial basis, $\large{Y}$ are spherical harmonics, and $\large{W}$ are learnable weights. $\large{C}$ are Clebsch-Gordan coefficients ensuring equivariance.
where \(\large{R}\) is a learnable radial basis, \(\large{Y}\) are spherical harmonics, and \(\large{W}\) are learnable weights. \(\large{C}\) are Clebsch-Gordan coefficients ensuring equivariance.

### **Higher-order Feature Construction**
Higher-order features are constructed using tensor products and symmetrization:

$$
\[
\large{B_{i, \eta \nu k LM}^{(t)} = \sum_{lm} C_{LM \eta \nu, lm} \prod_{\xi=1}^\nu \sum_{k_\xi} w_{kk_\xi l_\xi}^{(t)} A_{i, k_\xi l_\xi m_\xi}^{(t)}}
$$
\]

where $\large{C}$ are generalized Clebsch-Gordan coefficients.
where \(\large{C}\) are generalized Clebsch-Gordan coefficients.

### **Message Passing**
The message passing updates the node features by aggregating messages:

$$
\[
\large{h_i^{(t+1)} = U_{kL}^{(t)}(\sigma_i^{(t)}, m_i^{(t)}) = \sum_{k'} W_{kL, k'}^{(t)} m_{i, k' LM} + \sum_{k'} W_{z_i kL, k'}^{(t)} h_{i, k' LM}^{(t)}}
$$
\]

### **Readout Phase**
In the readout phase, invariant features are mapped to site energies:

$$
\[
\large{E_i = E_i^{(0)} + E_i^{(1)} + \cdots + E_i^{(T)}}
$$
\]

where:

$$
\[
\large{E_i^{(t)} = R_t(h_i^{(t)}) = \sum_{k'} W_{\text{readout}, k'}^{(t)} h_{i, k' 00}^{(t)} \quad \text{for } t < T}
$$
\]

$$
\[
\large{E_i^{(T)} = \text{MLP}_{\text{readout}}^{(t)}(\{h_{i, k 00}^{(t)}\})}
$$
\]

### **Equivariance**
The model ensures equivariance under rotation $\large{Q \in O(3)}$ :
The model ensures equivariance under rotation \(\large{Q \in O(3)}\):

$$
\[
\large{h_i^{(t)}(Q \cdot (r_1, \ldots, r_N)) = D(Q) h_i^{(t)}(r_1, \ldots, r_N)}
$$
\]

where $\large{D(Q)}$ is a Wigner D-matrix. For feature $\large{h_{i, k LM}^{(t)}}$, it transforms as:
where \(\large{D(Q)}\) is a Wigner D-matrix. For feature \(\large{h_{i, k LM}^{(t)}}\), it transforms as:

$$
\[
\large{h_{i, k LM}^{(t)}(Q \cdot (r_1, \ldots, r_N)) = \sum_{M'} D_L(Q)_{M'M} h_{i, k LM'}^{(t)}(r_1, \ldots, r_N)}
$$
\]

## Properties and Computational Efficiency

Expand All @@ -98,77 +99,72 @@ $$

For further details, refer to the [Batatia et al.](https://arxiv.org/abs/2206.07697).

---

## Necessary math to know:

## Necessary Math to Know

### 1. **Spherical Harmonics**

**Concept:**
- Spherical harmonics $Y^L_M$ are functions defined on the surface of a sphere. They are used in many areas of physics, including quantum mechanics and electrodynamics, to describe the angular part of a system.
**Concept**:
- Spherical harmonics \(Y^L_M\) are functions defined on the surface of a sphere. They are used in many areas of physics, including quantum mechanics and electrodynamics, to describe the angular part of a system.

**Role in MACE:**
**Role in MACE**:
- Spherical harmonics are used to decompose the angular dependency of the atomic environment. This helps in capturing the rotational properties of the features in a systematic way.

**Mathematically:**
- The spherical harmonics $Y^L_M(\theta, \phi)$ are given by:
**Mathematically**:
- The spherical harmonics \(Y^L_M(\theta, \phi)\) are given by:

$$
Y^L_M(\theta, \phi) = \sqrt{\frac{(2L+1)}{4\pi} \frac{(L-M)!}{(L+M)!}} P^M_L(\cos \theta) e^{iM\phi}
$$
\[
Y^L_M(\theta, \phi) = \sqrt{\frac{(2L+1)}{4\pi} \frac{(L-M)!}{(L+M)!}} P^M_L(\cos \theta) e^{iM\phi}
\]

where $P^M_L$ are the associated Legendre polynomials.
where \(P^M_L\) are the associated Legendre polynomials.

### 2. **Clebsch-Gordan Coefficients**

**Concept:**
**Concept**:
- Clebsch-Gordan coefficients are used in quantum mechanics to combine angular momenta. They arise in the coupling of two angular momentum states to form a new angular momentum state.

**Role in MACE:**
**Role in MACE**:
- In MACE, Clebsch-Gordan coefficients are used to combine features from different atoms while maintaining rotational invariance. They ensure that the resulting features transform correctly under rotations, preserving the physical symmetry of the system.

**Mathematically:**
- When combining two angular momentum states $\vert l_1, m_1\rangle$ and $\vert l_2, m_2\rangle$, the resulting state $\vert L, M\rangle$ is given by:

$$
**Mathematically**:
- When combining two angular momentum states \(\vert l_1, m_1\rangle\) and \(\vert l_2, m_2\rangle\), the resulting state \(\vert L, M\rangle\) is given by:

\[
|L, M\rangle = \sum_{m_1, m_2} C_{L, M}^{l_1, m_1; l_2, m_2} |l_1, m_1\rangle |l_2, m_2\rangle
\]

$$
where \(C_{L, M}^{l_1, m_1; l_2, m_2}\) are the Clebsch-Gordan coefficients.

where $C_{L, M}^{l_1, m_1; l_2, m_2}$ are the Clebsch-Gordan coefficients.
### 3. **\(O(3)\) Rotations**

### 3. **$O(3)$ Rotations**
**Concept**:
- The group \(O(3)\) consists of all rotations and reflections in three-dimensional space. It represents the symmetries of a 3D system, including operations that preserve the distance between points.

**Concept:**
- The group $O(3)$ consists of all rotations and reflections in three-dimensional space. It represents the symmetries of a 3D system, including operations that preserve the distance between points.
**Role in MACE**:
- Ensuring that the neural network respects \(O(3)\) symmetry is crucial for modeling physical systems accurately. MACE achieves this by using operations that are invariant or equivariant under these rotations and reflections.

**Role in MACE:**
- Ensuring that the neural network respects $O(3)$ symmetry is crucial for modeling physical systems accurately. MACE achieves this by using operations that are invariant or equivariant under these rotations and reflections.
**Mathematically**:
- A rotation in \(O(3)\) can be represented by a 3x3 orthogonal matrix \(Q\) such that:

**Mathematically:**
- A rotation in $O(3)$ can be represented by a 3x3 orthogonal matrix $Q$ such that:
\[
Q^T Q = I \quad \text{and} \quad \det(Q) = \pm 1
\]

$$
Q^T Q = I \quad \text{and} \quad \det(Q) = \pm 1
$$

where $I$ is the identity matrix.
where \(I\) is the identity matrix.

### 4. **Wigner D-matrix**

**Concept:**
- The Wigner D-matrix $D^L(Q)$ represents the action of a rotation $Q$ on spherical harmonics. It provides a way to transform the components of a tensor under rotation.
**Concept**:
- The Wigner D-matrix \(D^L(Q)\) represents the action of a rotation \(Q\) on spherical harmonics. It provides a way to transform the components of a tensor under rotation.

**Role in MACE:**
**Role in MACE**:
- Wigner D-matrices are used to ensure that the feature vectors in the neural network transform correctly under rotations. This is essential for maintaining the rotational equivariance of the model.

**Mathematically:**
- For a rotation $Q \in O(3)$ and a spherical harmonic of degree $L$, the Wigner D-matrix $D^L(Q)$ is a $(2L+1) \times (2L+1)$ matrix. If $Y^L_M$ is a spherical harmonic, then under rotation $Q$, it transforms as:

$$
Y^L_M(Q \cdot \mathbf{r}) = \sum_{M'=-L}^{L} D^L_{M'M}(Q) Y^L_{M'}(\mathbf{r})
$$


**Mathematically**:
- For a rotation \(Q \in O(3)\) and a spherical harmonic of degree \(L\), the Wigner D-matrix \(D^L(Q)\) is a \((2L+1) \times (2L+1)\) matrix. If \(Y^L_M\) is a spherical harmonic, then under rotation \(Q\), it transforms as:

\[
Y^L_M(Q \cdot \mathbf{r}) = \sum_{M'=-L}^{L} D^L_{M'M}(Q) Y^L_{M'}(\mathbf{r})
\]
114 changes: 114 additions & 0 deletions _posts/2024-04-18-kan.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,114 @@
---
layout: post
title: "[Stuff I read] Kolmogorov Arnold Networks"
date: 2024-04-18 22:31
description: "A very short summary of main ideas from Liu et al."
tags: neural-network
categories: journal-club
giscus_comments: true
related_posts: false
---

- Proposed by Max Tegmark's group. See [Liu et al.](https://arxiv.org/abs/2404.19756)
- A hot topic on Twitter - a matter of a lot of debate.

The paper "KAN: Kolmogorov–Arnold Networks" proposes Kolmogorov-Arnold Networks (KANs) as an alternative to Multi-Layer Perceptrons (MLPs). The core idea behind KANs is inspired by the Kolmogorov-Arnold representation theorem, which states that any multivariate continuous function can be represented as a sum of continuous functions of one variable. This section will summarize the technical details of the paper, focusing on the mathematical formulations.

### 1. Introduction

The motivation for KANs stems from the limitations of MLPs, such as fixed activation functions on nodes and linear weights. MLPs rely heavily on the universal approximation theorem, but their structure can be less efficient and interpretable. KANs, on the other hand, utilize learnable activation functions on edges and replace linear weights with univariate functions parametrized as splines.

### 2. Kolmogorov-Arnold Representation Theorem

The Kolmogorov-Arnold representation theorem states:

\[
f(x) = \sum_{q=1}^{2n+1} \Phi_q \left( \sum_{p=1}^n \varphi_{q,p}(x_p) \right)
\]

where \(\varphi_{q,p} : [0, 1] \to \mathbb{R}\) and \(\Phi_q : \mathbb{R} \to \mathbb{R}\).

### 3. KAN Architecture

KANs generalize the representation theorem to arbitrary depths and widths. Each weight parameter in KANs is replaced by a learnable 1D function (spline).

#### 3.1. Mathematical Formulation of KANs

Define a KAN layer with \(n_{\text{in}}\)-dimensional inputs and \(n_{\text{out}}\)-dimensional outputs as a matrix of 1D functions:

\[
\Phi = \{ \varphi_{q,p} \}, \quad p = 1, 2, \ldots, n_{\text{in}}, \quad q = 1, 2, \ldots, n_{\text{out}}
\]

Activation function on edge \(\varphi_{l,j,i}\) between layer \(l\) and \(l+1\) is given by:

\[
\varphi_{l,j,i}(x) = w \big(b(x) + \text{spline}(x)\big)
\]

where \(b(x) = \text{silu}(x) = \frac{x}{1 + e^{-x}}\).

The output of each layer is computed as:

\[
x_{l+1, j} = \sum_{i=1}^{n_l} \varphi_{l,j,i}(x_{l,i})
\]

in matrix form:

\[
x_{l+1} = \Phi_l x_l
\]

where \(\Phi_l\) is the function matrix of layer \(l\).

### 4. Approximation Abilities and Scaling Laws

KANs can approximate functions by decomposing high-dimensional problems into several 1D problems, effectively avoiding the curse of dimensionality.

#### Theorem 2.1: Approximation Bound

Let \(f(x)\) be represented as:

\[
f = (\Phi_{L-1} \circ \Phi_{L-2} \circ \cdots \circ \Phi_1 \circ \Phi_0)x
\]

For each \(\Phi_{l,i,j}\), there exist \(k\)-th order B-spline functions \(\Phi_{l,i,j}^G\) such that:

\[
\| f - (\Phi_{L-1}^G \circ \Phi_{L-2}^G \circ \cdots \circ \Phi_1^G \circ \Phi_0^G)x \|_{C^m} \leq C G^{-k-1+m}
\]

where \(G\) is the grid size and \(C\) depends on \(f\) and its representation.

### 5. Grid Extension Technique

KANs can increase accuracy by refining the grid used in splines:

\[
\{c'_j\} = \arg\min_{\{c'_j\}} E_{x \sim p(x)} \left( \sum_{j=0}^{G2+k-1} c'_j B'_j(x) - \sum_{i=0}^{G1+k-1} c_i B_i(x) \right)^2
\]

### 6. Simplification Techniques

KANs can be made more interpretable by sparsification, pruning, and symbolification. The \(L^1\) norm and entropy regularization can be used to sparsify the network.

### 7. Toy Examples and Empirical Results

KANs were shown to have better scaling laws than MLPs, achieving lower test losses with fewer parameters in various toy datasets and special functions.

#### Example Functions:

1. Bessel function: \(f(x) = J_0(20x)\)
2. High-dimensional function:

\[
f(x_1, \ldots, x_{100}) = \exp\left( \frac{1}{100} \sum_{i=1}^{100} \sin^2\left(\frac{\pi x_i}{2}\right) \right)
\]

KANs can achieve near-theoretical scaling exponents \(\alpha = 4\), outperforming MLPs in accuracy and parameter efficiency.

### Conclusion

KANs provide a novel approach to neural network design, leveraging the Kolmogorov-Arnold representation theorem to achieve better performance and interpretability compared to traditional MLPs. The use of learnable activation functions on edges and splines allows for greater flexibility and efficiency in function approximation.
Loading

0 comments on commit d5bc517

Please sign in to comment.