anew

utksi · Jan 15, 2025 · d5bc517 · d5bc517
1 parent 2ed91e7
commit d5bc517
Show file tree

Hide file tree

Showing 8 changed files with 811 additions and 575 deletions.
diff --git a/_config.yml b/_config.yml
@@ -3,8 +3,8 @@
 # -----------------------------------------------------------------------------
 
 title: blank # the website title (if blank, full name will be used instead)
-first_name: Utkarsh,
-middle_name: 
+first_name: Utkarsh
+middle_name:
 last_name: Singh
 contact_note: >
   Best way to reach me is on my email: utkarsh.singh@liu.se
@@ -87,8 +87,8 @@ bing_site_verification: # out your bing-site-verification ID (Bing Webmaster)
 # Blog
 # -----------------------------------------------------------------------------
 
-blog_name: Utkarsh # blog_name will be displayed in your blog page
-blog_description: a simple whitespace theme for academics
+blog_name: Worklog # blog_name will be displayed in your blog page
+blog_description: This is where thoughts and prayers are compiled
 permalink: /blog/:year/:title/
 lsi: false # produce an index for related posts
 
@@ -116,15 +116,15 @@ giscus:
   lang: en
 
 # Disqus comments (DEPRECATED)
-disqus_shortname: al-folio # put your disqus shortname
+disqus_shortname: utksi # put your disqus shortname
 # https://help.disqus.com/en/articles/1717111-what-s-a-shortname
 
 # External sources.
 # If you have blog posts published on medium.com or other external sources,
 # you can display them in your blog by adding a link to the RSS feed.
 external_sources:
   - name: medium.com
-    rss_url: https://medium.com/@al-folio/feed
+    rss_url: https://medium.com/@utksi/feed
   - name: Google Blog
     posts:
       - url: https://blog.google/technology/ai/google-gemini-update-flash-ai-assistant-io-2024/

diff --git a/_pages/about.md b/_pages/about.md
@@ -2,7 +2,7 @@
 layout: about
 title: about
 permalink: /
-subtitle: <a href='#'>Affiliations</a>. Address. Contacts. Motto. Etc.
+subtitle: <a href='#'>Ph.D. Student, Linköping University</a>.
 
 profile:
   align: right

diff --git a/_posts/2023-08-01-mace.md b/_posts/2023-08-01-mace.md
@@ -4,83 +4,84 @@ title: "MACE (Message Passing ACE)"
 date: 2023-08-01 22:36
 description: "A summary of Message Passing Atomic Cluster Expansion Graph Neural Networks"
 tags: machine learning potential
-categories: sample-posts
+categories: worklog
 giscus_comments: true
 related_posts: false
 ---
+
 ### **Introduction**
-MACE (Message Passing Atomic Cluster Expansion) is an equivariant message passing neural network that uses higher-order messages to enhance the accuracy and efficiency of force fields in computational chemistry.
+MACE (Message Passing Atomic Cluster Expansion) is an equivariant message-passing neural network that uses higher-order messages to enhance the accuracy and efficiency of force fields in computational chemistry.
 
 ### **Node Representation**
-Each node $\large{i}$ is represented by:
+Each node \(\large{i}\) is represented by:
 
-$$
+\[
 \large{\sigma_i^{(t)} = (r_i, z_i, h_i^{(t)})}
-$$
+\]
 
-where $r_i \in \mathbb{R}^3$ is the position, $\large{z_i}$ is the chemical element, and $\large{h_i^{(t)}}$ are the learnable features at layer $\large{t}$.
+where \(r_i \in \mathbb{R}^3\) is the position, \(\large{z_i}\) is the chemical element, and \(\large{h_i^{(t)}}\) are the learnable features at layer \(\large{t}\).
 
 ### **Message Construction**
 Messages are constructed hierarchically using a body order expansion:
 
-$$
+\[
 m_i^{(t)} = \sum_j u_1(\sigma_i^{(t)}, \sigma_j^{(t)}) + \sum_{j_1, j_2} u_2(\sigma_i^{(t)}, \sigma_{j_1}^{(t)}, \sigma_{j_2}^{(t)}) + \cdots + \sum_{j_1, \ldots, j_\nu} u_\nu(\sigma_i^{(t)}, \sigma_{j_1}^{(t)}, \ldots, \sigma_{j_\nu}^{(t)})
-$$
+\]
 
 ### **Two-body Message Construction**
-For two-body interactions, the message  $m_i^{(t)}$  is:
+For two-body interactions, the message \(m_i^{(t)}\) is:
 
-$$
+\[
 A_i^{(t)} = \sum_{j \in N(i)} R_{kl_1l_2l_3}^{(t)}(r_{ij}) Y_{l_1}^{m_1}(\hat{r}_{ij}) W_{kk_2l_2}^{(t)} h_{j,k_2l_2m_2}^{(t)}
-$$
+\]
 
-where  $\large{R}$  is a learnable radial basis,  $\large{Y}$  are spherical harmonics, and  $\large{W}$  are learnable weights.  $\large{C}$  are Clebsch-Gordan coefficients ensuring equivariance.
+where \(\large{R}\) is a learnable radial basis, \(\large{Y}\) are spherical harmonics, and \(\large{W}\) are learnable weights. \(\large{C}\) are Clebsch-Gordan coefficients ensuring equivariance.
 
 ### **Higher-order Feature Construction**
 Higher-order features are constructed using tensor products and symmetrization:
 
-$$
+\[
 \large{B_{i, \eta \nu k LM}^{(t)} = \sum_{lm} C_{LM \eta \nu, lm} \prod_{\xi=1}^\nu \sum_{k_\xi} w_{kk_\xi l_\xi}^{(t)} A_{i, k_\xi l_\xi m_\xi}^{(t)}}
-$$
+\]
 
-where  $\large{C}$  are generalized Clebsch-Gordan coefficients.
+where \(\large{C}\) are generalized Clebsch-Gordan coefficients.
 
 ### **Message Passing**
 The message passing updates the node features by aggregating messages:
 
-$$
+\[
 \large{h_i^{(t+1)} = U_{kL}^{(t)}(\sigma_i^{(t)}, m_i^{(t)}) = \sum_{k'} W_{kL, k'}^{(t)} m_{i, k' LM} + \sum_{k'} W_{z_i kL, k'}^{(t)} h_{i, k' LM}^{(t)}}
-$$
+\]
 
 ### **Readout Phase**
 In the readout phase, invariant features are mapped to site energies:
 
-$$
+\[
 \large{E_i = E_i^{(0)} + E_i^{(1)} + \cdots + E_i^{(T)}}
-$$
+\]
 
 where:
 
-$$
+\[
 \large{E_i^{(t)} = R_t(h_i^{(t)}) = \sum_{k'} W_{\text{readout}, k'}^{(t)} h_{i, k' 00}^{(t)} \quad \text{for } t < T}
-$$
+\]
 
-$$
+\[
 \large{E_i^{(T)} = \text{MLP}_{\text{readout}}^{(t)}(\{h_{i, k 00}^{(t)}\})}
-$$
+\]
 
 ### **Equivariance**
-The model ensures equivariance under rotation  $\large{Q \in O(3)}$ :
+The model ensures equivariance under rotation \(\large{Q \in O(3)}\):
 
-$$
+\[
 \large{h_i^{(t)}(Q \cdot (r_1, \ldots, r_N)) = D(Q) h_i^{(t)}(r_1, \ldots, r_N)}
-$$
+\]
 
-where $\large{D(Q)}$ is a Wigner D-matrix. For feature $\large{h_{i, k LM}^{(t)}}$, it transforms as:
+where \(\large{D(Q)}\) is a Wigner D-matrix. For feature \(\large{h_{i, k LM}^{(t)}}\), it transforms as:
 
-$$
+\[
 \large{h_{i, k LM}^{(t)}(Q \cdot (r_1, \ldots, r_N)) = \sum_{M'} D_L(Q)_{M'M} h_{i, k LM'}^{(t)}(r_1, \ldots, r_N)}
-$$
+\]
 
 ## Properties and Computational Efficiency
 
@@ -98,77 +99,72 @@ $$
 
 For further details, refer to the [Batatia et al.](https://arxiv.org/abs/2206.07697).
 
+---
 
-## Necessary math to know:
-
+## Necessary Math to Know
 
 ### 1. **Spherical Harmonics**
 
-**Concept:**
-- Spherical harmonics $Y^L_M$ are functions defined on the surface of a sphere. They are used in many areas of physics, including quantum mechanics and electrodynamics, to describe the angular part of a system.
+**Concept**:
+- Spherical harmonics \(Y^L_M\) are functions defined on the surface of a sphere. They are used in many areas of physics, including quantum mechanics and electrodynamics, to describe the angular part of a system.
 
-**Role in MACE:**
+**Role in MACE**:
 - Spherical harmonics are used to decompose the angular dependency of the atomic environment. This helps in capturing the rotational properties of the features in a systematic way.
 
-**Mathematically:**
-- The spherical harmonics $Y^L_M(\theta, \phi)$ are given by:
+**Mathematically**:
+- The spherical harmonics \(Y^L_M(\theta, \phi)\) are given by:
 
-$$
-  Y^L_M(\theta, \phi) = \sqrt{\frac{(2L+1)}{4\pi} \frac{(L-M)!}{(L+M)!}} P^M_L(\cos \theta) e^{iM\phi}
-$$
+\[
+Y^L_M(\theta, \phi) = \sqrt{\frac{(2L+1)}{4\pi} \frac{(L-M)!}{(L+M)!}} P^M_L(\cos \theta) e^{iM\phi}
+\]
 
-where $P^M_L$ are the associated Legendre polynomials.
+where \(P^M_L\) are the associated Legendre polynomials.
 
 ### 2. **Clebsch-Gordan Coefficients**
 
-**Concept:**
+**Concept**:
 - Clebsch-Gordan coefficients are used in quantum mechanics to combine angular momenta. They arise in the coupling of two angular momentum states to form a new angular momentum state.
 
-**Role in MACE:**
+**Role in MACE**:
 - In MACE, Clebsch-Gordan coefficients are used to combine features from different atoms while maintaining rotational invariance. They ensure that the resulting features transform correctly under rotations, preserving the physical symmetry of the system.
 
-**Mathematically:**
-- When combining two angular momentum states  $\vert l_1, m_1\rangle$  and  $\vert l_2, m_2\rangle$, the resulting state  $\vert L, M\rangle$  is given by:
-
-$$
+**Mathematically**:
+- When combining two angular momentum states \(\vert l_1, m_1\rangle\) and \(\vert l_2, m_2\rangle\), the resulting state \(\vert L, M\rangle\) is given by:
 
+\[
 |L, M\rangle = \sum_{m_1, m_2} C_{L, M}^{l_1, m_1; l_2, m_2} |l_1, m_1\rangle |l_2, m_2\rangle
+\]
 
-$$
+where \(C_{L, M}^{l_1, m_1; l_2, m_2}\) are the Clebsch-Gordan coefficients.
 
-where  $C_{L, M}^{l_1, m_1; l_2, m_2}$  are the Clebsch-Gordan coefficients.
+### 3. **\(O(3)\) Rotations**
 
-### 3. **$O(3)$ Rotations**
+**Concept**:
+- The group \(O(3)\) consists of all rotations and reflections in three-dimensional space. It represents the symmetries of a 3D system, including operations that preserve the distance between points.
 
-**Concept:**
-- The group $O(3)$ consists of all rotations and reflections in three-dimensional space. It represents the symmetries of a 3D system, including operations that preserve the distance between points.
+**Role in MACE**:
+- Ensuring that the neural network respects \(O(3)\) symmetry is crucial for modeling physical systems accurately. MACE achieves this by using operations that are invariant or equivariant under these rotations and reflections.
 
-**Role in MACE:**
-- Ensuring that the neural network respects $O(3)$ symmetry is crucial for modeling physical systems accurately. MACE achieves this by using operations that are invariant or equivariant under these rotations and reflections.
+**Mathematically**:
+- A rotation in \(O(3)\) can be represented by a 3x3 orthogonal matrix \(Q\) such that:
 
-**Mathematically:**
-- A rotation in $O(3)$ can be represented by a 3x3 orthogonal matrix $Q$ such that:
+\[
+Q^T Q = I \quad \text{and} \quad \det(Q) = \pm 1
+\]
 
-$$
-  Q^T Q = I \quad \text{and} \quad \det(Q) = \pm 1
-$$
-
-where $I$ is the identity matrix.
+where \(I\) is the identity matrix.
 
 ### 4. **Wigner D-matrix**
 
-**Concept:**
-- The Wigner D-matrix $D^L(Q)$ represents the action of a rotation $Q$ on spherical harmonics. It provides a way to transform the components of a tensor under rotation.
+**Concept**:
+- The Wigner D-matrix \(D^L(Q)\) represents the action of a rotation \(Q\) on spherical harmonics. It provides a way to transform the components of a tensor under rotation.
 
-**Role in MACE:**
+**Role in MACE**:
 - Wigner D-matrices are used to ensure that the feature vectors in the neural network transform correctly under rotations. This is essential for maintaining the rotational equivariance of the model.
 
-**Mathematically:**
-- For a rotation $Q \in O(3)$ and a spherical harmonic of degree $L$, the Wigner D-matrix $D^L(Q)$ is a $(2L+1) \times (2L+1)$ matrix. If $Y^L_M$ is a spherical harmonic, then under rotation $Q$, it transforms as:
-
-$$
-  Y^L_M(Q \cdot \mathbf{r}) = \sum_{M'=-L}^{L} D^L_{M'M}(Q) Y^L_{M'}(\mathbf{r})
-$$
-
-
+**Mathematically**:
+- For a rotation \(Q \in O(3)\) and a spherical harmonic of degree \(L\), the Wigner D-matrix \(D^L(Q)\) is a \((2L+1) \times (2L+1)\) matrix. If \(Y^L_M\) is a spherical harmonic, then under rotation \(Q\), it transforms as:
 
+\[
+Y^L_M(Q \cdot \mathbf{r}) = \sum_{M'=-L}^{L} D^L_{M'M}(Q) Y^L_{M'}(\mathbf{r})
+\]
diff --git a/_posts/2024-04-18-kan.md b/_posts/2024-04-18-kan.md
@@ -0,0 +1,114 @@
+---
+layout: post
+title: "[Stuff I read] Kolmogorov Arnold Networks"
+date: 2024-04-18 22:31
+description: "A very short summary of main ideas from Liu et al."
+tags: neural-network
+categories: journal-club
+giscus_comments: true
+related_posts: false
+---
+
+- Proposed by Max Tegmark's group. See [Liu et al.](https://arxiv.org/abs/2404.19756)
+- A hot topic on Twitter - a matter of a lot of debate.
+
+The paper "KAN: Kolmogorov–Arnold Networks" proposes Kolmogorov-Arnold Networks (KANs) as an alternative to Multi-Layer Perceptrons (MLPs). The core idea behind KANs is inspired by the Kolmogorov-Arnold representation theorem, which states that any multivariate continuous function can be represented as a sum of continuous functions of one variable. This section will summarize the technical details of the paper, focusing on the mathematical formulations.
+
+### 1. Introduction
+
+The motivation for KANs stems from the limitations of MLPs, such as fixed activation functions on nodes and linear weights. MLPs rely heavily on the universal approximation theorem, but their structure can be less efficient and interpretable. KANs, on the other hand, utilize learnable activation functions on edges and replace linear weights with univariate functions parametrized as splines.
+
+### 2. Kolmogorov-Arnold Representation Theorem
+
+The Kolmogorov-Arnold representation theorem states:
+
+\[
+f(x) = \sum_{q=1}^{2n+1} \Phi_q \left( \sum_{p=1}^n \varphi_{q,p}(x_p) \right)
+\]
+
+where \(\varphi_{q,p} : [0, 1] \to \mathbb{R}\) and \(\Phi_q : \mathbb{R} \to \mathbb{R}\).
+
+### 3. KAN Architecture
+
+KANs generalize the representation theorem to arbitrary depths and widths. Each weight parameter in KANs is replaced by a learnable 1D function (spline).
+
+#### 3.1. Mathematical Formulation of KANs
+
+Define a KAN layer with \(n_{\text{in}}\)-dimensional inputs and \(n_{\text{out}}\)-dimensional outputs as a matrix of 1D functions:
+
+\[
+\Phi = \{ \varphi_{q,p} \}, \quad p = 1, 2, \ldots, n_{\text{in}}, \quad q = 1, 2, \ldots, n_{\text{out}}
+\]
+
+Activation function on edge \(\varphi_{l,j,i}\) between layer \(l\) and \(l+1\) is given by:
+
+\[
+\varphi_{l,j,i}(x) = w \big(b(x) + \text{spline}(x)\big)
+\]
+
+where \(b(x) = \text{silu}(x) = \frac{x}{1 + e^{-x}}\).
+
+The output of each layer is computed as:
+
+\[
+x_{l+1, j} = \sum_{i=1}^{n_l} \varphi_{l,j,i}(x_{l,i})
+\]
+
+in matrix form:
+
+\[
+x_{l+1} = \Phi_l x_l
+\]
+
+where \(\Phi_l\) is the function matrix of layer \(l\).
+
+### 4. Approximation Abilities and Scaling Laws
+
+KANs can approximate functions by decomposing high-dimensional problems into several 1D problems, effectively avoiding the curse of dimensionality.
+
+#### Theorem 2.1: Approximation Bound
+
+Let \(f(x)\) be represented as:
+
+\[
+f = (\Phi_{L-1} \circ \Phi_{L-2} \circ \cdots \circ \Phi_1 \circ \Phi_0)x
+\]
+
+For each \(\Phi_{l,i,j}\), there exist \(k\)-th order B-spline functions \(\Phi_{l,i,j}^G\) such that:
+
+\[
+\| f - (\Phi_{L-1}^G \circ \Phi_{L-2}^G \circ \cdots \circ \Phi_1^G \circ \Phi_0^G)x \|_{C^m} \leq C G^{-k-1+m}
+\]
+
+where \(G\) is the grid size and \(C\) depends on \(f\) and its representation.
+
+### 5. Grid Extension Technique
+
+KANs can increase accuracy by refining the grid used in splines:
+
+\[
+\{c'_j\} = \arg\min_{\{c'_j\}} E_{x \sim p(x)} \left( \sum_{j=0}^{G2+k-1} c'_j B'_j(x) - \sum_{i=0}^{G1+k-1} c_i B_i(x) \right)^2
+\]
+
+### 6. Simplification Techniques
+
+KANs can be made more interpretable by sparsification, pruning, and symbolification. The \(L^1\) norm and entropy regularization can be used to sparsify the network.
+
+### 7. Toy Examples and Empirical Results
+
+KANs were shown to have better scaling laws than MLPs, achieving lower test losses with fewer parameters in various toy datasets and special functions.
+
+#### Example Functions:
+
+1. Bessel function: \(f(x) = J_0(20x)\)
+2. High-dimensional function: 
+
+\[
+f(x_1, \ldots, x_{100}) = \exp\left( \frac{1}{100} \sum_{i=1}^{100} \sin^2\left(\frac{\pi x_i}{2}\right) \right)
+\]
+
+KANs can achieve near-theoretical scaling exponents \(\alpha = 4\), outperforming MLPs in accuracy and parameter efficiency.
+
+### Conclusion
+
+KANs provide a novel approach to neural network design, leveraging the Kolmogorov-Arnold representation theorem to achieve better performance and interpretability compared to traditional MLPs. The use of learnable activation functions on edges and splines allows for greater flexibility and efficiency in function approximation.