diff --git a/session_nmf/NMF_main.html b/session_nmf/NMF_main.html index 41694d3..5981b42 100644 --- a/session_nmf/NMF_main.html +++ b/session_nmf/NMF_main.html @@ -7516,14 +7516,14 @@
-

NMF - Non-Negative Matrix Factorization

by Sergiu Netotea, PhD, NBIS, Chalmers

+

NMF - Non-Negative Matrix Factorization

by Sergiu Netotea, NBIS, Chalmers

@@ -7559,29 +7559,46 @@

NMF - Non-Negative Matrix Facto

NMF - Non-Negative Matrix Factorization

+
+ + +
+
+ + @@ -7687,18 +7704,17 @@

NMF - in general contexts:

Solving NMF

  • Similar to ICA, PCA, MFA it can be classified as an unsupervised dimensionality reduction / clustering technique.
  • -
  • Uses the Frobenius norm, which is the matrix equivalent of the euclidean distance. But other cost functions are possible, that also include regularization. -$$ -\|X\|_F = \sqrt{\sum_i\sum_jx_{ij}^2}. -$$
  • +
  • As an optimization problem: $ min~\|X-WH\|_F, V \ge 0, W \ge 0, H \ge 0$
      +
    • With the Frobenius norm, the fit function is: $ F = \sum_{u,i} (x_{ui} - w_u h_i^T)^2, x_{ui} \approx w_u h_i^T = \sum_k{w_{uk} h_{ki}}$
    • +
    • This is non convex optimization (no global minima)!
    • +
    • The number of latent factors is a result of global fitting
    • +
    +
  • Many algorithms exist, such as iterated coordinate descent (the original solver), hierarchical alternated least squares.
  • -
  • Main solver works iteratively via alternating non-negative least squares (ANLS) as: -$$\begin{align} - W_{t+1} &= W_t^T \frac{XH_t^T}{XH_tH_t^T} \\ - H_{t+1} &= H_t \frac{W_t^TX}{W^T_tW_tX}. -\end{align}$$
  • +
  • Main solver works iteratively via alternating non-negative least squares (ANLS)
  • Weak convergence: it is reinitialized several times to avoid local minima, and the best result is kept.
  • -
  • Optimal number of k components (RSS scores for example, Silhouettes scores etc)
  • +
  • Since NMF has multiple local minima, starting from different initial values for W and H can lead the algorithm to settle in different parts of the solution space. Some initializations might lead to poor local minima with higher approximation errors, while others might lead to better or more meaningful decompositions.
  • +
  • Optimal number of k components (hidden atributes) needs fitting (RSS scores for example, Silhouettes scores etc)

@@ -7711,18 +7727,24 @@

Solving NMF

Alternating non-negative least squares (ANLS)

@@ -7812,7 +7834,7 @@

Toy dataset
- + @@ -7841,7 +7864,10 @@

Toy dataset @@ -8070,7 +8177,7 @@

ANF - Affinity network fusion

@@ -8135,7 +8242,7 @@

@@ -8215,7 +8322,7 @@

The Kernel trick diff --git a/session_nmf/SNF_main.ipynb b/session_nmf/SNF_main.ipynb index 18aff8f..d78f459 100644 --- a/session_nmf/SNF_main.ipynb +++ b/session_nmf/SNF_main.ipynb @@ -8,15 +8,38 @@ } }, "source": [ - "# Similarity Network Fusion - SNF\n", + "# Network diffusion based methods in integrating 'omics data\n", "\n", "Sergiu Netotea, PhD, NBIS, Chalmers\n", "\n", - "- Similarity networks\n", - "- SNF method\n", + "\n", + "- Network fusion\n", + "- Similarity network fusion, explained in detail\n", "- Applications" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Classification of graph based integration methods\n", + "\n", + "\n", + "__Network-Based Approaches:__\n", + " - Graph Construction, Multi-Modal Networks: Integrating multiple omics datasets into one comprehensive graph that allows for the analysis of cross-layer interactions.\n", + " - Node/Edge Weighting: Some methods apply weighting strategies to nodes and edges to emphasize biological relevance, which can assist in identifying key components within the network.\n", + "\n", + "__Algorithmic Methods:__\n", + "- Network Diffusion/Propagation: Methods that allow information (e.g., signals or perturbations) to propagate through the network to detect influential nodes or subnetworks. Examples: Similarity network fusion, Graph autoencoders\n", + "- Causal Inference: Algorithms that attempt to infer causality between nodes based on their interactions and the multi-omics data layers. Example: Graphical models based on statistical learning, Recurrent Graph Learning\n", + "\n", + "__Machine Learning Approaches:__\n", + "- Integration with Network-Based Approaches: Machine learning models are sometimes used in tandem with network approaches to enhance predictive power and biological insight. These methods are characterized by the existence of a fittness function. Examples: Graph embedding, Graph Neural Networks\n", + "\n", + "Read mode:\n", + "> Agamah FE, Bayjanov JR, Niehues A, Njoku KF, Skelton M, Mazandu GK, Ederveen THA, Mulder N, Chimusa ER, 't Hoen PAC. Computational approaches for network-based integrative multi-omics analysis. Front Mol Biosci. 2022 Nov 14;9:967205. doi: 10.3389/fmolb.2022.967205. PMID: 36452456; PMCID: PMC9703081." + ] + }, { "cell_type": "markdown", "metadata": { @@ -25,7 +48,7 @@ } }, "source": [ - "## Similarity networks\n", + "## Network based models in biology\n", "\n", "Network models are a very complex representation of data:\n", "- Power law sophistication: for every n vertices there are up to n(n-1) possible edges\n", @@ -78,7 +101,7 @@ } }, "source": [ - "![NF basics](./assests/nf_basics.png \"NF basics\")" + "![NF basics](img/nf_basics.png \"NF basics\")" ] }, { @@ -192,7 +215,7 @@ } }, "source": [ - "![similarity](./assests/similarity.png)" + "![similarity](img/similarity.png)" ] }, { @@ -348,23 +371,83 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### (latest trents) MoGCN, graph neural networks\n", + "### Paper study: \n", "\n", - "- Graph neural networks are a new hot topic in integrative omics.\n", - "- MoGCN, a multi-omics integration model based on graph convolutional network (GCN)\n", + "> MoGCN, a multi-omics integration model based on graph convolutional network (GCN)\n", " - https://github.com/Lifoof/MoGCN\n", - " - Li X, Ma J, Leng L, Han M, Li M, He F, Zhu Y. MoGCN: A Multi-Omics Integration Method Based on Graph Convolutional Network for Cancer Subtype Analysis. Front Genet. 2022 Feb 2;13:806842. doi: 10.3389/fgene.2022.806842. PMID: 35186034; PMCID: PMC8847688.\n", - " - cancer subtype classification and analysis. Genomics, transcriptomics and proteomics datasets for 511 breast invasive carcinoma (BRCA) samples were downloaded from the Cancer Genome Atlas (TCGA). \n", - " - The autoencoder (AE) and the similarity network fusion (SNF) methods were used to reduce dimensionality and construct the patient similarity network (PSN), respectively\n", - " - Then the vector features and the PSN were input into the GCN for training and testing. Feature extraction and network visualization were used for further biological knowledge discovery and subtype classification. \n", - " \n" + " - Li X, Ma J, Leng L, Han M, Li M, He F, Zhu Y. MoGCN: A Multi-Omics Integration Method Based on Graph Convolutional Network for Cancer Subtype Analysis. Front Genet. 2022 Feb 2;13:806842. doi: 10.3389/fgene.2022.806842. PMID: 35186034; PMCID: PMC8847688\n", + "\n", + "- Cancer subtype classification and analysis. Genomics, transcriptomics and proteomics datasets for 511 breast invasive carcinoma (BRCA) samples were downloaded from the Cancer Genome Atlas (TCGA). \n", + "- The autoencoder (AE) and the similarity network fusion (SNF) methods were used to reduce dimensionality and construct the patient similarity network (PSN), respectively\n", + "- Then the vector features and the PSN were input into the GCN for training and testing. Feature extraction and network visualization were used for further biological knowledge discovery and subtype classification. " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Paper study:\n", + "\n", + "> Wang, C., Lue, W., Kaalia, R. et al. Network-based integration of multi-omics data for clinical outcome prediction in neuroblastoma. Sci Rep 12, 15425 (2022). https://doi.org/10.1038/s41598-022-19019-5\n", + "\n", + "- Aim: integrate multi-omics data (like gene expression and DNA methylation) for predicting clinical outcomes in neuroblastoma, a pediatric cancer.\n", + "- Using Patient Similarity Networks (PSNs) derived from omics features, they create networks where patients are nodes and edges represent their similarity based on omics data. They apply two methods for data fusion: at feature level and at network level\n", + "- Their results show that network-level fusion generally outperforms feature-level fusion for integrating diverse omics datasets, while feature-level fusion is effective when combining different features within the same omics dataset.\n", + "\n", + "- Feature-level fusion: Combines features derived from each omics dataset into a single feature set by concatenating or averaging features like centrality and modularity from PSNs. For each omics dataset m, a Patient Similarity Network (PSN) is constructed. Let x_m represent the feature vector of the m-th omics dataset for a subject. The feature-level fusion is performed as follows:\n", + " - Extract centrality and modularity features from each PSN.\n", + " - Compute the mean of centrality features and concatenate the modularity features from each omics dataset:\n", + "$$\n", + "x_{\\text{fused}} = \\frac{1}{M} \\sum_{m=1}^{M} x_m\n", + "$$\n", + ", where M is the total number of omics datasets.\n", + "\n", + "The fused feature vector $x_{\\text{fused}}$ is used as input to machine learning classifiers for clinical outcome prediction.\n", + "\n", + "\n", + "- Network-level fusion: PSNs from individual omics datasets are combined to form a single multi-omics PSN. The fusion is performed using the Similarity Network Fusion (SNF) algorithm, which combines the similarity matrices \\(A_m\\) of individual PSNs:\n", + "$$\n", + "A_{\\text{fused}} = \\text{SNF}(A_1, A_2, \\ldots, A_M)\n", + "$$\n", + "The fused similarity matrix $ A_{\\text{fused}} $ represents the multi-omics PSN, which is then used for downstream prediction tasks." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Paper study:\n", + "> Wang, J., Liao, N., Du, X. et al. A semi-supervised approach for the integration of multi-omics data based on transformer multi-head self-attention mechanism and graph convolutional networks. BMC Genomics 25, 86 (2024). https://doi.org/10.1186/s12864-024-09985-7\n", + "Searched 2 sites\n", + "\n", + "- Uses a semi-supervised learning framework for disease classification, combining transformer multi-head self-attention mechanisms with graph convolutional networks (GCNs) to extract meaningful relationships between samples. The model integrates labeled and unlabeled omics data, using the attention mechanism to capture dependencies across features and GCNs to capture graph-based relationships.\n", + "- Omics used: mRNA expression, microRNA expression, and DNA methylation. These data types are integrated to improve the prediction accuracy of disease classifications, such as Alzheimer's disease and breast cancer. \n", + "- Self-Attention Mechanism: Captures intra- and inter-modality feature dependencies.\n", + "- Graph Convolutional Networks (GCNs): Extract structural information from the multi-omics graph, enabling better representation of relationships between data points.\n", + "- Semi-Supervised Learning: Utilizes both labeled and unlabeled data to improve model training, mitigating the limitations of small labeled datasets often found in multi-omics studies." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "" + "" ] }, { @@ -483,7 +566,7 @@ } }, "source": [ - "" + "" ] }, { @@ -511,7 +594,7 @@ } }, "source": [ - "" + "" ] }, { @@ -539,7 +622,7 @@ } }, "source": [ - "" + "" ] }, { @@ -597,7 +680,7 @@ } }, "source": [ - "" + "" ] }, { @@ -674,7 +757,7 @@ } }, "source": [ - "" + "" ] }, { diff --git a/session_nmf/img/MoGCN.png b/session_nmf/img/MoGCN.png new file mode 100644 index 0000000..d69aa2f Binary files /dev/null and b/session_nmf/img/MoGCN.png differ diff --git a/session_nmf/img/NMF.png b/session_nmf/img/NMF.png new file mode 100644 index 0000000..a812788 Binary files /dev/null and b/session_nmf/img/NMF.png differ diff --git a/session_nmf/img/anf_alg.jpg b/session_nmf/img/anf_alg.jpg new file mode 100644 index 0000000..b545f14 Binary files /dev/null and b/session_nmf/img/anf_alg.jpg differ diff --git a/session_nmf/img/anf_nn.jpg b/session_nmf/img/anf_nn.jpg new file mode 100644 index 0000000..8aa2fdc Binary files /dev/null and b/session_nmf/img/anf_nn.jpg differ diff --git a/session_nmf/img/budget.jpg b/session_nmf/img/budget.jpg new file mode 100644 index 0000000..8d54d27 Binary files /dev/null and b/session_nmf/img/budget.jpg differ diff --git a/session_nmf/img/deep_mf.png b/session_nmf/img/deep_mf.png new file mode 100644 index 0000000..70d967a Binary files /dev/null and b/session_nmf/img/deep_mf.png differ diff --git a/session_nmf/img/item.png b/session_nmf/img/item.png new file mode 100644 index 0000000..fd47eab Binary files /dev/null and b/session_nmf/img/item.png differ diff --git a/session_nmf/img/jnmf_pharma.png b/session_nmf/img/jnmf_pharma.png new file mode 100644 index 0000000..c5a9cbd Binary files /dev/null and b/session_nmf/img/jnmf_pharma.png differ diff --git a/session_nmf/img/nf_basics.png b/session_nmf/img/nf_basics.png new file mode 100644 index 0000000..9bcfc1f Binary files /dev/null and b/session_nmf/img/nf_basics.png differ diff --git a/session_nmf/img/nmf_mowgli.png b/session_nmf/img/nmf_mowgli.png new file mode 100644 index 0000000..d1edbe6 Binary files /dev/null and b/session_nmf/img/nmf_mowgli.png differ diff --git a/session_nmf/img/nmf_multilayered.png b/session_nmf/img/nmf_multilayered.png new file mode 100644 index 0000000..36796b9 Binary files /dev/null and b/session_nmf/img/nmf_multilayered.png differ diff --git a/session_nmf/img/nmf_onelayer.png b/session_nmf/img/nmf_onelayer.png new file mode 100644 index 0000000..0710794 Binary files /dev/null and b/session_nmf/img/nmf_onelayer.png differ diff --git a/session_nmf/img/nmf_uinmf.png b/session_nmf/img/nmf_uinmf.png new file mode 100644 index 0000000..0bff182 Binary files /dev/null and b/session_nmf/img/nmf_uinmf.png differ diff --git a/session_nmf/img/segmentation.png b/session_nmf/img/segmentation.png new file mode 100644 index 0000000..71ea2aa Binary files /dev/null and b/session_nmf/img/segmentation.png differ diff --git a/session_nmf/img/similarity.png b/session_nmf/img/similarity.png new file mode 100644 index 0000000..c8d2044 Binary files /dev/null and b/session_nmf/img/similarity.png differ diff --git a/session_nmf/img/skf.png b/session_nmf/img/skf.png new file mode 100644 index 0000000..d3396d9 Binary files /dev/null and b/session_nmf/img/skf.png differ diff --git a/session_nmf/img/snf_mosegcn.png b/session_nmf/img/snf_mosegcn.png new file mode 100644 index 0000000..6bdae3a Binary files /dev/null and b/session_nmf/img/snf_mosegcn.png differ diff --git a/session_nmf/img/snf_psn_fusing.png b/session_nmf/img/snf_psn_fusing.png new file mode 100644 index 0000000..1592587 Binary files /dev/null and b/session_nmf/img/snf_psn_fusing.png differ diff --git a/session_nmf/img/user.png b/session_nmf/img/user.png new file mode 100644 index 0000000..4705181 Binary files /dev/null and b/session_nmf/img/user.png differ diff --git a/session_nmf/img/wsnf.png b/session_nmf/img/wsnf.png new file mode 100644 index 0000000..ff3c6bd Binary files /dev/null and b/session_nmf/img/wsnf.png differ diff --git a/session_nmf/img/wsnf_data.png b/session_nmf/img/wsnf_data.png new file mode 100644 index 0000000..1d9b543 Binary files /dev/null and b/session_nmf/img/wsnf_data.png differ