MSc-Project

This project was a part of my MSc. dissertation and used K-means clustering to investigate a protein called FACT in Saccharomyces Cerrevisiae (baker's yeast). The entire project was coded in Python (version 3.9) and the IDE was Microsoft Visual Studio (v. 1.82).

Some of the major results are shared below, and the full annotated code and PDF of the report will be made available in due course. (Meanwhile, the draft code jupyter notebook can be seen here).

Data Analysis Pipeline

We used a range of libraries for data cleaning and modelling (numpy, pandas, scikit-learn), as well data visualisation (matplotlib, seaborn, etc.)

K-means Clustering Results

We relied on inertia and silhouette scores to select k = 3 for K-means analysis. The resulting clusters of genes are plotted below

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
images		images
.DS_Store		.DS_Store
Annotated Code.ipynb		Annotated Code.ipynb
LICENSE		LICENSE
README.md		README.md
transcriber.ipynb		transcriber.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MSc-Project

Data Analysis Pipeline

K-means Clustering Results

Gene Ontology (GO) Analysis

About

Releases

Packages

Languages

License

HarmanKhera/MSc-Project

Folders and files

Latest commit

History

Repository files navigation

MSc-Project

Data Analysis Pipeline

K-means Clustering Results

Gene Ontology (GO) Analysis

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages