forked from sawsimeon/Anti-sickling
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathPCA.Rmd
67 lines (56 loc) · 1.97 KB
/
PCA.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
---
title: "Origin of anti-sickling activity via QSAR modelling"
author: "Chuleeporn Phanus-umporn, Saw Simeon, Watshara Shoombuatong and Chanin Nantasenamat"
date: "July 7, 2016"
output: pdf_document
---
## Principal Component Analysis
```{r, error = FALSE, message = FALSE, warning=FALSE}
library(readxl)
library(caret)
library(dplyr)
library(ggplot2)
library(cowplot)
df <- read.csv("data.csv")
Activity <- df$Activity
descriptors <- df[, 2:ncol(df)]
data <- descriptors[, which(!apply(descriptors, 2, sd) == 0)]
raw <- cor(data)
df <- data
pca <- prcomp(df, retx=TRUE,scale.=TRUE, center = TRUE)
scores <- pca$x[,1:5]
loadings <- pca$rotation[,1:5]
```
### Explained variance, PC scores and loadings
```{r, error = FALSE, message=FALSE, warning=FALSE}
summary(pca)
scores
loadings
```
## PCA plot with clusters
```{r, error=FALSE, message=FALSE, warning=FALSE}
km <- kmeans(scores, center=5, nstart=5)
ggdata <- data.frame(scores, Cluster=km$cluster)
### paper numbering
library(grid)
set.seed(23)
x <- ggplot(ggdata, aes(x = PC1, y = PC2, colour = Cluster)) +
geom_point(aes(fill=factor(Cluster)), size=5, shape=20, pch = 21, alpha = 0.8) +
ggtitle(" ") +
stat_ellipse(aes(fill=factor(Cluster)), colour = "black",
geom="polygon", level=0.95, alpha=0.2) +
guides(color=guide_legend("Cluster"),fill=guide_legend("Cluster")) +
#geom_text(aes(label=compoundnumber), size=7, hjust=0.5, vjust= 1.5, alpha=0.45) +
theme(
legend.position=("none"),
#plot.title = element_text(size=20, face="bold", colour="black", vjust = 2, hjust=-0.07),
panel.border = element_rect(linetype = "solid", colour = "black", fill = NA, size = 1),
axis.text.y = element_text(size = 15),
axis.ticks.length = unit(0.3, "cm"),
axis.text.x = element_text(size = 15),
legend.title=element_blank(),
axis.title.x = element_text(color="black", size=20),
axis.title.y = element_text(color="black", size=20)) +
coord_cartesian(ylim = c(-15, 15), xlim = c(-15, 15))
x
```