-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path02-Basics.Rmd
139 lines (110 loc) · 10.6 KB
/
02-Basics.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
# Basics
In this chapter you will see some short introduction to **R** environment. The basic information will be covered with more concern in following chapters.
```{r echo = FALSE, eval = TRUE}
operators <- read.delim("/media/michal/DATA/ROwe/BeginnersGuideToGalaxy/helperFiles/operators", header=FALSE, stringsAsFactors=FALSE)
names(operators) <- c('sign', 'type', 'action')
```
## Getting started
### Help
There are just few things you really need to remember and follow when you want to start using **R**. First, there is very good *help* build in. To access it, you use `?` sign with name of function: e.g.: `?t.test`. After executing command, in your window with *Help* pane, a page dedicated to this function will pop up. You will get information on syntax, options to use with this function and in most cases some code examples. However, with single question mark you are telling **R** to only look into functions from packages that are *currently loaded* and have this *precise name*. If you want to tell **R** to look for proper function in *all packages* or you are not sure what the exact name is you can use double question mark e.g. `??mutate`. In effect, in the same pane and window as previously you will get a list of results that match your query. Finally, using *Packages* pane, you can click on one of the packages names, to display all of the functions within it. Than, by clicking on the name of functions you are interested in, you will be taken to proper page with description.
### Internet is a great source of information
Anytime you feel lost or need help that is beyond the scope of manuals, just ask Google. For instance you can use this query: *how to make density plot in R*. Thanks to huge community you will find a lot of answers. The most reliable ones can be found on *StackOverflow*, *StatsExchange* and *RBloggers*. If you don't know if there is a package to perform particular task, also ask uncle Google. For instance, if you want to use random numbers from Dirichlet distribution, you can use this query: *dirichlet distribution r*.
### More on internet sources
A good practice, when you want to learn programming language, is to read what other people do and how they code. In the beginning it might be a bit overwhelming or confusing to read all the stuff. However, reading others work will get you used to syntax and workflow, and will give you great basics to invent your own code. Hopefully, you don't need to spend hours searching for some interesting blogs. There is great blog aggregator [R weekly](https://rweekly.org), that gathers in one place the best posts, pod-casts, etc. on **R**, every week.
## Syntax
### Common operators
In general **R** syntax resembles syntax of standard math functions $f(x, z) = a * x + b * z$. So in **R** the syntax to call this function would be... `f(x, z)`, which would evaluate (behind scenes) expression `a * x + b * z`. You can think about functions in this language with general pattern `function_name(argument_1, argument_2, ..., argument_n)`. Next thing you should know before you start working is that there are three main *signs* used in **R's** syntax. First two are assignment symbols: `<-` and `=`; for convention we use them in different cases. **You need to use `=` ONLY for function arguments**. Third one is `#`. It is a symbol used for comments. Everything following this symbol to the end of code line will not be executed. There are also other signs (or symbols) which are building blocks of language, however their use is very precisely defined and reserved for certain events. Below you find a table with reference for most common operators used. You will faster grasp it while you write your own code, than by reading about it. Thus, I suggest we go deeper into variable types in **R** language.
```{r tab-operators, echo = FALSE}
knitr::kable(
operators, caption = 'Common operators in R',
booktabs = TRUE, longtable = TRUE
)
```
### Variables
Concept of variable is crucial for programming. In **R** variables can contain many things: vectors, data frames, results of statistical analyses etc. Each of variables have some characteristic properties. They are defined by *class* of the variable. Thanks to *class* attribute, **R** knows, how to deal with variable -- what is the internal structure and what operations can be performed over variable. Data can be stored in variables in different manners. To assign something to variable we use `<-` operator, which tells **R** to store right side of arrow under name on the left side of arrow. The simplest variable is *vector*, which can be of *class*: *character*, *integer*, *numeric* or *logical*. For instance:
```{r echo = T, eval = T}
characterVector <- c('a', 'b', 'c')
class(characterVector)
integerVector <- c(1L, 2L, 3L)
class(integerVector)
numericVector <- c(2.5, 3.5, 4.5)
class(numericVector)
logicalVector <- c(TRUE, FALSE)
class(logicalVector)
```
For more complex data, we have four basic classes: *vectors*, *lists*, *data frames* and *matrices*. *Matrix* is similar to *data frame*. The most obvious difference is that *matrix* contains only one *class* of variables (usually *numeric* or *integer*), while *data frame* can store *numeric*, as well as *characters* and *factors* (for now, you can assume that *factor* class is used to store categorical variables) in separate columns. Also *matrices* are used when programmers want to achieve great speed in mathematical computation. *Data frames* are resembling tables from popular spreadsheet software. Lets look:
```{r echo = T, eval = T}
matrixVariable <- matrix(c(1:10), nrow = 2)
matrixVariable
class(matrixVariable)
dfVariable <- data.frame(x1 = 1:5, x2 = 6:10)
dfVariable
class(dfVariable)
```
*Lists* are... lists of variables. Each *list* element can be of different *class* and length. To grasp the idea of *lists* it will be best to present it with example:
```{r echo = T, eval = T, tidy = T}
listVariable <- list(x1 = c('a', 'b'), x2 = 1:4, x3 = matrix(c(1:6), nrow = 2))
listVariable
class(listVariable)
class(listVariable$x1)
class(listVariable$x2)
class(listVariable$x3)
```
There are plenty of other *classes*, e.g. for *time variables*, however mentioned above are the basic ones you will deal mostly. Also because they are so often used, you should learn how to recognize their structure at a glance. Later on, I will present you how (and when) each of this variables types can be used in work.
#### Naming Variables
First of all, all names are *case-sensitive*, which means that **R** recognize variables named `RVariable`, `rVariable` and `Rvariable` as three different objects. Second thing to remember is that variable name *have to* start with a letter and may contain only letters, numbers and symbols: . (dot) and _ (underscore). There are also some *good practices* in naming variables (after [Hadley Wickham Style guide](http://adv-r.had.com.nz/Style.html)):
* use lowercase to names variables (and functions)
* use nouns to name variables (and verbs for functions)
* try to be precise when naming
* try to be concise when naming
* use underscore _ to separate words (snake_case) e.g. `first_variable`
* *some other guidelines suggest using camel cases e.g.* `firstVariable`
And the golden rule should be - whatever guideline you follow -- be consequent!
## Math operations
In **R** we use standard math operators `+ - * /` to perform addition, subtraction, multiplication and division. Symbol `^` indicates that we want to use power, and `sqrt` to make square root. OK, so whats the name of a function to get n^th^ root? Probably you remember from math lessons that $\sqrt [n] {x} = x^\frac{1}{n}$, thus you can just write `x^(1/n)`. To change order of operation (which are following mathematics rules) use brackets `()`. Other important mathematical functions are `%%` for modulo, and `%/%` for integer division.
```{r eval = T, echo = T}
5 + 2
11 - 3
(4+7)/9*2
14 %/% 3 + 1
8^(1/3) + 10%%6
```
To calculate logarithms there is *build in* function `log()`. It uses as a base Euler's number by default, however you can override it i.e. `log(10, base = 10)`. You can calculate exponential function using `exp()` function.
There are also trigonometric functions in **R**: `sin()`, `cos()`, `tan()`, `asin()`, `acos()`, `atan()`. Angles are used/expressed in radians. To transform values from degrees to radians multiply by `pi` and divide by 180. To transform values from radians to degrees multiply by 180 and divide by `pi`. By the way, `pi` is a constant in **R**, meaning that its value is build in the language (similar as Euler number is `exp(1)`).
```{r eval = T, echo = T}
someArc <- 90*pi/180
sin(someArc)
atanValue <- atan(0.89)
atanValue*180/pi
```
## Logics
Logical expression are often used in programming. They compare left side with right side arguments of statement. The result of those comparison might be `TRUE` or `FALSE` (in many other languages those are called *Boolean* values) which belong to *class Logical*. In Table \@ref(tab:tab-operators) you will find list of most common logical operators used to build statements. Here is a small *cheatsheet tables*:
```{r echo = F, eval = T}
logicResults <- c(NA, FALSE, TRUE)
names(logicResults) <- as.character(logicResults)
andTable <- outer(logicResults, logicResults, '&')
orTable <- outer(logicResults, logicResults, '|')
```
```{r tab-logics, echo = FALSE, result = 'asis'}
knitr::kable(andTable, caption = 'AND Table',
booktabs = TRUE, longtable = TRUE
)
knitr::kable(orTable, caption = 'OR Table',
booktabs = TRUE, longtable = TRUE
)
##check kableExtra to format tables
```
Below you can see them in action:
```{r eval = T, echo = T}
5 >= 1
10%%2 == 0
!FALSE
5L | 11.1 <= 6
```
## Functions
When writing code we generally want to perform some actions on our variables. There is a lot of *build in functions* in base **R** distribution, and a whole Galaxy of *functions* provided by community. Function can be literally any action performed on variables. For instance, there are some build in statistical functions like `t.test()` or `chisq.test()`. Other function can be use to draw some charts and plots, e.g. `plot()`. It's easy to recognize function, since its structure is *name* followed by parentheses *()*. Inside parentheses user provides arguments and options to function. Lets see how it works with one of *build in* functions:
```{r echo = T, eval = T}
tTestResult <- t.test(numericVector, integerVector)
print(tTestResult)
```
We used `t.test()` function, with two arguments: `numericVector` and `integerVector`. In second function call, we *ordered* **R** to print out the results of statistical test stored in variable `tTestResult` -- which is the argument of this function. I'll show you how to write your own function in chapter \@ref(funcs).