You can use R as a calculator. Try typing the following command into the console:
3+5
## [1] 8
Note that the '## [1] 8' is the computer output from the command '3+5'
There are several other commands you can use in R. Try these in the console.
3^2
## [1] 9
9^(.5)
## [1] 3
Note that the “^” sign is a shorthand for the exponent function. We have used it to find both the square of 3 and the square-root of 9.
3*5
## [1] 15
3/5
## [1] 0.6
A function is a set of instructions that perform a specific task. In R, functions are written with names, followed by a set of parentheses. Functions contain statements inside their parentheses, and these statements are called arguments. At the beginning of this module, you practiced basic algebra in R. There are built-in functions that perform these tasks as well! For example:
sum(3,5)
## [1] 8
In this example, sum() is a function that has numbers as arguments, and takes the sum of those numbers. When there is more than one argument, the arguments are separated by commas. Another function is the sqrt() function, which takes the square root of a number. Therefore, sqrt() can only take in one argument, because it only performs an operation on one number. For example:
sqrt(9)
## [1] 3
These both result in the answer 3.
But, if we input more than one number, then we get an error, which is a scenario in which R is unable to perform a given operation.
sqrt(9,16) #This will generate an error
## Error: 2 arguments passed to 'sqrt' which requires 1
Consider an experiment in which the following attributes were observed on a sample of 150 irises.
There are two important features of the experiment, cases and variables.
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3.0 1.4 0.2 setosa
## 3 4.7 3.2 1.3 0.2 setosa
## 4 4.6 3.1 1.5 0.2 setosa
## 5 5.0 3.6 1.4 0.2 setosa
## 6 5.4 3.9 1.7 0.4 setosa
How do we organize this data?
We organize data frames in canonical data form. There are two important features of canonical data form:
For example take a look at the first row in the data frame above. This represents the first iris in the sample which has the following attributes:
There are built-in functions in R that help us explore the structures and basic features of our data frames.
The "dim()" function returns the number of cases and the number of variables in a data frame.
dim(iris)
## [1] 150 5
The "names()" function returns the names of the variables in a data frame.
names(iris)
## [1] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width"
## [5] "Species"
The "head()" function returns the first six cases of a data frame.
head(iris)
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3.0 1.4 0.2 setosa
## 3 4.7 3.2 1.3 0.2 setosa
## 4 4.6 3.1 1.5 0.2 setosa
## 5 5.0 3.6 1.4 0.2 setosa
## 6 5.4 3.9 1.7 0.4 setosa
The "summary()" function returns a summary of a data frame. The function provides basic summary statistics for each variable in the data frame.
summary(iris)
## Sepal.Length Sepal.Width Petal.Length Petal.Width
## Min. :4.30 Min. :2.00 Min. :1.00 Min. :0.1
## 1st Qu.:5.10 1st Qu.:2.80 1st Qu.:1.60 1st Qu.:0.3
## Median :5.80 Median :3.00 Median :4.35 Median :1.3
## Mean :5.84 Mean :3.06 Mean :3.76 Mean :1.2
## 3rd Qu.:6.40 3rd Qu.:3.30 3rd Qu.:5.10 3rd Qu.:1.8
## Max. :7.90 Max. :4.40 Max. :6.90 Max. :2.5
## Species
## setosa :50
## versicolor:50
## virginica :50
##
##
##