Basic Tutorial in R
R is a very simple language to understand and to implement. We will learn R in this tutorial in an easy manner.
What is R?
R is a programming language developed by Ross Ihaka and Robert Gentleman. It is commonly used in data analytics and scientific research. They decided to call their creation, simply, R.
It is one of the most popular languages used by data analysts to collect, clean and transform the data to make decisions and predict the required probabilities.
Basics of R
We will learn the following topics in this tutorial:-
- Help() function
- Print() function
- Comments
- Variables
- Vectors
- Lists
- Matrices
- Arrays
- Factors
- Data Frames
- Deleting Variables (using rm() function)
- Operations
- Arithmetic Operations
- Relational Operations
- Logical Operations
- Assignment Operations
- Miscellaneous Operations
- Decision Making
- Loops
- Packages
- Importing Dataset
- Graphs
- Pie chart
- Bar plot
- Histogram
- Scatter plot
- Box plot
- Functions
- Predefined Functions
- Numeric Functions
- Statistical Function
- Predefined Functions
This is Rstudio work space. There is a console window where you can run the commands directly. There is plots window to view different types of graphs. There is a help section to search about R topics and many others.
Now let us learn about all the
Help() function- This is an useful function and should be understood first. This provides you help about any topic in R you want to search. It’s command will be:
help(topic you want to search).
Print() function- You can directly use the print() function to print the required content.
Comments- You can put comments in your code using the # symbol. To use multi line comments put the comment in Single quotes or double quotes.
Variables- In programming languages we need variables to store the information. These variables are reserved memory spaces.
We have different types of variables as:-
- Vectors
- Lists
- Matrices
- Arrays
- Factors
- Data Frames
We even have different data types to store different kinds of values:-
- Logical
- Numeric
- String
- Complex
Vectors- We use the c() function to assign values to vectors
We can also access a particular element in these vectors
Lists- A list can have different types of elements such as vectors, strings and many others.
Matrices- It is a 2-D rectangular data set. It can be created using a vector input with a matrix function.
You can manipulate rows and columns value using “nrow” and “ncol” respectively.
Arrays- We earlier studied matrices, these matrices are confined to specific dimensions. Arrays could be of any size.
If we want to create 2 arrays of 3 rows and 2 columns then this will be the format:-
Factors- They are data objects which divide the elements as levels.
Data Frames- A data frame is 2-Dimensional array like structure in which each column is a variable name and every row is its corresponding value.
Deleting Variables- Variables can be removed or deleted using the remove() function or rm() function.
Operations- When we have numbers or list of numbers (vectors), we sometimes need to perform some basic mathematical operations on it. R provides us different types of operations.
- Arithmetic Operators
- Relational Operators
- Logical Operators
- Assignment Operators
- Miscellaneous Operators
Arithmetic Operators
Let us look at some of the arithmetic operators
Operator | Description |
+ | Addition |
– | Subtraction |
* | Multiplication |
/ | Division |
%% | Remainder |
%/% | Quotient |
^ | Power |
Relational Operator
Relational operators help us to compare vectors. They return TRUE or FALSE as output.
Operators | Description |
> | checks if each element of first vector
is greater than corresponding element of second vector |
< | checks if each element of first vector
is less than corresponding element of second vector |
== | checks if each element of first vector
is equal to corresponding element of second vector |
<= | checks if each element of first vector
is less than or equal to corresponding element of second vector. |
>= | checks if each element of first vector
is greater than or equal to corresponding element of second vector |
!= | checks if each element of first vector
is not equal to corresponding element of second vector |
Logical Operators
Operator | Description |
& | It is known as Element-wise logical AND operator. It combines each element of first vector and corresponding element in second vector and gives an output TRUE if both the elements are TRUE. |
| | It is known as Element-wise logical OR operator. It combines each element of first vector and corresponding element in second vector and gives an output TRUE if anyone of them is TRUE. |
! | It is known as Logical NOT Operator. Takes each element of the vector and gives the opposite logical value. |
&& | It is known as Logical AND Operator. It takes the first element of each vector and gives output TRUE if both are TRUE. |
|| | It is known as Logical OR Operator. It takes the first element of each vector and gives output TRUE if anyone of them is TRUE. |
Assignment Operator
These operators are used to assign values to vectors.
Operator | Description |
<-
= <<- |
Called left
Assignment |
->
->> |
Called Right
Assignment |
Left Assignment
Right Assignment
Miscellaneous Operator
Operator | Description |
: | Colon operator: Creates series of
numbers in sequence |
%in% | Checks if an element belongs to a
vector |
%*% | This operator multiplies a matrix
with its transpose |
Decision Making
Decision making means if a certain condition is true following task has to be done and if it’s not true some other task has to be done.
There are various decision making statements possible in R:
If statement:-
If else statement:-
Switch:-
Switch (Expression, list)
Loop- There might be situations when we need to execute a statement several number of times, so feasibly we will not write that statement again and again. This would make our task difficult. So at such places we use loops.
We have different types of loops as:
For loop:-
While loop:-
While (cond) expr
Packages
Packages are collection of data, R functions and some compiled code in. We can access them through a directory called library. They are by default installed during the installation of R.
You can see all the libraries already installed by the command: library() .We will get the following output:-
You can even a install a new package as per your requirement by following command:-
Install.packages(“Package Name”)
Importing Dataset
We have files stored in our systems and sometimes we need to use them. They can be any format like csv, xml, excel etc. To use a file it should be in the current working directory (Current working directory means the directory or the folder you currently working in).
With commands: – getwd() you can get the directory in which Rstudio is currently working in and with setwd() you can set your own directory.
CSV file- CSV stands for comma separated value. In a csv file the values are stored in comma separated format.
This is how csv file looks like with all the values separated with commas.
Reading a CSV File- To read a CSV file in R we use read.csv() function
Note: – In case you want to import an excel file use command read.xlsx(path of file).
Plots –Sometimes we need to analyze our data better so we do it through graphs. R provides us different types of graphs:-
Pie chart
We will create a pie chart using the pie() function. It takes only the positive values as input in vector form.
Bar plot
These are most commonly used graphs. They show the relationship between the numerical value and the categorical value.
Histograms
It is a type of graph whose area shown is proportional to the frequency of any variable and width is equal to class interval.
Scatter plot
It is a type of graph where 2 variables are plotted along 2 axes ,the resulting pattern specifies the correlation present between the variables.
Box plot
It is another type of representation where the data is represented with a rectangle to denote the quartiles.
Xlab- It denotes the label on the x-axis.
Ylab- It denotes the label on the y-axis.
Main- It gives the heading to our plot.
Functions
Functions are group of statements which perform certain tasks as defined by the programmer. R has a set of some predefined functions to make our work easier such as log(x), exp(x), mean(x), median(x) and many others. You can also define your own function in the given format:-
Function_name <- function (argument list……)
{
Function body
}
Let us understand with an example:-
To create a script in python, follow the steps: – File -> New File -> R script or use ctrl+shift+N
This will give you a new script:
Suppose we are making an addition function in a R Script:-
Default Arguments
Default arguments means giving the value to arguments when we define the function. Let us look at an example:
Predefined Functions
R has a set of some predefined functions which can be categorized in following ways:-
Numeric Functions:-
Some of the numeric functions which R provides us are: – abs(x), sqrt(x), ceiling(x), floor(x), trunc(x), round(x), signif(x), cos(x), sin(x), tan(x), exp(x).
Statistical Functions: – We have some statistical probability functions as: mean(x), median(x), sd(x), min(x), max(x) and many others.
Mean function gives the mean value of the argument, median provides the media.
sd stands for standard deviation. It calculates the standard deviation of the argument.
Min and Max provides the minimum and maximum value of the argument.