Introduction to Data Analysis and Graphics using R

Hellen Gakuruh
2017-03-06

Introduction to R and RStudio

Outline

  • Introduction to base R and RStudio
  • Download and Install both R and RStudio
  • Layout (Windows and Panes)
  • Interactively work with R and RStudio's console
  • Global environment/working space and history files
  • Code using scripts
  • Install R Packages
  • Working directory and RStudio's Projects
  • Getting Help

Introduction to base R and RStudio

In this session we get to know a bit about R and Rstudio, like what they are, how they differ and where we can get them.

About R

  • A system for statistical computation and graphics (R Documentation)
  • A Programming Language
  • A dialect of the S language and started in 1983
  • Partly named after the initials of first names of the leading founders Ross Ihaka and Robert Gentleman and partly to bear similarity with S
  • Began as a program for teaching statistics at a university
  • Has since grown with diverse user from all over the world
  • A collaborative project with many contributors to the base package as well as extensions (packages)

Why R?

  • It's absolutely free
  • Has magnificent graphing capabilities: R's greatest strength.
  • As a programming language, it's highly extensible; it allows user defined functions and packages (add-on)
  • Growing number of packages to facilitate documentations (Word, PDF, HTML) and reproducible analysis (Rmarkdown, bookdown, blogdown)
  • Innovative packages for interactive statistical application (apps) like shiny
  • A growing number of users

Why R? (cont.)

  • A growing number of free and commercial Integrated Development Environment (IDE's) [1].
  • To distinguish R from it's IDE's, it is usually referred to as base R.
  • With good foundation, R is easy to work with.

[1]: IDE's are software's which ease coding process, they include RStudio, and Revolution R.

What is RStudio?

  • One of R's Integrated Development Environment(IDE)'s. Some of it's key advantages over base R are:
    • Well thought out and organized layout(panes), making it easy to see all the core windows at the same time.
    • It has a syntax-highlighting editor that supports direct code execution,
    • Workspace management.

What is RStudio (cont)

  • Makes data importation easier
  • Incorporation of R markdown files makes documentation and reproducible analysis easier.
  • It is available for open source and commercial editions and runs on the desktop (Windows, Mac, and Linux) or in a browser connected to RStudio Server or RStudio Server Pro (Debian/Ubuntu, RedHat/CentOS, and SUSE Linux) RStudio.

Download and install both programs

In this sub-session we discuss and demonstrate how to install and download R and Rstudio.

Downloading base R

  • Base R is available from Comprehensive R Archives Network (CRAN)
  • CRAN is a collection of web servers that stores identical and up to date version of code and documentation of R
  • There are multiple mirrors (copies) of CRAN located across the globe. It is recommended to download from one of the mirrors closest to your area
  • Select a platform most suitable to your computer specification (OS, 32/64 bit)
  • Now lets download and start it up (live demonstration)

Downloading RStudio

  • Download from RStudio
  • Select platform specific version
  • Click the executable file to begin the installation process

(live demonstration)

Layout (Windows and Panes)

In this sub-session we quickly look at R's tool bar and windows.

Base R layout

Tool bar (live demo)

  • File: Open/create script, load/save workspace and history, print and exit. Workspace and history files to be discussed later in this session
  • Edit: Usual edit options plus run options and GUI preference (appearance) options
  • Packages: selecting mirrors, loading, installing and uninstalling package. Packages are add-no's
  • Windows: How to arrange the windows
  • Help: Manuals and help documentation

Console

  • Also called Command Line Interface
  • It's an interactive platform (acts just like a calculator)
  • Enter an input and it would be evaluated and results outputted immediately (interactive)
  • As a window, it can be minimize, maximize, or re-size
  • Most suitable for single lines

Console (cont)

Interactive session on console (live demo)

# simple arithmetric
1 + 1
[1] 2
# Exponential
exp(1)
[1] 2.718282
# Some geometry  
2 * pi * (90/360)
[1] 1.570796
# Some trigonometry
cos(90 - 32)
[1] 0.1191801
atan(28/63)
[1] 0.4182243

Console (cont)

Note:

  • R is case sensitive, so Cos and cos are not the same.
  • To clear console Ctrl+L
  • Use up and down arrow to go back to a previous code
  • Console prompt “+” indicates an incomplete syntax; R is waiting for completion

R script

  • Scripts are text files used to write code
  • They are more suitable for reproducibility and multiple line of code (like creating a program)
  • Text editor (program for writing scripts) in base R, need to be loaded as a window from file > Scripts
  • Not interactive (don't give instant results) like console
  • To output results, must click edit then run all/selection, alternatively ctrl_R

R Script (cont)

Live demonstration on scripting in base R

Global environment/workspace

  • Environment in R is a place with list of object names and location of their associated values. It is also a list of parent environments as environment have a hierarchical nature.
  • Environments themselves do not store objects (values/data), they only point to where it's located.
  • First environment to be used is called a global environment or the workspace.
  • Global environment is searched first when an object name is given in a code, if not found, it's parent environment is searched (this will become clear as we create objects)
  • This is not visible in base R (use function ls())

RStudio Layout

  • Very user friendly
  • Has four panes with multiple tabs
  • Tool bar is similar to base R but with some additions

RStudio Panes

  • Top left usually Script/text editor and data viewer tabs
  • Top right usually global environment, history (logs) and additional tabs (like Build, Git and Presentations)
  • Bottom left usually console tab
  • Bottom right files, plots, packages, help and viewer tabs
  • However, these can be re-arranged

Interactive session on Rstudio

  • Not different than base R's console
  • Input code click enter and output generated

Live demonstration

Scripting in RStudio

  • Also like Base R with added bonus that it has it's own tab with easily accessible run button.
  • There are a variety of scripts that work well in R's script editor like .R (for pure R code), .Rmd (for reproducibility, text and code), md (markdown files), html and css.

Live demo (.R file)

Function Calls in R

  • Everything that happens in R is as a result of a function call
  • Functions are actions performed by R: commands
  • A function is is denoted by parenthesis: ()
  • Within parenthesis are arguments or parameters. Arguments are name-value pair used to give input or specify how a function should be done
  • There two types of functions, named functions and anonymous functions. Named functions include mean (to compute mean), median (to locate median), and read.table (to import data)
  • Functions can also be considered to be high or low level. High level functions are commonly used commands while low level commands are those called by high level functions
  • “Function call” means using a function to perform and action
  • When making function call, argument can be named or unnamed and it could have default values or not

Function call example: mean function

  • To access documentation of this and any other function; type ?function name e.g. ?mean
  • mean has arguments (x, trim = 0, na.rm = FALSE, …)
  • Arguments trim and na.rm have default values 0 and FALSE. These can be changed as need be but if okay, don't include in call.
  • Argument x has no default value hence it must be given.
  • When a function call is made with both name and value e.g. mean(x = data) it's a named argument call. If it's mean(data), then it's unnamed argument call
  • A function call with named arguments can be specified in any order without a problem (though not really good practice)
  • But unnamed argument call needs to be ordered the same way as function definition (Live demo)

Errors, warnings and messages

  • An error in R means something is not available, for example a data set is specified in a call yet it's unavailable in R. Errors are fatal, they stop execution
  • A warnings are information of possible problems, they do not stop execution. It good to check why this happens to pre-empt a possible problem
  • Messages are useful information, they have nothing to do with a problem, good examples are messages to inform on installation progress

(Live demo)

R Packages

  • Package is simply a mechanism for loading optional code, data and documentation.
  • All R functions and data sets are stored in packages and base R distribution includes about 30 packages [2].
  • Out of these 30 packages, there those packages that are considered part of R source code and therefore automatically loaded
  • The others are installed (exist), but are not available for use, they must be loaded with function library to make them available
  • Many other contributed packages exist in CRAN (Comprehensive R Network)
  • Generally, base R packages are sufficient to perform most basic statistics, but if specialized functions are needed, then check CRAN (start with tasks view )

[2]: R Core Team (2016). R: A language and environment for statistical computation. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/.

R Session and Working Directory

  • R session is a current active session, it begins when R is logged in and end when logged off
  • A working directory is a folder used during an R session to source and save files
  • It is important to specify working directory for each R session or globally change R to start from a folder considered as a working directory

Getting and setting working directory

  • To check current working directory, call getwd()
  • To set another working directory call setwd(dir)
  • Argument dir is a path (location) name), it can either be relative or absolute
  • Example: set("~/Data Mania Inc/Data_Mgt_Analysis_and_Graphics_R")

RStudio Projects

  • One of the recent feature in RStudio
  • Enables working with different yet inter-related work and document each with it's own working directory (folder)
  • Can be created in a new or an existing working directory or from a cloned version controlled repository )

Getting Help

  • Important to initially read internal help documentation like function documentation: access with either ?function or help(function) e.g. ?read.table or help("read.table"). Note, later case has to be quoted (“”).
  • R core team has invested a lot of time and energy to develop a number of user manual: access with help.start() (no arguments)
  • Manuals to read as a beginner in R are:
    • An introduction to R and
    • R Installation and Administration

  • Under references, Search Engine & Keywords can be handy in locating certain write-up
  • Under Miscellaneous Materials, “Frequently Asked Questions is a must read”
  • Other documentation in this page can be read in bits as they become relevant

Beyond R documentation

  • Do a Google search, specifically using R seek
  • Ask a knowledgeable (and helpful) friend
  • Search through R's help mailing list
  • Search through stackoverflow Q&A
  • Finally post a question to either Stackoverflow or R's mailing list (consider the latter) but never the same question to both sites

Useful resources