4. Packages and Functions

In this lesson we will review the basics of packages (including Tidyverse) and functions.

Packages

A package is a collections of code, data, and documentation all bundled together in a standardized way that can be easily shared and installed by other R users.

Palmerpenguins

To start, let’s have a look at the palmerpenguins package.

# install.packages("palmerpenguins")

library(palmerpenguins)
penguins
# A tibble: 344 × 8
   species island    bill_length_mm bill_depth_mm flipper_length_mm
   <fct>   <fct>              <dbl>         <dbl>             <int>
 1 Adelie  Torgersen           39.1          18.7               181
 2 Adelie  Torgersen           39.5          17.4               186
 3 Adelie  Torgersen           40.3          18                 195
 4 Adelie  Torgersen           NA            NA                  NA
 5 Adelie  Torgersen           36.7          19.3               193
 6 Adelie  Torgersen           39.3          20.6               190
 7 Adelie  Torgersen           38.9          17.8               181
 8 Adelie  Torgersen           39.2          19.6               195
 9 Adelie  Torgersen           34.1          18.1               193
10 Adelie  Torgersen           42            20.2               190
# … with 334 more rows, and 3 more variables: body_mass_g <int>,
#   sex <fct>, year <int>

Tidyverse

Tidyverse is a package, well it’s actually a collection of 8 packages (ggplot2, dplyr, tidyr, readr, purrr, tibble, stringr, forcats) introduced by Hadley Wickham to help with data manipulation, exploration, and visualization.

# install.packages("tidyverse") # run in console

library(tidyverse)
── Attaching packages ───────────────────────────── tidyverse 1.3.1 ──
✔ ggplot2 3.3.6     ✔ purrr   0.3.4
✔ tibble  3.1.7     ✔ dplyr   1.0.9
✔ tidyr   1.2.0     ✔ stringr 1.4.0
✔ readr   2.1.2     ✔ forcats 0.5.1
── Conflicts ──────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()

Errors installing new package…what do you do when you encounter errors?

If you are an active R user, I can almost guarantee that sooner or later you will run into error messages when trying to install an R package. So what do you do when this happens? To see, let’s go through an example.

Let’s say we discovered there’s an emo package that let’s us insert emojis and we want to install the package.

  1. Like any other package, the first thing we do is try to install the package in the console with the following code:

  1. Turn to Google! I typically go to Google and search “install package X in R”.

  2. Find the package repository on GitHub. If packages aren’t on CRAN, you should be able to find the package repository on GitHub and then you can install the package via devtools (another package).

  1. On the GitHub Repository, scroll down and look for instructions on how to install the package.
# install.packages("devtools")
devtools::install_github("hadley/emo")
# install.packages("devtools")
library(devtools)
install_github("hadley/emo")

Remember, it’s not enough to install the package. We also have to load the library.

library(emo)
ji("smile")
😄 

Functions

A function is a set of statements (or code) organized to perform a specific task. Essentially, a function is like a black box where you enter inputs and you get an output. R has a lot of built-in functions.

For example, there’s a mean() function that calculates the average of the data you include in the function.

my_values = c(10,20,30)

mean(my_values)
[1] 20

Arguments

An argument is the information passed into any given function. In the example above we passed in one argument consisting of “my_values”. The output returned the average of 10, 20, and 30.

As another example of this function, let’s say we wanted to get the average bill length of the penguins in the palmerpenguins dataset.

mean(penguins$bill_length_mm)
[1] NA
mean(penguins$bill_length_mm, na.rm = TRUE)
[1] 43.92193

Help Documentation

mean() Help Documentation

Why Use Functions?

Summary

In this lesson we first learned about packages Specifically, we learned how to install packages for the first time and how it’s necessary to “load” in a package using the library() function every time you start a new R session. We also went through an example of what to do when you encounter an error message after running install.packages().

We also learned about functions and why we should use functions. Importantly, we introduced R’s Help Documentation, which is a great resource if you are using a function for the first time and not quite sure where to start.

Exercises

  1. Load in the packages palmerpenguins and tidyverse and print the penguins dataset.
  1. Notice that when you printed the penguins variable, it printed either in your console or directly below your code, but it did not save the variable in your environment.

Now, save the penguins dataset to a new variable called penguins_data and then print penguins_data.

  1. Using the help documentation, look up the function head(). See if you can understand what this function does.

Hint: in the console, use a ? followed by the function name.

  1. Use the head() on the peguins_data.
  1. Return to the help documentation for head(). In the description, it mentions a second function that is the opposite of head(), what is that function? Test it out on the penguins_data.
  1. What is the average flipper length for all of the penguins in our dataset?

If you get stuck, review the example above where we calculate the average bill length.

THE END 🎉