In this lesson we will review the basics of packages (including Tidyverse) and functions.
A package is a collections of code, data, and documentation all bundled together in a standardized way that can be easily shared and installed by other R users.
install.packages("package_name")
- note the package name must be in quotes. You only need to install a package 1 time on your local computer, so I suggest using the console to install packages.library(package_name)
- note the package name does not need to be in quotes here.To start, let’s have a look at the palmerpenguins
package.
palmerpenguins
package, there is a dataset called penguins
. Therefore, when you install and load in the package, you will automatically have access to this dataset.penguins
before we install or load in the package, we will get an error message:palmerpenguins
, we can see the penguins dataset!# install.packages("palmerpenguins")
library(palmerpenguins)
penguins
# A tibble: 344 × 8
species island bill_length_mm bill_depth_mm flipper_length_mm
<fct> <fct> <dbl> <dbl> <int>
1 Adelie Torgersen 39.1 18.7 181
2 Adelie Torgersen 39.5 17.4 186
3 Adelie Torgersen 40.3 18 195
4 Adelie Torgersen NA NA NA
5 Adelie Torgersen 36.7 19.3 193
6 Adelie Torgersen 39.3 20.6 190
7 Adelie Torgersen 38.9 17.8 181
8 Adelie Torgersen 39.2 19.6 195
9 Adelie Torgersen 34.1 18.1 193
10 Adelie Torgersen 42 20.2 190
# … with 334 more rows, and 3 more variables: body_mass_g <int>,
# sex <fct>, year <int>
Tidyverse is a package, well it’s actually a collection of 8 packages (ggplot2, dplyr, tidyr, readr, purrr, tibble, stringr, forcats) introduced by Hadley Wickham to help with data manipulation, exploration, and visualization.
You can check out the tidyverse to find out more details of each of the individual packages.
Whenever you load tidyverse, you’ll see a message saying it’s attaching all 8 packages. Therefore, you don’t have to load in any of the individual packages because they all get loaded in together. This means as long as you load in tidyverse
, you can use functions from any one of the 8 packages.
── Attaching packages ───────────────────────────── tidyverse 1.3.1 ──
✔ ggplot2 3.3.6 ✔ purrr 0.3.4
✔ tibble 3.1.7 ✔ dplyr 1.0.9
✔ tidyr 1.2.0 ✔ stringr 1.4.0
✔ readr 2.1.2 ✔ forcats 0.5.1
── Conflicts ──────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
If you are an active R user, I can almost guarantee that sooner or later you will run into error messages when trying to install an R package. So what do you do when this happens? To see, let’s go through an example.
Let’s say we discovered there’s an emo
package that let’s us insert emojis and we want to install the package.
package 'emo' is not available for this version of R
. At first, this may be a little misleading because you may think the issue is that you do not have the right version of R.install.packages()
function only works for packages that are part of CRAN.emo
is not there.Turn to Google! I typically go to Google and search “install package X in R”.
Find the package repository on GitHub. If packages aren’t on CRAN, you should be able to find the package repository on GitHub and then you can install the package via devtools (another package).
emo
package on Google, this is the first link: Emo GitHub Repository.emo
package!# install.packages("devtools")
devtools::install_github("hadley/emo")
devtools
in the code above because we are explicitly telling R that the install_github
function is from (::
) the devtools
package.# install.packages("devtools")
library(devtools)
install_github("hadley/emo")
emo
package!Remember, it’s not enough to install the package. We also have to load the library.
A function is a set of statements (or code) organized to perform a specific task. Essentially, a function is like a black box where you enter inputs and you get an output. R has a lot of built-in functions.
For example, there’s a mean()
function that calculates the average of the data you include in the function.
my_values
:An argument is the information passed into any given function. In the example above we passed in one argument consisting of “my_values”. The output returned the average of 10, 20, and 30.
As another example of this function, let’s say we wanted to get the average bill length of the penguins in the palmerpenguins dataset.
$
operator.mean(penguins$bill_length_mm)
[1] NA
mean(penguins$bill_length_mm, na.rm = TRUE)
[1] 43.92193
?mean
which will take you directly to the help documentation for the mean function.mean()
Help Documentation...
tells us that there are other arguments, but they are not required. For example, the “na.rm” is one argument, but it is not required.Functions save us a lot of time! We often rely on functions that other people have written. So, while we could have taken the time to write our own mean
function, there’s no need to because the function already exists. It should hopefully be clear from the last example how powerful the mean
function can be and how much time it can save us.
You can create your own functions specific to your code and your tasks. As a general rule, if you find yourself writing the same code more than twice, you should consider writing a function. Using functions can reduce your chances of making errors. For example, if you need to update your code and you have a function, you only need to update it in one place. However, if you have the same code written 5 times, there’s a pretty high chance that you’ll forget to update your code somewhere, which will result in an error.
Although writing your own function is beyond the scope of this lesson, here are some useful websites if you wish to learn more:
In this lesson we first learned about packages Specifically, we learned how to install packages for the first time and how it’s necessary to “load” in a package using the library()
function every time you start a new R session. We also went through an example of what to do when you encounter an error message after running install.packages()
.
We also learned about functions and why we should use functions. Importantly, we introduced R’s Help Documentation, which is a great resource if you are using a function for the first time and not quite sure where to start.
palmerpenguins
and tidyverse
and print the penguins dataset.Now, save the penguins dataset to a new variable called penguins_data
and then print penguins_data
.
head()
. See if you can understand what this function does.Hint: in the console, use a ? followed by the function name.
head()
on the peguins_data
.head()
. In the description, it mentions a second function that is the opposite of head()
, what is that function? Test it out on the penguins_data
.If you get stuck, review the example above where we calculate the average bill length.