3. Data Structure - Part 1

In this lesson we will review different data structures in R, including vectors, matrices, and arrays.

A data structure is a unique way of organizing data so that it can be stored, processed, and retrieved effectively. In this lesson we will review some of the different data structures in R, including the following:

In the Data Structure - Part 2 lesson, we will review:

Depending on what you are using R for, you will probably use a specific type of data structure most frequently. For example, in my research, I use data frames and tibbles all the time and rarely use matrices or arrays. However, it is still useful to be aware of and understand the different types of data structures.

Vectors

In R, a vector is the simplest type of data structure. It is a sequence of data elements of the same basic type.

In the example below, we have three people (who happen to be myself and my two brothers) - Josh, Jenny, and Brandon. Here, we are creating three separate vectors:

names <- c("Josh", "Jenny", "Brandon")
names
[1] "Josh"    "Jenny"   "Brandon"
age <- c(31, 30, 27)
age
[1] 31 30 27
blue_eyes <- c(TRUE, FALSE, FALSE)
blue_eyes
[1]  TRUE FALSE FALSE

Once you run the above R chunk, you can click on the Environment tab and see how the data is stored. It even shows the data types (num, logi, chr).

In this example, it’s important to notice that each vector only contains one type of data. We can also see the type of data stored with the class() function.

The lines of code below are not being assigned (or saved) to any variables, so the results will be returned in the console, but not saved to the Environment.

class(names)
[1] "character"
class(age)
[1] "numeric"
class(blue_eyes)
[1] "logical"

Matrices

A matrix has 2 dimensions of data and contains only one type of data. Matrices look like a typical table. In my experience, matrices typically contain numeric values, but there can also be character matrices.

my_matrix <- matrix(data = 1:25, nrow = 5, ncol = 5)
my_matrix
     [,1] [,2] [,3] [,4] [,5]
[1,]    1    6   11   16   21
[2,]    2    7   12   17   22
[3,]    3    8   13   18   23
[4,]    4    9   14   19   24
[5,]    5   10   15   20   25

If you want to fill in your matrix by rows (instead of columns), you can set the byrow argument to equal TRUE like in the example below.

Please note, we will review functions and arguments in more detail in a couple of lessons.

my_matrix_byrow <- matrix(data = 1:25, nrow = 5, ncol = 5, byrow = TRUE)

my_matrix_byrow
     [,1] [,2] [,3] [,4] [,5]
[1,]    1    2    3    4    5
[2,]    6    7    8    9   10
[3,]   11   12   13   14   15
[4,]   16   17   18   19   20
[5,]   21   22   23   24   25

Here’s an example of a matrix with character strings, specifically the colors of the rainbow.

rainbow_matrix <- matrix(data = c("red", "orange", "yellow", 
                                  "green", "blue", "purple"), nrow = 2, ncol = 3)
rainbow_matrix
     [,1]     [,2]     [,3]    
[1,] "red"    "yellow" "blue"  
[2,] "orange" "green"  "purple"

You can access an item within your matrix by using [] where the first number represents which column and the second represents which row.

my_matrix[2,4]
[1] 17
my_matrix_byrow[2,4]
[1] 9
rainbow_matrix[1,3]
[1] "blue"

Next, we’ll test out what happens if we try to create a matrix that is smaller than our data:

matrix(data = 1:25, nrow = 4, ncol = 4)
     [,1] [,2] [,3] [,4]
[1,]    1    5    9   13
[2,]    2    6   10   14
[3,]    3    7   11   15
[4,]    4    8   12   16

Next, let’s test out what happens if we try to create a matrix that is larger than our given data:

matrix(data = 1:25, nrow = 6, ncol = 6)
     [,1] [,2] [,3] [,4] [,5] [,6]
[1,]    1    7   13   19   25    6
[2,]    2    8   14   20    1    7
[3,]    3    9   15   21    2    8
[4,]    4   10   16   22    3    9
[5,]    5   11   17   23    4   10
[6,]    6   12   18   24    5   11

Matrices are often used for data transformation. So as a final example of matrices, let’s see how we can easily transform our my_matrix matrix to a new matrix that is multiplied by 2.

# original matrix
my_matrix
     [,1] [,2] [,3] [,4] [,5]
[1,]    1    6   11   16   21
[2,]    2    7   12   17   22
[3,]    3    8   13   18   23
[4,]    4    9   14   19   24
[5,]    5   10   15   20   25
# matrix multiplied by 2
my_matrix*2
     [,1] [,2] [,3] [,4] [,5]
[1,]    2   12   22   32   42
[2,]    4   14   24   34   44
[3,]    6   16   26   36   46
[4,]    8   18   28   38   48
[5,]   10   20   30   40   50

Arrays

An array has 1 or more dimensions of data, but only contains a single data type.

Even though we are calling an array() we can see that this type of an array is simply a vector.

vector <- 1:25

array(vector)
 [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22
[23] 23 24 25
array(vector, dim = c(5,5))
     [,1] [,2] [,3] [,4] [,5]
[1,]    1    6   11   16   21
[2,]    2    7   12   17   22
[3,]    3    8   13   18   23
[4,]    4    9   14   19   24
[5,]    5   10   15   20   25

However, arrays can also contain more than 2 dimensions…let’s see what happens if we add another dimension.

array(vector, dim = c(5,5,2))
, , 1

     [,1] [,2] [,3] [,4] [,5]
[1,]    1    6   11   16   21
[2,]    2    7   12   17   22
[3,]    3    8   13   18   23
[4,]    4    9   14   19   24
[5,]    5   10   15   20   25

, , 2

     [,1] [,2] [,3] [,4] [,5]
[1,]    1    6   11   16   21
[2,]    2    7   12   17   22
[3,]    3    8   13   18   23
[4,]    4    9   14   19   24
[5,]    5   10   15   20   25

Here’s one final example, but I encourage you to play around with the array() and test out different numbers and dimensions

array(vector, dim = c(2,3,4))
, , 1

     [,1] [,2] [,3]
[1,]    1    3    5
[2,]    2    4    6

, , 2

     [,1] [,2] [,3]
[1,]    7    9   11
[2,]    8   10   12

, , 3

     [,1] [,2] [,3]
[1,]   13   15   17
[2,]   14   16   18

, , 4

     [,1] [,2] [,3]
[1,]   19   21   23
[2,]   20   22   24

For more examples check out this website.

Summary

In this lesson we introduced three types of data structures: vectors, matrices, and arrays.

Exercises

  1. Create a vector with three names: jennifer, taylor, and miley. Save this vector as artists. Print your results.

What is the class of artists?

  1. Create a vector with the three artists heights in inches (Jennifer = 64.57, Taylor = 70.87, Miley = 64.96). Save the vector as artists_heights. Make sure to save the heights in the same order as you saved the names.

What is the class of artists_heights?

  1. Create a matrix containing the values 1:10 with 2 rows and 5 columns.
  1. Create a matrix containing the values 1:10 with 5 rows and 2 columns.
  1. Create a matrix containing the values 1:10 with 5 rows and 2 columns, but this time fill in the values by rows (instead of the default which is by column).
  1. Challenge Exercise Create a matrix containing the first 10 letters in the alphabet with 5 rows and 2 columns and fill in the values by rows.

Hint: if you get stuck, try to use Google to learn how to print letters of the alphabet in R.

THE END 🎉