5 Lab 5: Data transformation 1
5.1 Learning Outcomes
By the end of this lab, students will be able to:
- Pick rows based on their values
- Reorder rows based on their values
- Use the pipe operator
5.2 Introduction
The following assignment implements a just-in-time learning approach where you will acquire knowledge and skills exactly when you need them, rather than in advance.
5.3 Prerequisites
Complete the following instructions to prepare for the lab:
- Set the following global options in RStudio
- Go to the Tools menu > Global Options
- Under General > Basic, uncheck the boxes under Workspace and History
- Under Code > Editing, check the box for “Use native pipe operator”
- Under Code > Completion, check the box for “Use tab for multiline autocompletions”
- Claim your repository on GitHub by following the link on D2L in the Lab 5 Content module.
- Clone your repository to your computer using the New Project > Version Control > Git method
- Make your first commit by adding the R project file and gitignore file to Git.
5.4 Assignment
5.4.1 Instructions
- Open the
questions.R
script. This is where you will write the code to answer the questions below. - If it is not already installed, then install the tidyverse package. Do not add code to the script to do this.
- Run the code under the sections:
-
load packages ----
- Loads the required packages
-
look at penguin data ----
- A variety of commands to remind you how to look at a dataset by printing various things in the console
-
example of pipe usage ----
- Commands that illustrate how to use the pipe operator
|>
. - Read section R4DS 3.1.3 dplyr basics so you have a general idea of how the pipe is used.
- Commands that illustrate how to use the pipe operator
-
- Answer the questions in the
Questions ----
section of the script by writing code- Below each question, write code to perform the requested action
- Do not assign the results a name (i.e. do not create an object) using the assignment operator. For example, if the question asked you to glimpse the penguins data:
- Correct:
penguins |> glimpse()
- Incorrect:
new_df <- penguins |> glimpse()
- Correct:
- Include only one command per question.
Do not include every version of code you tried.
Only include a single command that performs the required action.
-
The command may extend over multiple lines, for example:
penguins |> glimpse()
5.4.2 Questions
- Filter the penguins data to include only birds with bill lengths greater than 55 mm.
- Familiarize yourself with the filter function: R4DS 3.2.1
filter()
- Know how to use the logical operators
- Familiarize yourself with the filter function: R4DS 3.2.1
- Filter the penguins data to include only birds with flipper lengths less than or equal to 180 mm.
- Filter the penguins data to include only Chinstrap Penguins.
- Avoid the
=
mistake described in R4DS 3.2.2 Common mistakes - Note that species is a categorical variable (see the
<fct>
below the variable name when you print the dataset in the console). See the first code block in R4DS 3.4 The pipe for an example of filtering on a categorical variable.
- Avoid the
- Filter the penguins data to exclude penguins on Biscoe Island.
- Use the “not equal to” operator described in R4DS 3.2.1
- Filter the penguins data to include penguins captured in 2007 and 2009.
- Use the “in” operator
%in%
described in R4DS 3.2.1
- Use the “in” operator
- Filter the data to include only penguins with a missing value for bill length.
- Use the
is.na()
function - See examples of this in R4DS 4.3 Pipes
- Use the
- Filter the penguins data to exclude penguins with a missing value for sex.
- Filter the penguins data to include only Chinstrap Penguins with flipper lengths less than or equal to 180 mm.
- Master the “and” operator
&
or the comma,
as described in R4DS 3.2.1
- Master the “and” operator
- Arrange the penguins dataset in increasing order by flipper length.
- Arrange the penguins dataset in decreasing order by body mass.
- Filter the penguins data to include only Chinstrap Penguins with flipper lengths less than or equal to 180 mm and then arrange the rows by decreasing body mass.
- This requires using the pipe operator: R4DS 3.4 The pipe
5.4.3 Grading
Your grade will be based on the completeness and accuracy of the code in your script.
Commands will be run starting at the top of the script.
Questions will be marked correct only if the code runs without error and produces the requested output in the console.
Before submitting, you are strongly advised to run your entire script from top to bottom, one line at a time, to ensure it does not produce errors and it prints the requested output in the console.
5.5 Submission
- Save changes in your R script
- Stage your changes by checking the boxes next to files in the git tab
- Commit the changes
- Push the changes
- Go to Github and copy the URL to your lab repository online
- The easiest way to copy the URL is to click the green Code button, then the copy button
- Submit the URL to your assignment on D2L