6 Lab 6: Data transformation 2
6.1 Learning Outcomes
By the end of this lab, students will be able to:
- Pick columns in a dataset
- Rename variables
- Create new variables
6.2 Introduction
The following assignment implements a just-in-time learning approach where you will acquire knowledge and skills exactly when you need them, rather than in advance.
The goal for todays lab is to work with columns in a dataset.
6.3 Prerequisites
Complete the following instructions to prepare for the lab:
- Set the following global options in RStudio
- Go to the Tools menu > Global Options
- Under General > Basic, uncheck the boxes under Workspace and History
- Under Code > Editing, check the box for “Use native pipe operator”
- Under Code > Completion, check the box for “Use tab for multiline autocompletions”
- Claim your repository on GitHub by following the link on D2L in the corresponding Content module.
- Clone your repository to your computer using the New Project > Version Control > Git method
- Make your first commit by adding the R project file and gitignore file to Git.
6.4 Assignment
6.4.1 Instructions
- Create a new
questions.R
script. This is where you will write the code to answer the questions below. - If it is not already installed, then install the tidyverse package. Do not add code to the script to do this.
- Write the code to load your packeges.
- Put placeholder comments for the questions below.
- Answer the questions in the
questions ----
section of the script by writing code- Below each question, write code to perform the requested action
- Do not assign the results a name (i.e. do not create an object) using the assignment operator. For example, if the question asked you to glimpse the penguins data:
- Correct:
penguins |> glimpse()
- Incorrect:
new_df <- penguins |> glimpse()
- Correct:
- Include only one command per question.
Do not include every version of code you tried.
Only include a single command that performs the required action.
-
The command may extend over multiple lines, for example:
penguins |> glimpse()
6.4.2 Questions
- Use the
names()
function to see the names of the variables in the penguins dataset. - Select only the columns species, sex, and flipper length.
- Use the
select()
function as described in R4DS 3.3.2
- Use the
- Select the column species and all columns that begin with “bill_”
- Use the
starts_with()
helper function, which goes inside theselect()
function.
- Use the
- Select all columns except those ending with “_mm”
- Use the
ends_with()
helper function
- Use the
- Rename the
island
variable tolocation
- Use the
rename()
function as described in R4DS 3.3.3
- Use the
- Select the variables bill length and bill depth, then create a new variable named
aspect_ratio
by that is the ratio of bill length to bill depth.- Use the
mutate()
function as described in R4DS 3.3.1
- Use the
- Extra credit: Use pipes to select female Adelie penguins with a bill depth of at least 18 mm, arrange the results by decreasing bill depth, and select only the columns year and flipper_length_mm, in that order.
6.4.3 Grading
Your grade will be based on the completeness and accuracy of the code in your script.
Commands will be run starting at the top of the script.
Questions will be marked correct only if the code runs without error and produces the requested output in the console.
Before submitting, you are strongly advised to run your entire script from top to bottom, one line at a time, to ensure it does not produce errors and it prints the requested output in the console.
6.5 Submission
- Save changes in your R script
- Stage your changes by checking the boxes next to files in the git tab
- Commit the changes
- Push the changes
- Go to Github and copy the URL to your lab repository online
- The easiest way to copy the URL is to click the green Code button, then the copy button
- Submit the URL to your assignment on D2L