Wrangling Data in R

Data Dance
From Allison Horst


Syllabus Contents


Catalog Description

Student will use the “tidyverse” to “wrangle” (i.e., manipulate) data for ease of analysis and visualization. Foundational principles of data structures; reading external files; relational data; selecting, renaming and rearranging variables; filtering, sorting, and isolating observations; summarizing results by groups; and handing dates, times, strings, and factors will be emphasized so that students can work with a wide variety of data formats. Class examples will be drawn from a variety of fields including the environmental, natural resources, and social sciences; business; and sports. Prerequisite is MTH107 or instructor’s consent after demonstrating a simple familiarity with the R software (instructor can provide preparatory resources).

Learning Outcomes

In this course, you will have the opportunity to:

  1. Describe the characteristics of “tidy data.”
  2. Read external data from a variety of formats.
  3. Combine data from tables related by a primary key into a single table.
  4. Reorganize data from a “longer” to a “wider” format and vice versa.
  5. Select, move, rename, and add variables to an existing data frame.
  6. Select, filter, arrange, and append observations to an existing data frame.
  7. Summarize data by groups.
  8. Properly handle date-time data.
  9. Properly create and use factor variables.
  10. Properly handle complex strings data.
  11. Wrangle personally collected data to a format useful for further analysis or visualization.

While this course does not fulfill any requirements in the Liberal Education for the Environment and Society program it does support the “[c]ommunicate mathematical information … symbolically, visually …” outcome.


Delivery

XXX


Assistance

This class will have lots of coding in R. Learning R can be difficult at first — it’s like learning a new language, just like Spanish, French, or Chinese.1 Hadley Wickham, chief data scientist at RStudio and author of ggplot2, said this:2

It’s easy when you start out programming to get really frustrated and think, “Oh it’s me, I’m really stupid,” or, “I’m not made out to program.” But, that is absolutely not the case. Everyone gets frustrated. I still get frustrated occasionally when writing R code. It’s just a natural part of programming. So, it happens to everyone and gets less and less over time. Don’t blame yourself. Just take a break, do something fun, and then come back and try again later.

This cartoon illustrates what Hadley explains and how you may feel at some point this term.

From Allison Horst

If you find yourself in this situation follow the advice of Hadley above and Allison below.

This advice is predicated on two things. First, you start early enough that you have time to ask for help and wait for a response. Second, that you will ask for help when you need it. I am here to help you and I want to help you learn. You can succeed in this class, especially if you are organized and reach out for help when you have questions or are stuck.

I will, however, not be able to monitor my e-mail constantly during this May term. Thus, please use MSTeams (see quick link on homepage) to ask questions that may be able to be answered by others in class or if the eventual answer might be useful to others in class.3 Please also answer questions posted by others on Teams if you feel that you know the answer. I will make every attempt to check my e-mail and Teams every day in the later afternoons.4 Questions that are specific to you should be sent directly to me.5

If you do ask questions on Teams or directly to me, please include the following items in your question:

  1. A clear explanation of the problem, with as much detail as you can offer.
  2. Which data you are using. If you are using your own data then please attach the data file.
  3. An attachment of your R script so that, with your data, I can re-create your analyses.

Note that you can only include the attachments in direct e-mails to me.

 

Accommodations

I want to create an inclusive and accessible learning environment for those of you that have a condition (e.g., attention, learning, vision, hearing, mental, physical, or other health-related concern) that may require special accommodations. If you have already established accommodations with the Office of Accessibility Resources (OAR), please communicate your approved accommodations to me as soon as possible so that we can discuss your needs in this course. If you have a condition that requires accommodations but you have not yet established services through OAR, then you should contact Jennifer Newago as soon as possible (Ponzio 230, x1387, or accomodations@northland.edu). It is the policy and practice of Northland College to create inclusive and accessible learning environments consistent with federal and state law. More information is available here.


Grading

An overall grade will be computed from your performance on daily exercises (75%) and a final project (25%), which are both described below. Your letter grade will be assigned from your overall percentage (rounded to a whole number) and the table below.

A 92-100A- 90-91
B+ 87-89B 82-86B- 80-81
C+ 77-79C 70-76
D+ 67-69D 60-66F 0-59

Exercises

Most course modules will have exercises that will be due by XXX.

All exercises should be formatted as described here. Each exercise set is worth 10 points and will be graded with a two-part rubric. The first part of the rubric is based on your completion of the exercise.

5 points5-1 points0 points
All parts of the exercise were completed in full and followed the required homework format.Some parts of the exercise were either not attempted or were incomplete. The required format was followed.Very little of the exercise was completed or the required format was not followed.

The second part of the rubric is based on your correctness in performing the work.

5 points5-1 points0 points
All or nearly all parts of the exercise were correct.Various amounts of the exercise were done incorrectly.Very little of the exercise was done correctly.

Exercises handed in late may still receive “completeness” points but may not receive any “correctness” points. Please make every effort to not fall behind by turning in your exercises on time.

 

Final Project

To demonstrate your ability to wrangle data in R, I am asking you to identify data that is of interest to you and wrangle it. YOU SHOULD SEE ME ABOUT THE SPECIFICS OF THIS PROJECT RELATIVE TO THE DATA THAT YOU ARE INTERSESTED IN.

Items to be considered when grading your final project are:

  • XXX

 

Note About Incomplete Grades

An incomplete grade will be given ONLY in extreme circumstances that are beyond your control, such as a major illness, and will ONLY be given if you have successfully completed the entire course except for the final exam. This is in accordance with Northland College policy (scroll down to “Incomplete Grades”).

Academic Alerts

As you adjust to college, you may benefit from working with a professional on your organization, motivation, and stress level. If I observe you struggling with the course early in the term then I may file an “Academic Alert” about you. If this happens, you will receive an e-mail from me that explains steps you can take to improve your performance in the course. Our Academic Success Coordinator, Gina Kirsten, will receive the alert and will likely also reach out to you to set up a time to further discuss ways to improve in the course. Academic Alerts are not punitive, they are simply an attempt to help you get back on track in this course as soon as possible.


Classroom Conduct and Academic Integrity

Conduct and Integrity I hope that everyone in this class will feel comfortable express themselves, asking questions, and freely participating. Thus, please treat each other with courtesy and respect, and refrain from offensive or inappropriate language during any part of the class. Please do not post non-course related or personal or non-academic material on MSTeams. Issues related to conduct and integrity should be sent directly to me via my email.

Students are expected to submit work which is their own. Plagiarism or cheating will not be tolerated. If either is the case you may have your grade for the assignment lowered or you may fail the course. The College’s Academic Integrity Statement & Policy will be followed in this course. Please make sure that you are familiar with its content.


Footnotes

  1. The idea for this section came from Dr. Ted Laderas

  2. This is from the last question in this interview

  3. There are several students in this course that have used R in previous courses. They should be able to answer R related questions for those of you that are just getting started with R. 

  4. I will often check my e-mail more often, but I won’t be able to do that on a consistent basis. So, please, do not wait untl the last minute as you may not get an immediate response from me. 

  5. While I am pretty open-minded and not much of a stickler when it comes to e-mail etiquette, some professors are. And it is always better to send an appropriate rather than an inappropriate e-mail. Here and here are some good suggestions for e-mailing professors.