LLO 8200 Intro to Data Science

Logo

Course files and content for the Introduction to Data Science Course.

View the Project on GitHub brittanymosby/edd_datascience

Welcome to Data Science Sp21

This is the supplementary page for Professor Mosby’s section of LLO 8200 Introduction to Data Science. Make sure you bookmark and check this page frequently for the most up-to-date files to use for Async lectures and homework assignments. Over the course of the term code may be optimized or corrected, so please alert me if you’re going to be working ahead. (If you do work ahead, you will be responsible for updating your work if the assignments have changed.)

Async Lecture RStudio (.Rmd) Files

These are the RStudio files and datasets that you will use in tandem with the Async lecture videos each week.

  1. Module 1 - Introduction, College datafile
  2. Module 2 - Conditional Means, Student Debt datafile
    • New video instructions: First, watch 2.1 in the LMS; then, watch “conditional means part 1” (posted to course wall) and then “conditional means part 2” (posted to course wall). These videos will direct you when to watch LMS sections 2.7 and 2.8.
  3. Module 3 - Plotting Conditional Means, Attrition datafile
  4. Module 4 - Getting Data: Flat Files and Tidy Data
  5. Module 5 - Using Regression for Prediction, ELS training data, ELS testing data
  6. Module 6 - Plotting Linear Regression with Scatterplots, ELS full data
  7. Module 7 - Getting Data: Web Sources (This is the newest version, that matches Live Session from this week)
  8. Module 8 - Classification, Za training file, Za testing file
    • (Here is another good link for understanding the confusion matrix.)
  9. Module 9 - Plotting for Classification, Za data file
  10. Module 10 - Crossfold Validation
  11. Module 11 - Databases (Optional, but recommended)
  12. Module 12 - Unsupervised Learning (Optional)
  13. Module 13 - Creating Interactive Graphics with ‘plotly’ (Optional, but recommended)

Live Session PPT Files

Known Bugs/Issues in Async Lecture

Assignments

Each assignment (problem set) is worth 100 points. The problem sets are due on the Sunday prior to the following week’s live session. All assignment submissions must include both the .Rmd file and a knitted file (html, doc, or pdf). There may be a penalty for submissions not meeting this requirement.

File Due Date
Assignment 1 Sunday 1/17
Assignment 2 Tuesday 1/26
Assignment 3 Tuesday 2/2
Assignment 4 Tuesday 2/9
Assignment 5 Tuesday 2/16
Assignment 6 Tuesday 2/23
Assignment 7 Saturday 3/13
Assignment 8 Saturday 3/20
Assignment 9 Saturday 3/27
Assignment 10 Saturday 4/3

Final Project

The final project for this course will entail utilizing techniques learned over the semester to answer a research question with data of your choosing. You may work in groups of up to three people, or individually. Here is the rubric for the final project that will be used for grading.

This is the formatting example we went over in class on 4/19.

Progress Report Due Date
PR #1 Monday 2/22
PR #2 Week of 3/22
PR #3 Week of 4/12

RStudio Reference Guides

A non-exhaustive list of some of my favorite cheat sheets and guides for RStudio