Students will gain experience with the data science process including: data collection; stating the question; data wrangling; data analysis; data interpretation; and communication by working on projects. The projects will involve data collected by an organization (e.g., organization or scientist), using published data, or scraping web pages. All projects will involve some type of collaboration or communication. Students are expected to be familiar with the application of basic statistical methods used for inference (e.g., general linear models), prediction (e.g., linear and logistic regression), and are comfortable with basic data analysis using a programming language such as R or Python. Students will be expected to adopt a reproducible research workflow using tools such as R Markdown, R Notebook, or Jupyter Notebook.
Class time will be a mixture of informal lectures, class discussions, meetings with collaborators, and student presentations.
This course will not cover specific “methods”, nevertheless it’s important that students are able to independently learn and apply unfamilar methods.
All work will be graded on a scale from 1 to 4 (sometimes with pluses and minuses) where:
Grade value | Description |
---|---|
1 | Work does not meet expectations. |
2 | Work meets expectations minimally, possibly missing some. |
3 | Good work; meets all or most expectations. |
4 | Excellent work; exceeds expectations. |
Grades will almost always be 2 or 3 (1’s and 4’s are rare). Generally speaking, a 2 is a B, a 3 is an A, and a 4 is an A+.
Project | Item | Value |
---|---|---|
Project #1 | Proposal | 5% |
Draft report | 5% | |
Final report | 10% | |
Presentation on project #1 | 20% | |
Project #2 | Proposal | 5% |
Draft report | 5% | |
Final report | 10% | |
Presentation on project #2 | 20% | |
Participation | Attendance, participate in discussions, prepare for class | 20% |
Class | Date | Description | Reading | Due |
---|---|---|---|---|
1 | 09-12 | Introduction, data analysis case study | ||
2 | 09-19 | Data Analysis, questions, web scraping, discuss ideas project #1 | R. D. Peng and Matsui (2015) (1-3), Leek and Peng (2015) | |
3 | 09-26 | Exploratory data analysis, discuss ideas project #1 | R. D. Peng and Matsui (2015) (4), Donoho (2015) | |
4 | 10-03 | Models | R. D. Peng and Matsui (2015) (5,6,7) | Project #1 proposal |
6 | 10-10 | Inference vs. prediction, discuss project #2 | R. D. Peng and Matsui (2015) (8), Breiman (2001) | Project #1 draft report |
7 | 10-17 | Project #1 presentations | Project #1 report due, meet with collaborators by 10-27 | |
8 | 10-24 | Project #1 presentations | Project #2 proposals due | |
9 | 10-31 | Project #2 check-in | ||
- | 11-07 | No class - Fall reading week | ||
9 | 11-14 | Project #2 check-in | Lazer et al. (2014) | |
10 | 11-21 | Project #2 check-in and quick presentations | Project #2 draft report | |
11 | 11-28 | Project #2 presentations | ||
12 | 12-5 | Project #2 presentations | Project #2 report due |
Breiman, Leo. 2001. “Statistical Modeling: The Two Cultures (with Comments and a Rejoinder by the Author).” Statistical Science 16 (3). Institute of Mathematical Statistics: 199–231.
Donoho, David. 2015. “Years of Data Science, 2015.” http://courses.csail.mit.edu/18.337/2015/docs/50YearsDataScience.pdf.
Lazer, David, Ryan Kennedy, Gary King, and Alessandro Vespignani. 2014. “Google Flu Trends Still Appears Sick: An Evaluation of the 2013-2014 Flu Season.”
Leek, Jeffery T., and Roger D. Peng. 2015. “What Is the Question?” Science 347 (6228). American Association for the Advancement of Science: 1314–5. doi:10.1126/science.aaa6146.
Peng, Roger D, and Elizabeth Matsui. 2015. “The Art of Data Science.” A Guide for Anyone Who Works with Data. Skybrude Consulting. https://bookdown.org/rdpeng/artofdatascience/.