Course Description

Students will gain experience with the data science process including: data collection; stating the question; data wrangling; data analysis; data interpretation; and communication by working on projects. The projects will involve data collected by an organization (e.g., organization or scientist), using published data, or scraping web pages. All projects will involve some type of collaboration or communication. Students are expected to be familiar with the application of basic statistical methods used for inference (e.g., general linear models), prediction (e.g., linear and logistic regression), and are comfortable with basic data analysis using a programming language such as R or Python. Students will be expected to adopt a reproducible research workflow using tools such as R Markdown, R Notebook, or Jupyter Notebook.

Class time will be a mixture of informal lectures, class discussions, meetings with collaborators, and student presentations.

This course will not cover specific “methods”, nevertheless it’s important that students are able to independently learn and apply unfamilar methods.

Evaluation

All work will be graded on a scale from 1 to 4 (sometimes with pluses and minuses) where:

Grade value	Description
1	Work does not meet expectations.
2	Work meets expectations minimally, possibly missing some.
3	Good work; meets all or most expectations.
4	Excellent work; exceeds expectations.

Grades will almost always be 2 or 3 (1’s and 4’s are rare). Generally speaking, a 2 is a B, a 3 is an A, and a 4 is an A+.

Project	Item	Value
Project #1	Proposal	5%
	Draft report	5%
	Final report	10%
	Presentation on project #1	20%
Project #2	Proposal	5%
	Draft report	5%
	Final report	10%
	Presentation on project #2	20%
Participation	Attendance, participate in discussions, prepare for class	20%

Tentative Course Schedule

Class	Date	Description	Reading	Due
1	09-12	Introduction, data analysis case study
2	09-19	Data Analysis, questions, web scraping, discuss ideas project #1	R. D. Peng and Matsui (2015) (1-3), Leek and Peng (2015)
3	09-26	Exploratory data analysis, discuss ideas project #1	R. D. Peng and Matsui (2015) (4), Donoho (2015)
4	10-03	Models	R. D. Peng and Matsui (2015) (5,6,7)	Project #1 proposal
6	10-10	Inference vs. prediction, discuss project #2	R. D. Peng and Matsui (2015) (8), Breiman (2001)	Project #1 draft report
7	10-17	Project #1 presentations		Project #1 report due, meet with collaborators by 10-27
8	10-24	Project #1 presentations		Project #2 proposals due
9	10-31	Project #2 check-in
-	11-07	No class - Fall reading week
9	11-14	Project #2 check-in	Lazer et al. (2014)
10	11-21	Project #2 check-in and quick presentations		Project #2 draft report
11	11-28	Project #2 presentations
12	12-5	Project #2 presentations		Project #2 report due

Reading References

Breiman, Leo. 2001. “Statistical Modeling: The Two Cultures (with Comments and a Rejoinder by the Author).” Statistical Science 16 (3). Institute of Mathematical Statistics: 199–231.

Donoho, David. 2015. “Years of Data Science, 2015.” http://courses.csail.mit.edu/18.337/2015/docs/50YearsDataScience.pdf.

Lazer, David, Ryan Kennedy, Gary King, and Alessandro Vespignani. 2014. “Google Flu Trends Still Appears Sick: An Evaluation of the 2013-2014 Flu Season.”

Leek, Jeffery T., and Roger D. Peng. 2015. “What Is the Question?” Science 347 (6228). American Association for the Advancement of Science: 1314–5. doi:10.1126/science.aaa6146.

Peng, Roger D, and Elizabeth Matsui. 2015. “The Art of Data Science.” A Guide for Anyone Who Works with Data. Skybrude Consulting. https://bookdown.org/rdpeng/artofdatascience/.

STA4002HF - Data Science, Collaboration, and Communication (Fall 2017)

Course Description

Evaluation

Tentative Course Schedule

Reading References