A. Is data analysis a linear process where the statistician/data scientist forumulates a series of models to answer a question? Briefly explain.
B. What type of questions are the following:
Is air pollution related to life expectancy in Ontario?
What proportion of males in the data have asthma?
Polling data is used to evaluate how Torontonians will vote in the October, 2018 Toronto mayoral election.
Data Analysis as Art
Data analysis is a branch of science not mathematics. (Tukey, 1962)
“Data analysis uses mathematical arguments … and results as bases for judgement rather than as bases for proof or stamps of validity”. (Tukey, 1962)
Data analysts use many statistical methods such as regression, classification trees, neural nets, etc. but the data analyst must assemble all of the tools and apply them to data to answer a relevant question.
This is the art of data analysis.
Epicycles of Data Analysis
Data analysis is not a linear process.
Data analysis is iterative - “… information is learned at each step, which then informs whether (and how) to refine, and redo, the step that was just performed, or whether (and how) to proceed to the next step.” (Peng and Matsui)
Many algorithms embedded in software are final data analysis products.