STA 441: Data Analysis
University of Toronto Mississauga, Spring 2024
http://utstat.toronto.edu/brunner/441s24
Lecture: Lecture Tuesday 4:10 - 6:00 p.m. in KN 137
and Thursday 4:10 - 5:00 p.m. in IB 335
- Instructor: Jerry Brunner
- Office: 3028 Deerfield Hall
- Phone: 905-828-3816
- email: jerry.brunner[at]utoronto.ca
- Office Hours: Tuesday 2:10-3 and Thursday 1:10-3
Note: Jerry does not read his email every day. It is much more efficient to
talk with him before or after class, or during office hours.
Tutorial: Friday 5:10 - 6:00 p.m. in DV2072 -- except that on February 2nd, the tutorial will be in DH2060.
- Instructor: Marija Pajdakovska
- email: marija.pajdakovska[at]mail.utoronto.ca
- Online Office Hours: Thursday 9-10 a. m. via Zoom: The Meeting ID is 856 8758 5768. The web link is
https://utoronto.zoom.us/j/85687585768.
Text (draft): Data analysis with SAS: An open textbook by Jerry Brunner. It is a free download. Chapters will be posted one at a time on the course home page.
Learning Objectives: The discipline of Statistics is based on probability models for noisy numerical data. The primary objective of STA441 is for students to learn to navigate the interface between the tidy world of the formal model, and the messy world of real data. Their knowledge will be demonstrated by the ability to
- Flexibly choose appropriate statistical methods to answer a variety of subject-matter questions.
- Carry out the analyses with software.
- Interpret the results in language than can be understood by a scientist or journalist who has no statistical training.
- Design studies and criticize the design of existing studies.
Secondary goals are for students to (a) Become familiar (or reinforce their familiarity) with a variety of relatively advanced statistical methods at an applied level, and (b) Become moderately proficient in the SAS programming language.
Topics:Vocabulary and concepts of data analysis; Review of statistical inference; Introduction to SAS; Basic descriptive statistics; One and two-sample t-tests, one-way ANOVA, simple regression and correlation, cross-tabulation and chi-squared tests of independence; Tests of contrasts in one-way and higher way designs; Multiple comparisons, including Bonferroni corrections, Scheffé tests and Tukey tests; Univariate multiple regression, including regression with dummy variables and interactions; Logistic regression, extended to multinomial logit models; Multivariate regression and analysis of variance; Random effects and mixed models for normal data; Various models for repeated measures (within cases) data assuming normality; Mixed models for binary data; Principal components analysis; Cluster analysis. If time permits, the meaning of life.
Prerequisites: STA302 or equivalent. Students who lack the prerequisite can be removed at any time unless they have received an explicit waiver from the department.
Grading:
- 40%: Regular quizzes, given in tutorial. Quiz dates are Jan. 19, 26, Feb. 2, 9, 16, March 1, 8, 15, 22 April 5 -- for a total of 10 quizzes. The lowest quiz mark will be dropped.
- 10%: Eleven pop quizzes given in lecture. Dates are chosen randomly, with Tuesdays twice as likely as Thursdays, and no more than one pop quiz per class meeting. The lowest pop quiz mark will be dropped.
- 50%: Final exam. The exam is comprehensive.
Please note that your quiz papers and any printouts you turn in with the quiz may be scanned or photocopied before they are returned.
There will be an assignment for each quiz. The knowledge you need to do each quiz is a subset of the knowledge you need to do the corresponding assignment. Most or all of the assignments will include a computer part. You will bring printouts to the quiz and answer questions based on the printouts. Possibly, one of the quiz questions will be to hand in your printouts. The non-computer parts of the assignments are just to prepare you for the quizzes; they will never be handed in.
Policy for missed work: If you miss a quiz, the mark is zero. However, your lowest quiz mark will be dropped. If you miss one or more regular quizes or pop quizzes with a valid excuse, you will have the option of taking a comprehensive make-up test covering all the material in the course, at the end of term. The lowest mark, possibly the make-up, will still be dropped.
What is a valid excuse? Medical issues and family emergencies are valid. Vacations are not. Automotive breakdown or other transportation problems are never valid excuses. If you miss term work because you are taking another class at the same time as this one, that is not a valid excuse. The printer jammed, my dog ate it, etc. fall into the same category. If the University is officially open, weather is a valid excuse only if more than 50% of the class miss the quiz.
Academic Honesty: It is an academic offence to present someone else's work as your own, or to allow your work to be copied for this purpose. To repeat: the person who allows her/his work to be copied is equally guilty, and subject to disciplinary action by the university.
The main rule is don't copy, and don't let anyone else copy from you. You are expected to do the work yourself, and then perhaps compare answers after you have done so. A good rule is to never help someone who hasn't started yet. Here are some detailed guidelines.
- For the non-computer parts of the homework, I believe that students
learn better if they do the work independently, and then compare answers
afterward. I wish I could enforce this, but in practice I cannot. Please
be aware, though, that some of the questions ask for original
examples. In such cases, if two students give the same example they
will both get zero for the question. This means that if you allow
a classmate to photograph and memorize your answers, you could get a zero.
- For some quizzes, you will be asked to bring your printouts to the quiz in
tutorial; maybe you will hand them in, and maybe you will use them
to answer questions. Never, ever, bring a copy of somebody
else's printout, or allow anyone to have a copy of yours. Your
"friends" may ask you. You are expected to refuse.
- If you allow anyone to have an electronic
copy of your computer work, for any reason, you are
not only guilty of an academic offence, you have lost
your mind.
- If you are asked to hand in your log file and results file, your name and student number should be on both printouts. You are allowed to write your name and student number on the printouts in advance, but do not write anything else on your printouts in advance.
- This should be obvious, but if you are asked for a number from your printout and you don't have a printout, do not answer the question and pretend you remembered the number. If you do, you will be charged with an academic offence.
- This also should be obvious, but you are not allowed to put answers or any other material related to the non-computer questions in comment statements, or otherwise cause such material to appear on your printout.
- You are also not allowed to put interpretation or explanation of results on your printout. You are allowed to put question numbers, or the question the analysis is trying to answer, or both if you wish. The rule is that you may not put anything on the printout that you could not have typed before seeing the results. An exception is numbers from the results file that are used as input to proc iml.
- For the computer parts of the homework, it is surprisingly easy to detect
copying, and copying from other students is not allowed. If two
students have computer work that is
excessively similar, but not
similar to what was presented in lecture or office
hours, that is evidence of cheating. Of course it's
easier to detect if the work is also wrong.
- It is permitted to copy from Jerry or Marija. If your work is
very similar to what is presented in lecture, office
hours or suggested readings, that is okay.
- Direct copying of computer code from the
internet (other than from the class website of STA441H5 2024) is prohibited. You
are expected to do the work yourself.
- On the other hand, you are allowed (encouraged) to consult online documentation and examples to learn how to do things. How can you tell the difference between this and direct copying? Suppose you were asked why you did something, and your answer is "I don't know. I saw it online." Then you are copying blindly, which is an academic offence. If your answer is something like "That's how you get a confidence interval; I saw it online," then you're fine.
- It is acceptable to get help with your computer assignments from someone outside the class, but the help must be limited to general discussion and examples that are not the same as the assignment. As soon as you get an outside person to actually start working on one of your computer assignments, you have committed an academic offence.
- Be particularly careful about paying outside "tutors" to do the computer assignments for you. If they solve the problems (or otherwise obtain solutions) for money and give you the answers with or without explanation, you are guilty of an academic offence. If they are students, they are also guilty, and face possible expulsion from the university for a first offence.
- Because we will be using
SAS in this course, it's possible to be very specific about what you
are allowed to do and are not allowed to do.
- In SAS (unlike R) the program and statistical output are in separate files. It is absolutely forbidden to look at anyone else's SAS program file, or to allow anyone to look at yours. It is very tempting for friends to "work together" on an assignment side by side, typing in basically the same material, running their jobs at the same time and making the same corrections. (They call it working together, but one is usually a parasite.) This is an academic offence. Don't do it. The work does not need to be completely identical for you to be charged with an academic offence. All we need is convincing evidence that one person has been influenced by the other person's code. It is easier to detect than you think.
- You are allowed to compare numerical answers and discuss their meaning. This means it is okay to show other students your SAS results file if they also show you theirs. However, it is very dangerous to let anyone have a copy of your output file, especially an electronic copy. It they turn it in, or transmit it to someone else who turns it in, you are guilty of an academic offence because you have provided an unauthorized aid to another student.
- Beware of this specific situation. You have finished the computer assignment, and your friend has not started yet. He or she asks for a copy of your output file (not the program file), to compare answers later, some time in the middle of the night. You agree, but your friend never does finish the assignment. Instead, your friend brings his or her messed up log file and your output file. They are handed in with the quiz, the fraud is detected, and you are both convicted of an academic offence.
- Log files are a dangerous grey area, because the log file includes a copy of the SAS program as well as any error and warning messages. It is permitted for a student who has completely finished the assignment to look at another student's log file to help with SAS syntax problems, but only if he or she is not influenced. If the student who is "helping" copies code from the other person's log file, both parties are guilty of an academic offence. It is safest to get help with SAS syntax errors by showing the log file to Jerry or Marija. In any case, never show a log file without errors or warnings to another student.
- Don't copy, and don't let anyone copy from you. If we catch you, you will get in big trouble. And even if we do not catch you, after you die you will be reincarnated as a tadpole in a polluted stream.
If this is not clear enough, the latest version of the student handout "How not to
Plagiarize" is available at
http://www.writing.utoronto.ca/advice/using-sources/how-not-to-plagiarize
The Academic Regulations of the University are outlined in the Code of
Behaviour on Academic matters, which can be found in the Arts and Science
Calendar or on the web at
http://www.governingcouncil.utoronto.ca/policies/behaveac.htm.
Generative AI: In this class, the use of artificial intelligence tools like chatGPT is not particularly recommended, but it is not forbidden either. Specifically, if a homework problem requires you to do something in SAS and you are asked to bring hard copy of your input and output to the quiz, it is technically okay if part or all of the SAS code was generated by AI. My impression at this point is that a SAS program written by chatGPT will almost never run until you fix it up, and that fixing it up may require more knowledge and work than it would take to do the job yourself. Also, if you are having trouble with AI-generated code, then you are on your own. Marija and I will make no effort to understand it.
AI can do a surprisingly good job on some of the non-computer homework problems. It can also produce answers that have serious flaws or are off topic. Again, you are responsible for what you write. Some time during the term, I expect to hear "But this is what chatGPT said!" That's never a valid argument. Your job is to understand.
Finally, it's still an academic offence to present someone else's work as your own, or to allow your work to be copied by another student. So, if a classmate does a computer assignment using AI and then gives you the result, you are both guilty of an academic offence.
Accessibility Needs: We are committed to accessibility. If you require accommodations for a disability, or have any accessibility concerns
about the course, the classroom or course materials, please contact Jerry or
Accessibility Services (visit http://www.utm.utoronto.ca/accessability or email
accessconfirm.utm@utoronto.ca) as soon as possible.
Last Date to drop course from Academic Record and GPA is March 11, 2024.