STA2101 Project
Overview
Graduate students in STA2101 are required to do a short project in addition to taking the midterm test and final examination. The report on the project is expected to be about 5 typeset (or typed) pages in length, plus appendices showing how you did the work. A pdf of the project is due by midnight before midnight on the day of the final exam. Yes, this means you can wait until after you take the final exam to start the project, but it's not really recommended. Please do not email the project to me before the day of the final exam.
Here are some possibilities.
- Learn SAS and use it to analyze a data set. Full details are given below.
SAS (the Statistical Analysis System) is a strong old statistical software package. It is losing ground to R, but it is still used widely in the biomedical research sector and banks. It is a valuable job credential. You will need to learn it on your own, any way you can. I have some old class materials that may be helpful.
- Do the SAS assignment, but using Python. If you want to use a software option other from SAS and Python, please consult with me first. R is not acceptable.
- Learn about a new statistical method that interests you. Your report will have 3 sections. This project may be a bit longer than the others (more pages), but appendices are not necessary.
- Description of the method in your own words. Include a clear statement of the model, with typeset formulas. Write for an audience (me) who knows statistics, but is unfamiliar with the topic.
- Simulate a data set with R, based on the model in Part 1. Briefly describe the data set in words, and give the R code. Also, please display the first few lines of the data file. The unknown parameters in the model will have numerical values. At the end of this section, list the parameters along with their numerical values.
- Using any software of your choice (R is okay), estimate the model parameters. Give a table showing the true parameter values, and the estimates (numbers). If testing and confidence intervals are appropriate, carry out a few tests and produce at least one confidence interval.
- Design another project of your choice. Please discuss with me first. I have already had one such discussion that lasted less than 10 seconds. The student said "Text mining with SAS" and I said "Okay, that's great!"
This option is to learn SAS and use it to analyze some data. You may supply your own data set if you wish, but it must be rich and interesting. The default data set is from a longitudinal clinical trial of an interactive, multimedia program known as "Beat the Blues" designed to deliver cognitive behavioural therapy to depressed patients via a computer terminal. Patients with depression recruited in primary care were randomised to either the Beating the Blues program, or to "Treatment as Usual" (TAU). (This isn't my writing; I seem to have lifted it from somewhere and I don't even believe all of it.)
The variables are
- id: Patient identification code
- drug: Did the patient take anti-depressant drugs (No or Yes).
- length: The length of the current episode of depression, a factor with values <6m (less than six months) and >6m (more than six months).
- treatment: Treatment group, a factor with levels TAU (treatment as usual) and BtheB (Beat the Blues)
- bdi_pre: Beck Depression Inventory score before treatment.
- bdi_2m: Beck Depression Inventory score after two months
- bdi_4m: Beck Depression Inventory score after four months
- bdi_6m: Beck Depression Inventory score after six months
- bdi_8m: Beck Depression Inventory score after eight months
The data are available in the file
BeatTheBlues.data.txt.
Your task is to analyze the data and write a brief report. Here are some guidelines. I may add to this list based on comments and questions.
- The report will be typed or typeset. A very carefully written report might be as short as one page. I hope you can hold it to five pages or less.
- There will be two appendices. Appendix A will be a listing of your SAS program or programs. It should start with a paragraph or bulleted list describing what's there. Appendix B is your results file or files.
- On the day of the final exam and not before, please email me a copy of the report, including both appendices, in a single pdf file.
-
- The main question is whether the treatment program worked. Focus on that.
- Mention the statistical methods you used to come to your conclusion and any issues you had to deal with.
- There is no need for the famous "plain, non-statistical language." You are writing this report/memo to another statistician (me). At the same time, this is not an invitation to show how many technical terms you know. Try to be clear and direct.
- This is not a group project, and there are so many reasonable ways to look at the data that it would be astonishing (and suspicious) if any two people did exactly the same thing. You can discuss it but just don't copy, don't let anyone else copy, and don't "help" anyone who has not started yet. If I see evidence of plagiarism in the SAS code or in the written report, I will bring it to the Director of Graduate Studies, who will be pissed off at me (and you) for wasting his time.
- I would start with basic (or not so basic) descriptive statistics and a bit of exploration. Get familiar with the data file and see what's in there. You will not put much detail about this in the report, but you need to do it in order to do a good job. It might be good to do this in a separate SAS program.
- You will probably think of several promising ways to analyze the data. There is no harm in trying more than one thing.
- Don't hesitate to learn on your own.
- Questions and discussion are welcome.