STA1007 Project

Due Thursday Dec 6th


First, locate a data set in your field. If you have a faculty advisor, he or she may be able to help. If this fails, I may be able to help. It is recommended but not required that you briefly discuss the project with me before you do a lot of work. Email is fine. If you want to come to my office hours in groups, it may save me having to say the same thing over and over to different people, so groups are welcomed.

Analyze your data using some of the statistical methods covered in this course. Unless there is a very good reason for using other software and we have explicitly agreed on this, please use SAS or R. Come to some conclusions. Write a cover section describing the data, where it came from and what you did with it. Make your conclusions explicit. This section not have to be polished; I am certainly not asking for a professional paper with an introduction, method, results, discussion and references. But it should be clear and readable by a non-specialist (me). Statistical terminology is okay. Length is is a minimum of one typed page and an absolute maximum of 5.

Attach the log and list files, showing all the analyses you mention in your cover section -- that is, the analyses upon which your main conclusions are based.

If you spend more than twice the time on this that you spend on the typical weekly assignment, it's too much. All I want is evidence that you've learned something you can use in your work.

Uploading data: This is a pain, but finally I know how to do it. Assuming your data are in an Excel spreadsheet, save it as comma-delimited text (CSV). Open the file in Word, and save as text with (DOS) line breaks. Word for the Mac calls it "Text Only with Line Breaks (MS-DOS)."

Then we'll transfer it to cquest. I will do it for you during office hours, unless I look into the sftp (Secure File Transfer Protocol) tool available on the PuTTY website, and it looks friendly enough. So ask me to do this for you, before I have time to play around with the PuTTY tool!

Once your data are in a file on cquest, you'll do something like

dos2unix < name1.txt > name2.data

to convert the Windows line breaks to unix line breaks. Ugh! But there is one nice thing to report. The delimiter=',' option on the infile statement will allow you to read your comma-delimited data directly without any more editing. I tried this and it works. My infile statement was

infile 'testdata.dat' delimiter=',';

Turning it in: The project is due Thursday Dec 6th. There are three ways to turn it in.

  1. Give me hard copy any time before it's due.
  2. Close to the end of term, I'll ask the people in the Statistics department office to set aside a cardboard box where you can drop it during academic business hours (10-4:30 M-F, maybe).
  3. You may email it to me as a single PDF file. You can save a Microsoft Word document as PDF, so this should not be a hardship. The log and list files can be pasted in after your answers. Please put the log and list files in the courier font.

If you leave hard copy with the department office, it has to be in by 5 pm on Thursday Dec 6th If you email me a PDF, it must be in by midnight on that day.