STA2453F08: Statistical Consulting
www.utstat.toronto.edu/~brunner/2453f08
Last Class Meeting: Tuesday April 7th at 10 a.m. in Ramsey Wright 141
Logistic Regression Examples:
proc logistic order=internal descending; /* Always use descending for 0-1 DV */ title2 'Logistic regression on perjury vote'; model perjury = ritewing cpercent next0 next2 firsterm; nextelec: test next0=next2=0; others: test cpercent=next0=next2=firsterm=0; /* Ctrl for ritewing */ allvars: test ritewing=cpercent=next0=next2=firsterm=0; /* Just for comparison with Testing Global Null Hypothesis: BETA=0 */ proc logistic order=internal descending; /* Always use descending for 0-1 DV */ title2 'SAS will make your dummy variables'; class nextelec; model perjury = ritewing cpercent nextelec firsterm;
SAS code for the Principal Componenets analysis of the Walker Music data is now in the directory containing the homework data sets. Look Here.
When we get together this term, it will be Tuesday between 10:10 and noon in Ramsey Wright 141.
Phone | Office | ||
Jerry Brunner (Professor) | brunner@utstat.utoronto.ca | 416-978-7589 | SS6026E |
Laurel Duquette (Consulting Service Director) | consult@utstat.utoronto.ca | 416-978-4455 | SS 3112 |
For security reasons, you need to connect using software that probably did not come with your computer. The protocol is SSH, which stands for "Secure SHell." When you use SSH, information travels over the Internet in encrypted form, so hackers have trouble intercepting your password and other information. You can download a free copy of SSH below.
With an Internet connection, SSH applications give you a text-only connection to utstat and other unix machines from your home computer. From utstat's prompt, you can run programs such as SAS, R and emacs.
Different SSH programs are recommended, depending on the operating system that you are using. To use these programs, you must be connected to the Internet, say with a broadband connnnection or via PPP over your phone line.
To connect, note that for utstat's IP address (or Host Name), you can put utstat.toronto.edu.
In any of these SSH programs, the first time you connect to a host, you will be told that the program can't verify that this host is really what it appears to be. Do you want to trust it? SSH is just being sanely paraniod. Say yes.
Most clients seem to record and keep their data in Miscrosoft Excel spreadsheets. But on unix machines, SAS likes plain text data files. Transferring the data can be a pain, because even if you save the spreadsheet as plain text, SAS will choke on the tab characters, and also the conventions for line breaks differ in Windows and unix/linux. To overcome this minor technical nightmare, proceed as follows.
This process is not pleasant, but there is one nice thing to report. The delimiter=',' option on the SAS infile statement will allow you to read your comma-delimited data directly without any more editing. I tried this and it works. My infile statement was
infile 'name2.data' delimiter=',';
For smaller data sets, it also seems reasonable that you could open the .csv file in Word, and then just copy-paste the whole thing into a PuTTY window where emacs is running. Would you still have to convert the line breaks in this case? I imagine so, but I haven't tried it.
Warning: It is very natural to leave missing data cells empty in an Excel spreadsheet, but if you do this and then export the data as described here, the data file will contain two consecutive commas, which SAS will treat as a single comma; the results are usually disasterous. SAS is being sensible in a way. This is just how it treats spaces. Two spaces the are same as one space unless it is reading the data using a fixed format.
So, if you are reading a raw data file consisting of comma delimited text, it is important to make sure you never have two consecutive commas. The best way to avoid this is, if missing data are to be blank in the spreadsheet, make sure the cell contains an actual blank space (press the space bar), and is not completely empty. SAS treats blank space between two commas as a missing value. One space or several -- it does not matter. The result is still a single missing value.
When you start editing files with emacs, you will notice that additional files ending with a tilde (~) keep turning up in your directory. These are backup files, automatically created by emacs for your protection. I suppose they might be useful sometimes, but I find them annoying. If you get tired of deleting them, use emacs to create a file called .emacs in your home directory. This is an initialization file used to set options for emacs. Beginning it with a period makes it invisible to the ls command (but try ls –a). However, it's still there if you create it, and you can edit it like any other file. In the .emacs file, put a single line saying (setq make−backup–files nil). Don't forget the parentheses! Exit, saving the file. Next time you run emacs, no backup file will be created. Needless to say, your .emacs file can be very long and do a lot if you wish.