Projects


Projects are due in class on Thursday April 12h, the last day of class. Well, if not actually in class, then at least they must be in my hands before I go home on that day. What do I mean by "home?" Like, if you intercept me on my front porch ...     Just kidding (I hope). You know what I mean -- no late work. Please be reasonable.

Projects must be no more than 5 pages, exclusive of any appendices. It's fine if they are shorter. Listings of computer output must be in an appendix. Documents must be machine printed, and not handwritten. An earlier draft of this documant specified LaTeX. This is desirable but no longer required.

The following are some possibilities. Perhaps the best project is one that you design yourself; let's discuss. Email me if you are interested in one of these, and I will elaborate on it, in this document. The list will continue to grow as I think of more possibilities. I will provide data sets for as many as I can, especially if I think someone is starting to work on one. Again, these are just some possibilities.

  1. Sequential Analysis: When we did that first example of power for a test of difference between proportions, I remarked that in practice, since the outcome was infant mortality, no one would plan to analyze data using an elementary test like the one whose power we studied. Rather, one would employ some kind of sequential procedure in which the decision is Reject H0, accept H0 or collect more data. The assignment. Look up sequential hypothesis testing, and write a document that explains it briefly, quotes the necessary theorems, and computes a test case in which you test difference between proportions. That is, the data are Bernoulli with possibly two different values of theta. Let me know if you are interested in this and I will simulate the data for you. Tell me as soon as possible. If it's at the last minute you'll need to simulate it yourself.

     

  2. Logistic regression with more than two outcomes: A natural extension of logistic regression is to the case where the outcome (that is, the dependent variable) has more than two categories. Your task is to explain how SAS proc catmod does this. Give a brief explanation in clear English and present a computed example. I will help look for some data if you let me know early enough.

     

  3. Analyze the Cartoon data: In a study designed to measure the effectiveness of different learning materials, subjects listened to a 5 minute lecture on tape, accompanied by 18 slides. Half the slides showed a cartoon animal, and half showed a realistic picture. There were 9 animals; each was shown once as a cartoon and once in a realisit picture. All subjects saw all 18 slides, but a randomly selected half the participants saw them in colour, while the other half saw them in black and white.

    After they had seen the slides, the participants saw took a test (immediate test) on the material. The 18 slides were presented in a random order, and the participants wrote down the character type represented by that slide. They received two scores, one for the number of cartoon characters they correctly identified, and one for the number of reaistic characters they correctly identified. Each score could range from 0 to 9, since there were 9 characters.

    Four weeks later, participants were given another test (delayed test) on the material. Some participants did not show up for this second test, so their scores were given the missing value code, a period.

    All participants were given the OTIS Quick Scoring Mental Ability Test, which yielded a rough estimate of their natural ability. If this variale is used, it will be used as a covariate, so that all conclusions may be accompanied by the phrase "controlling for natural ability."

    The data file cartoon.dat has one row for each participant, and columns representing the following variables

    1. Identification number
    2. Colour (0=Black and white, 1=Colour). No participant saw both.
    3. Forget it.
    4. Forget it.
    5. OTIS intelligence test
    6. Cartoon 1: Score on cartoon part of immediate test (0 through 9)
    7. Real 1: Score on realistic part of immediate test (0 through 9)
    8. Cartoon 2: Score on cartoon part of delayed test (0 through 9)
    9. Real 2: Score on realistic part of delayed test (0 through 9)

    Think of this as a three-factor design, in which the factors are Colour (Yes-No), Realistic (Yes-No) and Time. You want to display the means, and test three main effects, three 2-factor interactions, and one 3-factor interaction. I can think of three reasonable approaches.

    Regardless of which approach you take, display some relevant descriptive statistics (but don't go crazy), give the p-values and state your conclusions. By conclusions, I mean something about conditions for effective learning. Put copies of your computer printouts in an appendix. If you use SAS, give your command file, your list file and your log file.