Assignment 2

STA2201s01 Assignment Two

Quiz on Thursday January 25. Bring printouts to the quiz. You may be asked to answer questions based on your printouts. You may be asked to hand in one or more of the printouts. The non-computer parts are not to be handed in. Do them in preparation for the quiz.

The effectiveness of a new educational program is to be evaluated as follows. Students are randomly assigned to either the new program or to a control condition (the traditional teaching method). After 6 months, all students are given a standardized educational achievement test. The educator conducting the study is confident that the test scores will be normally distributed, but she insists that the treatment could affect either the the mean or the variance of the scores, or both. Write down the large-sample likelihood ratio statistic for this problem, and simplify it. Show your work, and circle the expression you would compute for a real data set.
A copy of the data for the example above is available in the file meanvar.dat. Use some variant of the S language (such as R or Splus) to carry out the large-sample likelihood ratio test; your program should print the sample means and variances, the value of the test statistic G, the degrees of freedom and the p-value. For this problem, please provide printed output and also a printout of the code that produced it.
Suppose you have a data set, and before looking at the data you can think of k significance tests you want to do. What is the problem with just doing all of them and discussing any results that are significant at some significance level such as 0.05?
Derive the Bonferroni correction that will allow you to deal with the problem raised in the preceding question. Express it as a rule that a student in STA220 could understand and follow.
Using S and the same data set meanvar.dat, carry out a Bonferroni-corrected test for differences in mean and variance. Your program should print out a t-statistic, an F-Statistic, their uncorrected p-values, and also a single Bonferroni corrected p-value. You may need to think a bit about this last one, but just apply the definition of a p-value and you will get the same answer I did (assuming you split alpha equally between the two tests). For this problem, please provide printed output and also a printout of the code that produced it.
Under what circumstances might you want to split alpha unequally? Try to answer the question in general, not just for this mean-variance example.
Suppose we have a data file in which the following information is recorded about a sample of Canadian citizens: Education (in years), Place of birth (1=Canada, 2=Asia, 3=Africa, 4=Europe, 5=Other), and whether they have a job (1=Yes, 0=No).
1. What is the dependent variable?
2. How would you set up dummy variables to represent place of birth? Give a full, formal statement of the statistical model, including these dummy variables and terms for the interaction of education and place of birth.
3. Once we control for education, does place of birth help predict whether or not a person has a job? Give the full and reduced models and state the null hypothesis in symbols (there no interaction terms in this one).
4. Does the relationship of education to having a job differ according to place of birth? Give the full and reduced models and state the null hypothesis in symbols.
5. Suppose an interaction is present. Among those with 12 years of education, the odds of having a job are only ___ times as great for those born in Europe as for those born in Canada. Your answer is in terms of ß values.
6. Suppose there is zero interaction. Staying in school 4 years longer multiplies the odds of having a job by ____. Your answer is in terms of ß values.
In the first lecture, we considered the following problem: "The infant mortality rate for babies with birth weight under 1 kg. is thought to be approximately 30%. A new treatment is expected to cut this in half. If a randomized clinical trial with treatment and control groups is run to confirm the effectiveness of the treatment, what sample size is required?"
1. Show that the first random variable Z we considered (the one including an unknown theta) has a limiting standard normal distribution under H₀. My intention is that you will use limiting moment-generating functions here, but if you want to use more powerful tools it is okay. However, please give a real proof; no hand waving.
2. In class, we got sample sizes of n₁=n₂=95 for a one-sided test. What sample size is required to get the same power for a two-sided test? Be able to show a few lines of handwritten work, and also do the computation and get a numerical answer. You don't need to do the simulation part. Feel free to use code from the handout distributed in class. Bring printouts to the quiz.
3. Would the large-sample likelihood ratio test imply different sample sizes for the one-sided and the two-sided alternatives considered in class? Explain if you can.
4. Give the power of the large-sample likelihood ratio test for the two sets of sample sizes (n₁=n₂=95 and the other one). Use simulation. Bring printouts to the quiz.