STA2201s01 Assignment Two
Quiz on Thursday January 25. Bring printouts
to the quiz. You may be asked to answer questions based on your printouts. You
may be asked to hand in one or more of the printouts. The non-computer
parts are not to be handed in. Do them in preparation for the quiz.
- The effectiveness of a new educational program is to be evaluated as
follows. Students are randomly assigned to either the new program or to a
control condition (the traditional teaching method). After 6 months, all
students are given a standardized educational achievement test. The educator
conducting the study is confident that the test scores will be normally
distributed, but she insists that the treatment could affect either the the
mean or the variance of the scores, or both. Write down the large-sample
likelihood ratio statistic for this problem, and simplify it. Show your work,
and circle the expression you would compute for a real data set.
- A copy of the data for the example above is available in the file meanvar.dat. Use some variant of the S language
(such as R or Splus) to carry out the large-sample likelihood ratio test; your
program should print the sample means and variances, the value of the test
statistic G, the degrees of freedom and the p-value. For this problem, please
provide printed output and also a printout of the code that produced it.
- Suppose you have a data set, and before looking at the data you can
think of k significance tests you want to do. What is the problem with just
doing all of them and discussing any results that are significant at some
significance level such as 0.05?
- Derive the Bonferroni correction that will allow you to deal with the
problem raised in the preceding question. Express it as a rule that a student
in STA220 could understand and follow.
- Using S and the same data set meanvar.dat, carry out a
Bonferroni-corrected test for differences in mean and variance. Your program
should print out a t-statistic, an F-Statistic, their uncorrected p-values,
and also a single Bonferroni corrected p-value. You may need to think a bit
about this last one, but just apply the definition of a p-value and you will
get the same answer I did (assuming you split alpha equally between the two
tests). For this problem, please provide printed output and also a printout of
the code that produced it.
- Under what circumstances might you want to split alpha unequally? Try
to answer the question in general, not just for this mean-variance example.
- Suppose we have a data file in which the following information is
recorded about a sample of Canadian citizens: Education (in years), Place of
birth (1=Canada, 2=Asia, 3=Africa, 4=Europe, 5=Other), and whether they have a
job (1=Yes, 0=No).
- What is the dependent variable?
- How would you set up dummy variables to represent place of
birth? Give a full, formal statement of the statistical model, including these
dummy variables and terms for the interaction of education and place of birth.
- Once we control for education, does place of birth help
predict whether or not a person has a job? Give the full and reduced models
and state the null hypothesis in symbols (there no interaction terms in this
one).
- Does the relationship of education to having a job differ
according to place of birth? Give the full and reduced models and state the
null hypothesis in symbols.
- Suppose an interaction is present. Among those with 12 years
of education, the odds of having a job are only ___ times as great for those
born in Europe as for those born in Canada. Your answer is in terms of ß
values.
- Suppose there is zero interaction. Staying in school 4 years
longer multiplies the odds of having a job by ____. Your answer is in terms of
ß values.
- In the first lecture, we considered the following problem: "The
infant mortality rate for babies with birth weight under 1 kg. is thought
to be approximately 30%. A new treatment is expected to cut this in half. If
a randomized clinical trial with treatment and control groups is run to
confirm the effectiveness of the treatment, what sample size is required?"
- Show that the first random variable Z we considered (the one
including an unknown theta) has a limiting standard normal distribution under
H0. My intention is that you will use limiting moment-generating
functions here, but if you want to use more powerful tools it is okay. However,
please give a real proof; no hand waving.
- In class, we got sample sizes of
n1=n2=95 for a one-sided test. What sample size is
required to get the same power for a two-sided test? Be able to show a few
lines of handwritten work, and also do the computation and get a numerical
answer. You don't need to do the simulation part. Feel free to use code from
the handout distributed in
class. Bring printouts to the quiz.
- Would the large-sample likelihood ratio test imply different
sample sizes for the one-sided and the two-sided alternatives considered in
class? Explain if you can.
- Give the power of the large-sample likelihood ratio test for
the two sets of sample sizes (n1=n2=95 and the other
one). Use simulation. Bring printouts to the quiz.