STA442/1008 Assignment 1

Quiz on Friday Jan. 18th.


This assignment is based on the Chapter 1 of the online text, and associated lecture material. Do it in preparation for Quiz 1; it is not to be handed in. Read Chapter 1 and think about the concepts. You could be asked for definitions. You might be asked to make up an original example of a study requiring a matched or independent groups t-test (what is the difference?), a chi-square test of independence, a one-way ANOVA or a correlation. You could be asked what's the Independent Variable, the Dependent Variable, and how to set up the data file. You could be given an example of one or more studies and asked to choose the appropriate test to answer the main research question. Here are some additional questions to think about.

Note: In these questions, the word original is important. If you give an example that is overly similar to one from lecture or the class notes, your answer will receive a zero. If two people give exactly the same example, they will both get a zero for the question.

  1. Invent and briefly describe (in a few sentences at most) studies with the following characteristics. Do not use any examples from lecture or the class notes. If the requested example is impossible, say so and explain why it is impossible.
    1. A categorical independent variable and a continuous dependent variable.
    2. A continuous independent variable and a continuous dependent variable.
    3. A nominal scale independent variable and an ordinal scale dependent variable.
    4. An independent variable that is both quantitative and nominal scale, and a dependent variable that is continuous.
  2. In a study relating IQ score to birth order, which is the independent variable and which is the dependent variable?
  3. Make up an original example of a study that is multivariate, and both dependent variables are categorical.
  4. Give an example of a variable for which it would be unreasonable to compute the standard deviation.
  5. If p>.05, the results are significant and we can draw conclusions. True or False?
  6. Is it possible for a variable to be both categorical and quantitative? Give an example.
  7. What is the difference between a statistic and a parameter?
  8. A useful non-parametric alternative to the two-sample t-test is the median test. Considering all the numerical values of the Dependent Variable in one batch, recode the dependent variable as 0 = at or below the median, vs 1 = above the median. Then do a chisquare test of independence on the resulting 2 by 2 table. Explain how this method could be extended to provide an alternative to a one-way ANOVA for comparing 5 treatments.
  9. A medical researcher conducts a study using twenty-seven litters of cancer-prone mice. Two members are randomly selected from each litter, and all mice are subjected to daily doses of cigarette smoke. For each pair of mice, one is randomly assigned to Drug A and one to Drug B. Time (in weeks) until the first clinical sign of cancer is recorded.
    1. What is the independent variable (or variables)?
    2. What is the dependent variable (or variables)?
    3. Indicate how the data file would be set up.
    4. How could the design be modified to allow comparison of 3 drugs and a placebo? Is this still a within-cases design?
    5. Presumably the mice are so cancer-prone that they all come down with the disease eventually. But this might not happen, especially if one of the drugs is very effective. Suggest one way of handling the data from a mouse that died of old age, and never showed signs of cancer. Now point out a problem with this way of treating the data.
    6. Suppose that a set of sample means are exactly identical for the various drug treatments. Is it possible for the population means to be different?
  10. What does it mean for two variables to be related in the population? Your answer must include the word "conditional," or it is wrong.
  11. Is it possible to have a study with a within-cases design and a categorical dependent variable? If it is possible, make up an original example. If it is impossible, explain why.
  12. Is it possible for a study to be both experimental and observational? Explain.
  13. It is well known that people who graduate from university have higher lifetime earnings on average than those who do not. Explain the correlation-causation problem here.
  14. Is it possible for Independent Variable and Dependent Variable to be related in the population and unrelated in the sample?
  15. Is it possible for Independent Variable and Dependent Variable to be related in the sample but unrelated in the population?
  16. Is it possible for Independent Variable and Dependent Variable to be related in the population and related in the sample, but not significantly related?
  17. Is it possible for Independent Variable and Dependent Variable to be related in the population and also significantly related in the sample, but in the wrong direction?
  18. Suppose that volunteer patients undergoing elective surgery at a large hospital are randomly assigned to one of three different pain killing drugs, and one week after surgery they rate the amount of pain they have experienced on a scale from zero (no pain) to 100 (extreme pain).
    1. Indicate how the data file would be set up.
    2. What is the independent variable?
    3. What is the dependent variable?
    4. What statistical test would you recommend?
    5. Is this an experimental study, or observational?
    6. Why is it important for the patients to be unaware of which drug they are receiving? Relate this to the idea of a confounding variable.
    7. Is it also important for the physicians to remain unaware of what drugs their patients are getting? Why or why not?
    8. Is it also important for the person administering the questionnaire to remain unaware of what drug each patient is getting? Why or why not?
    9. In this study, suppose the population means were exactly identical for the various drug treatments. Would it be possible for Independent Variable and Dependent Variable to still be related in the population? Explain.
    10. What "population" are we talking about here, anyway?
  19. In a study relating physical attractiveness to academic performance, six judges rated attractiveness on a 10-point scale, from photos of 100 randomly chosen first-year students. The data file contained 10 variables: Six attractiveness ratings, sex of student, number of credits completed by the end of first year, cumulative Grade Point Average (GPA) at the end of first year, and a binary variable indicating whether the student was still enrolled at the end of first year. An eleventh variable, mean attractiveness rating, was calculated from the 6 ratings, and was taken to be the definition of "attractiveness."
    1. Is this an experimental study, or observational?
    2. Would it make sense to compute the average correlation of the attractiveness ratings with one another? What would it tell us, if anything? How many numbers would we be averaging?
    3. Which variables are independent variables, and which are dependent?
    4. What statistical test would you recommend for assessing the relationship between attractiveness and GPA?
    5. Give an example of an unmeasured variable that is a potential confounding variable in this study. Explain how it might produce an apparent relationship between attractiveness and GPA even if no real relationship existed.
    6. It is suggested that order of presentation be counterbalanced, so that each student has approximately the same mean order of presentation when averaging over judges. Why is this a good idea?
  20. Answer the following questions T for true or F for false; write the letters on the lines. Assume the significance level is alpha = .05 in all cases. If there are true-false questions on the quiz, you will have to get them all correct, or at most one wrong, to get any marks for them.
    1. ____ In an experimental study, a statistically significant relationship between the independent variable and the dependent variable can provide some evidence of a causal relationship.
    2. ____ In simple regression, a positive regression coefficient b1 implies that high values of X tend to go with low values of Y and low values of X tend to go with high values of Y.
    3. ____ We observe r = -0.70, p = .009. We conclude that X and Y are unrelated.
    4. ____ The power of a test is the probability of rejecting the null hypothesis when the null hypothesis is true.
    5. ____ When the null hypothesisis true, p-values from the standard tests are uniformly distributed on the interval from zero to one.
    6. The power of a test is the probability of rejecting the null hypothesis when the null hypothesis is false.
    7. ____ An observational study is one in which cases are randomly assigned to the different values of an independent variable.
    8. ____ The p-value is the probability of failing to replicate significant results in a second independent random sample of the same size.
    9. ____ Experimental studies are based on random sampling from a well-defined population.
    10. ____ If p < .05 we say the results are statistically significant at the .05 level.
    11. ____ It is possible for an independent variable to be categorical.
    12. ____ We seek to predict the dependent variable from the independent variable.
    13. ____ We observe r = 0.50, p = .002. This means that 50% of the variation in the dependent variable is explained by a linear relationship with the independent variable.
    14. The p-value is the probability that the null hypothesis is true.
    15. The p-value is the probability that the null hypothesis is false.
    16. ____ When a relationship between the independent variable and the dependent variable is statistically significant, we conclude there is no evidence that the two variables are actually related.
    17. ____ The greater the p-value, the stronger the evidence that the independent and dependent variable are related.
    18. ____ Type II Error is the probability of getting non-significant results.
    19. ____ We would like to predict type of automobile owned (North American vs. other) from income, education and sex of owner. The appropriate statistical test is a matched t-test.
  21. In the table below, write the name of the most appropriate (least inappropriate?) statistical method in each cell. Use the tests in the following list. More than one test may be appropriate; in this case, write the names of both tests. I have filled in two of the cells, because the methods are not elementary.

     
     

    Dependent Variable

     Independent Variable

      Categorical: Two Categories

     Categorical: More than 2 Categories

     Quantitative

     Categorical: Two Categories

         

     Categorical: More than 2 Categories

         

     Quantitative

     Logistic Regression

     Extension of Logistic Regression