STA442/1008 Assignment 4

Do for Test One: Friday Jan. 30th


  1. Short Answers
    1. A correlation of -0.7 means that each variable explains what percentage of the variation in the other variable?
    2. By hand, make a scatterplot where the correlation is zero but the points all lie exactly along a straight line. Yes, it's possible.
    3. Is it possible to have a curvilinear relationship and a zero correlation? Answer yes or no. If the answer is yes, draw an example scatterplot by hand.
    4. Is it possible to have a curvilinear relationship and a substantial positive correlation? Answer yes or no. If the answer is yes, draw an example scatterplot by hand.
    5. A market research firm is interested in testing two versions of a television commercial. A very large sample of consumers is randomly divided into two groups, by tossing a fair coin. If the coin comes up Heads, the person sees commercial version One. If the coin comes up Tails, the person sees commercial version Two. Each consumer views the commercial alone, in a separate room, and then is given an opportunity to purchase the product. The dependent variable is binary -- purchase versus non-purchase. Is there a problem here with potential confounding variables? Explain.
    6. A market research firm is interested in testing the effect of an advertising campaign (consisting of radio and TV ads, billboards, coupons, etc.) for Jolt Cola. A very large random sample of consumers is interviewed before the campaign begins, and asked how much Jolt Cola they have purchased during the past seven days. Then the campaign runs for a month, and the same consumers are interviewed again. Once again, they are asked how much Jolt Cola they have purchased during the past seven days. Is there a problem here with potential confounding variables? Explain.
    7. Answer the following questions T for true or F for false. Assume the significance level is alpha = .05 in all cases. You must get at least 9 out of 10 right in order to get any credit at all on this question. No marks will be deducted if you get one wrong. This is supposed to be a easy. No tricks!
      1. In an experimental study, a statistically significant relationship between the independent variable and the dependent variable can provide some evidence of a causal relationship.
      2. In simple regression, a positive regression coefficient b1 implies that high values of X tend to go with low values of Y and low values of X tend to go with high values of Y.
      3. We observe r = -0.70, p = .009. We conclude that X and Y are unrelated.
      4. We would like to predict fuel efficiency from type of automobile owned (North American vs. other). It makes sense to use a matched (paired) t-test.
      5. An observational study is one in which cases are randomly assigned to the different values of an independent variable.
      6. We seek to predict the dependent variable from the independent variable.
      7. In a study attempting to predict income from education and race, we observe substantial race differences in highest grade completed. This means that income cannot be correlated with education.
      8. We observe r = 0.50, p = .002. This means that 50% of the variation in the dependent variable is explained by a linear relationship with the independent variable.
      9. When a relationship between the independent variable and the dependent variable is statistically significant, we conclude there is no evidence that the two variables are actually related.
      10. If p < .05, we say the results are statistically significant at the .05 level.
  2. Do the job described here, and bring your log file and your list file to Test 1. Starting with your command file for the TV data used in Assignment 3, create a new variable called location. Location = 1 if the household is in a rural district, Location = 2 if the household is in a small-town district, and Location = 3 if the household is in an urban (city) district. Use proc format to set up printing formats for location. Write a SAS program to answer do the following; as you will see, in all cases the dependent variable is Total hours of TV watched last week. Please do not, repeat not put the data step in a separate file and use %include; it does not show up in the log file.
    1. What is the simple Pearson correlation r between number of people in a household and total number of TV hours watched by the people in that household?
    2. Which one is the independent variable?
    3. What proportion of the variation in total number of TV hours watched is explained by number of people in the household? The answer is a number.
    4. Obtain n, mean and standard deviation and number of TV hours watched for households in each location. Do a one-way ANOVA to test whether average TV hours watched differs significantly as a function of urban vs. rural vs. small town location. If the results are significant, follow up with Scheffé tests. In plain language that could be understood by someone with no statistical training, what do you conclude?
    5. What proportion of the variation in number of TV hours watched is explained by urban vs. rural vs. small town location?
    6. Explain why number of people in the household is a potential confounding variable that should be taken into account when one examines the relationship between location and number of TV hours watched.

Again, bring your log file and your list file to the test. You may need to hand one or both of them in.