STA2201s01 Assignment One

To be handed in by 10:15 am, Thursday September 18


The file cars.dat has three variables, country of origin (1=USA, 2=Japan, 3=Other), fuel efficiency in miles per gallon, and weight in pounds. Your client gives you these data, and asks three questions.

  1. Is there a straight-line relationship between weight of car and fuel efficiency?
  2. Once you control for weight of car (possibly including a quadratic term), is fuel efficiency related to country of origin?
  3. Does the relationship between weight and fuel efficiency depend on country of origin?

The purpose of this assignment is mostly to ensure that you know the basics of multiple regression with normal error terms, and that you know how to do it with the S language -- not to put your data analytic skill to a serious test. Therefore, I will tell you what to do. First, it should be obvious that the dependent variable is fuel efficiency.

  1. To answer question one, you might fit a reduced model with just weight, and compare it (with an F-test) to a model that includes a quadratic term . You could also fit just one model and look at a t-test (what one model am I talking about?).
  2. For question two, the reduced model will depend on the results of the first analysis. The full model will also have dummy variables for country of origin.
  3. In question three, we are asking about an interaction; are the curves parallel?
  4. Finally, is there anything odd going on here? Did you have to make any semi-arbitrary decisions in order to get the job done properly? Hint: you are not a robot. Don't just do what you are told and nothing else.

Here is what you hand in.

  1. Brief answers to the client's three questions, in language a non-statistician might understand. You will base conclusions only on hypothesis tests in which the null hypothesis is rejected at alpha = 0.05, but you will avoid formal statistical terminology at all costs. I am serious about this. I will deduct marks for the use of statistical terms that a History professor would not understand.
  2. Answer to the fourth question.
  3. Printout or printouts that show both your S commands and the output. Circle and label the three p-values that support your answers to the client's three questions.