STA442/1008 Assignment 5

Do this for Test Two: Friday Feb 27th


Remember the bank data from Assignment 2? We had starting salaries for female and male employees of a bank. The data file discrim2.dat is a more complete set of the same data. The variables are:

Please start by converting age, seniority and prior work experience from months to years. Also, create a new variable equal to age at the time the person was hired.

For this assignment, we are going to focus on starting salary as the dependent variable. The primary question is this. Controlling for how long ago the person was hired, age when hired and prior work experience, did men still receive higher average starting salaries?

Please use proc reg simple corr instead of just proc reg. This way, you get simple descriptive statistics and correlations among the independent variables. The only problem is that (as you will see), your variable labels get in the way, and cause SAS to use too much paper printing this stuff. I got satisfactory results by putting the statement options nolabel; on the line right before my proc reg. Please note that use of the nolabel option does not affect the correctness of your results; it just makes the output a little easier to look at.

Here are some questions you should be able to answer. Many of the answers are numbers from your output.

  1. Is your answer to the primary question Yes or No?
  2. Are the results statistically significant at the 0.05 level?
  3. What is the value of the test statistic?
  4. What is the p-value?
  5. Whether the test was significant or not, who got higher average salaries when all other variables in the reduced model were held constant? How can you tell from your output?
  6. When the variables in the reduced model are held constant, the (predicted) average salary of women minus the predicted average salary of men is ... What? The number is on your printout directly.
  7. What is the reduced model?
  8. What is the full model?
  9. What proportion of the variation in starting salary is explained by all the variables in the full model?
  10. Once you control for the variables in the reduced model, what proportion of the remaining variation is explained by sex of employee? Use a calculator.
  11. What proportion of the variation is explained by the variables in the reduced model? There are several ways to get the answer to this one; the simplest is to fit the reduced model, but you can also get it from the sequential sums of squares. Another way is to do algebra on the answer to the preceding question, but I would never ask you to do something like that in this class.
  12. When the variables in the reduced model are help constant at their mean levels (this is why you need the simple option), what is the predicted average starting salary of females? Of males? Please use proc iml for this one.
  13. Now some secondary questions. You can get all the answer directly from the printout, without extra calculations.
    1. If you control for age at time of hiring, prior work experience at the time of hiring and gender, is starting salary related to how long ago the person was hired? What is the value of the test statistic? Is it significant? Is the relationship (significant or not) positive, or is it negative? What is the p-value? In non-statistical language, what do you conclude? Your answer could start out "Allowing for age at time of hiring, prior work experience at the time of hiring, and gender, ..."
    2. You want to know if any of the independent variables are related to starting salary. Give the value of the test statisic for the simultaneous test. What is the p-value? Is the test significant? What do you conclude?
    3. If you control for how long ago the person was hired, prior work experience at the time of hiring and gender, is starting salary related to age at time of hiring? What is the value of the test statistic? Is it significant? Is the relationship (significant or not) positive, or is it negative? What is the p-value? In non-statistical language, what do you conclude?
    4. If you control for how long ago the person was hired, age at time of hiring and gender, is starting salary related to prior work experience at the time of hiring? What is the value of the test statistic? Is it significant? Is the relationship (significant or not) positive, or is it negative? What is the p-value? In non-statistical language, what do you conclude?
    5. What is the correlation between age when hired and amount of prior work experience at time of hiring? Give a number. There is no significance test, but anyway describe the trend in simple, non-statistical language.
    6. For the following questions, determine the answer from the correlations, without using any significance test. That is, describe trends that may or may not be significant.
      1. Who has more average seniority, men or women?
      2. Who was older on average when hired, men or women?
      3. Who had more average prior work experience at the time of hiring, men or women?

      Bring your log file and your list file to the test. You may need to hand one or both of them in.