STA442/1008 Assignment 3

Quiz on Friday Feb. 8th


Explain the difference between random sampling (simple independent random sampling) and random assignment. Use original examples.

You will find Chapter 2 and Handout 2 to be useful for the remainder of this assignment.

The file tv1.data contains data from a 1982 survey conducted in Stevens County in the United States. At the time, Stevens County was divided into 75 districts including rural, small-town and urban areas. For each of 500 households interviewed, the data file contains district number, household number within district, assessed value of home in US dollars (an indirect measure of income, which was not asked), and answers to 9 questions related to the respondents' interest in getting cable TV. The variables are:

  1. District: 1-25 are rural, 26-50 small town, 51-75 city.
  2. Household (numbered within district)
  3. Assessed value of home in US dollars
  4. Number of persons 12 and older in household
  5. Number of persons 11 and younger in household
  6. Number of TV sets in Household
  7. Price willing to pay for cable TV (in US Dollars per month)
  8. Total TV hours watched last week (add hours for all persons in household)
  9. Hours Public Affairs watched last week
  10. Hours Sports watched last week
  11. Hours Children's programming watched last week
  12. Hours Movies watched last week

Write a SAS program that reads the data and labels the variables with the label statement. Create a new variable called "Location" with 3 values: Rural, Small Town and City.

You will need to edit the data file. Notice that some people were not at home. Please do not delete the entire data lines for these respondents. Instead, put the SAS missing value code (a period) for all the missing data. I did it with search and replace in emacs. See the shorter emacs handout.

  1. Use proc means to obtain n, mean and standard deviation for all the quantitative variables.
  2. Use proc freq to obtain frequency distributions of Location, and also of Number of persons 12 and older, Number of persons 11 and younger and Number of TV sets. Location should be nicely labelled; I used proc format. So you can check your work, I had 98 households in the Small Town location.

For debugging, your initial runs should include frequency distributions of everything, but once the data are clean and your program is correct, just produce what is requested above.

Bring your log file and your list file to the quiz. You will hand both of them in. If you decide to use %include, you'll need to bring two log files. Also,