STA442/1008 Assignment 2

Quiz on Friday Sept. 25th


This assignment is based on Chapter 2 of the class notes and associated lecture material. You will do the job described below, and bring your log file and your list file to the quiz. Then you will be asked a few very simple questions, like what's the mean number of hours of sports programming watched per household, or how many households in the sample do not have a TV. The answers will all be numerical, and they will all be directly from your printout. The quiz will consist of your circling a few numbers and writing a few words on your printouts, and turning them in. It should only take a few minutes.

The file tv1.data contains data from a 1982 survey conducted in Stevens County in the United States. At the time, Stevens County was divided into 75 districts including rural, small-town and urban areas. For each of 500 households interviewed, the data file contains district number, household number within district, assessed value of home in US dollars (an indirect measure of income, which was not asked), and answers to 9 questions related to the respondents' interest in getting cable TV, which was new at the time. The variables are:

  1. District: 1-25 are rural, 26-50 small town, 51-75 city.
  2. Household (numbered within district)
  3. Assessed value of home in US dollars
  4. Number of persons 12 and older in household
  5. Number of persons 11 and younger in household
  6. Number of TV sets in Household
  7. Price willing to pay for cable TV
  8. Total TV hours watched last week (add hours for all persons in household)
  9. Hours Public Affairs watched last week
  10. Hours Sports watched last week
  11. Hours Children's programming watched last week
  12. Hours Movies watched last week

Here's what you do

Write a SAS program that reads the data and labels the variables with the label statement. Create a new variable with 3 values: Rural, Small Town and City. Use proc means to obtain n, mean and standard deviation for all the quantitative variables.Use proc freq to obtain frequency distributions of all the categorical variables, including the quantitative variables that take on just a few values (say, fewer than 20 values). That's it.

For debugging, your initial run should include frequency distributions of District and Household for data cleaning. But once the data are cleaned, you should omit these, because printing them is a waste of paper.

Bring your log file and your list file to the quiz. You will hand both of them in. Here are a few suggestions and comments:

Just so you can check your work, for value of home in U.S. dollars, I get a standard deviation of 16455.73.