STA429/1007 Assignment 5

Quiz on Thursday Oct. 25th at 10:10 a.m.


You will find Chapter 2 and Handout 2 to be useful for this assignment.

The file tv1.data contains data from a 1982 survey conducted in Stevens County in the United States. At the time, Stevens County was divided into 75 districts including rural, small-town and urban areas. For each of 500 households interviewed, the data file contains district number, household number within district, assessed value of home in US dollars (an indirect measure of income, which was not asked), and answers to 9 questions related to the respondents' interest in getting cable TV. The variables are:

  1. District: 1-25 are rural, 26-50 small town, 51-75 city.
  2. Household (numbered within district)
  3. Assessed value of home in US dollars
  4. Number of persons 12 and older in household
  5. Number of persons 11 and younger in household
  6. Number of TV sets in Household
  7. Price willing to pay for cable TV (in US Dollars per month)
  8. Total TV hours watched last week (add hours for all persons in household)
  9. Hours Public Affairs watched last week
  10. Hours Sports watched last week
  11. Hours Children's programming watched last week
  12. Hours Movies watched last week

Write a SAS program that reads the data and labels the variables with the label statement. Create a new variable called "Location" with 3 values: Rural, Small Town and City.

You will need to edit the data file. Notice that some people were not at home. Please do not delete the entire data lines for these respondents. Instead, put the SAS missing value code (a period) for all the missing data. I did it with search and replace in emacs. See the shorter emacs handout. I will also illustrate search and replace in class.

  1. Use proc means to obtain n, mean and standard deviation for all the quantitative variables.
  2. Use proc freq to obtain frequency distributions of Location, and also of Number of persons 12 and older, Number of persons 11 and younger and Number of TV sets. Location should be nicely labelled; I used proc format. So you can check your work, here is my frequency distribution of Location:
                                 The FREQ Procedure
    
                                                   Cumulative    Cumulative
              location    Frequency     Percent     Frequency      Percent
            ---------------------------------------------------------------
            Rural               91       18.20            91        18.20  
            Small Town          98       19.60           189        37.80  
            Urban              311       62.20           500       100.00  
    
    

For debugging, your initial runs should include frequency distributions of everything, but once the data are clean and your program is correct, just produce what is requested above.

Bring your log file and your list file to the quiz. You will hand both of them in. Also,