STA 301F97 Assignment 5: Quiz in tutorial Oct. 17th

1. Give an original example of a variable for which it would be unreasonable to compute the standard deviation.

2. Give an original example of a variable that is both quantitative and categorical.

A continuous variable is one in which potential measurements may be placed in one-to-one correspondance with an interval of the real numbers (that is, with a segment of the number line). Clearly this is a theoretical abstraction. What it means in practice is that there are lots of different data values, and the scale of measurement is such that you could always imagine having more accuracy. For example, weight is continuous. Now don't use weight in any of your original examples.

3. Is it possible to have a variable that is continuous but not quantitative?

4. Make up original examples of studies with a

The rest of this assignment is based on the "pulse" data, which are described below:

Students in an introductory class participated in a simple experiment. The students first took their own pulse rates. They were then asked to flip a coin. If their coin came up heads, they were asked to run in place for one minute. Then all the students took their pulses again. The variables in the study are described below.

1. First Pulse Rate

2. Second pulse rate

3. 1 = Ran in Place, 2 = Did not run in place

4. 1 = Smokes regularly, 2 = Does not smoke regularly

5. 1 = Male, 2 = Female

6. Height in inches

7. Weight in pounds

8. Usual level of physical Activity: 1 = Slight, 2 = Moderate, 3 = A lot

Get a copy of the raw (and I mean raw) data with

cp /student/jbrunner/public/pulse.dirty .

The period is important! It refers to your current directory.

Step 1: Make a command file that reads the data, providing variable labels for all variables and value labels as appropriate. Generate descriptive stats, and also frequency distributions of all variables (even continuous ones) on your screen just to make sure everything looks okay. If it does not, use common sense to deal with any problems. By my count there are four problems (or maybe three, depending on how you look at it) with this data file. Once you have fixed the problems, delete the statements that produce descriptive statistics and frequency distributions.

Remember the nature of the problems you fixed. You might be asked.

Step 2: Now run another job, adding to what you already have to create the following NEW variables. Provide variable labels and value labels as appropriate.

1. Rise in pulse rate.

2. Height above vs at or below the median.

3. Weight above vs at or below the median.

4. A composite variable with 4 categories: Female smoker, Female non-smoker, Male smoker, Male non-smoker. (Hint: use "if" statements)

Run examine to produce descriptive statistics for all continuous variables, and make frequency distributions of all categorical variables.

Also, perform statistical analyses to answer these questions:

  1. Does rise in pulse rate depend on whether the person ran in place?
  2. What proportion of the variation in pulse rate rise is explained by whether the person ran in place?
  3. Does rise in pulse rate depend on whether the person smokes regularly?
  4. What proportion of the variation in pulse rate rise is explained by whether the person smokes regularly?
  5. Does rise in pulse rate depend on whether the person is male or female?
  6. What proportion of the variation in pulse rate rise is explained by whether the person is male or female?
  7. Does rise in pulse rate depend on usual level of physical activity?
  8. What proportion of the variation in pulse rate rise is explained by usual level of physical activity?

Note that the most efficient command file will not answer these questions in the order given above.

Bring printouts of TWO, repeat TWO files to the quiz -- your command file for Step 2 and your list file for Step 2. Don't write anything on them except your name and student number.