Assignment 4

The Walker Music Data

Statistical Consulting Assignment Four

Due at 10:10 a.m. on Thursday November 13th

Okay folks, this is important. Susan Walker is a real client who is going to pay close attention to what we do for her. Your work will play an important role in the conclusions that are drawn from this research. Please give it your best effort.

The Excel spreadsheet Walker.xls (easier to look at) and the plain text file Walker.data (useable by SAS) contain the same variables.

Identification Number
Responses to the 47 questions in the Walker Music Questionnaire , on a scale from 1=Strongly Disagree to 7 = Strongly Agree. Note that Questions 5, 10, 16, 22, 26, 28 and 36 make up the School Commitment Scale (sort of hidden among the music questions). These questions were not included in the Principal Components Analysis, nor were Questions 2, 30, 32, 39 and 44 (which represent preference for specific types of music -- take a look!).
Sex
Age
Psychiatric patient versus non-patient
Beck Depression Inventory (BDI), missing for all the psychiatric patients. I don't plan to use this variable directly.
Sentence Completion Personality Test (PTR), scored from 1 to 6. My interpretation of this variable is that high numbers indicate answers that are more thoughtful and emotionally complex. I am not using proper technical language here; I hope you know what I mean.
BDI Group 1 = Less than 16 on the Beck Depression Inventory; 2 = 16 or above, which is serious. This variable was used to classify the non-patients into depressed versus non-depressed.
Two-group PTR: Subjects got a 1 ("Pre-conformist") if their Sentence Completion Personality Test was 1, 2 or 3; they got a 2 ("Conformist") if their score was 4, 5 or 6. I just think of it as high vs. low personality development. This is the version of the scale that is deemed clinically meaningful.
Age Group (Theoretically meaningful):
1. 12-14
2. 15-17
3. 18+
Three variables: Evaluative (Good-Bad), Romance, and Potency (Strong-Weak) factor scores (actually, rotated principal components) from the Semantic Differential ratings of the 10 music clips, all combined I think. But if they were all combined and the Catholic school students were not allowed to rate all the songs, I would have thought there should be a lot more missing values. Or maybe the ratings of all the songs were averaged for each scale (like Simple-Complex), and it was these means that were input to the PCA. In that case the variables for the Catholic school students would just be averages of a smaller number of items. This is my best guess of what was done. Susan probably told us, but as you can see it's easy to miss details when you're new to a data set -- especially a complex one like this.
Now we have 10 "Preference" variables, which are the average of two semantic Differential scales: Good-Bad and Pleasing-Displeasing. They are ratings of
1. PreferenceSD.1 MC Hammer: Play
2. PreferenceSD.2 David Bowie: Space Oddity
3. PreferenceSD.3 Incantation: Entrapment of Evil
4. PreferenceSD.4 Sinead O'Connor: Nothing Compares 2 U
5. PreferenceSD.5 Madonna: Justify my Love
6. PreferenceSD.6 Moody Blues: Knights in White Satin
7. PreferenceSD.7 Niggaz With Attitudes: Fuck the Police
8. PreferenceSD.8 Paula Abdul: Rush, Rush
9. PreferenceSD.9 Crystal Waters: Homeless
10. PreferenceSD.10 LaTar: Everyone's Still Having Sex
Next we have five (rotated) principal components from the Walker Music Questionnaire. They are
- Introspection
- Identity Music
- Discerning Music Identity
- Fantasy-Rebellion
- Identity-Self
Now we have the seven items making up the School Commitment Scale. These are identical to items 5, 10, 16, 22, 26, 28 and 36 from the Walker Music Questionnaire, but some of them are turned around so that higher numbers always represent more positive statements about school.
School Commitment: The mean of the preceding 7 items. Let's use this one.

Properly analyzing a big data set like this typically takes months, so you can't really do a comprehensive job in the time available. On the other hand, you can and should try to locate some findings that have been missed up to now. I see two legitimate approaches, which could be called the "shotgun approach" and the "focused" approach.

In the shotgun approach, you explore a lot of possibilities. The output of your work will be a typed list of simple, clear statements describing conclusions you draw from the data. Each statement you make should be accompanied by a p-value. Only p-values less than 0.01 are of interest; this is the client's directive. You will also hand in the log and list file. On the list file, circle and clearly label each p-value with the number of the conclusion that is drawn from it. Please be aware that you should never just state that variables are "related," or that a difference is "statistically significant" without saying what the relationship is. For example, you would not just say that there was a sex difference in average scores on the sentence completion test. You would also say that females have higher average scores.

If you take the shotgun approach, please report at least five results. Also, be aware that the data have already been explored with elementary tests -- that is, methods involving only a single independent variable at a time. So for the most part, such analyses are of limited interest. Your preference should be for multiple regression and factorial ANOVA with muore than one factor.

If you take the focused approach, poke around a little first, settle on one question, and answer it with a fairly careful set of analyses. The product of your labours will be a typed page or three describing what you did and what you conclude. As in the shotgun approach, you will also hand in the log and list file. On the list file, circle and clearly label each p-value with the number of the conclusion that is drawn from it. We are paying attention only to p-values less than 0.01. To repeat the warning above, you must always state clearly what your findings are, not just that there is something there.

Regardless of which approach you take, please be aware that some one-IV-at-a-time tests have been carried out, but two important things have not been explored much yet.

Interactions between categorical independent variables in their relationship to the dependent variables.
Relationships between quantitative variables from different domains. The domains I see are
1. Factors from the WMQ (5 vars)
2. Factors from the Seantic Differential ratings (3 vars)
3. Preferences for specific songs (10 vars)
4. School committment (1 var)

Avery made one other very good comment about the factor analyses, and he suggested (or maybe just implied) a way around the potential criticism that the factors in the factor analysis are uncorrelated, but this is artificial and is more a statistical convenience than a real feature of the data. If you understand this issue (you're not really expected to), it's a very good candidate for the focused approach.

To save you some time, here is part of my SAS program.


proc format;
     value ptrfmt 1 = 'Pre-Conformist' 2 = 'Conformist';
     value agefmt 1 = '12-14' 2 = '15-17' 3 = '18+';
     value mhfmt 1 = 'Inpatient'
                  2 = 'Not inpatient,  not depr'
                  3 = 'Not inpatient,  depr';

data rockroll;
     infile 'Walker.data' delimiter=',';
     input id wmq1-wmq47 sex $ age psych $ beck ptr bdigroup ptr2 agegrp
           evalSD romancSD potentSD pref1-pref10
           introsp ID_Music DisMusID F_Rebel ID_Self
           sc1-sc7 schlcom ; /* SAS is not case sensitive, but still ... */

     if psych = 'Y' then mhgroup = 1;
        else if (psych = 'N' & bdigroup=1) then mhgroup = 2;
        else if (psych = 'N' & bdigroup=2) then mhgroup = 3;


     label ptr2     = 'PTR Sentence Completion Personality Test'
           evalSD   = 'Evaluative factor from Semantic Differential'
           romancSD = 'Romance factor from Semantic Differential'
           potentSD = 'Potency factor from Semantic Differential'
           pref1    = 'Pref for MC Hammer: Play'
           pref2    = 'Pref for David Bowie: Space Oddity'
           pref3    = 'Pref for Incantation: Entrapment of Evil'
           pref4    = "Pref for Sinead O'Connor: Nothing Compares 2 U"
           pref5    = 'Pref for Madonna: Justify my Love'
           pref6    = 'Pref for Moody Blues: Knights in White Satin'
           pref7    = 'Pref for Niggaz With Attitudes: Fuck the Police'
           pref8    = 'Pref for Paula Abdul: Rush, Rush'
           pref9    = 'Pref for Crystal Waters: Homeless'
           pref10   = "Pref for LaTar: Everyone's Still Having Sex"
           introsp  = 'Introspection factor from WMQ'
           ID_Music = 'Identity Music factor from WMQ'
           DisMusID = 'Discerning Music Identity factor from WMQ'
           F_Rebel  = 'Fantasy-Rebellion factor from WMQ'
           ID_Self  = 'Identity-Self factor from WMQ'
           schlcom  = 'School Committment Scale'
           mhgroup  = 'Mental Health Group';
     format ptr2 ptrfmt.;
     format agegrp agefmt.;
     format mhgroup mhfmt.;