The Walker Music Data

Statistical Consulting Assignment Four

Due at 10:10 a.m. on Thursday November 13th


Okay folks, this is important. Susan Walker is a real client who is going to pay close attention to what we do for her. Your work will play an important role in the conclusions that are drawn from this research. Please give it your best effort.

The Excel spreadsheet Walker.xls (easier to look at) and the plain text file Walker.data (useable by SAS) contain the same variables.

Properly analyzing a big data set like this typically takes months, so you can't really do a comprehensive job in the time available. On the other hand, you can and should try to locate some findings that have been missed up to now. I see two legitimate approaches, which could be called the "shotgun approach" and the "focused" approach.

In the shotgun approach, you explore a lot of possibilities. The output of your work will be a typed list of simple, clear statements describing conclusions you draw from the data. Each statement you make should be accompanied by a p-value. Only p-values less than 0.01 are of interest; this is the client's directive. You will also hand in the log and list file. On the list file, circle and clearly label each p-value with the number of the conclusion that is drawn from it. Please be aware that you should never just state that variables are "related," or that a difference is "statistically significant" without saying what the relationship is. For example, you would not just say that there was a sex difference in average scores on the sentence completion test. You would also say that females have higher average scores.

If you take the shotgun approach, please report at least five results. Also, be aware that the data have already been explored with elementary tests -- that is, methods involving only a single independent variable at a time. So for the most part, such analyses are of limited interest. Your preference should be for multiple regression and factorial ANOVA with muore than one factor.

If you take the focused approach, poke around a little first, settle on one question, and answer it with a fairly careful set of analyses. The product of your labours will be a typed page or three describing what you did and what you conclude. As in the shotgun approach, you will also hand in the log and list file. On the list file, circle and clearly label each p-value with the number of the conclusion that is drawn from it. We are paying attention only to p-values less than 0.01. To repeat the warning above, you must always state clearly what your findings are, not just that there is something there.

Regardless of which approach you take, please be aware that some one-IV-at-a-time tests have been carried out, but two important things have not been explored much yet.

Avery made one other very good comment about the factor analyses, and he suggested (or maybe just implied) a way around the potential criticism that the factors in the factor analysis are uncorrelated, but this is artificial and is more a statistical convenience than a real feature of the data. If you understand this issue (you're not really expected to), it's a very good candidate for the focused approach.

To save you some time, here is part of my SAS program.


proc format;
     value ptrfmt 1 = 'Pre-Conformist' 2 = 'Conformist';
     value agefmt 1 = '12-14' 2 = '15-17' 3 = '18+';
     value mhfmt 1 = 'Inpatient'
                  2 = 'Not inpatient,  not depr'
                  3 = 'Not inpatient,  depr';

data rockroll;
     infile 'Walker.data' delimiter=',';
     input id wmq1-wmq47 sex $ age psych $ beck ptr bdigroup ptr2 agegrp
           evalSD romancSD potentSD pref1-pref10
           introsp ID_Music DisMusID F_Rebel ID_Self
           sc1-sc7 schlcom ; /* SAS is not case sensitive, but still ... */

     if psych = 'Y' then mhgroup = 1;
        else if (psych = 'N' & bdigroup=1) then mhgroup = 2;
        else if (psych = 'N' & bdigroup=2) then mhgroup = 3;


     label ptr2     = 'PTR Sentence Completion Personality Test'
           evalSD   = 'Evaluative factor from Semantic Differential'
           romancSD = 'Romance factor from Semantic Differential'
           potentSD = 'Potency factor from Semantic Differential'
           pref1    = 'Pref for MC Hammer: Play'
           pref2    = 'Pref for David Bowie: Space Oddity'
           pref3    = 'Pref for Incantation: Entrapment of Evil'
           pref4    = "Pref for Sinead O'Connor: Nothing Compares 2 U"
           pref5    = 'Pref for Madonna: Justify my Love'
           pref6    = 'Pref for Moody Blues: Knights in White Satin'
           pref7    = 'Pref for Niggaz With Attitudes: Fuck the Police'
           pref8    = 'Pref for Paula Abdul: Rush, Rush'
           pref9    = 'Pref for Crystal Waters: Homeless'
           pref10   = "Pref for LaTar: Everyone's Still Having Sex"
           introsp  = 'Introspection factor from WMQ'
           ID_Music = 'Identity Music factor from WMQ'
           DisMusID = 'Discerning Music Identity factor from WMQ'
           F_Rebel  = 'Fantasy-Rebellion factor from WMQ'
           ID_Self  = 'Identity-Self factor from WMQ'
           schlcom  = 'School Committment Scale'
           mhgroup  = 'Mental Health Group';
     format ptr2 ptrfmt.;
     format agegrp agefmt.;
     format mhgroup mhfmt.;