Thursday, September 13, 2007

The problem with statistics in science

...is that a disturbingly large fraction of researchers not specifically educated within statistics or so-called "hard" physical sciences is less than brilliant when it comes to evaluating data. Case in point - the main story in the September issue of Gemini (Norwegian only, I'm afraid) is an article about how young men (18-year olds) evaluate politicians largely based on whether or not they're men. The title is "Young men elect men", and that is also the conclusion of the piece. But wait; despite the boys fitting very well into a stereotype, there is also good news: Girls evaluate the politician based on content rather than appearances. "The good news is that girls are more mature, but it's sad that the young men aren't", the researcher says. Paraphrased, but still. Apparently young men resort to stereotypical behavior due to their lack of experience, whatever that means.

So who's the genious behind this study and how did the researchers manage to collect the data set and extract the conclusions? The study was conducted by Associate Professor of Political Science and Media Sociology Toril Aalberg, from the Department of Sociology and Political Science at NTNU. The method: One female and one male actor were given 5 fake political identities (i.e. names and party affiliations). Each gave five political speeches (from transcripts of actual speeches given by real-life politicians), which were videotaped, leaving ten videos - five pairs of identical speeches. Ya follow?

These were then shown to ~400 high school students in the Trondheim area, so that one class got to see one of the taped speeches. Each student was then given a questionnaire about how they perceived the politician with respect to the person giving the speech, the content, the party supposedly affiliated with the speaker, etc. Answers to each question were given on a sliding 1-10 scale, so that 1 corresponds to "strongly disagree" and 10 corresponds to "strongly agree". Or vice versa - doesn't really matter. The results: Independent of party affiliation, the young men evaluated the male politician to do the better job. Students who watched a speech given by the male actor found him to be more knowledgeable, convincing, trustworthy and inspiring than the students who watched the female "politician". This despite the fact that there were five identical speeches.

OK; that was the data set, methodology and conclusion. Let's work with round numbers and assume that there are 20 students per class. That means the statistical material corresponds to 20 classes, i.e. each video was shown twice. There were two actors - one female and one male. N = 2 shows up a lot in this dataset, and they don't report any use of control sets. Also; there is no information about what the gender distribution was in the classes, or what types of high schools were selected for the study. In other words, not only was each video shown a measly two times, but one of those could've been shown to an all-male mechanic class, while the other one could've been shown to an all-female audience of nursing students. Or vice versa.

Are you freakin' kidding me? And one of the lead-ins is even that people "hear what they want to hear". Pot, meet kettle. Aalberg finally mentions that her female colleagues accept her findings immediately and without question, while her male colleagues "have to look at the results carefully before being convinced". Again; are you frickin' kidding me?

I don't have any particular opinion about whether or not men elect men, but I do have a problem with people who draw pretty strong conclusions based on a less than optimal data set, to put it mildly. One of the main problems with this dataset (where N = 1 and thus effectively irrelevant) is so blatantly obvious that I'm not even gonna bother pointing it out. Anyone?

5 comments:

Anders said...

Why do they choose five different speeches and just two different actors? It might be that the male actor just happened to be more credible in his role then his female counterpart (there's your N=1's, I guess?). Especially consider the phrase "The girls seem to be prone to see other sides of the message then whether messenger is male or female". Does that mean that they find the same trend of preferring men over female politicians within the female population, just not as strong as with the men? Or that young women elects female politicians, but not so biased as men? In both cases, you should take a closer look at your study design.

And look at the people (hoping the Gemini photos are actual snapshots from the videos); one is dressed in dark clothes and one in light clothes. Isn't that an enormous source of error (I know you love color theory, G-loc, ever since you started your mandatory pedagogic course), especially when we should consider level of credibility (there are a reason why business men were dark suits and not pink ones).

I could also say something about the body language in the snaps, but that would be interpreting too much from the pictures.

However, since I don't know how the data was interpreted (some was done, since they corrected for political standing of the population), and give Toril Aalberg the benefit of the doubt since she adds "Such studies should be repeated before any general conclusion can be drawn". However, I would suggest rethinking the study design before drawing any general conclusions.

And Reverend W., she seems to be a colleague of yours. Maybe you should discuss this over lunch some day. ;-)

Wilhelm said...

B, I N G O, and BINGO was his name.

Although she says "such studies should be repeated before any general conclusion can be drawn", that's exactly what she did - drew conclusions WAY outside the confidence interval of her data set. Strictly speaking, she'd be extending her data set if she said anything other than "we figured out that high school kids will fill out questionnaires if you ask them to".

I'd love to see the principal component breakdown for this "analysis", but I fear that the only load derived from this is the study itself.

I know; Monster chemometric joke. Thank you, thank you - I'll be here all week

Wilhelm said...

Not sure about the lunch thing, dawg. I have a distinct feeling that it'd be like two bald men fighting over a comb - I know jack about political science, and she probably wouldn't know a correlation coefficient if it rolled up and slapped her with a newspaper.

Anders said...

I'd love to see the principal component breakdown for this "analysis", but I fear that the only load derived from this is the study itself.

I guess the score(s) of his analysis wouldn't be that good either...

Who said chemometrics are nerds?

Wilhelm said...

Good thing none of us specialize in chemometrics, then.

I certainly wouldn't want to be categorized as a nerd.

Oh, wait a minute...

Never mind :-(