March 30, 2010

Health of a nation and women’s preference for masculine men

Filed under: Uncategorized — admin @ 1:54 pm

As a review, the authors of this study purport to show that women from less healthy nations prefer masculine appearing men, while women from healthier nations prefer feminine appearing men.

There are at least five major flaws with the research, and several more with the summary written by Jena Pincott.  They are as follows:

#1)  The respondents self-declared their gender and ethnicity on a web form. The researchers did not have the capacity to verify if the people claiming to be women actually were (or any other aspect of collected demographics for that matter).  I was particularly amused to see their justification for this:

Many studies of masculinity preferences have been conducted using similar web-based methods and have demonstrated that online and laboratory studies of variation in masculinity preferences produce equivalent patterns of results (e.g. Jones et al. 2005, 2007; Little et al. 2007b; Welling et al. 2008).

Hmm, those names look familiar.  Would those be the exact same people as the ones who wrote this study?  Why yes, they are!

If these folks have really established that online surveys are just as accurate as lab results, there are a whole lot of social psychologists wasting a whole lot of money on lab research.  This is particularly problematic when trying to do online surveys in nations with disparate rates of PC and Internet subscription rates.  Those who can answer your online survey are less likely to be representative of the population as a whole.

I’m calling shenanigans on this data.

It is vanishingly unlikely that their results are not polluted by misrepresented data.  They can’t even prove that the people who filled out their survey were even women!

#2) They invented their own ranking system for nation health. Creating a composite indicator for something as complex as the health of a nation is an extremely complex undertaking.  These folks picked just a few indicators and ran with them.  Their selected method for distilling a nation’s health down to a single number does not reflect an understanding of the basic aspects that are needed to create such an indicator.

Their inclusion of neonatal mortality as an indicator strongly suggests they are not familiar with the issues surrounding statistically ranking health systems.  This particular indicator is an excellent example of how transnational health statistics are difficult to pull off.

Many countries, including the United States, Sweden or Germany, count an infant exhibiting any sign of life as alive, no matter the month of gestation or the size, but according to United States Centers for Disease Control researchers,[6] some other countries differ in these practices. All of the countries named adopted the WHO definitions in the late 1980s or early 1990s,[7] which are used throughout the European Union.[8] However, in 2009, the US CDC issued a report which stated that the American rates of infant mortality were affected by the United States’ high rates of premature babies compared to European countries and which outlines the differences in reporting requirements between the United States and Europe, noting that France, the Czech Republic, Ireland, the Netherlands, and Poland do not report all live births of babies under 500 g and/or 22 weeks of gestation.[6][9][10] However, the report also concludes that the differences in reporting are unlikely to be the primary explanation for the United States’ relatively low international ranking.[10]

This indicator is problematic.  To include it in a basket of eight total indicators to asses nation health in a single number is very risky indeed and suggests they were not familiar with the nuances of this data.

In my opinion, they basically cherry-picked limited data and came up with their own composite index number.  For example, their data doesn’t come from the same years, doesn’t weight the individual factors, and doesn’t distinguish causes of death (with some exception for examining years of life lost to communicable diseases).  This is problematic because most of the deaths in the US come from non-communicable lifestyle issues, not presence of pathogens.

#3)  In seeking to prove that the relationship between masculinity preference and national health is not entangled with other indicators, such as national wealth, the authors make the absolutely elementary mistake of using Gross National Product per capita. This is such an elementary mistake, I can’t believe they did this.  It reflects a very poor understanding of economics to fall into this trap.

These types of breakdowns are particularly bad for transnational comparisons because it doesn’t take into consideration how much the items in that nation cost.  An elementary solution would have been to use purchasing power parity adjusted numbers, which compare how much a basket of products cost in the context of GDP.   This approach would have been simple, easy and more accurate, yet the authors didn’t use it.  I’m not going to do their work for them and run those numbers, but I suspect their data relationships would shift.

Amusingly, the authors admit that other ways of ranking wellbeing, such as the Human Development Index show that all their examined nations were reasonably equivalent.  Hmm, data didn’t show what you wanted, so you went elsewhere?


#4) I question the ethnic representation in their data. The authors are trying to establish that their results apply to humans in general, and not a specific  cultural context (they use the code words: cross-cultural a lot).  When they set up their experiment they tried to control for difficulties in trans-racial assessments by only including data from women (if they even were women) who self-reported their ethnicity as white (unverified).  They then used these responses as representative of women from that country.  Upon reviewing their data, I find a concern.  They include Mexico, Brazil, Argentina and Turkey.  How likely do you think it is that women from those nations would self-declare to be white?

I don’t think that is likely to be representative.

#5) They have a serious problem with data sparsity. While they had 4,794 respondents, nearly all of those came from the US, UK, Germany and France.  Yet the other 26 nations are still tossed into the pot.  In fact, their cut-off for data is 10 women.  10 women can represent the whole country!  Fully a third of their nations have less than 30 respondents.  That is very sparse data indeed.

This is particular problematic because several of their outlier points which are used to establish their trend line (the dots on the right) are from nations with very low response rates.

The authors try a bit of a trick in claiming that they did a weighted least square analysis to account for data sparsity.  Unfortunately, this is a mis-attribution.  WLS is useful for adjusting data quality issues, not quantity.  If you have biased results from a small sample set, that isn’t going to help.  It can be used to assess the relationship between the data and your comparator – the invented national health ranking, but not between the data and the unknown reality.

If this was the case, there again, a whole lot of people are wasting a whole lot of money have surveys with more than 30 people.

Why not pick 30 people from every nation, slap WLS on it, and extrapolate? WLS can’t create accurate data from sparsity.


They can’t prove their respondents were actually women.

They created a highly dubious national health ranking system.

They used imprecise economic data to exclude other factors as causes of the correlation.

Their data was very sparse for establishing their trend.





  1. Note: I didn’t read the original article.

    Of those criticisms I feel qualified to judge, I think several are well-taken. Criticism #1, however may not be the big deal you think it is. Why would someone misrepresent their gender in a research survey? (Why they would lie about drug use or sexual practices is another matter–but is there anything embarrassing about admitting you’re female?) If there is any motivation driving people’s responses to a gender question, it is probably to be accurate. Moreover, sabotaging someone’s survey with fake answers is very tedious (because lying on surveys takes more cognitive effort and thus quite a bit of motivation than simply reporting what you remember/are aware of) and if the survey researchers have done their job well, is often detectable. Also, there’s been a little bit of work comparing the results of the same survey administered through both paper and the web, and the results uniformly show that people respond the same ways with both modes. Whether a study should be done online or in the lab just depends on your research goals and procedures, but it’s no surprise that a lot more social science research is being done online now than it was 10 years ago.

    Comment by P — April 1, 2010 @ 12:58 am

  2. Of my criticisms, I wondered if that would be a weak point. One of the readers here is a talented statistician, so I wonder what her views are.

    I seems weird to me that this method of data collection would be reliable.

    I can see you argument that unreliable data would be detectable, but I doubt that these researchers have gone to that effort, given what they have done with their data in the rest of the paper. That’s my inference though, and could be wrong.

    Even if we hold that online submission either is, or can be made reliable, they still don’t have enough data to justify cross-cultural extrapolation.

    Any thoughts on my criticisms of their health indicators and non PPP adjusted income figures?

    Comment by admin — April 1, 2010 @ 8:26 am

  3. “Any thoughts on my criticisms of their health indicators and non PPP adjusted income figures?”

    As it turns out, those are the two criticisms I felt least qualified to judge. As you describe the problems, they sound genuine; but I have learned the hard way that when I am in the role of outsider, what looks like idiocy might be a product of my own ignorance. I frequently hear college students imply that surveys of sensitive issues (e.g., adultery, drug use) are pretty useless because most people OBVIOUSLY lie about such things on surveys. As it turns out, this has been researched. And while it is true that the more sensitive the questions are, the more you have to worry about respondents being dishonest, it is also true that with some pretty simple procedures in place, most people are remarkably honest when completing surveys.

    Also, my own perspective on research is that it is all flawed, especially if it involves data concerning human behavior. Every research procedure has drawbacks (e.g., if you have the budget for it, videotaping actual behavior instead of asking people about what they do/did is an improvement, but still, video cameras can produce behavior that is unnatural). So an essential question, I think, is this: did the researchers do a good enough job to move our knowledge forward a few tentative steps, or did they mess up so badly that it would have been better had they not done the study at all? And a second essential question is this: were the researchers honest about the shortcomings of their study, or did they imply that their research design was much stronger than it really was?

    I think my inner educator is leaking out…

    Comment by P — April 4, 2010 @ 1:52 am

  4. Leaking educatorhood is very welcomed.

    I’m not sure how to rank the study. Unless I re-ran their numbers with my alternative indicators, I don’t think I can *prove* they reached invalid conclusions. At this point, I can only speculate based on how these shifts have impacted other research. I am well aware I suffer from the same effect I charge them with violating – I don’t have expertise outside my field. I know they are using dubious means with GDP and Health Care indicators, but the others not so much.

    I’m with you on the detectability of lying issue, but I would like to discuss representation a bit more. We do know that self-selected survey participants give different results than randomly selected groups, and that a robust cross-section is desirable (perhaps needed). These authors are making rather large generalizations using data from countries where a small sample group of self-selected, white, internet users is not likely to be a representative sample. Even if we presume all these people are honest, are they sufficiently representative? I think not for the reasons I stated in my posting. Do you find my issues with their methodology to be plausible and reasonably likely to impact their results?

    I do think that poor scholarship can certainly be worse than no scholarship, particularly when the conclusions are heavily hyped, and actually produce impact. This study certainly has the first criteria.

    One of the things that really annoys me is how preventable these issues are. If they had talked to an economist, or someone well versed in health care assessments, I am confident they would have received the feedback I posted above. With minor adjustment they could have avoided these pitfalls. They should certainly have collected more representative data on their outliers before making extrapolations (in my view).

    “And a second essential question is this: were the researchers honest about the shortcomings of their study,…”

    Eh, yes, I think in all fairness that they did reasonably discuss the limitations of their study. However, I suspect their conclusions were flawed because of their analytical methodology and data collection, which they certainly didn’t discuss.

    Comment by admin — April 6, 2010 @ 12:39 pm

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress