Incentivizing Discovery through the PSA001 Secondary Analysis Challenge


Patrick S. Forscher, Abigail Noyce, Lisa M. DeBruine, Benedict C. Jones, Jessica K. Flake, Nicholas A. Coles, Christopher R. Chartier


September 14, 2020

Please cite this blog post as:

Forscher, P. S., Noyce, A., DeBruine, L. M., Jones, B., Flake, J. K., Coles, N. A., & Chartier, C. R. (2020, May 26). Incentivizing Discovery through the PSA001 Secondary Analysis Challenge.

On September 1, 2019, we announced a new initiative to promote the re-use of a large dataset we collected for the PSA’s first study. Called the Secondary Analysis Challenge, this initiative started from the premise that much of the potential richness in psychological datasets is currently untapped.

The Challenge aimed to tap that richness using two methods. The first method was relatively standard: we created extensive documentation of the project data and meta-data. The second method was a bit more unusual: we provided incentives to research teams who were willing to follow an analysis pipeline that we thought would minimize the chance of false positives and maximize the chance of true discoveries.

Specifically, with the help of a generous donation out of Hans Rocha IJzerman’s book income, we were able to offer $200 awards to up to 10 teams that:

What happened?

We received a total of eight submissions to the challenge. Each submission consisted of a link to a preregistered script that conducted an analysis on the exploratory segment of the PSA001 data.

Two members of our team (Abigail Noyce and Patrick Forscher) checked the submitted scripts for computational reproducibility. These checks involved running the submitted analysis script on one of our computers. We focused primarily on ensuring that the scripts ran without error, but we also sometimes commented on other issues, such as unusual parameter estimates, and possible bugs. We did not comment at all on the validity or theoretical grounding of the analyses. In most cases we did not even have access to this information.

If the computational reproducibility check led to changes in the code, we asked the submitting teams to add a comment to the top of the revised script listing the changes. These revised scripts were uploaded to the project OSF page.

After these computational reproducibility checks, we released the confirmatory segment of the data. We asked the teams to use the revised scripts to analyze the confirmatory segment. To receive the award, the proposing team also had to write a brief preprint reporting the results to PsyArXiv. We did not give any strong requirements on the contents of these preprints; we merely asked that they be made public.

Carlota Batres, one of our Challenge participants, mentioned specifically that the Challenge structure provided her with opportunities she would not otherwise have had. “I hope there will be more initiatives like this one which leverage the collaborative power of the PSA,” she commented.

What are the preprints like?

Most, but not all, of the preprints rely on connecting the PSA001 face rating data to some other secondary dataset, such as the Chicago Face Database, the Project Implicit Demo Website datasets, or the UN Migration database. Aside from that similarity, the preprints varied substantially in structure, format, and content. Some are fairly brief descriptions of results. Some are submission-ready articles with publication-quality figures. The topics range from an investigation of the “halo effect” to the possible link between face perception and the race and gender IAT scores averaged across regions. The Challenge seems to have elicited a variety of submissions that bear on a variety of scientific topics.

We think the preprints cover many interesting topics. However, no one within the PSA reviewed the preprints themselves for scientific veracity. You can help ensure that we advance science through this exercise by looking through the preprints yourself and providing comments and feedback to the authors. To facilitate this process, we have included links and descriptions of the preprints at the end of this post.

Lessons for Future Challenges

We think this Challenge was a success. The Challenge also holds some lessons for people who wish to facilitate their own, similar, initiatives:

  • Communicate a clear schedule. Clarity of communication helps keep the Challenge process predictable for the participants.
  • Conduct computational reproducibility checks. The computational reproducibility checks were effortful. I estimate that I spent at least an hour and a half per submission, including running code, debugging, and communicating with submitters. However, the checks uncovered multiple issues and sometimes led to substantive revisions to the analysis plan. These checks were effortful but worthwhile.
  • Enrich the target dataset with meta-data. Meta-data are data about the data -- in other words, the time of data collection, information about the stimuli, and other details about the data collection process. These meta-data are important for primary studies, but are crucial for secondary analyses. In this Challenge, we archived a huge amount of meta-data alongside our primary data and documented these in our data management plan. These meta-data greatly facilitated the task of generating interesting exploratory analyses.
  • Build pipelines to other datasets. The extensive documentation of the PSA001 data made it easy to merge our dataset to other interesting datasets. In fact, we included Chicago Face Database ID variables in the data that we released and explicitly noted in our data management plan how to merge our data to the Chicago Face Database. This is another step that we took that, I think, allowed the Challenge to be generative for exploratory analysis.

Our Challenge format should not be viewed as a substitute for other forms of peer review. By design, the Challenge did not evaluate the merits of the theoretical logic, the analysis plan, or criteria for inference. Hopefully, these are issues that peer reviewers evaluate if and when these proposals are submitted for consideration at scientific journals.

Overall, we view the Challenge format as a promising supplement that can enhance the scientific value of large datasets. We look forward to observing other innovations people adopt to enhance the value of psychological data.

Appendix: Secondary Analysis Challenge Preprints

Preprint 1: PSA001 Secondary Analysis:

Examining the “attractiveness halo effect”

Authors: Carlota Batres

Abstract: Research has found that attractiveness has a positive “halo effect”, where physically attractive individuals are ascribed with socially desirable personality traits. Most of the research on this “attractiveness halo effect”, however, has been conducted using Western samples. Therefore, this report aims to examine the “attractiveness halo effect” across eleven world regions using thirteen ratings on faces, including attractiveness, that the Psychological Science Accelerator network collected. We found that for both male and female faces, attractiveness generally correlated positively with the socially desirable traits and negatively with the socially undesirable traits. More specifically, across all eleven world regions, individuals rated as more attractive were rated as more confident, emotionally stable, intelligent, responsible, sociable, and trustworthy as well as less weird. These results replicate previous findings of the “attractiveness halo effect” in Western samples and suggest that the positive effect of attractiveness can be found cross-culturally.

Preprint 2: Is facial width-to-height

ratio reliably associated with social inferences? A large cross-national examination

Authors: Patrick Durkee and Jessica Ayers

Abstract: Previous research suggests that facial width-to-height ratio (fWHR) may be associated with behavioral tendencies and social judgments. Mounting evidence against behavioral links, however, has led some researchers to invoke evolutionary mismatch to explain fWHR-based inferences. To examine whether such an explanation is needed, we leveraged a large cross-national dataset containing ratings of 120 faces on 13 fundamental social traits by raters across 11 world regions (N = 11,481). In the results of our preregistered analyses, we found mixed evidence for fWHR-based social judgments. Men’s fWHR was not reliably linked to raters’ judgments for any of the 13 trait inferences. In contrast, women’s fWHR was reliably negatively associated with raters’ judgments of how dominant, trustworthy, sociable, emotionally stable, responsible, confident, attractive, and intelligent women appeared, and positively associated with how weird women appeared. Because these findings do not follow from assumptions and theory guiding fWHR research, the underlying theoretical framework may need revising.

Preprint 3: Variance & Homogeneity of

Facial Trait Space Across World Regions [PSA001 Secondary Data Analysis]

Authors: Sally Xie and Eric Hehman

Abstract: This preregistration is part of the PSA secondary analysis challenge. We investigate how the facial 'trait space' shifts across countries and world regions, using the PSA_001 dataset shared by the Psychological Science Accelerator. The facial trait space refers to the interrelationships between many of the trait impressions that people infer from faces. Here, we examine whether this trait space is more homogeneous (or less differentiated) in some cultures than others.

Preprint 4: Hester PSA001

Preregistration Preprint—Region- and Language-Level ICCs for Judgments of Faces

Authors: Neil Hester and Eric Hehman

Abstract: We report the results of preregistered analyses of the PSA001 face perception data. We tested whether the target-level intra-class correlations (ICCs) would be higher in specific regions (i.e., more culturally homogeneous samples) than in the global data set (i.e., a less culturally homogeneous sample). We also report perceiver-level ICCs as well as by-trait perceiver- and target-level ICCs.

Preprint 5: Do regional gender and

racial biases predict gender and racial biases in social face judgments?

Authors: DongWon Oh and Alexander Todorov

Abstract: Trait impressions from faces are more simplified for women than men. This bias stems from gender stereotypes; when strong stereotypes exist for a group of faces (e.g., of women’s or Blacks’), they are evaluated more positively/negatively when they fit/violate the stereotypes, making the impressions simpler (i.e., more one-dimensional). In this preregistered study, using trait impression ratings of faces collected from various world regions (+11,000 participants in 48 countries), scores of implicit associations (+18,000 and +212,000 participants in +200 countries), and mixed-effects models, we ask (1) whether simplified facial impressions are found for women and Blacks across regions and (2) whether the regional level of stereotypes about genders and races is correlated with the level of simplification in the face-based impressions of women and Blacks, respectively. The results were not coherent across analyses. The interpretation of the results and the limitations of the study are discussed.

Preprint 6: Hierarchical Modelling of

Facial Perceptions: A Secondary Analysis of Aggressiveness Ratings

Authors: Mark Adkins, Nataly Beribisky, Stefan Bonfield, and Linda Farmus

Abstract: The Psychological Science Accelerator’s (PSA) primary project tested for latent structure using exploratory factor analysis and confirmatory factor analysis but we decided to diverge from this approach and model individual traits separately. Our interest mainly was in examining the interplay between “stimulus ethnicity” and “stimulus sex” to discover how differing levels of these criterion differ across region, country, lab etc. While the necessary and prerequisite hierarchical structural information about each trait could certainly be found within the primary project’s dataset, we did not assume that any specific factor structure from the PSA’s primary analysis would necessarily hold, therefore we based our decision to model the data from each trait separately using a mixed model framework.

Preprint 7: Population diversity is

associated with trustworthiness impressions from faces

Authors: Jared Martin, Adrienne Wood, and DongWon Oh

Abstract: People infer a number of traits about others’ based simply on facial appearance. Even when inaccurate, face-based trait impressions can have important behavioral consequences including voting behavior and criminal sentencing. Thus, understanding how perceivers infer others’ traits is an important social and psychological issue. Recent evidence suggests that face-based trait impressions may vary by culture. In the present work, we attempt to understand cultural differences in face-based trait impressions. As part of the Psychological Science Accelerator’s Secondary Data Analysis Challenge, we report a set of pre-registered analyses testing how cultural differences in present-day diversity relate to a) 13 face-based trait impressions, b) sensitivity to physical features of the face, c) and the mental structure underlying trait impressions. We find that greater present-day diversity might be related to lower trustworthiness ratings, in particular. We discuss this finding in the context of other recent work and suggest further analysis of the mental structure of face-based trait impressions across cultures.

Preprint 8: The Facial Width-to-Height

Ratio (fWHR) and Perceived Dominance and Trustworthiness: Moderating Role of Social Identity Cues (Gender and Race) and Ecological Factor (Pathogen Prevalence)

Authors: Subramanya Prasad Chandrashekar

Abstract: People effortlessly form trait impressions from faces, and these impressions can affect a variety of important social and economic outcomes. Trait impressions based on facial features can be approximated to distinct dimensions: trustworthiness and dominance (Oosterhof & Todorov, 2008). One of the facial features, the facial width-to-height ratio (face ratio) is associated with the trait impressions. I tested whether social category (gender, race) of the target being perceived shapes the relationship between face ratio and perception of dominance and trustworthiness. In this preregistered study, using trait impression ratings of faces collected from 8800 participants across 44 countries, I employ mixed-effects analysis and report results on (1) the direct influence of social categories (gender and race) of the target on perceived dominance and trustworthiness, (2) the moderating role of social categories (gender and race) on the direct relationships between face ratio and perceived dominance and trustworthiness, and (3) the moderating role of pathogen prevalence on the direct relationships between face ratio and perceived dominance and trustworthiness.