社会心理学中的“危机”真的那么糟糕吗？事情好转了吗？

社会心理学中的“危机”真的那么糟糕吗？事情好转了吗？
Alex Fradera
陈明译
请不要在微信公众号内发布

Part One: the researchers’ perspective
第一部分：研究者的视角

The field of social psychology is reeling from a series of crises that call into question the everyday scientific practices of its researchers. The fuse was lit by statistician John Ioannidis in 2005, in a review that outlined why, thanks particularly to what are now termed “questionable research practices” (QRPs), over half of all published research in social and medical sciences might be invalid. Kaboom. This shook a large swathe of science, but the fires continue to burn especially fiercely in the fields of social and personality psychology, which marshalled its response through a 2012 special issue in Perspectives on Psychological Science that brought these concerns fully out in the open, discussing replication failure, publication biases, and how to reshape incentives to improve the field. The fire flared up again in 2015 with the publication of Brian Nosek and the Open Science Collaboration’s high-profile attempt to replicate 100 studies in these fields, which succeeded in only 36 per cent of cases. Meanwhile, and to its credit, efforts to institute better safeguards like registered reports have gathered pace

社会心理学领域正在受到一系列的危机的影响，这些危机对研究者的日常科学实践提出了质疑。导火索由统计学家John Ioannidis在2005点燃，他在一篇综述中概述了为什么要特别感谢所谓的“有问题的研究行为questionable research practices”（QRPs），所有已发表的社会科学与医学研究中可能有一半是无效的。这个大爆炸撼动了众多的科学，但是在社会与人格心理学领域，这场大火仍然特别剧烈的燃烧着，这些担忧通过2012出版的《心理科学透视》特刊所进行的整理和回应而完全公开了，在这个特刊中，讨论了重复性的失败，出版社的偏见，以及如何重塑激励措施，以改善这一领域。在2015年，这场大火又燃起来了，2015年布莱恩·诺赛克（Brian Nosek）的出版物以及“开放科学合作”(The Open Science Collaboration）高调尝试复制了这些领域100项研究，只有36%的案例成功重复。值得赞扬的是，与此同时，对更好的保障措施的努力建立，例如，注册报告已经加快步伐。

So how bad did things get, and have they really improved? A new article in pre-print at the Journal of Personality and Social Psychology tries to tackle the issue from two angles: first by asking active researchers what they think of the past and present state of their field, and how they now go about conducting psychology experiments, and second by analysing features of published research to estimate the prevalence of broken practices more objectively

那么，事情有多糟糕呢？他们真的改善了吗？在《人格和社会心理学杂志》预印的一篇新文章中，试图从两个角度来解决这个问题：首先，通过向活跃的研究者询问，他们对自己领域的过去与现在的状态的看法，以及他们如何进行心理学实验；第二，通过分析已发表的研究的特点，更客观地估计破坏性实践的程度。

The paper comes from a large group of authors at the University of Illinois at Chicago under the guidance of Linda Skitka, a distinguished social psychologist who participated in the creation of the journal Social Psychological and Personality Science and who is on the editorial board of many more social psych journals, and led by Matt Motyl, a social and personality psychologist who has published with Nosek in the past, including on the issue of improving scientific practice

本文来自于伊利诺伊大学的一大批作者，由Linda Skitka指导，Linda Skitka是一个杰出的社会心理学家，他参与了《社会心理学和人格科学》杂志的创立，并且在更多的社会心理学期刊中担当编委，在Matt Motyl的带领下，一个社会和人格心理学家曾经在过去诺斯克进行了合作出版，这些包括提高科学实践问题。

Psychology research is the air that we breathe at the Digest, making it crucial that we understand its quality. So in this two-part series, we’re going to explore the issues raised in the University of Illinois at Chicago paper, to see if we can make sense of the state of social psychology, beginning in this post with the findings from Motyl et al’s survey of approximately 1,200 social and personality psychologists, from graduate students to full professors, mainly from the US, Europe and Australasia

心理学研究是BPS研究精选中的赖以呼吸的空气，我们对其质量的了解是至关重要的。所以在这个两部分的系列中，我们将探讨在芝加哥伊利诺伊大学论文的问题，看看我们是否能够使了解社会心理状态的状况，开从这篇文章开始，Motyl 等人调查了约1200名社会和人格心理学家，从研究生到全职教授，主要来自美国，欧洲和澳大利亚。

Motyl’s team began by asking their participants about the state of the field now as opposed to 10 years ago. On average, participants believed that older research would only replicate in 40 per cent of cases – quite close to Nosek’s figure – but they believed that research being conducted now would have a better rate, about 50 per cent, and that generally the field was improving itself in response to the crisis

Motyl的团队向参与者询问他们研究的领域现在与过去10年前的状况。平均而言，参与者认为，过去的研究只会有40%的可重复性——相当接近Nosek的数字——但他们认为现在进行的研究将有更好的可复制率，大概50%左右，这一领域的普遍性的改善本身就是为了应对危机。

Motyl’s team also canvassed the respondents on a range of questionable research practices, sketchy behaviours like neglecting to report all the measures taken, or quietly dropping experimental conditions from your study. Thanks particularly to work by Joseph Simmons, Leif Nelson, and Uri Simonsohn, we understand just how much these practices compromise the assumptions of scientific significance testing, making it easy to produce false positive results even in the absence of fraudulent intent. In their words, QRPs are not wrong “in the way it’s wrong to jaywalk”, the way that researchers have often implicitly been encouraged to think of them, but “wrong the way it’s wrong to rob a bank.”

Motyl的团队也调查了受访者的一系列有问题的研究行为，粗略的行为，包括忽视报告所有采取的措施，或者悄然无息的从研究中删除实验条件。特别感谢Joseph Simmons, Leif Nelson, 和Uri Simonsohn的工作，让我们了解了这些做法多大程度上损害了科学意义测试的假设，很容易在没有欺诈的意图产生假阳性结果。用他们的话来说，QRP“在它错误的擅自乱窜马路的路上in the way it’s wrong to jaywalk”是没有错的，研究人员常常暗示的方式鼓励他们去思考它们，但“错误方式本身如同抢银行。”

Previous surveys of researchers’ own QRP usage have uncovered high levels of admissions, as if the field was rushing to the confession box to purge their sins. Here, Motyl’s team used finer-grained questioning to look at frequency (often a “yes” turned out to be “rarely” or “once”) and justification. In some cases, a researcher’s justification showed that they had misinterprete the question and that they were actually expressing strong disapproval of the QRP – in fact, this seemed to be the case in virtually all “confessions” of data fabrication. In other cases, the context provided by a justification painted the particular research practice in a completely different light

之前关于研究者自己QRP习惯的调查，已经建立了更高的入场门槛，这个领域就像是在慌忙地赶去忏悔箱以清洗他们的罪恶。在这里，Motyl的团队使用更细的质询来查看频率分布（通常是一个“是”证明“很少”或“一次”）和过失情况。在某些情况下，一个研究者的辩解显示了他们误解了问题，而且，他们实际上是在表达对QRP强烈反对——事实上，几乎所有的“忏悔供词”的数据造假看起来都是这样的。在其他情况下，由正当理由提供的背景材料在一个完整的不寻常的光谱中绘制了特定的研究实践。

For example, consider the seemingly dodgy decision to drop conditions from your study analysis. If your rationale is that the condition didn’t turn out to do what you want to do – in an emotion and memory study, your sad video didn’t produce a sad mood in participants, for instance – it’s actually more problematic to keep what is effectively a bogus condition in your analysis than it is to exclude it (ideally in a principled way according to a registered procedure). For the new survey, independent judges evaluated all the stated justifications, and felt they legitimised the “questionable” practices in 90 per cent of cases

考虑到愚蠢的决定，例如，从你的研究中删除条件状况。如果你的理由是，这样的前提条件的结果并不是你想要的——例如，在情绪记忆的研究中，你的悲伤的视频没有让参与者产生悲伤情绪，比如——实际上，在你的分析中保留那些有效的伪条件比排除这些更为困难更多的问题可以让你分析什么是有效的而不是排除它虚假的条件（根据注册程序，理论上是有个原则方法的）。新的调查显示，独立的审鉴人对所有的陈述理由进行评估，并认为他们合法的“质疑了”90%的案例的实践。

Discovering these misunderstandings and justifiable practices littered through the QRP data led Motyl’s team to conclude that pre-explosion psychology practices aren’t as derelict as once feared, although the fact that 70 per cent respondents said they are now less likely to engage in many of these practices than ten years ago suggests that all was not entirely virtuous back then

通过QRP数据发现这些误解和合理做法，使得Motyl团队得出结论，之前爆炸的心理学实践不像以前担忧的那样玩忽职守，尽管事实上，70%的被调查对象表示，他们现在不太可能像19年前一样从事那么多的实验，这些观点显示，当时所有的做法不是那么的完美。

So not perfect, but getting better, is the take within the field: a cautious optimism compared to some dire pronouncements on the state of psychology. In Part Two, we’ll look at the body of psychological research itself, to see if this optimism is justified

因此，曾经是不完美的，但会越来越好，只要在这个领域内做到：与心理学状态的一些悲惨的生命相比，要谨慎的乐观。在第二部分中，我们将研究心理研究本身，看看这种乐观是合理的。

第二部分

A new paper in the Journal of Personality and Social Psychology has taken a hard look at psychology’s crisis of replication and research quality and we’re covering its findings in two parts.

《人格与社会心理学杂志》的一篇新论文仔细研究了心理危机的可重复性和研究质量，我们将其研究结果分为两部分。

In Part One, published yesterday, we reported the views of active research psychologists on the state of their field, as surveyed by Matt Motyl and his colleagues at the University of Illinois at Chicago. Researchers reported a cautious optimism: research practices hadn’t been as bad as feared, and are in any case improving.

在昨天报告的第一部分中，我们公布了活跃的心理学家对他们的场态的观点，作为调查的Matt Motyl和他的同事们在伊利诺伊大学芝加哥。研究人员报告了一个谨慎的乐观主义：研究实践并没有想象的那么糟糕，而且无论如何都在改善。

But is their optimism warranted? After all, several high-profile replication projects have found that, more often than not, re-running previously successful studies produces only null results. But defenders of the state of psychology argue that replications fail for many reasons, including defects in the reproduction and differences in samples, so the implications aren’t settled.

但他们的乐观是有保证的吗？毕竟，一些高调的复制项目发现，往往不是，重新运行以前成功的研究只产生空的结果。但心理状态的辩护者认为，复制失败的原因很多，包括生殖和不同样品的缺陷，所以往往不是解决。

To get closer to the truth, Motyl’s team complemented their survey findings with a forensic analysis of published data, uncovering results that seem to bolster their optimistic position. In Part Two of our coverage, we look at these findings and why they’re already proving controversial.

接近真相，Motyl的团队补充他们的调查结果与法庭公布的数据分析，发现结果似乎支持他们的乐观立场。在我们的报道的第二部分，我们看看这些发现，以及为什么他们已经证明了争议。

Motyl and his colleagues used a relatively new type of analysis to assess the quality and honesty of the data found in over 500 previously published papers in social psychology. Their approach is technical, involving weirdly-named statistics conducted upon even more statistics, so it helps to use an analogy: Just as a vegetable garden produces a variety of tomatoes, some bigger than others, some misshapen, some puny and poor for eating, an honestly-conducted body of research should bear a range of fruit in the same way. True experimental effects shouldn’t always come out exactly the same: they should vary in size from experiment to experiment, including instances when the effect is too small to be statistically significant.

Motyl和他的同事们使用一个相对较新的类型的分析评估数据质量和诚信发现超过500以前发表的论文在社会心理学。他们的方法是技术，涉及对古怪的命名更统计，所以它可以用一个比喻：就像一个蔬菜园生产的各种西红柿，一些比其他的更大，有些畸形，吃一些弱小和穷人，一个诚实的管理研究机构应该承担的范围内以相同的方式水果。真正的实验效果不应该总是完全相同的：他们应该从实验到实验的大小不同，包括实例时的效果太小，在统计学上显着。

These are the sorts of things you can evaluate in a body of research – in this case with the Test for Insufficient Variance, which Motyl’s study used alongside six other indices. When there were too many irregularities in the data, or bizarre regularity like identikit supermarket tomatoes, this suggested to Motyl and his colleagues that questionable research practices may have been used to make the weak results swell up to reach the desired appearance.

这些东西你可以在不足的方差测试在这种情况下，一个研究机构–评价类，Motyl的研究与其他六指数。当有数据太多的违规行为，或奇异的规律像普通超市的番茄，这暗示Motyl和他的同事们的研究实践，有可能被用来使弱结果膨胀起来达到所需的外观。

Crucially, however, the study found that more often than not, the indices showed low levels of anomalies, suggesting research practices are more likely to be acceptable than questionable. This was the case for studies from 2003-4, before the crisis was fully acknowledged, and the researchers found an even better picture for more recent (2013-14) papers. The fruits of the research may have been tampered with from time to time, but there was no case that the entire enterprise was “rotten to the core”.

然而，关键的是，研究发现，更多的往往不是，指数表现出低水平的异常，这表明研究的做法更可能是可以接受的比可疑的。这是研究2003-4的情况下，在危机前的充分认识，研究人员发现，在最近的一个更好的图片（2013-2014）论文。这项研究的成果可能不时被篡改，但没有任何案例表明整个企业“烂到了核心”。

This optimistic conclusion conflicts with similar analyses performed in the past, but this might be explained by the different approaches of collecting the data – of gathering the fruit, if you will. Past approaches automatically scraped articles for every instance of a statistic, such as every listed p-value. But this is like a bulldozer ripping out a corner of a garden and measuring everything that looks anything like a tomato, including stones and severed gnome-heads. To take just one example, articles will often list p-values for manipulation checks: confirmations that an experimental condition was set up correctly (did participants agree that the violent kung-fu clip was more violent than the video of grass growing?). But these aren’t tests to determine new scientific knowledge, rather – turning to another analogy – the equivalent of a chemist checking their equipment works before running an experiment. So Motyl’s team took a more nuanced approach, reading through every article and picking out by hand only the relevant statistics.

这种乐观的结论与过去进行的类似分析相冲突，但这可能是通过收集数据的不同方法来解释的–收集果实，如果你愿意的话。过去的方法自动刮携一统计每一个实例，如每一个上市的P值。但这就像一把推土机从花园的一角挖出来，测量一切看起来像西红柿的东西，包括石头和被割断的侏儒头。只举一个例子，文章会经常检查操纵P值列表：确认实验条件设置正确（没有参与者同意暴力功夫夹比草生长？视频更暴力）但是，这些并不是用来确定新的科学知识的试验，而是转向另一种类比–相当于一个化学家在进行实验之前检查他们的设备工作。所以，Motyl的团队采取了一种更微妙的方式，通过阅读每一篇文章和取用手只有相关统计。

However, all is not rosy in the garden. At their Datacolada blog, “state of science” researchers Joseph Simmons, Leif Nelson, and Uri Simonsohn, have already responded to the new analysis and they’re sceptical. Simmons and co first note the daunting scale of the new enterprise: to correctly identify 1800 relevant test statistics from 500 papers. In an online response, Motyl’s team agreed that yes, it was time consuming, and yes, it required a lot of hands: “there are reasons this paper has many authors: It really took a village,” they said.

然而，花园里的一切并不都是玫瑰色的。datacolada在他们的博客中说，“科学的状态”的研究Joseph Simmons，Leif Nelson，和Uri Simonsohn，已经回应了新的分析和他们的怀疑。西蒙斯和他的合作者首先注意到新企业的令人生畏的规模：从500份文献中更正了1800个相关的测试统计。Motyl的团队在网上的一个回应中认为，是的，这很费时间，是的，它需要很多人手：“这篇论文有很多作者的原因：它真的占用了一个村庄，”他们说。

But Datacolada sampled some of the statistics that Motyl’s team used in their assessments and they argue that far too many of them were inappropriate, including data from manipulation checks that Motyl’s group had themselves categorised as statistica non grata. To the Datacolada team, this renders the whole enterprise suspect: “We are in no position to say whether their conclusions are right or wrong. But neither are they.” In their response, Motyl’s team make some concessions, but they argue that some of the statistic selection comes down to difference of opinion, and defend both their overall procedure, and the amount of coding errors they expect their study will contain. So….

但Datacolada采样的一些统计数据，Motyl的团队使用他们的评估，他们认为太多的人是不合适的，包括从操作检查数据，Motyl集团把自己归类为统计不受欢迎的人。到datacolada团队，这使得整个企业的怀疑：“我们没资格说，他们的结论是正确的或错误的。但他们也不是。”在他们的反应，Motyl的团队做出一些让步，但他们认为一些统计量的选择归结为不同的意见，并捍卫他们的整体过程和编码错误他们希望他们的研究将含有量。所以…

So?

是这样吗？

So doing high-quality science isn’t straightforward. Neither is doing high-quality science on the quality of science, nor is gathering everything together to form high-quality conclusions. But if we care about the validity of the more sexy findings in psychology – the amazing powers of power poses to make you physically more confident, how you can hack your happiness simply by changing your face, and how even subtle social signals about age, race or gender can transform how we perform at tasks – we need to care about psychological science itself, how it’s working and how it isn’t. (By the way, those findings I just listed? They’ve all struggled to replicate.)

所以做高质量的科学并不简单。无论是对科学的质量进行高质量的科学，也不是聚集在一起，形成高品质的结论。但是，如果我们关心更性感的结论的有效性在心理–权力的神奇力量的姿势使你的身体更加的自信，你怎么可以破解你的幸福仅仅通过改变你的脸，和如何更微妙的社会信号对年龄、种族或性别，可以改变我们如何执行任务–我们需要照顾心理科学本身，它是如何工作以及如何不是。（顺便说一下，结果我刚上市的？他们都在努力复制。）

There are surely ways to to improve the methods of this new study – perhaps not coincidentally, Datacolada’s Leif Nelson is running a similar project – but even if the new assessment does include some irrelevant statistics, it will likely be an advance on past analyses that included every irrelevant statistic.

有一定的方法来改善这一新的研究–也许不是巧合的方法，datacolada Leif Nelson运行一个类似的项目–但即使新的评估不包括一些不相关的数据，可能会对过去的分析，包括每一个无关紧要的统计提前。

So … the new insights have budged my position on the state of science a little: I’m still worried, but I can see a little more light among the dark. Motyl’s group make the case that social psychology isn’t ruined, that the garden isn’t totally contaminated. I hope so. But it’s not hope on its own that will move our field forward, but research, debate, and making sense of the evidence. After all, psychology is too good to give up on.

所以…新见解已改变我对科学的一个小国家的立场：我还是担心，但是我能看到一点点的光在黑暗中。Motyl的组的情况下，社会心理学不是毁了，那个花园没有完全污染。我也这样想。但它本身并不是希望，这将推动我们的领域向前发展，但研究，辩论和证据的意义。毕竟，心理太好了。

—The State of Social and Personality Science: Rotten to the Core, Not so Bad, Getting Better, or Getting Worse?

沈飞（Eden）

社会心理学中的“危机”真的那么糟糕吗？事情好转了吗？

评论

发表回复取消回复

沈 飞 （Eden）

社会心理学中的“危机”真的那么糟糕吗？事情好转了吗？

评论

发表回复 取消回复

沈飞（Eden）

发表回复取消回复