From Abracadabra to Zombies | View All
A meta-analysis is a type of data analysis in which the results of several studies, none of which need find anything of statistical significance, are lumped together and analyzed as if they were the results of one large study.
For example, the results of ganzfeld experiments have varied wildly, indicating that the results depend greatly on who is doing the experiment. Between 1974 and 1981, some 42 ganzfeld experiments reported or published results. It is unknown how many experiments were done but not reported or published. Charles Honorton claimed that 55% of these reported studies found positive evidence for the existence of something interesting if not paranormal. That is, slightly more than half of the studies produced statistical results that were not likely due to chance. The data could be due to psi, but they could also be due to sensory leakage or some other methodological weakness.
In 1981 or 1982, Honorton sent all the reported studies to skeptic Ray Hyman who proceeded to do a meta-analysis of them. Hyman concluded that the data did not warrant belief in psi, primarily because of many flaws he found in the experiments themselves. He stripped the data down to 22 studies by 8 investigators (746 trials, which accounted for 48% of the data base). He found a hit rate of 38% for these studies, but after adjusting for selection bias and quality of study, he calculated the replication success rate at 31% not 55%.
In Hyman's view, 58% of the studies used inadequate randomization procedures. He also found problems with sensory leakage (e.g., rooms weren’t soundproof, video recordings could be heard by experimenters) and with some of the statistical procedures used. He writes:
As far as I can tell, I was the first person to do a meta-analysis on parapsychological data. I did a meta-analysis of the original ganzfeld experiments as part of my critique of those experiments. My analysis demonstrated that certain flaws, especially quality of randomization, did correlate with outcome. Successful outcomes correlated with inadequate methodology. In his reply to my critique, Charles Honorton did his own meta-analysis of the same data. He too scored for flaws, but he devised scoring schemes different from mine. In his analysis, his quality ratings did not correlate with outcome. This came about because, in part, Honorton found more flaws in unsuccessful experiments than I did. On the other I found more flaws in successful experiments than Honorton did. Presumably, both Honorton and I believed we were rating quality in an objective and unbiased way. Yet, both of us ended up with results that matched our preconceptions. (Hyman 1996)
Fifteen of the studies appeared in refereed journals; 20 were abstracts of papers delivered at meetings of the Parapsychological Association; five were published monographs; and two were undergraduate honors theses in biology. When Honorton did his meta-analysis, he selected 28 of these studies. Carl Sargent did 9 of the studies; Honorton did 5; John Palmer did 4; Scott Rogo did 4; William Braud did 3 and Rex Stanford did 3. Sargent accounted for about 1/3 of the data base.
Honorton, in his meta-analysis of the 28 studies, concluded that instead of a chance result of 25% correct identification by the receivers, the actual result was 34% correct—a result that could not be reasonably explained as a random or chance occurrence, i.e., it was statistically significant.
However, Hyman raises a crucial point about meta-analysis: believers and skeptics rate the studies quite differently, even though both think they are being fair and unbiased. Honorton did agree with Hyman that there were some problems with some of the studies and that no grand conclusions should be drawn until further studies were done, studies that were very tightly designed and controlled.
Hyman did not think that the data could be explained by the file-drawer effect, but he could have been wrong. There is no standard method for determining how many studies would have to be in the file drawer for a meta-effect to be nullified. Different statisticians apply different formulae with significantly different results. The issue of the file drawer could be avoided simply by doing larger single experiments under stringent conditions.
Parapsychologist Dean Radin is very fond of meta-analyses. In his book, The Conscious Universe, he uses the results of meta-analyses to demonstrate the existence of psi. Regarding the ganzfeld studies he claims that Honorton's results were not due to the file-drawer effect. Honorton had done his own file drawer analysis and "concluded that there would have to be 423 unreported studies averaging null results in order to attribute the overall effect found in the 28 experiments in his sample as being due to data selection….about more than fifteen unpublished studies for each study that was published" (Radin 1997). However, another way of analyzing the data indicates that there need only be 62 studies in the drawer, which amounts to only a little over two unpublished studies for each study that was published (Stokes 2001). The fact is that to some extent any statistical formula used to speculate about how many studies would have to get null results before a meta-analysis is nullified is arbitrary. It is worth noting that in 1975 the American Parapsychological Association established an official policy against the selective reporting of only positive results.
Susan Blackmore visited Carl Sargent's lab and had this to say:
These experiments, which looked so beautifully designed in print, were in fact open to fraud or error in several ways, and indeed I detected several errors and failures to follow the protocol while I was there. I concluded that the published papers gave an unfair impression of the experiments and that the results could not be relied upon as evidence for psi. Eventually the experimenters and I all published our different views of the affair (Blackmore 1987; Harley and Matthews 1987; Sargent 1987). The main experimenter left the field altogether.
I would not refer to this depressing incident again but for one fact. The Cambridge data are all there in the Bem and Honorton review but unacknowledged. Out of twenty-eight studies included, nine came from the Cambridge lab, more than any other single laboratory, and they had the second highest effect size after Honorton's own studies. Bem and Honorton do point out that one of the laboratories contributed nine of the studies but they do not say which one. Not a word of doubt is expressed, no references to my investigation are given, and no casual reader could guess there was such controversy over a third of the studies in the database. (“What can the paranormal teach us about Consciousness?” 2001)
Physicist Victor Stenger calls meta-analysis in parapsychology "a dubious procedure ... in which the statistically insignificant results of many experiments are combined as if they were a single, controlled experiment" ("Meta-Analysis and the Filedrawer Effect"). Theoretically, it would be possible to do one hundred experiments with small samples and all with negative outcomes, while a meta-analysis of the same data would produce results that are statistically significant. This should remind us that statistical significance does not mean scientifically important.
books and articles
Blackmore, S.J. 1987. "A report of a visit to Carl Sargent's laboratory." Journal of the Society for Psychical Research 54: 186P198.
Harley, T., and G. Matthews. 1987. "Cheating, psi, and the appliance of science: A reply to Blackmore." Journal of the Society for Psychical Research 54: 199P207.
Sargent, C. 1987. "Sceptical fairytales from Bristol." Journal of the Society for Psychical Research 54: 208P218.
Stokes, Douglas. 2001. “The Shrinking Filedrawer.” Skeptical Inquirer.
Another reason I'm leery of meta-analyses Respectful Insolence