A Collection of Strange Beliefs, Amusing Deceptions, and Dangerous Delusions

From Abracadabra to Zombies


Book Review

 

The Conscious Universe: The Scientific Truth of Psychic Phenomena

by Dean Radin
(HarperOne 1997)


 

 

 

part four


In chapters three (“Replication”) and four (“Meta-analysis”) Radin compares the performance of baseball players to the performance of psychics. (Gary Schwartz likes sports comparisons, too, though he likes to compare the performance of psychics to basketball players). The sports analogy is misleading because, while we know what it means for a baseball player to hit .357 or for a basketball player to hit 50% of his free throws, we don’t know what it means for a subject, a group, or group of groups to “hit” 34% when guessing cards where 25% is chance. While chance or luck plays a role in baseball and basketball, the analogue to mathematical chance in sports is not as straightforward as Radin implies. The odds of guessing the suit of a playing card are clear and calculable, and they have nothing to do with the past performance of an individual at guessing cards. The odds of a baseball player getting a hit or of a basketball player hitting a shot can only be calculated by knowing the individual player’s statistics.

Radin uses the baseball analogy to introduce some important concepts in statistics, such as the “95% confidence interval.” He notes the obvious: Mickey Mantle’s lifetime batting average is a better measure of his “true skill level” than a single year where he had only a few at-bats or a year where his performance fluctuated wildly. Radin thinks it is relevant to compare this procedure of measuring Mantle’s ‘true skill level” to taking the data from individual psi studies and lumping them together in a meta-analysis. The implication is that meta-studies give a better measurement of “true value” than do the individual studies. The fact is, meta-analysis gives you larger samples, which in turn gives you a statistic for the 95% confidence interval that you do not necessarily get for any of the individual studies. He might as well have compared 17 different players’ averages over 17 years and claimed that the overall average of the group gives you the “true value” of the “average player,” which would be nonsense.

When one cuts through the baseball analogy what one is left with is that in psi experiments mathematical chance performance will be compared to actual performance of individuals and groups of individuals. However, Radin does not mention that the difference between chance and actual performance in card guessing, for example, is of interest only if we assume that what is expected by theoretical chance occurs in the real world, which is a questionable assumption. (See my entry on the psi assumption.)

We can accept one part of his analogy: psi performances vary and so do the performances of athletes. However, since we have no idea how psi works, we have no idea how psi varies. If we assume psi is a human skill like hitting a baseball, then some people will be better at it than others and practice can improve the skill up to a point. If, however, psi is a spontaneous occurrence rather than a skill, all bets are off as far as conducting replicable controlled experiments go.

Radin admits that psi experiments function under the assumption—I would say false assumption—that if a performer guesses correctly “this is taken as evidence of one of two things: ESP or chance” (p. 37). (He says we can’t tell if a batter’s hit was due to skill or dumb luck. Yet, most of us, when we watch a ball sail out of the ball park, would probably not say it was dumb luck. Of course there is some luck in whether a smashed ball goes right at a defensive player or whether the ball sails into the field where it can’t be caught, but this kind of luck is quite different from the luck in card-guessing experiments.)

The issue of replication is complicated enough without bringing in all this superfluous baseball analysis. The bottom line is that Radin calls it psi if the guesses are not likely due to chance (dumb luck) but he recognizes that one trial or one study won’t be sufficient to establish psi. Replication is necessary.

There are some interesting things about replication in science that Radin notes. Most replications fail. Most successful studies aren’t replicated. Replications are not rewarded; original studies are rewarded. Nevertheless, he says, psi investigators have “conducted thousands of replication studies” (p. 39).

Radin spends most of chapter three reviewing eight reasons replication is difficult. One reason psi phenomena may not be replicable is because there isn’t anything to replicate. Some skeptics make this claim, he says (p. 41). He disagrees, of course.

Experiments with people are hard to replicate because the interaction of experimenter and subject affects the outcome. There are several psychological factors that have to be taken into consideration when evaluating studies involving human beings, including the expectations of the experimenters and their critics. Psi researchers can miss data right before their eyes, as can skeptics. People are not static, and results of the same tests on the same people will often be different because of changes in the people themselves. Confirmation bias can affect how experiments are designed, conducted, and interpreted. But parapsychology suffers from another problem, says Radin: bias in the media, including the scientific media. He claims  that Newsweek (1995) created a “pure fiction” when it declared that “other labs, using [Robert] Jahn’s machine [in psychokinesis experiments], have not obtained his results” (p. 43). This is not a fiction, pure or impure. Stanley Jeffers, a physicist at York University, Ontario, repeated the Jahn experiments but with chance results (Alcock 2003: 135-152). (See "Physics and Claims for Anomalous Effects Related to Consciousness" in Alcock et al. 2003.
Abstract.) He used Jahn’s equipment and his own; neither succeeded in replicating. (Jahn et al. also failed to replicate the PEAR results in experiments done in Germany. See "Mind/Machine Interaction Consortium: PortREG Replication Experiments," Journal of Scientific Exploration, Vol. 14, No. 4, pp. 499–555, 2000. This work was done after publication of The Conscious Universe, so we can’t fault Radin for not mentioning it. However, we can fault him for not mentioning either failure to replicate in his 2006 follow-up book, Entangled Minds.) Jeffers himself noted the bias in the scientific media when he wrote: One waggish editor did offer to publish a PEAR paper “if it could be transmitted telepathically.”*

Radin claims that “Jahn’s research has been replicated by more than seventy researchers worldwide, both before and after Jahn produced the main body of his work” (p. 43). How do you replicate something before the main work is done? Some might consider Jahn’s work a replication of Helmut Schmidt’s REG trials, but it seems absurd to consider Schmidt’s work a replication of Jahn’s work. Yet, that is what Radin does. (Since Radin considers meta-analysis as identical to replication, this notion of replicating work in the past makes sense to him, I suppose.)

On another note, I have to agree with Radin that using terms like ‘pseudoscience’ to describe psi research is non-productive. Such labels are a hindrance to discussion. On the other hand, I don’t think that parapsychologists should expect to be treated like physicists, biologists, or chemists just because they have scientific intentions. They need to do more than produce a few interesting statistical studies. As popular as Radin’s book might be, proclaiming that you have the scientific truth about psychic phenomena and that it’s been replicated and demonstrated isn’t sufficient. The data have to speak for themselves.

Because there is no general theory as to how psi functions, Radin seems to think that parapsychology is data-driven rather than theory-driven, but nothing could be further from the truth. The assumption all psi researchers work under is that if their performers do better than chance, then they are demonstrating psi (as long as there is no sensory leakage, cheating, fraud, statistical error, etc.) But determining whether the data can be explained as due to chance can be significantly affected by bias, says Radin. However, all his examples are of researchers or skeptics who interpret data as due to chance, not psi. He has no examples of anyone who interprets data as due to psi when chance would be a better explanation. For example, Radin attributes the difference in how John Coover interpreted his card-guessing data and the way Robert Thouless interpreted it as being due to expectation on the part of the skeptic. Coover was Stanford University’s first Fellow in Psychical Research. By 1917, he had done four large studies (trials of 10,000 or more) and reported that he had found nothing to support belief in ESP. The main experiment involved 100 pairs of subjects in 100 trials. Roughly half of these were for telepathy (experimental) and half were for clairvoyance (control). That is, in half the trials a sender looked at the card before trying to send a telepathic communication to a receiver. In the other half, the sender looked at the card after the receiver made his or her guess. Radin writes that the receivers’ ability to guess the right cards rated 160 to 1 against chance (1997: 65). In 1939, psychologist Robert Thouless (d. 1984) found that if the data were lumped together from the main experiment, there were 44 more hits than expected by chance. Thouless suggested that the data supported some slight psychic effect. He calculated the odds of this happening by chance to be about 200 to 1. Coover attributed the excess hits to recording errors on the part of the experimenter (Hansel 1989: 26). Also, F. C. S. Schiller found the data showed odds greater than 50,000 to 1 against chance, but he used only the data from the fourteen highest-scoring subjects. Coover replied that he could find all kinds of interesting anti-chance events if he were selective in his use of the data (Hansel 1989: 28).

Nobel prize winner Charles Richet was particularly vocal in his criticism of Coover’s work. Coover responded by proclaiming that it can’t be denied that fraud is frequent, general, and well known in psychical research. The witnessing of psychic phenomena by astute and eminent men, he said, has had a negative effect on the studies because it has led them to discount contrary interpretations of the same phenomena, ignore the lack of controls during those psychic experiences, and rely on the corroboratory testimony of others to such an extent that it has weakened the rigor with which the researcher should be expected to guard against fraud. Coover noted that in the other sciences the experimenter controls the conditions; but in testing psychical powers, the medium controls the conditions.*

Neither Schiller, Richet, nor Thouless, however, attempted to repeat Coover’s experiment. That would have to wait until J. B. Rhine set up shop at Duke University in 1927. Radin says that Coover may have been more pessimistic about his data than others because of “disapproving pressure from his peers at Stanford” (p. 65). However, Radin also notes that several studies have shown that a 1% error rate in recording is typical. Thus, Coover’s suspicion might well have been justified.

Radin also dismisses the work of J. L. Kennedy (1939) and compares it to an unnamed believer who, from the same data, got 10 million to one odds against chance where Kennedy says nothing of interest happened. Radin also dismisses Ray Hyman’s evaluation of the ganzfeld data as being tainted by expectation (See The Elusive Quarry: a Scientific Appraisal of Psychical Research, Prometheus Books, 1989. See also Hyman 1995.) Radin notes that 13 non-significant replications, when combined into one grand experiment, “produced an overall result that was statistically significant” (p. 46). Instead of seeing this as embarrassing, Radin takes it as evidence of the value of meta-analysis. Who wouldn’t support using a method that can take 13 failures to replicate, add them together, and get one grand replication of statistical significance


Radin also takes Susan Blackmore to task for not seeing the significance of her own studies. Blackmore, who has a degree in parapsychology and pursued research in the field for a number of years, quit the field because she couldn’t find any evidence for psi. She also found that one of the labs that was getting the best results in the ganzfeld studies was not what it claimed to be.*

Regarding replication, Radin notes that there are significant problems having to do with statistical factors. He points out that the odds of replicating “the exact same experiment” with 50 subjects are about 50% (p. 47). This is true no matter how good the original experiment is. “Experiments involving human beings never turn out exactly the same way twice….” (p. 47). “Skeptics who demand extremely high rates of repeatability for psi experiments simply do not understand the statistics of replication” (p. 47). Yet, somebody who can add 13 bad tests to get one good test does understand these statistics?

Flaws in the experimental design is another factor affecting replicability. Radin correctly notes that not all flaws are created equal. Some are fatal, but some are not. It is not a valid criticism to fault an experiment because there is some possible flaw unknown to the skeptic. No experiment is perfect, but that doesn’t excuse poor design or controls during an experiment.

Radin is correct in claiming that it would be “too strong a requirement for any phenomenon involving human performance” to require that it work every time. But it should work most of the time, shouldn’t it? Would you take a drug that worked in trials one out of six times? Just how many times does a study have to be replicated? Since most studies aren’t replicated or even attempted to be replicated, the question seems moot. Hansel said he’d accept psi if there were 3 good trials at 100 to 1 odds against chance. Radin says this has been done dozens of times and “informed skeptics today agree that chance is no longer a viable explanation for the results obtained in psi experiments” (p. 50). However, for Radin these replications occur in meta-analysis, the kind of study that lets you take 13 bad studies and convert them into one good study. As R. Barker Bausell says: meta-analysis "elevates publication bias to an art form."

end of part four 

Part 1 | 2 | 3 | 4 | 5

more book reviews by R. T. Carroll

* AmeriCares *

The Skeptic's Shop

Other Languages

Print versions available in Estonian , Russian , Japanese , Korean , and (soon) Spanish .

 

 
This page was designed by Cristian Popa.