From Abracadabra to Zombies
Book Review
The
Conscious Universe: The Scientific Truth of Psychic Phenomena
by Dean Radin
(HarperOne 1997)
part four
In chapters three (“Replication”) and four
(“Meta-analysis”) Radin compares the performance of baseball
players to the performance of psychics. (Gary Schwartz likes
sports comparisons, too, though he likes to compare the
performance of psychics to basketball players). The sports
analogy is misleading because, while we know what it means for a
baseball player to hit .357 or for a basketball player to hit
50% of his free throws, we don’t know what it means for a
subject, a group, or group of groups to “hit” 34% when
guessing cards where 25% is chance. While chance or luck plays a
role in baseball and basketball, the analogue to
mathematical chance in sports is not as straightforward as Radin
implies. The odds of guessing the suit of
a playing card are clear and calculable, and they have nothing
to do with the past performance of an individual at guessing
cards. The odds of a baseball player getting a hit or of a
basketball player hitting a shot can only be calculated by
knowing the individual player’s statistics.
Radin uses the baseball analogy to introduce some important
concepts in statistics, such as the “95% confidence interval.”
He notes the obvious: Mickey Mantle’s lifetime batting average
is a better measure of his “true skill level” than a single year
where he had only a few at-bats or a year where his performance
fluctuated wildly. Radin thinks it is relevant to compare this
procedure of measuring Mantle’s ‘true skill level” to taking the
data from individual psi studies and lumping them together in a
meta-analysis. The implication is that meta-studies give a
better measurement of “true value” than do the individual
studies. The fact is, meta-analysis gives you larger samples,
which in turn gives you a statistic for the 95% confidence
interval that you do not necessarily get for any of the
individual studies. He might as well have compared 17 different
players’ averages over 17 years and claimed that the overall
average of the group gives you the “true value” of the “average
player,” which would be nonsense.
When one cuts through the baseball analogy what one is left with
is that in psi experiments mathematical chance performance will
be compared to actual performance of individuals and groups of
individuals. However, Radin does not mention that the difference
between chance and actual performance in card guessing, for
example, is of interest only if we assume that what is expected
by theoretical chance occurs in the real world, which is a
questionable assumption. (See my entry on the
psi assumption.)
We can accept one part of his analogy: psi performances vary and
so do the performances of athletes. However, since we have no
idea how psi works, we have no idea how psi varies. If we assume
psi is a human skill like hitting a baseball, then some people
will be better at it than others and practice can improve the
skill up to a point. If, however, psi is a spontaneous
occurrence rather than a skill, all bets are off as far as
conducting replicable controlled experiments go.
Radin admits that psi experiments function under the
assumption—I would say false assumption—that if a performer
guesses correctly “this is taken as evidence of one of two
things: ESP or chance” (p. 37). (He says we can’t tell if a
batter’s hit was due to skill or dumb luck. Yet, most of us,
when we watch a ball sail out of the ball park, would probably
not say it was dumb luck. Of course there is some luck in
whether a smashed ball goes right at a defensive player or
whether the ball sails into the field where it can’t be caught,
but this kind of luck is quite different from the luck in
card-guessing experiments.)
The issue of replication is complicated enough without bringing
in all this superfluous baseball analysis. The bottom line is
that Radin calls it psi if the guesses are not likely due to
chance (dumb luck) but he recognizes that one trial or one study
won’t be sufficient to establish psi. Replication is necessary.
There are some interesting things about replication in science
that Radin notes. Most replications fail. Most successful
studies aren’t replicated. Replications are not rewarded;
original studies are rewarded. Nevertheless, he says, psi
investigators have “conducted thousands of replication studies”
(p. 39).
Radin spends most of chapter three reviewing eight reasons
replication is difficult. One reason psi phenomena may not be
replicable is because there isn’t anything to replicate. Some
skeptics make this claim, he says (p. 41). He disagrees, of
course.
Experiments with people are hard to replicate because the
interaction of experimenter and subject affects the outcome.
There are several psychological factors that have to be taken
into consideration when evaluating studies involving human
beings, including the expectations of the experimenters and
their critics. Psi researchers can miss data right before their
eyes, as can skeptics. People are not static, and results of the
same tests on the same people will often be different because of
changes in the people themselves. Confirmation bias can affect
how experiments are designed, conducted, and interpreted. But
parapsychology suffers from another problem, says Radin: bias in
the media, including the scientific media. He claims
that Newsweek (1995) created a “pure fiction” when it declared
that “other labs, using [Robert] Jahn’s machine [in
psychokinesis experiments], have not obtained his results” (p.
43). This
is not a fiction, pure or impure. Stanley Jeffers, a physicist
at York University, Ontario, repeated the Jahn experiments but
with chance results (Alcock 2003: 135-152).
(See "Physics and Claims for Anomalous Effects Related to
Consciousness" in Alcock et al. 2003.
Abstract.)
He used Jahn’s equipment and his own; neither succeeded in
replicating. (Jahn et al. also failed to replicate the
PEAR
results in experiments done in Germany. See "Mind/Machine
Interaction Consortium: PortREG Replication Experiments,"
Journal of Scientific Exploration, Vol. 14, No. 4, pp. 499–555,
2000. This work was done after publication of The Conscious
Universe, so we can’t fault Radin for not mentioning it.
However, we can fault him for not mentioning either failure to
replicate in his 2006 follow-up book,
Entangled Minds.) Jeffers
himself noted the bias in the scientific media when he wrote: One waggish editor did offer to publish a PEAR paper “if it
could be transmitted telepathically.”*
Radin claims that “Jahn’s research has been replicated by more
than seventy researchers worldwide, both before and after Jahn
produced the main body of his work” (p. 43). How do you
replicate something before the main work is done? Some might
consider Jahn’s work a replication of
Helmut Schmidt’s REG trials, but
it seems absurd to consider Schmidt’s work a replication of Jahn’s work. Yet, that is what Radin does. (Since Radin
considers meta-analysis as identical to replication, this notion
of replicating work in the past makes sense to him, I suppose.)
On another note, I have to agree with Radin that using terms
like ‘pseudoscience’ to describe psi research is non-productive.
Such labels are a hindrance to discussion. On the other hand, I
don’t think that parapsychologists should expect to be treated
like physicists, biologists, or chemists just because they have
scientific intentions. They need to do more than produce a few
interesting statistical studies. As popular as Radin’s book
might be, proclaiming that you have the scientific truth about
psychic phenomena and that it’s been replicated and demonstrated
isn’t sufficient. The data have to speak for themselves.
Because there is no general theory as to how psi functions,
Radin seems to think that parapsychology is data-driven rather
than theory-driven, but nothing could be further from the truth.
The assumption all psi researchers work under is that if their
performers do better than chance, then they are demonstrating
psi (as long as there is no sensory leakage,
cheating, fraud,
statistical error, etc.) But determining whether the data can be explained as due
to chance can be significantly affected by bias, says Radin. However, all his
examples are of researchers or skeptics who interpret data as
due to chance, not psi. He has no examples of anyone who
interprets data as due to psi when chance would be a better
explanation. For example, Radin attributes the difference
in how John Coover interpreted his card-guessing data and the way
Robert Thouless interpreted it
as being due to expectation on the part of
the skeptic. Coover was Stanford University’s first Fellow in
Psychical Research. By 1917, he had done four large studies
(trials of 10,000 or more) and reported that he had found
nothing to support belief in ESP. The main experiment involved
100 pairs of subjects in 100 trials. Roughly half of these were
for telepathy (experimental) and half were for clairvoyance
(control). That is, in half the trials a sender looked at the
card before trying to send a telepathic communication to a
receiver. In the other half, the sender looked at the card after
the receiver made his or her guess. Radin writes that the
receivers’ ability to guess the right cards rated 160 to 1
against chance (1997: 65). In 1939, psychologist Robert Thouless (d. 1984) found that if the data were lumped
together from the main experiment, there were 44 more hits than
expected by chance. Thouless suggested that the data supported
some slight psychic effect. He calculated the odds of this
happening by chance to be about 200 to 1. Coover attributed the
excess hits to recording errors on the part of the experimenter
(Hansel 1989: 26). Also, F. C. S. Schiller
found the data showed odds greater than 50,000 to 1 against
chance, but he used only the data from the fourteen
highest-scoring subjects. Coover replied that he could find all
kinds of interesting anti-chance events if he were selective in
his use of the data (Hansel 1989:
28).
Nobel prize winner
Charles Richet was particularly vocal in his
criticism of Coover’s work. Coover responded by proclaiming that
it can’t be denied that fraud is frequent, general, and well
known in psychical research. The witnessing of psychic phenomena
by astute and eminent men, he said, has had a negative effect on
the studies because it has led them to discount contrary
interpretations of the same phenomena, ignore the lack of
controls during those psychic experiences, and rely on the
corroboratory testimony of others to such an extent that it has
weakened the rigor with which the researcher should be expected
to guard against fraud. Coover noted that in the other sciences
the experimenter controls the conditions; but in testing
psychical powers, the medium controls the conditions.*
Neither Schiller, Richet, nor Thouless, however, attempted to
repeat Coover’s experiment. That would have to wait until J. B.
Rhine set up shop at Duke University in 1927. Radin says that
Coover may have been more pessimistic about his data than others
because of “disapproving pressure from his peers at Stanford”
(p. 65). However, Radin also notes that several studies have
shown that a 1% error rate in recording is typical. Thus,
Coover’s suspicion might well have been justified.
Radin also dismisses the work of J. L. Kennedy (1939) and
compares it to an unnamed believer who, from the same data, got 10 million to one
odds against chance where Kennedy says nothing of interest
happened. Radin also dismisses Ray Hyman’s evaluation of the
ganzfeld data as being tainted by expectation (See
The Elusive Quarry: a Scientific Appraisal of Psychical Research,
Prometheus Books, 1989. See also
Hyman 1995.)
Radin notes that 13 non-significant replications, when combined
into one grand experiment, “produced an overall result that was
statistically significant” (p. 46). Instead of seeing this as
embarrassing, Radin takes it as evidence of the value of
meta-analysis. Who wouldn’t support using a method that can take
13 failures to replicate, add them together, and get one grand
replication of statistical significance‽
Radin also takes
Susan Blackmore to task for not seeing the
significance of her own studies. Blackmore, who has a degree in
parapsychology and pursued research in the field for a number of
years, quit the field because she couldn’t find any evidence for
psi. She also found that one of the labs that was getting the
best results in the ganzfeld studies was not what it claimed to
be.*
Regarding replication, Radin notes that there are significant
problems having to do with statistical factors. He points out
that the odds of replicating “the exact same experiment” with 50
subjects are about 50% (p. 47). This is true no matter how good
the original experiment is. “Experiments involving human beings
never turn out exactly the same way twice….” (p. 47). “Skeptics
who demand extremely high rates of repeatability for psi
experiments simply do not understand the statistics of
replication” (p. 47). Yet, somebody who can add 13 bad tests to
get one good test does understand these statistics?
Flaws in the experimental design is another factor affecting
replicability. Radin correctly notes that not all flaws are
created equal. Some are fatal, but some are not. It is not a
valid criticism to fault an experiment because there is some
possible flaw unknown to the skeptic. No experiment is perfect,
but that doesn’t excuse poor design or controls during an
experiment.
Radin is correct in claiming that it would be “too strong a
requirement for any phenomenon involving human performance” to
require that it work every time. But it should work most of the
time, shouldn’t it? Would you take a drug that worked in trials
one out of six times? Just how many times does a study have to
be replicated? Since most studies aren’t replicated or even
attempted to be replicated, the question seems moot. Hansel said
he’d accept psi if there were 3 good trials at 100 to 1 odds
against chance. Radin says this has been done dozens of times
and “informed skeptics today agree that chance is no longer a
viable explanation for the results obtained in psi experiments”
(p. 50). However, for Radin these replications occur in
meta-analysis, the kind of study that lets you take 13 bad
studies and convert them into one good study. As
R. Barker Bausell says: meta-analysis "elevates
publication bias
to an art form."
end of part four
more book reviews by R. T. Carroll
* AmeriCares *