Beckwith bibliography

Nature's Imperfect Experiment, La Recherche, (France)
July/August 1998

Jon Beckwith, Department of Microbiology and Molecular Genetics, Harvard Medical School
Joseph S. Alper, Department of Chemistry and the Center for the Study of Genetics and Public Policy, University of Massachusetts

They've been referred to as Nature's perfect experiment. They are identical twins - twin siblings who are born with identical sets of genes. Psychologists and geneticists claim that a host of human behavioral characteristics are strongly influenced by genes, basing their conclusions on studies of identical twins. Newspapers follow with headlines reporting that everything from intelligence and homosexuality to religious beliefs and television watching are largely genetically determined.

Most scientists agree that both genetics and environment are important in explaining behavior. But, they disagree about the relative importance of genetics and the environment in explaining the differences we see among people. They also disagree about whether indications of strong genetic contributions to behavior are useful in formulating social policy.

Twin studies have provided one of the most important arguments for a genetic basis of human behavior. Identical twins result from the splitting of a fertilized egg, the zygote, into two separate zygotes. Since they arise from a single zygote, they are referred to as monozygotic (MZ) twins. Like MZ twins, fraternal twins, called dizygotic (DZ) twins, are born at the same time. However, unlike MZ twins, fraternal twins arise from the separate fertilization of two eggs. Since MZ twins come from the same fertilized egg, they are genetically identical. DZ twins, who come from two separate eggs, are no more genetically similar than are ordinary siblings. On average, two DZ cotwins (or two ordinary siblings) have inherited one-half their genetic material in common.

Researchers who study identical twins have developed two approaches to measure genetic influences on behavior. In the first approach, they study twins all of whom have grown up with their biological parents. They compare the correlations in the behaviors of identical twins with correlations found with fraternal twins. If the MZ twins resemble each other much more closely for a particular behavioral characteristic than do the DZ twins, they argue that genetic influences must be important in explaining the relative similarities.

In the second approach, researchers study twins who were separated and placed in different homes at birth or early in their lives. It is assumed that the environments in the two homes are unrelated to each other. If this assumption is true, then strong similarities in the behavior of the separated twins suggests the importance of genetic influences on behavior.

At first sight, the genetically identical make-up of MZ twins appears to be Nature's gift of a perfect experiment to behavioral geneticists. Unfortunately, studies based on twins are not as foolproof as they might seem. In fact, the flaws in these studies are so serious that research on twins provides little insight into the genetic origin of the way we behave.

Twins raised with their biological parents

To explore some of the limitations of the first twin study approach, we consider a 1991 study carried out by Drs. Michael Bailey of Northwestern University and Richard Pillard of Boston University that examined the concordance for male homosexual behavior among pairs of twins and non-twin siblings. We choose this study because it is one of the few that states explicitly some of the methodological problems encountered in twin studies and makes attempts to overcome them1. Bailey and Pillard found that among homosexual males who had an identical twin, that twin was also homosexual in 52% of the pairs. But, DZ twins showed only a 22% concordance for homosexuality. A statistical analysis of these data suggested that the significant difference in the two percentages implies a strong genetic contribution to male homosexual behavior.

As the authors admit, one of their potential problems is "ascertainment bias." Ascertainment bias refers to a situation in which the way subjects are recruited for a study leads to a collection of individuals who are not representative of the general population exhibiting that behavior. Ideally, to avoid bias, individuals in any study should be chosen randomly from the population of interest. In this case, the population of interest is those gay men who have a twin brother. However, there is no list of such men. Bailey and Pillard found their twins through responses to advertisements in gay newspapers and magazines published in Midwestern or southern cities of the United States.

Did this group of gay men really provide a random sampling of homosexual male twins in the population? We cannot say for certain, but it is not unreasonable to suspect that the readers of gay publications who volunteered to participate in the study represented a special subset of homosexual males. Such readers who volunteered may, for instance, be less shy and less concerned than non-participants about public exposure and are, therefore, willing to involve themselves in such a study. Further, the fact that they read such journals in the first place might indicate a greater openness and acceptance of their sexual orientation than other gay men. Their willingness to participate in the study and their degree of comfort with their own gayness may, in turn, reflect an upbringing in a more liberal and accepting environment. If any of these possibilities is in fact true, then the study might have excluded a large number of potential twin participants who would show much lower correlations with their twin siblings for homosexual behavior. The result of this ascertainment bias would be an overestimate of roles "homosexuality" genes play in determining whether or not men manifest homosexual behavior. When subjects are not chosen at random, it takes little imagination to envisage potential ascertainment biases for whatever behavioral trait one wishes to study.

A closer look at Bailey and Pillard's data reveals another major problem that affects many twin studies. Recall that the concordance for MZ twins was 52% and that for DZ twins was 22%. However, the concordance for homosexuality in ordinary brothers was only 9%. But, we know that DZ twins are no more genetically similar than are brothers who were born at different times. Consequently, according to a genetic model, DZ twins and ordinary siblings should exhibit similar concordances. The finding of the strong difference between the results with DZ twins and ordinary brothers suggests a significant influence of environmental factors on the development of homosexuality.

It is easy to identify environmental differences between DZ twins and ordinary brothers that might explain Bailey and Pillard's results. The non-twin brothers, in contrast to the twins, were born at different times and thus may have grown up in a different familial and cultural milieu. Perhaps more importantly, they did not grow up in a "twin environment". Twins exist together in an unusual and often especially close environment, influencing each other in untold ways that differ from ordinary brothers.

Our focus on the comparison between DZ twins and ordinary siblings instead of on that between MZ and DZ twins, leads us to the exact opposite conclusion from that of Bailey and Pillard! That is, the comparison between DZ twins and ordinary brothers suggests that environmental factors are important in explaining the difference in concordances between these two types of brothers. In this light it seems reasonable to go back and ask whether some of the difference in concordances (52% vs. 22%) between the MZ and DZ twins may also be due to environmental influences. Critical to such a question is a fundamental assumption of such twin studies referred to as "the equal environment assumption."2

According to the equal environment assumption, researchers on twins propose that the degree of shared environments for MZ twins is the same as it is for DZ twins. This assumption is based on the fact that each pair of twins, whether MZ or DZ, is raised at the same time in the history of their families and of their society. If this equal environment assumption is valid, then the difference in concordances between MZ and DZ twins would represent a measure of genetic differences.

But is this assumption correct? Are the similarities in environments of identical twins and fraternal twins effectively the same? And, if the environments are not the same, does it make any difference in the behavior of the twins. It seems unlikely that MZ twins and DZ twins experience the same environments. Identical twins are so alike that it is often difficult to distinguish one from the other. As a result, identical cotwins are likely to be treated more similarly by parents than are fraternal cotwins and, thus, will experience a much more similar environment within the home. Even outside the family setting, identical cotwins who look identical may experience more comparable environments than do fraternal cotwins. The response of people to the individuals they encounter depends to a great extent on the physical characteristics of those individuals, for example, their height, weight, or degree of attractiveness. The physical similarities in MZ cotwins resulting from their identical genes may elicit similar responses from other people. These similar responses may, in turn, result in further increasing the behavioral similarities between the MZ cotwins. We thus see that the often strikingly similar behaviors in cotwins can result from both their identical genes and from their exposure to unusually similar environments.

A curious but important semantic debate occurs here where the genes are exerting their influence through the mediation of physical characteristics and environmental responses to those characteristics. The debate is whether to consider such effects as due to the genes or to the environment. In the technical jargon, these effects are called "gene-environment covariances."3 We believe that since the covariances arise as a result of the actions of siblings, parents, teachers, friends, etc., rather than arising from the direct actions of genes, the effects of gene-environment covariances should be treated as an environmental effect. The cultural features of a particular society (or substrata of society) at a particular time in its history may have a strong influence on what responses are "elicited" from a society in response to twins' physical or behavioral characteristics. The set of environments available to the twins depends on society and the people who interact with the twins; it is not produced by the genes of the twins. In some cultures overweight people are the subject of scorn and teasing, in others they are admired. Consequently, these potentially changeable features of the interactions of twins with their family, society and culture are more appropriately treated as environmental effects. The similarities in behavior and personality in MZ twins arising from similarities in the environment are not directly caused by these genes; they are directly caused by the environmental similarities. Clearly, treating gene-environment covariances as an environmental effect decreases the apparent role that genetic differences play in explaining behavioral differences.

In contrast to our proposal, most behavioral geneticists treat gene-environment covariances as genetic effects. They follow the example of University of Minnesota Psychologist Thomas Bouchard who argues :"[Identical] twins tend to elicit, select, seek out, or create very similar effective environments and, to that extent, the impact of these experiences is counted as a genetic influence."4 We leave it to the reader to decide.

What evidence is there that the shared environments of MZ twins contribute to the similarities in behavior? Clearly, as we see from Bouchard's statement about gene-environment covariance effects, he believes that such effects do exist for those environments elicited by the twins. A number of studies that confront this issue have been carried out to test the equal environment assumption.5,6 In principle, comparison of fraternal twins who look most alike or have even been mistaken for identical twins with those fraternal twins whose physical appearance is not very similar might shed light on this question. Under certain assumptions, this mistaken classification of DZ twins by parents and doctors should lead to a familial environment that is the same for MZ twins as it is for these particular DZ twins. The few studies done in this area have obtained mixed results.2

Researchers have also attempted to assess the degree of equal treatment and upbringing that identical twins received and whether that similarity was correlated with their similarity in behavior. These studies rely on the memories of children or parents or both. Clearly these measures are not very reliable and, perhaps as a result, different studies of different behaviors have reached different conclusions.

Several recent papers cast further doubt on the equal environment assumption by suggesting that environmental influences occurring early in life play an important role in the development of behavioral traits in identical twins. Dr. Bernie Devlin and coworkers of the University of Pittsburgh and Carnegie-Mellon University showed that conditions within the womb may have a substantial effect on the concordance of subsequent scores on IQ tests for identical twins.7 Furthermore, Drs. Elisabeth Spitz of the Universit� de Metz and Mich�le Carlier of the C.N.R.S. in Orl�ans have found that identical twins who develop in a single chorionic sac in the womb (monochorionic MZ twins) show different concordances for certain aptitudes when compared to identical twins who develop in separate sacs (dichorionic MZ twins).2 When the researchers corrected the calculations to take into account the high proportion of monochorionic MZ twins, they found that genetic differences could no longer account for the differences in performance on an IQ-type test.

Finally, Daniel O'Loughlin and coworkers of the University of Texas observed that the greater incidence of premature births among twins makes it difficult to extend specific conclusions drawn from the study of twins to conclusions about the role of genes in behavior in the overall population.5 The resultant lower birth weights and other complications associated with prematurity could result in enhanced similarities in twin development compared to ordinary siblings. Because of these developmental problems affecting twins, it may be unwarranted to conclude anything about the role of genes in behavior from genetic studies of identical twins.

In summary, there is not a large body of evidence regarding the equal environment assumption. Until comparatively recently, researchers in behavioral genetics ignored the complex environmental and developmental factors that affect estimates of the importance of genetic differences. As a result, few of the studies that support one or another perspective on the importance of these factors have yet to be replicated. The rather belated recognition of these problems, the paucity of studies, and the contradictions between the studies that have been published have led Spitz and Carlier to conclude, with regard to the equal environment assumption, " est �vident qu'un consensus n'existe pas."2

Twins reared apart

The second class of twin studies requires finding identical twins who, at an early age, were separated from their families and placed in separate homes. These studies are affected by many of the same difficulties that beset the research on twins who have been raised by their biological parents. Ascertainment of subjects for these studies presents a serious problem. Except in those cases where twins are identified through a national registry, such as exists in Denmark, the twins involved do not represent a random sample. For instance, in the very large twin study conducted by Dr. Thomas Bouchard and his coworkers,4 new subjects were attracted by media coverage which emphasized the striking similarities between separated twins. The pairs of twins who volunteered for the study in response to the news coverage may represent a biased population. They may, for example, have a particular interest in 'being twins'. Or they may have responded because of their own striking similarities. In both cases, these twins might resemble each other more closely than would a more randomly ascertained subject group.

As in the case of twins raised together, the physical similarities between identical twins raised apart could be important in invoking similar societal responses. Maternal effects may be important as well. Not surprisingly, the study by Devlin et al. reported such effects in both separated twins and twins raised together.7

In addition to these problems common to both types of research, studies of MZ twins raised apart present their own problems. Critical to this work is the assumption that the separated twins have been raised in significantly different environments. If the environments created by the different homes were similar, it would be difficult to know whether a high concordance for a trait was due to the similar environments or to identical genes of the MZ twins.

This assumption that the environments of separated twins are dissimilar is not likely to be correct. In many cases, either the parents, relatives, or the adoption agencies attempt to place children in environments as similar to that of the original home as possible. The twins may be placed in the homes of relatives. They may live in the same town and even attend the same school. In seeking similar environments, adoption agencies often consider such factors as socio-economic status, religion, and cultural interests. We do not know how much these similarities in environment contribute to similarities in, for example, IQ test scores.

For many decades, researchers studying separated twins did not consider very seriously the possibility that similarities in the separated environments might affect their conclusions. However, in 1974, psychologist Dr. Leon Kamin of Northeastern University examined in detail the data from the major twin studies on the genetic influences on IQ. He found that, more often than not, separated twins did in fact grow up in very similar environments.8 As a result of Kamin's work, "...behavior geneticists had to sharpen their arguments, design new, more careful studies, [and] obtain fresh evidence," according to researcher Dr. Neil MacKintosh of the University of Cambridge.9

As a response to Kamin's critique, some of the more recent studies of the genetics of behavior have included methods for evaluating the similarities of environments into which adopted twins are placed. These studies use survey instruments that attempt to quantify available cultural and intellectual household resources.4,10 For example, a count is obtained of the number of books in an adoptive family's home. It is easy to laugh at such a simplistic measure. But the problem is enormously difficult. Is it even possible to develop quantitative measures of the intellectual impact of a particular family environment which has many intangible components? We do not know the answer to this question and do not envy behavioral geneticists in their task.

The detailed critical analysis of the environments of separated twins carried out by Kamin in the 1970's has not been repeated on the more recent larger twin studies. The data required for such an analysis is usually not available to other researchers perhaps because of reasonable concerns about the privacy of the participants in the studies. Whatever the reason, this lack of access to the data makes it difficult for us to evaluate the conclusions of such studies.

The fundamental difficulty with twin studies

Rather than providing "Nature's perfect experiment", the genetic identity of identical twins creates more problems for researchers than it solves. We have already noted that the environments experienced by identical twins may be much more similar than those experienced by DZ twins and ordinary siblings and that MZ twins face a greater risk of premature birth than do people in general. In addition, the genetic identity of identical twins may cause problems in the statistical analysis of the genetics of behavior.

Let us suppose that a behavioral trait is influenced by the action of several genes. If these genes operate independently of each other, the effects of the genes may be additive and standard statistical methods can be used for calculating the genetic basis of the differences in similarity between MZ cotwins and DZ cotwins.

However, it is now suspected that many behavioral traits depend on the interactions among many genes (as well as interactions with the environment). That is, the genes may interact synergistically, together having a far greater effect than would be expected from adding up the influences of individual genes. For example, suppose that there are 10 genes that contribute in synergistic fashion to a particular behavior. If one MZ cotwin has all 10 of the genes, so will his or her cotwin. However, if one DZ cotwin has all 10 genes, the chances that his or her cotwin will have the same combination is extremely small, less than one in a thousand. Even if the cotwin had 9 of those genes, without the synergistic effects of the 10th gene, the difference in this behavior between the two DZ twins might be very dramatic indeed. This means that any similarities in this behavior within a family would be much less likely to be due to genetics. Thus, a study of this behavior that focused on identical twins could drastically overestimate the importance of genetic differences in explaining the behavioral differences seen in the general population.


In order to quantify their results, behavior geneticists use data from twin studies to calculate what is called the heritability of a trait.3 There has been an enormous misunderstanding and misuse of this term, which is in part a consequence of the closeness in sound and origin of the words heritability and inherit. The term "inherit" in biology clearly refers to the inheritance of genes and thus has a deterministic characteristic to it- we obviously do inherit our genes from our parents. But the "heritability" of a trait is not, as is commonly thought, the percentage of the trait that is inherited. The New York Times committed this error in the headline of their article reporting the results of the Minnesota Twin Study of Bouchard and coworkers: "Major Personality Study Finds That Traits Are Mostly Inherited."11 Nor does heritability measure the degree to which a trait like height or IQ is genetically determined in an individual. Rather, heritability is defined as the proportion of the variation of that trait within a population living in a specific range of environments that is due to genetic differences among the people in that population.

This rather complex definition describes a term that is used by agricultural scientists in selective breeding experiments to predict and improve valuable traits in plants and animals raised or grown under precisely controlled conditions. Since people are not selectively bred nor raised in controlled conditions, it is not really possible to obtain accurate estimates of heritabilities for human behavioral traits. In the words of Spitz and Carlier, this is because of "l'extr�me difficult� d'interpr�tation de l'h�ritabilit� d�s lors que celle-ci est estim�e dans notre esp�ce, c'est-�-dire dans une situation o� on ne contr�le ni les g�notypes ni les environnements."2

It is a consequence of the definition of heritability that even if one could control the environment more than we can with humans, an estimate of the heritability of a trait would only be an estimate true for a particular range of environments and would not tell us anything about how much or in what direction that trait would change in a different environments. Furthermore, we are brought back to the question of whether we can even define environment. Defining a home or cultural environment is not as simple as quantifying an agricultural environment in terms of the amount of fertilizer added to soil, the depth of soil, or the amount of irrigation.

Let us for now ignore these problems and assume that heritabilities can be estimated for various traits in people. As a result of all the technical difficulties that we have mentioned, the heritability estimates appearing in the literature are probably all overestimates. For example, Devlin and his colleagues have estimated that including the effects of maternal environment reduces the estimate of the heritability of IQ from 60% to 48%. In addition, they note that their analyses do not "preclude other, unmodelled factors, such as cultural inheritance and interaction between genes and the environment, from having important effects on IQ."7 The recent findings from studies on dichorionic vs. monochorionic identical twins also appear to require a reduction in heritability estimates for certain traits. If the criticisms of the equal environment assumption or the concerns about placement of separated twins have validity, heritability estimates would have to be further reduced. Finally, as we have pointed out, researchers have often incorporated the contribution of gene-environment covariances into the genetic component of heritability estimates.

But, continuing to ignore the problems in estimating heritabilities, we now ask: what do these (reduced) estimates of heritability mean? What utility do these values have? Some researchers and some of the media believe that high heritabilities of such traits as IQ have social and political implications. The late psychologist Richard Herrnstein of Harvard University and political scientist Charles Murray of the University of Colorado, in their recent book "The Bell Curve"12 argue that the average intelligence in the United States is decreasing because lower class people with lower IQ's are having too many children. They believe that women from the upper classes should be encouraged by new social programs to bear more children. Since, in their opinion, IQ is genetic and therefore unchangeable, they call for an end to what they regard as futile welfare and remedial education programs. In the legal arena, debates over genetic contributions to criminal behavior have caused lawyers and judges in the United States to wonder whether individuals who commit crimes have free will and are responsible for their actions.13

Behind such arguments is the assumption that a behavior with high heritability is fixed and unchangeable. But genetic does not mean fated. Even for those traits such as a disease caused by the malfunctioning of a single gene, an appropriate modification of the environment may be able to completely reverse the effect of the mutant gene. Phenylketonuria is a disease caused by a single gene mutation that results in the accumulation of a compound devastating to brain function. However, if detected by genetic screening techniques in babies, the potential severe mental retardation can be completely prevented by placing the children on a modified diet. Thus a trait (mental retardation), that under one condition manifests itself fully in the individuals who carry the altered gene, disappears with a relatively simple change in the environment.

Behaviors like intelligence are much more complex than single gene diseases. Today, most behavioral geneticists believe that intelligence is the result of the action of many genes and the environment. Alfred Binet, who developed the original IQ test, knew that intelligence scores can be changed by modification of a child's educational environment. And with regard to the even more complex questions concerning genes and free will, we point out that the term environment includes the environment that an individual makes for him- or herself. Thus, questions of free will cannot be resolved by relying on genetic arguments.

Heritability estimates based on studies of twins have generated enormous confusion. Psychologist Scott Stoltenberg of the University of Michigan has suggested that this is partly due to the fact that the term heritability has both "technical and folk meanings."14 He goes on to suggest that where confusion is inevitable, it is the responsibility of the specialist to abandon "the term in favor of one without a widely understood folk meaning."

We believe that the term heritability is so entrenched that it will be impossible to change it. However, it is not impossible for scientists and journalists to explain carefully, in each article they write for the nonspecialist, what heritability means and what it does not mean. This can be done very briefly, as we do to conclude this article.

Heritability is a property of a population. It tells us how much of the variation in a population is due to genetic variation. It is often estimated by means of twin studies. But twins are special; what may be true for twins may not be true of the general population. Because heritability is a statistic, it gives no information about the role of genes or the environment in an individual. And its information about the population applies only for the existing range of environments. It tells us nothing about what would happen in a new environment. Moreover, since for such traits as IQ the heritability is estimated to be no more than 50%, there are almost certainly existing environments, that will increase the IQ scores of many people. We may or may not feel as though we are free to change various aspects of our personality and behavior. But it is not heritability that stands in our way.


1 Bailey M & Pillard R. (1991) A genetic study of male sexual orientation. Archives of General Psychiatry 48:1089-1096.

2 Spitz E & Carlier M. (1996) La M�thode des jumeaux de 1875 � nos jours. Psychiatrie de l'enfant 39:137-159.

3 Falconer D S (1981) Introduction to Quantitative Genetics. Longman, London.

4 Bouchard T J Jr, Lykken D T, McGue M, Segal N L & Tellegen A. (1990) Sources of human psychological differences: the Minnesota study of twins reared apart. Science 250:223-228.

5 Ainslie R C, Olmstead K M & O'Loughlin D D. (1987) The early development context of twinship:some limitations of the equal environments hypothesis. American Journal of Orthopsychiatry 57:120-124.

6 Morris-Yates A, Andrews G, Howie P & Henderson S. (1990) Twins: a test of the equal environments assumption. Acta Psychiatrica Scandinavica 81:322-326.

7 Devlin B, Daniels M & Roeder K. (1997) The heritability of IQ. Nature 388:468-471.

8 Kamin L (1974) The Science and Politics of I.Q. Potomac:Earlbaum Associates,

9 MacKintosh, N J (1995) Cyril Burt: Fraud or Framed? Oxford University Press, Oxford.

10 Rowe D C (1994) The Limits of Family Influence: Genes, Experience and Behavior. The Guilford Press, New York.

11 Goleman D. (1986) Major personality study finds that traits are mostly inherited. The New York Times Dec. 1:C1.

12 Herrnstein R J & Murray C (1994) The Bell Curve. Free Press, New York.

13 Blakeslee S. (1996) Genetic questions are sending judges back to classroom. The New York Times July 9:C1,C9.

14 Stoltenberg S F. (1997) Coming to terms with heritability. Genetica 99:89-96.