Chapter 1. Introduction

Table of Contents
1.1. Microsatellites
1.2. Evolution and population genetics of pines
1.3. Goals of this work

Molecular markers have become essential tools for conservation biology, evolutionary and population studies as well as for mapping projects (Queller et al. 1993, Jarne & Lagoda 1996). The ideal class of genetic marker would have many scorable and highly variable loci with codominant alleles, and markers should be densely distributed throughout the genome. Microsatellite markers meet these requirements, and they have become a marker of choice for mapping, forensic investigations, and population analyses as well as in ecological studies. In addition to these applications, the excellent properties of microsatellites allow even more innovative approaches in future projects. This means, however, that we have to know how these tools work. Knowledge of such molecular level properties as the mode of genetic transmission, mutation rate, and the nature of the mutational process itself are critical to proper interpretation of microsatellite markers in a population context.

1.1. Microsatellites

1.1.1. General characters of microsatellites

Microsatellites are sequences made up of a single sequence motif (1-6 bp) which is repeated many times side-by-side. Historically, the term microsatellite has been used to describe only repeats of the dinucleotide motif CA/GT (Litt & Luty 1989, Weber & May 1989). The term microsatellite has now become the most common term to describe these tandem repeats of short motifs. They are also called simple sequences (Tautz 1989) and short tandem repeats (STRs) (Edwards et al. 1991). If these repeats are long enough and uninterrupted, they are excellent genetic markers due to their high level of polymorphism (Powell et al. 1996). Estimates of microsatellite mutation rates in Escherichia coli in in vivo systems are about 10-2 events per locus per replication (Levinson & Gutman 1987b), in yeast 10-4-10-5 (Henderson & Petes 1992, Strand et al. 1993), and in Drosophila about 6x10-6 (Schug et al. 1997). Pedigree analysis in humans gave an estimate of 10-3 events per locus per generation (Weber & Wong 1993).

Microsatellites have been found in every organism studied so far. In the human genome poly(A)/poly(T) stretches are the most common repeat types (Stallings 1992). However, the poly(A)/poly(T) type is not suitable as genetic markers because of instability during PCR reactions. The study of Beckmann and Weber (1992) showed that the most common dinucleotide repeat type in the human genome is CA/GT. Other mammalian genomes seem to have similar repeat compositions as the human genome. In plants GA/CT and AT repeats are the most common (Stallings 1992, Lagercrantz et al. 1993). In conifers the most common repeat type varies among species, but GA/CT and CA/GT seem to be the most common ones (Lagercrantz et al. 1993, Smith & Devey 1994, Pfeiffer et al. 1997, Scotti et al. 2000).

Microsatellites are generally assumed to be evenly distributed over genomes (e.g. Dietrich et al. 1996) but rare within coding regions (Hancock 1995). There are, however, some human diseases caused by expansions of polymorphic trinucleotide repeats in genes such as fragile X and myotonic dystrophy, (e.g. Fu et al. 1991, Aslandis et al. 1992, Rubinsztein 1999). In conifers, it is known that nuclear microsatellite repeats are often embedded within repetitive DNA sequences (Smith & Devey 1994, Pfeiffer et al. 1997). The sequencing of the whole chloroplast genome of Pinus thunbergii (Wakasugi et.al. 1994) has allowed development of very useful and universal chloroplast microsatellite markers for conifers (e.g. Powell et al. 1995, Vendramin et al. 1996, Echt et al. 1998, Vendramin et al. 2000). In addition, microsatellites have been found also in the mitochondrial genome of conifers (Soranzo et al. 1999, Sperisen et al. 2001).

1.1.2. Microsatellite evolution

There are two potential mechanisms which can explain the high mutation rates of microsatellites. The first is recombination between DNA molecules by unequal crossing-over or by gene conversion (Smith 1976, Jeffreys et al. 1994). The second mechanism involves slipped-strand mispairing during DNA replication (Levinson & Gutman 1987a). Studies using yeast and E. coli as model organisms have shown that replication slippage seem to be the main mechanism generating length mutations in microsatellites (Levinson & Gutman 1987b, Henderson & Petes 1992). In replication slippage the nascent DNA strand dissociates from the template strand during the replication of the repeat area and the nascent strand can reanneal out-of-phase with the template strand. When replication is continued, the eventual nascent strand will be longer or shorter than the template, depending on whether the looped-out bases have occurred in the template strand or the nascent strand. Microsatellites will then lose or gain a single or a few repeats (Fig. 1). These kinds of small length changes are the most common mutational types in microsatellite loci, and have been detected in E. coli and yeast (Levinson & Gutman 1987b, Sia et al. 1997, Wierdl et al. 1997) as well as in human (Weber & Wong 1993). If recombinations were the major mechanism, mutations would be expected to give rise to a wider range of novel mutants.

Figure 1. Model of mutation process at microsatellite loci. a) slippage of the DNA polymerase during replication, b) misalignment of the template or the newly replicated strand and c) continuation of replication. DNA strands are represented by lines, repeat units by small boxes, and direction of replication by small arrows.

The length of the microsatellite repeats may have an effect on the mutation rate such that longer repeats are more polymorphic than shorter ones (Weber 1990, Chakraborty et al. 1997, Sia et al. 1997, Primmer et al. 1998, Ellegren 2000b). This is probably because the opportunity for a stable misaligned configuration is greater for longer repeat arrays. The second parameter that influences microsatellite stability is the purity of the repeat. Interrupted microsatellite repeats (due to insertion or bases or base substitution) seem to have lower mutation rates than perfect repeats. This might be due to greater difficulty of forming slipped intermediates in the presence of sequence interruptions (e.g. Kunst et al. 1997, Petes et al. 1997). In addition to these parameters it has been noticed that the sex of the mutating individual has an influence on the mutation process. In barn swallow the mutation rate was almost twice as high in males as in females (Primmer et al. 1997). In humans, an excess of paternally transmitted mutations supported a male-biased mutation rate (Ellegren 2000a).

Many studies have shown that microsatellite loci involve more gains than losses of repeat units (Weber & Wong 1993, Talbot et al. 1995). Amos et al. (1996) and Primmer et al. (1996a) observed in their germline studies that significantly more gains than losses of repeat units occurred in humans and barn swallows, respectively. However, it is unclear whether this asymmetry in the distribution of mutations occurs only in hypervariable and long microsatellites, or in all types of microsatellites. The molecular mechanism resulting in this kind of upwardly biased mutation is still unclear.

Most microsatellite arrays are shorter than a few tens of repeat units, although a few large repeat arrays have also been found e.g. in humans (Wilkie & Higgs 1992) and in barn swallow (Primmer et al. 1996a). This strongly suggests that there must be size constraints restricting the expansion of repeat arrays. However, there is no direct evidence for selective constraints acting on allele length at microsatellite loci, although several mechanisms have been suggested. For instance, Primmer et al. (1996a) and Ellegren (2000a) suggested that repeat losses might be more common or involve larger deletions among long alleles than shorter alleles. Because large alleles are strongly counter-selected at loci associated with genetic diseases, Samadi et al. (1998) suggested that selection might act as an upper truncating mechanism, imposing a ceiling on alleles with large repeat counts. Taylor et al (1999a) suggested that interruptions were associated with repeat shortening, and thus restricted the expansions of microsatellite alleles.

The analysis of germline mutations of the parental genotypes of human families suggested that mutations are more common in heterozygous individuals which have great allele repeat number differences (Amos et al. 1996). However, Ellegren (2000a) showed that in human the size difference between an individual’s two alleles has no effect on the mutation rate. In addition, if the theory of Amos et al. (1996) were true, the mutation rate would be correlated with heterozygosity, and loci in larger populations would evolve faster than those in smaller ones. This phenomenon has not been observed at the population level, although Rubinsztein et al. (1995) noticed that human microsatellites were longer than their homologues in chimpanzee. However, it has been argued that microsatellites tend to be longer and thus more polymorphic in the species they were cloned from, due to the selection during the cloning procedure (Ellegren et al. 1995, 1997).

In summary, mutational process of microsatellites seems to be very complex process. It is very likely that these processes are heterogeneous with differences between loci and alleles (see Ellegren 2000b).

1.1.3. Theoretical models of microsatellite mutations

To estimate population differentiation measures and genetic distances from microsatellite data, theoretical mutation models for the evolutionary processes of microsatellites are needed. Two theoretical models have been considered for microsatellites (Deka et al. 1991). In the infinite allele model (IAM, Kimura & Crow 1964) mutation can involve any number of tandem repeats and always results in a new allele state not previously existed in population. However, as discussed above, the slipped-strand mispairing is currently accepted as the main mechanism for microsatellite length variation. This mechanism mostly causes small changes in repeat numbers such that alleles of similar lengths should be more closely related to each other than alleles of very different sizes. Alleles may also mutate towards allele states that are already present in the population. The stepwise mutation model (SMM) (Kimura & Ohta 1978) developed for allozymes provides better description for these kinds of evolutionary processes. In addition to this model, Di Rienzo et al. (1994) described the two phase model (TPM), where a limited proportion of mutations involve several repeats. Although rarely cited in microsatellite literature, a K-allele model (KAM) could also be considered for microsatellites. Under this model, there are K possible allelic states, and any allele has a constant probability of mutating towards any of the other K–1 allelic states (Crow & Kimura 1970). Due to size constraints acting on microsatellite loci, the KAM seems to be more realistic than the IAM.

Different kinds of repeat number variance estimators based on the stepwise mutation model (SMM) have been developed for estimating phylogenetic relationships, genetic distances and population differentiation [(δµ)2, Goldstein 1995a,b; DSW, Shriver 1995; RST, Slatkin 1995]; from microsatellite data. These estimators are based on the following assumptions: (i) mutation results in a change of one repeat unit, (ii) the mutation rate is constant and independent of repeat length, (iii) there is no asymmetry in the distribution of mutations (iv) and there are no allele size constraints. Significant discrepancies between known divergence times and microsatellite genetic distances (Deka et al. 1994, Garza et al. 1995, Valsecchi et al. 1997) imply that one or more of these basic assumptions may be wrong or may not hold at least for all microsatellite loci (see Ellegren 2000). The factors relevant to the evolution of microsatellites have been incorporated into mutation models, such as allele size constraints (Garza et al. 1995, Nauta & Weissing 1996, Feldman et al. 1997), the possibility of multistep mutations (Di Rienzo et al. 1994) and directionally biased changes in allele size (Kimmel & Chakraborty 1996). However, the dependence of the mutation rate on the repeat numbers and on the purity of the repeat sequence has not been taken into account in these models.

1.1.4. Testing models of microsatellite evolution

Theoretical mutation models like SMM and TPM may provide adequate measures if populations are relatively closely related, but these simple models become inadequate when divergence between populations and especially between species increases (Takezaki & Nei 1996). The main questions of interest when studying microsatellite evolution are whether replication slippage is the only mechanism contributing to size differences between alleles, whether the mutation distribution is symmetric, and whether there are allele size constraints.

There are several ways to study microsatellite evolution. First, theoretical studies attempt to model the process of microsatellite evolution by applying assumptions to a range of parameters considered to be important to the mutational process. After computer simulations the resulting data can be compared to the observed distribution of allele frequencies and/or the heterozygosity of a locus (e.g. Deka et al. 1991, Shriver et al. 1993). Valdes et al. (1993) and Di Rienzo et al. (1994) used an alternative method where they compared the empirical and modeled allele frequency distributions. These studies have shown that SMM and TPM can explain relatively well the evolutionary processes of microsatellites. There are still open questions as to whether all mutations at microsatellite loci involve changes of only one or two repeat units or whether mutations of larger effects also occur.

It is possible to study the short-term evolution of microsatellites by analysing germline mutations (e.g. Zhang et al. 1994, Primmer et al. 1996a). Although, the mutation rate of microsatellites is several orders of magnitude higher than that for nucleotide substitutions in non-coding DNA, spontaneous mutations are still quite rare in the genome. More mutations have been recorded from immortal cell lines (Weber & Wong 1993) and carcinomas, but it is possible that these kinds of somatic events may show elevated rates of mutations (Weber & Wong 1993). These short term evolution studies have revealed that gains of repeats are more common than losses (Weber & Wong 1993, Amos et al. 1996, Primmer et al. 1996a), that multi-step changes may also occur (Weber & Wong 1993, Primmer et al. 1996a), that mutation rate may differ between sexes (Primmer et al. 1998, Ellegren 2000a), and that mutation rate might be positively correlated with the number of repeats (Primmer et al. 1996a).

Microsatellite allele sequencing both within and between species allows the analyses of past mutation events. Such studies have confirmed that changes in allele length are most often due to size alterations in the repeat area (e.g. Estoup et al. 1995a, Angers & Bernatchez 1997). Nevertheless, many studies have indicated that stepwise changes in repeat number is not the only mode of evolution of microsatellite alleles. For instance, Estoup et al. (1995a) noticed that irregular and composite repeat structures seemed to reduce the amount of single-step mutations, so that the mutation process may be more similar to the IAM. Homoplasic microsatellite alleles may also differ in repeat compositions and/or flanking sequences. Size homoplasy has been reported among microsatellite alleles from the same species (e.g. Blanquer-Maumont & Crouau-Roy 1995, Grimaldi & Crouau-Roy 1997, Viard et al. 1998, van Oppen et al. 1999, Makova et al. 2000), and between species (e.g. van Treuren et al. 1997, Primmer & Ellegren 1998, Colson & Goldstein 1999).

These studies of evolutionary processes have shown that (i) the mutation of repeat units depends on the allele size and purity; (ii) the mutation process is upwardly biased; and (iii) some constraints on allele length exist. It is very likely that these are allele and locus-dependent processes (Ellegren 2000b). Theoretical mutation models, such as SMM and TPM may accurately represent the evolutionary processes of microsatellites when closely related populations are considered. However, over long evolutionary distances the mutation process seems to be more complex. Thus, theoretical mutation models that can more accurately represent the evolutionary processes of microsatellites are needed to obtain better estimates of population differentiation measures.

1.1.5. Potential problems associated with microsatellites

Microsatellites also have some drawbacks as markers. The first problem is reduction or complete loss of amplification of some alleles due to base substitutions or indels within the priming site. These so-called null alleles will not necessarily be recognized when there is a product from the other allele homologue. This can lead to serious underestimation of heterozygosity, compared with that expected on the basis of Hardy-Weinberg equilibrium (e.g. Callen et al. 1993, Paetkau & Strobeck 1995). This problem can be overcome by designing a new primer which does not include the site of indel or base substitution. This can be very time consuming and may not always be possible for instance due to base composition of the flanking sequences.

There are also problems associated with the PCR process itself. Taq polymerase generates slippage during PCR and the tendency of Taq polymerase to add an additional dATP to PCR products can sometimes make allele scoring problematic (Ginot et al. 1996, Gill et al. 1997).

Microsatellite variation is based on length variation (in bp) of the amplified fragments. It is possible that two fragments of the same length are not derived from the same ancestral sequence, introducing the possibility of size homoplasy. Under the IAM there should not be any homoplasy, but SMM and TPM can generate size homoplasy. If microsatellite loci evolve in a stepwise fashion, size homoplasy will depend on the mutation rate on the locus and the divergence time of two populations. The degree of homoplasy will increase with the mutation rate and time of divergence (Estoup & Cornuet 1999). In addition, the selective size constraints that reduce the number of possible allelic states increase size homoplasy (Nauta & Weissing 1996). Size homoplasy can lead to underestimates of population subdivision and genetic divergence between populations and species (e.g. Estoup et al. 1995b, Viard et al. 1998, Taylor et al. 1999b). Size homoplasy is taken into account by several distance measures, which are based on the SMM (Goldstein et al. 1995a and b, Slatkin 1995, Rousset 1996, Feldman et al. 1997).

Hedrick (1999) showed that measures of differentiation for highly polymorphic microsatellites using traditional F-statistics can be underestimates. The reason for this is the high within population heterozygosity (He). FST determines the proportion of variation between subpopulations as compared to the total population (HT), but does not specify the identity of the alleles involved (Hedrick 1999). When using microsatellites, populations can have nonoverlapping sets of alleles, and because under Hardy-Weinberg HT > He, the differentiation estimates can be underestimates.

1.1.6. Applications of microsatellites

Microsatellites have become the preferred marker in many studies because of their high variability, ease and reliability of scoring and codominant inheritance. Microsatellite markers were first used for genetic mapping (e.g. Weissenbach et al. 1992) and as a diagnostic tool to detect human diseases (e.g. Murray et al. 1992). Nowadays microsatellites are regularly used in population and ecological studies. Microsatellites are excellent markers for studying gene flow, effective population size (Ne), dispersal and migration related issues, and parentage and relatedness (e.g. Taylor et al. 1994, Coulson et al. 1998, Ciofi & Bruford 1999, Goldstein et al. 1999, Luikart & England 1999).

Microsatellites can also be used to study the effects and level of inbreeding (Beaumont & Bruford 1999, Pemberton et al. 1999, Sweigart et al. 1999). Allozymes have been used to study mating systems in populations. Due to the low level of polymorphism, the estimation of individual inbreeding coefficients has been difficult. It has been possible to estimate only the average population inbreeding level. However, in many areas of ecological and evolutionary studies it is often important to know how much individuals differ in their inbreeding histories and to estimate the degrees of relatedness between individuals. The average heterozygosity of an individual measured from microsatellite data should realistically reflect the level of inbreeding. New advanced statistical methods have also enabled the use of microsatellite markers in such studies (Sweigart et al. 1999).