1.3. Mitochondrial DNA

Animal mitochondrial DNA (mtDNA) is a small (15-20 kb) circular molecule, composed of about 37 genes coding for 22 tRNAs, two rRNAs and 13 mRNAs, the latter coding for proteins mainly involved in the electron transport and oxidative phosphorylation of the mitochondria. The mitochondrial genome is arranged very efficiently. It lacks introns, has small intergenic spacers where the reading frames even sometimes overlap. The control region is the primary non-coding region, and is responsible for the regulation of heavy (H) and light (L) strand transcription and of H-strand replication (Figure 1).

As a molecular marker, mitochondrial DNA has many advantages. It evolves faster than nuclear DNA (Brown et al. 1982), probably due to inefficient replication repair (Clayton 1984). Different regions of the mitochondrial genome evolve at different rates (Saccone et al. 1991) allowing suitable regions to be chosen for the question under study. Mitochondrial DNA is maternally inherited in most species (exceptions with paternal leakage including mice, Gyllesten et al. 1991; biparental inheritance in marine mussels, Zouros et al. 1992). Mitochondrial DNA does not recombine (Hayashi et al. 1985), though some evidence of recombination events has recently been reported (Eyre-Walker et al. 1999, Hagelberg et al. 1999). Individuals are usually homoplasmic for one mitochondrial haplotype though heteroplasmic conditions have been reported in many species (e.g., perches, Nesbø et al. 1998; Drosophila, Volz-Lingenhöhl et al. 1992; bats, Wilkinson & Chapman 1991). These features mean that each molecule as a whole usually has a single genealogical history through maternal lineages.

Whether the mitochondrial DNA can be considered a strictly neutral marker has been controversial (e.g. Rand & Kann 1996 and references therein). Though support for neutrality comes from the high evolutionary rate of the molecule (Brown et al. 1979, Brown et al. 1982, Vawter & Brown 1986), the assumed uniformity in the substitution rates of the mtDNA and the relaxed translation of mitochondrial mRNAs (Cann et al. 1984), mitochondrial DNA evolution more likely follows the mildly deleterious model or the nearly neutral model. Based on neutrality tests of molecular data from 14 studies Fry (1999) suggested that within species there is an excess of rare haplotypes and these haplotypes carry mildly deleterious mutations. Ballard and Kreitman (1995) point out in their review that selection on any part of the mtDNA has an influence on polymorphism in the whole molecule in the population because the lack of recombination makes mitochondrial genomes particularly susceptible to genetic hitchhiking. The heterogeneous substitution rates along lineages and the relative excess of replacement polymorphism (=substitutions leading to nonsynonumous amino acid changes) also support the idea that selection has a role in mtDNA polymorphism. Yet, Ballard and Kreitman (1995) state that genetic drift may be the prevailing force in mitochondrial evolution.

Figure 1. Mitochondrial genomes of birds (a, b) and mammals and Xenopus (c). tRNA genes are identified by their 1-letter amino acid codes. Outer circle represents the heavy (H) strand and the inner circle the light (L) strand. Polarity of transcription and the transcribed strand is shown with the arrowheads. When no arrowhead is marked the gene is transcribed from the H-strand with clockwise polarity. The regions used in this study are marked with dark grey. The genomes are redrawn from Desjardins and Morais (1990) and Mindell et al. (1998).

However, the high value of using mtDNA in phylogenetic studies is not severely diminished by the uncertainty of mitochondrial neutrality. Even if molecules differ in fitness, properly identified synapomorphies still allow the recognition of monophyletic clades. Knowledge of mtDNA neutrality is essential for analyses involving genetic distance estimates and molecular clock (Avise et al. 1987).

Whether or not mtDNA is strictly neutral, it is a sensitive indicator of population level processes. Analysis of mtDNA divergence can be used to reveal geographic clusters of related molecules (individuals) or matrilineal relationships within populations. It can be used also to trace historical events like bottlenecks, or to analyse hybrid zones. MtDNA can also be very useful in resolving phylogenetic relationships between closely related taxa (Moritz et al. 1987).

1.3.1. Avian mitochondrial DNA

The first complete sequence of an avian mitochondrial genome was published from chicken by Desjardins and Morais (1990). It showed highly conserved features when compared to other vertebrate mtDNAs. The protein genes are very similar to the homologous genes in mammals and amphibians and they are translated using the same genetic code. Guanine is relatively infrequent at the third position of codons. Several genes overlap and several end with an incomplete stop codon that is completed by polyadenylation (Quinn 1997).

Though many features are the same in all the vertebrate mtDNAs, the avian genomes have some remarkable differences. First, the avian gene order is novel compared to mammalian and amphibian mitochondrial genomes (Figure 1). The ND5 gene (nicotinamide adenine dinucleotide dehydrogenase subunit 5) is followed by cytochrome b, tRNAThr and tRNAPro, ND6 and tRNAGlu in the 5’ → 3’ direction of the avian L-strand (Desjardins & Morais 1990, Desjardins & Morais 1991, Quinn & Wilson 1993). This rearrangement could have arisen through duplication via replication slippage followed by at least two independent deletion events (Quinn & Wilson 1993). Second, the L-strand replication origin that is found between tRNACys and tRNAAsn in other vertebrates is absent in the avian genome (Desjardins & Morais 1990, Desjardins & Morais 1991). In addition, COI (cytochrome oxidase I) has an unusual initiation codon GTG (instead of ATG) and there is evidence of low incidence of thymine at silent positions within coding regions (Quinn 1997). Recently, Mindell et al. (1998) found another gene order in avian mtDNA, found among four bird orders (Picidae, Cuculidae, suboscine Passeriformes and Falconiformes; figure 1). This gene arrangement probably has multiple independent origins because it is found in quite divergent taxa.

1.3.2. Cytochrome b

Cytochrome b is one of the cytochromes involved in the electron transport in the respiratory chain of mitochondria. It contains eight transmembrane helices connected by intramembrane or extramembrane domains (Figure 2, Esposti et al. 1993). It is the only cytochrome coded by mitochondrial DNA.

Figure 2. Structure of the cytochrome b protein. The gene region used in this study corresponds the shaded parts of the protein.

The cytochrome b gene is the most widely used gene for phylogenetic work for several reasons. Although it evolves slowly in terms of non-synonymous substitutions, the rate of evolution in silent positions is relatively fast (Irwin et al. 1991). The wide use of cytochrome b has created a status as a universal metric, in the sense that studies can be easily compared. Cytochrome b is thought to be variable enough for population level questions, and conserved enough for clarifying deeper phylogenetic relationships. However, the cytochrome b gene is under strong evolutionary constraints because some parts of the gene are more conserved than others due to functional restrictions (Meyer 1994). Most of the variable positions seem to be located within the coding regions for transmembrane domains or for the amino- and carboxy-terminal ends (Irwin et al. 1991).

So far, cytochrome b has been the most prevalent source of sequence data in avian studies. Although use of cytochrome b has some pitfalls, Moore and DeFilipps (1997) argue that it could anyhow be the best choice for resolving relatively recent evolutionary history. The tendency of birds to have low sequence divergence rates at high taxonomic levels compared to other vertebrates makes cytochrome b a good choice as a marker.

Helm-Bychowski and Cracraft (1993) provided a good example of low divergence in a phylogenetic study of corvine passerine birds. Although some monophyletic groups could be identified at the family level, some remained unsolved, possibly because of a rapid radiation of lineages over a relatively short period of time. Similar results are obtained also from cardueline finches, where all the branches of the phylogenetic tree could not be convincingly resolved (Fehrer 1996). However, cytochrome b gene divergence suggested that some subspecies should be elevated to species status in Tanagers (Ramphocelus; Hackett 1996), and the taxonomy of swiftlets (Apodidae) based on the presence or absence of echolocating ability has been called into question (Lee et al. 1996). The cytochrome b sequences have successfully been used to identify taxonomic groups even at subspecies level, for example in bluethroats (Luscinia svecica svecica and L. s. namnetum; Questiau et al. 1998) and common guillemots (Uria aalge; Friesen et al. 1996).

1.3.3. Control region of mtDNA

The mtDNA control region is the only large non-coding region in avian mitochondria varying from 1044 bp in Cairina moshata (Liu et al. 1996) to 1227 bp in Gallus domesticus (Desjardins & Morais 1990). It contains the heavy-strand replication origin and the promoters for both the L- and H-strand transcription (L’abbe et al. 1991). This region is divided into three domains (Figure 3) according to the criteria of Baker and Marshall (1997). The first domain at the 5’ end of the control region contains a C-stretch and the putative termination associated sequence (TAS). The C-stretch is characteristic for the 5’ terminus of the avian control region being present in various forms at least in Anatidae, Phasianidae and Paridae (Quinn & Wilson 1993, Desjardins & Morais 1990, Marshall & Baker 1997, works II-IV). A C-stretch was also found from the platypus (Ornithorhynchus anatinus; Janke et al. 1996) and the frog (Rana catesbeiana; Yoneyama 1987) control region, but not at the 5’ terminus.

The central domain is the most conserved. It contains several structural elements that can be readily aligned even between different bird families. Three of these elements (F-, D- and C-boxes) were identified by Desjardins and Morais (1991) from Gallus domesticus and by Quinn and Wilson (1993) from Anser caerulescens. In addition, there are highly homologous regions among bird families upstream from F- and C-boxes designated as E- and B-boxes after Southern et al. (1988). In the middle of the E-box and the D-box there are short regions (rebox and Mt-3, respectively) that are homologous to sequences able to bind nuclear transcription factors connected to the regulation of oxidative phosphorylation (Wallace 1993, Suzuki et al. 1991).

Usually the most variable part is the third domain at the 3’ end of the control region. This domain begins with conserved sequence block 1 (CSB-1). The mitochondrial transcription factor (mtTFA) is probably bound to CSB-1, and mediates the transition from transcription to replication when another factor is bound to the mtTFA-CSB-1-complex (Ghivizzani et al. 1994). The rest of the third domain seems to be free from functional constraints and large interspecific insertions or deletions, as well as intraspecific tandem repeats (loggerhead shrike Lanius ludovicianus; Mundy et al. 1996, Ciconiiformes; Berg et al. 1995) are concentrated in this region.

Figure 1-3. Control region alignment of one individual from each of four Parus species and the structural elements in the region. Identical nucleotide sites are shown by stars.

The control region has been shown to often evolve faster than the rest of the mitochondrial genome (e. g. hominoids; Horai et al. 1995) and to be highly variable in birds (Wenink et al. 1993). This variability has lead to the expanding usage of control region sequences to examine questions ranging from population structure to phylogenetic relationship. This region has already proven to be quite a powerful tool in elucidating the global population structure in shorebirds (Wenink et al. 1993, 1994, 1996) and fringilline finches (Marshall & Baker 1997, 1998), in revealing recent mixing of maternal lineages in snow geese (Quinn 1992), and in evaluating gene flow between social groups and populations in babblers (Edwards 1993). However, in some avian genera (e. g. gnatcatchers, Polioptila; Zink & Blackwell 1998 and towhees, Pipilo; Zink et al. 1998) mitochondrial coding genes have recently been shown to evolve as rapidly as the control region.

Fewer studies have dealt with control region sequences at higher taxonomic levels. Kidd and Friesen (1998) succeeded in resolving the phylogenetic relationships between three species of guillemots, but failed to solve the branching order of the subspecies. Between closely related goose species (Anser), control region sequences have been shown to resolve phylogenetic relationships quite reliably (Ruokonen 2000, in press).