|Phylogenetic analysis of mitochondrial DNA: Detection of mutations in patients with occipital stroke|
|Prev||Chapter 2. Review of the literature||Next|
The average number of base pair differences between two human mitochondrial genomes is estimated to be from 9.5 to 66 (Zeviani et al. 1998). The high mutation rate has resulted in the accumulation of a wide range of neutral, population-specific base substitutions in mtDNA. These have accumulated sequentially along radiating maternal lineages that have diverged approximately on the same time scale as human populations have colonized different geographical regions of the world. Thus the women that migrated out of Africa into the different continents about 130,000 years before present (YBP) harboured mtDNA mutations which today are seen as high-frequency, population-specific mtDNA polymorphisms creating groups of related mtDNA haplotypes, or haplogroups (Torroni & Wallace 1994, Wallace 1995).
The D-loop is the most variable region in the mitochondrial genome, and the most polymorphic nucleotide sites within this loop are concentrated in two ‘hypervariable segments’, HVS-I and HVS-II (Wilkinson-Herbots et al. 1996). Population specific, neutral mtDNA variants have been identified by surveying mtDNA restriction site variants or by sequencing hypervariable segments in the displacement loop. Restriction analysis using 14 restriction endonucleases allows screening of 15–20% of the mtDNA sequence for variations (Chen et al. 1995a). The large majority of mtDNA sequence data published to date are limited to HVS-I. The comparison of sequence variations in the HVS-I with the restriction fragment length polymorphisms (RFLP) distributed throughout mtDNA has two major objectives. First, to corroborate the reliability of the analysis of these variations, and second, to differentiate the control region variants that are phylogenetically associated with a special haplotype, and thus relatively ancient and stable, from those that are recent and have been subject to repeated mutations or back mutations (Bandelt et al. 1995).
The coding and classification system used for mtDNA haplogroups refers to the information provided by RFLPs and the hypervariable segments of the control region. The principal clusters of polymorphisms or haplogroups are denoted by capital letters (Torroni et al. 1996a, Richards et al. 1998). About 76% of all African mtDNAs fall into haplogroup L, defined by a HpaI restriction site gain at bp 3592 (Chen et al 1995a, Graven et al. 1995). 77% of Asian mtDNAs are encompassed within a super-haplogroup defined by a DdeI site gain at bp 10394 and an AluI site gain at bp 10397 (Ballinger et al. 1992, Torroni et al. 1993a, 1993b, Chen et al. 1995a, Wallace 1995). Essentially all native American mtDNAs fall into four haplogroups, A–D (Torroni & Wallace 1994). Haplogroup A is defined by a HaeIII site gain at bp 663, B by a 9 bp deletion between bp 8271 to bp 8281, C by a HincII site loss at bp 13259, and D defined by an AluI site loss at bp 5176 (Torroni et al. 1993b, Torroni & Wallace 1994, Wallace 1995). Ten haplogroups encompass almost all mtDNAs in European populations (Torroni et al. 1996a).
Classical polymorphic markers (i.e. blood groups, protein electromorphs and HLA antigenes) have suggested that Europe is a genetically homogeneous continent with a few outliers such as the Saami, Sardinians, Icelanders and Basques (Cavalli-Sforza et al. 1993, Piazza 1993). The analysis of mtDNA sequences has also shown a high degree of homogeneity among European populations, and the genetic distances have been found to be much smaller than between populations on other continents, especially Africa (Comas et al. 1997).
The mtDNA haplogroups of Europeans are surveyed by using a combination of data from RFLP analysis of the coding region and sequencing of the hypervariable segment I. About 99% of European mtDNAs fall into one of ten haplogroups: H, I, J, K, M, T, U, V, W or X (Torroni et al. 1996a). Each of these is defined by certain relatively ancient and stable polymorphic sites located in the coding region (Torroni et al. 1996a).
In a phylogenetic analysis of European haplogroups (Torroni et al. 1996a) the first subdivision is based on the presence or absence of a DdeI site at bp 10394. A lack of that site is common to haplogroups H, T, U, V, W and X. Haplogroup H, which is defined by the absence of a AluI site at bp 7025, is the most prevalent, comprising half of all Europeans (Torroni et al. 1996a, Richards et al. 1998). Haplogroups J, T, K and U are also common, and are shared by all European populations. Haplogroup J is defined by a BstNI site loss at bp 13704, and haplogroup T by a BamHI site gain at bp 13366. Haplogroups K and U both have a transition 12308A>G, but haplogroup U lacks a DdeI site at bp 10394. The remaining haplogroups I, V, W and X are less common. Haplogroup V is defined by a NlaIII site loss at bp 4577, while haplogroups I, X and W are not so clearly distinguished from each other. Haplogroups I and W both have an AvaII site gain at bp 8249, but haplogroups I and X both lack a DdeI site at bp 1715. Also, haplogroup W has a HaeIII site loss at bp 8994, and haplogroup I has an AluI site gain at bp 10028 (Torroni et al. 1996a).
Six of the European haplogroups (H, I, J, K, T and W) are essentially confined to European populations (Torroni et al. 1994, 1996a), and probably originated after the ancestral Caucasoids became genetically separated from the ancestors of the modern Africans and Asians. Haplogroup U, although much more prevalent in Europe, is also found at a low frequency in the Japanese, the North-African Berber population, the Ethiopians and the Senegalese (Ozawa 1995, Torroni et al. 1996a, Passarino et al. 1998, Macaulay et al. 1999).
The European haplogroups have recently been defined in a more accurate manner, and their phylogenetic tree has been partly reconstructed. Specifically, haplogroup U has been enlarged to include haplogroup K as a subcluster (Hofmann et al. 1997, Richards et al. 1998, Macaulay et al. 1999).
It is estimated that haplogroups H, J, T and V may be of relatively recent origin, 8,000–30,000 years (Torroni et al. 1996a), and this supports the hypothesis that they originated after the genetic and geographical separation of the ancestral Caucasoids from the ancestors of modern Africans and Asians. On the other hand, haplogroup U appears to be much older than the others, with an estimated age of 51,000–67,000 yr (Torroni et al. 1996a), raising the possibility that it may have originated in Africa and subsequently expanded into the Middle East and Europe.
The oldest archaeological evidence of settlement in Finland dates back to approximately 9,000 YBP, but the origin of the people is unknown. They were probably small groups which survived on hunting, fishing and gathering, and came from Europe and the southeast, possibly from the Ural Mountains area. They could be the ancestors of the Saami people (Pitkänen 1994). Finland has later received population influences from several other sources, but all the groups of settlers have been small (Pitkänen 1994).
The major route of colonization was over the Gulf of Finland, but settlement via both western and eastern routes has also occurred. Evidence for contacts with the Volga region can be found amongst archaeological remains of the Suomusjärvi culture dating from around 6000 YBP. Soon after that the Comb Ceramic Culture was introduced into Finland, and this was associated with the arrival of new, Finno-Ugric-speaking settlers. The native, linguistically unknown population already living in Finland partly acculturated to these settlers. The Battle-Axe Culture of Central and Northern Europe was introduced into Southwest Finland via the Baltic around 4,000 YBP and was incorporated into the native population. Finland had been divided into two cultures. The Combed Ware Culture adopted the agriculture, but north of the line Vaasa-Viipuri the Comb Ceramic Culture retained its the traditional hunting (Pitkänen 1994).
Finnish belongs to the Finno-Ugric group of languages, which form the largest non-Indo-European language group in Europe. Finno-Ugric languages are also spoken in Estonia, Karelia, Latvia, Lithuania and Hungary. The Saami languages and certain minor languages spoken in northern Russia are more distant members of the same group (Korhonen 1991).
Religious and linguistic barriers, geopolitical position and low population density have ensured that the Finns have remained in comparative local and national isolation, and therefore the spectrum of inherited diseases is different from that in neighbouring populations. Many recessive diseases are unique to Finland, whereas other diseases that are common elsewhere are rare (Norio et al. 1973, de la Chapelle 1993, Peltonen et al. 1995).
Previous studies on Finnish mtDNA variation have showed high homogeneity and a clear Caucasoid pattern of polymorphisms (Vilkki et al. 1988, Pult et al. 1994, Sajantila et al. 1996, Torroni et al. 1996a). These studies have indicated a close relationship between the Finns and the other Europeans. About 40% of the Finns belong to haplogroup H, which is the most common among Europeans, comprising half of each population (Torroni et al. 1996a, Richards et al. 1998). Other common haplogroups among the Finns include haplogroup U, with a frequency of 16%, haplogroup J, 14%, and haplogroup T, 6%. The remaining haplogroups (V, W, X, K, I, M) are less common, each with a frequency of 2–4 % (Torroni et al. 1996a). The Saami, however, have a mitochondrial gene pool which is distinct from that of other European populations (Sajantila et al. 1995, Lahermo et al. 1996).
An apparent discrepancy exists between the lack of a linquistic relation between the Finns and the other Europeans and the similarity between their mtDNA gene pools. On the other hand, the Saami speak a Finno-Ugric language and are genetically different from the surrounding European populations including the Finns. Based on this discrepancy it has been suggested that a language change has taken place in which an Indo-European population has adopted a Finno-Ugric language from the Saami without any substantial exchange of genetic material (Sajantila & Pääbo 1995).
As late as the 12th century only a minor part of Finland, the southwestern coast, the provinces of Häme, Southern Satakunta, Åland and the surroundings of Ladoga in eastern Finland, had permanent settlement. By the 16th century there was consistent settlement in southern Finland and a string of villages existed along the coasts of Ostrobothnia and its riversides, reaching the northern part of the Gulf of Bothnia. The major permanent colonization of Northern Ostrobothnia and Kainuu took place after the 16th century, when mainly people from Savo migrated under Crown orders or because of an increase in population and the adoption of agriculture. The Saami had to move to the north, but most of them had acculturated with the settlers (Pitkänen 1994).
DNA sequence variation can be used to construct a phylogenetic tree, or several alternative trees arranged in a network, to display the evolutionary relationships between individual sequences. The tree will thus tell the phylogenetic story of a given gene. The structure of this gene tree contains information which, in conjunction with a calibrated mutation rate for the DNA sequences under study, can be used to estimate a time-scale for events in human prehistory. Moreover, the geographical distribution of the lineages on a tree or network can be used to detect prehistoric movements from one region to another (Avise 1986).
Traditional tree-building methods are unsatisfactory when applied to human mtDNA data because of homoplasy, parallel mutation events or reversals (Bandelt et al. 1995). Maximum parsimony, maximum likelihood, and distance methods almost invariably fail to form nested sets of haplotypes, but instead exhibit incompatibility between pairs of characters. Excoffier and Smouse (1994) have calculated that the number of equally parsimonious trees for an RFLP data set of just 56 haplotypes exceeded 109.
Bandelt et al. (1995) argue that mtDNA data are best analyzed by a network based on a median algorithm. This approach distinguishes between unresolvable character conflicts, leaving a compact, intelligible representation of plausible solutions. The unmodified median network generated by partitioning the groups of haplotypes character by character is guaranteed to include all the most parsimonious trees. High rates of homoplasy might even lead to a single haplotype being independently derived from the same ancestor along different routes.
An approach using networks rather than trees has many advantages (Bandelt et al. 1995). The median network generated by using a table of binary data contains the same information as the table, yet in a much more comprehensible form. Furthermore, the network can predict haplotypes and tell us where homoplasy is located, which sites have frequently undergone mutation, where a consensus sequence is, whether recombination is likely to have occurred, where to look for sequence errors, which haplogroups may be distingquished, and so on. Since the median network harbours all the most parsimonious trees for the input data, it yields a more concise picture of the data than an exhaustive list of all maximum parsimony trees.