| Type XV collagen: Complete structures of the human COL15A1 and mouse Col15a1 genes, location of type XV collagen protein in mature and developing mouse tissues, and generation of mice expressing truncated type XV collagen | ||
|---|---|---|
| Prev | Chapter 2. Review of the literature | Next |
The knowledge of the structure of collagen genes has a number of important applications. One of them is to provide a necessary database for the identification of mutations in collagen genes that cause human diseases. These will be discussed in section 2.5. While screening for mutations in collagen genes, researcher often encounters normal variations, the knowledge of which is fundamental in understanding the functional properties of the protein. In addition, some of these normal variations may even turn out to be potently predisposing to common diseases (for examples, see Kivirikko, 1993). The knowledge of gene structures across distant phyla can be used in evolutionary studies and in the identification of functionally important domains in the protein structure and within the regulatory regions. The genomic sequences also provide a necessary tool for many molecular biological studies, for example, in gene regulation and elucidation of protein function by generating genetically modified animals (see 2.6.2.).
Collagen genes and their loci have been given names with the prefix COL, followed by an Arabic number denoting the collagen type, the letter A, and another Arabic number for the α-chain in question. The gene names are usually written in italics. Those encoding human polypeptides are written with capital letters, whereas lower case letters are used to distinguish the corresponding genes in mouse or chicken. The 34 collagen genes characterized to date, excluding the most recently identified collagen types XX-XXIII, are dispersed throughout the genome and are located in 15 human and 13 mouse chromosomes. The collagen genes in human and mouse, their chromosomal locations, and characteristic features are presented in Table 2 and discussed briefly below.
Table 2. Collagen genes and their chromosomal locations*.
| Gene | Features | Chromosome* | References | |
|---|---|---|---|---|
| Exons | Size (kb) | |||
| COL1A1 | 51 | 18 | 17q21.3-q22 | Chu et al., 1985; D"Alessio et al., 1988; Määttä et al., 1991; Westerhausen et al., 1991 |
| COL1A2 | 52 | 38 | 7q21.3-q22 | de Wet et al., 1987; Körkkö et al., 1998 |
| COL2A1 | 54 | 31 | 12q13-q14 | Ala-Kokko & Prockop, 1990 |
| Col2a1 | 54 | 28,9 | 15 | Metsäranta et al., 1991 |
| COL3A1 | 51 | 44 | 2q24.3-q31 | Chu & Prockop, 1993 |
| Col3a1 | 51 | 37,6 | 1 | Toman & de Crombrugghe, 1994 |
| COL4A1 | 52 | >100 | 13q34 | Soininen et al., 1989 |
| COL4A2 | 47 | >100 | 13q34 | Heikkilä & Soininen, 1996 |
| Col4a2 | 47 | >90 | 8 | Buttice et al., 1990 |
| COL4A3 | 52 | 250 | 2q34-q37 | Heidet et al., 2001 |
| COL4A4 | 48 | >113 | 2q35-q37 | Boye et al., 1998 |
| COL4A5 | 51 | 140 | Xq22 | Zhou et al., 1994 |
| COL4A6 | 46 | 425 | Xq22 | Oohashi et al., 1995; Zhang et al., 1996 |
| COL5A1 | 66 | 750 | 9q34.2-q34.3 | Takahara et al., 1995 |
| COL6A1 | 36 | 29 | 21q22.3 | Heiskanen et al., 1995; Saitta et al., 1991; Trikka et al., 1997 |
| COL6A2 | 36 | 30 | 21q22.3 | Saitta et al., 1991; Saitta et al., 1992 |
| COL7A1 | 118 | 31,1 | 3p21 | Christiano et al., 1994 |
| Col7a1 | 118 | 31 | 9 | Kivirikko et al., 1996 |
| COL9A1 | 38 | 90 | 6q12-q14 | Pihlajamaa et al., 1998 |
| COL9A2 | 32 | 15 | 1p32 | Pihlajamaa et al., 1998 |
| Col9a2 | 32 | 16 | 4 | Peralä et al., 1994 |
| COL9A3 | 32 | 23 | 20q13.3 | Paassilta et al., 1999 |
| COL10A1 | 3 | 6,2 | 6q21-q22 | Apte et al., 1992; Thomas et al., 1991 |
| Col10a1 | 3 | 7,2 | 10 | Apte & Olsen, 1993 |
| COL11A1 | 68 | >150 | 1p21 | Annunen et al., 1999 |
| COL11A2 | 66 | >28 | 6p21.2 | Lui et al., 1996; Vuoristo et al., 1995 |
| COL13A1 | 41/42 | 140 | 10q22 | Hägg et al., 1998; Tikka et al., 1991 |
| Col13a1 | 42 | 135 | 10 | Kvist et al., 1999 |
| COL15A1 | 42 | 145 | 9q21-q22 | see I |
| Col15a1 | 40 | 110 | 4 | see II |
| COL17A1 | 56 | 52 | 10q24.3 | Gatalica et al., 1997 |
| COL18A1 | 43 | 105 | 21q22.3 | Elamaa et al., personal communication |
| Col18a1 | 43 | >102 | 10 | Rehn et al., 1996 |
| COL19A1 | 51 | >250 | 6q12-q14 | Khaleduzzaman et al., 1997 |
* The chromosomal locations of human and mouse genes were collected from the GeneCards and Mouse Genome databases, respectively. Only completely characterized genes are listed, thus some genes whose chromosomal locations are known are excluded. | ||||
Typically, genes encoding collagens span large genomic areas and consist of multiple exons that have some common characteristics due to the repeating Gly-X-Y –unit structure (see Vuorio & de Crombrugghe, 1990; Chu & Prockop, 1993, for reviews). Accordingly, the genes encoding fibril-forming collagens are similar in structure, whereas those encoding non-fibril forming collagens are more heterogeneous. The region encoding the triple-helical domain of the major fibril-forming collagens, types I-III, consists of 41-42 exons, all of which are multiples of 9 bp. Most exons are 54 bp in size, but can also be multiples of 54 bp or combinations of 45- and 54-bp exons. Furthermore, each exon starts with a complete codon for glycine and therefore codes for a discrete number of Gly-X-Y –units. Because of the high evolutionary conservation among the fibrillar collagen genes, it has been proposed that the ancestral gene arose by amplification of a 54-bp exon unit. The genes encoding minor fibrillar collagens, types V and XI, have a large number of 54-bp exons, thus supporting the hypothesis of a 54-bp ancestor exon, although their structures otherwise diverse considerably from that of the major ones, indicating a separate evolutionary pathway (Takahara et al., 1995; Vuoristo et al., 1995).
The triple helix encoding regions of nonfibril-forming collagen genes do not reflect the 54-bp exon motif common in fibril-forming collagens, but contain 36- and 63-bp exons or other sizes that are multiples of 9-bp, or slight deviations from that. The presence of imperfections in the Gly-X-Y sequences, together with the occurrence of split codons at the 5’- or 3’-ends of exons, some of them involving the first G-residue of a Gly-codon, further account for the variation in the exon sizes (see Vuorio & de Crombrugghe, 1990; Chu & Prockop, 1993, for reviews).
Type XIII collagen was the first collagen shown to be modified by alternative splicing (Pihlajaniemi et al., 1987), but subsequent results have indicated that the occurrence of the variant collagen transcripts is the rule rather than the exception in the collagen family. The mode of generation of the alternative transcripts varies from the use of alternative promoters to exon skipping and utilization of internal splice sites (reviewed by Pihlajaniemi & Rehn, 1995). In most cases the alternative splicing affects the N and C-terminal NC domains with the exception of type XIII collagen, where both NC and COL domains are affected. Although the significance of these modifications is not fully understood, the tissue- and developmental stage-specific expression patterns of the variant forms reported e.g. for collagen II (Sandell et al., 1991 and 1994; Lui et al., 1995a), collagen IX (Liu et al., 1993), collagen XI (Sugimoto et al., 1998; Iyama et al., 2001), and collagen XII (Böhme et al., 1995) have been suggested to be implicated in conferring different functional properties (and see later in 2.7.).
To ensure that various collagen types are expressed at controlled rates in their specific locations in adult (see 2.1. and 2.2.) and developing tissues (see 2.7.), the coordinate function of a multiplicity of regulatory elements located in the core promoter areas, 5’-flanking sequences, and within introns is required. In addition, further modulation of collagen gene expression is provided by various cytokines or hormones (for reviews see Vuorio & de Crombrugghe 1990). Recently, type XV collagen expression was reported to be enhanced by transforming growth factor-β (TGF-β ) and reduced by tumor necrosis factor-α (TNF-α) and interleukin-1β (IL-1β ) (Kivirikko et al., 1999).
Structurally, the collagen genes, like other genes, can be roughly divided into two categories based on the characteristics in their core promoter areas. These categories are “tissue-specific genes”, which have TATA boxes specifying the precise position of transcription initiation, and “housekeeping genes”, which lack TATA boxes, but have instead high GC-contents and multiple transcription start sites. The genes belonging into the latter category are transcribed widely in many tissues, but at low RNA levels. Of the collagen genes, those encoding the major fibrillar collagens, COL1A1 (Bornstein et al., 1987), COL2A1 (Metsäranta et al., 1991), and COL3A1 (Benson-Chanda et al., 1989), the COL10A1 encoding the highly specialized collagen of hypertrophic chondrocytes (Apte & Olsen, 1993), and the downstream promoter initiating the synthesis of the cornea-specific transcript of collagen IX (Pihlajamaa et al., 1998) belong to the tissue-spesific gene category. COL4A3-A4 (Momota et al., 1998), COL5A1 (Lee & Greenspan, 1995), COL7A1 (Christiano et al., 1994), COL9A2-A3 (Pihlajamaa et al., 1998; Paassilta et al., 1999), COL11A1 (Yoshioka et al., 1995), COL11A2 (Vuoristo et al., 1995), Col13a1 (Kvist et al., 1999), the promoter 1 of Col18a1 (Rehn et al., 1996), and the downstream promoter of COL6A2 (Saitta et al., 1992) all belong to the housekeeping genes category. Furthermore, some collagen promoters, such as COL4A5-6 (Sugimoto et al., 1994) and the promoter 2 of Col18a1 (Rehn et al., 1996), lack both TATA- and GC-boxes, but contain CCAAT boxes. Others, however, lack all the above mentioned proximal promoter elements, and examples of these are COL6A1 (Bonaldo et al., 1993) and the upstream promoter of the cartilage-specific transcript of collagen IX (Pihlajamaa et al., 1998).
There are several ways to identify and characterize the regulatory elements. As described in publications I and II, putative regulatory elements can be identified simply by sequencing the 5’-flanking areas of the genes and by searching for binding sites for known transcription factors, the functional significance of which must be determined by other means. In several studies, hints provided by phylogenic conservation of critical regulatory elements have been utilized (collagens I, II, V and X) (Vikkula et al., 1992; Truter et al., 1993; Thomas et al., 1995; Antoniv et al., 2001, and see below). An important experimental system to study elements conferring tissue-specificity in intact animals is provided by transgenic mice, or lately also by nematodes, frogs, and zebra fish. Typically, a potential regulatory sequence is fused to a reporter gene, such as β -galactosidase, luciferase, or green fluorescent protein (GFP), introduced into the mouse germline, and the expression of the reporter gene is monitored in tissues (for reviews see Hogan et al., 1994). This strategy has been used for example in the identification of the chondrocyte-specific elements in the first intron of Col2a1 gene (Zhou et al., 1995; Zhou et al., 1998), in the identification of osteoblast-specific elements in the promoter of Col1a1 gene (Rossert et al., 1996), as well as in the study of isoform specificity in the expression patterns of collagen XVIII, cle-1, in C. elegans (Ackley et al., 2001). Similarly, the promoter efficacy can be studied in vitro in transient transfection assays using reporter gene constructs, which, when coupled with cotransfection, gel-shift, and footprinting assays or mutagenesis, reveal functional characteristics of the promoter, such as the cis-acting elements implicated in the gene regulation. This strategy has been successfully used for example in the identification of regulatory elements conferring the liver-specificity of promoter 2 of the Col18a1 gene (Liétard et al., 2000).