Gene Duplication, Structural Rearrangements, and Genome Evolution
Summary
This chapter examines how genomes evolve through gene duplication, creating paralogs, gene families, and pseudogenes, and how structural rearrangements reshape chromosomes. Students also explore genomic imprinting, parent-of-origin effects, and uniparental disomy as epigenetic modifications of inheritance. After completing this chapter, students will understand the mechanisms driving genome evolution and non-Mendelian inheritance patterns.
Concepts Covered
This chapter covers the following 10 concepts from the learning graph:
- Gene Duplication
- Paralog
- Gene Family
- Pseudogene
- Segmental Duplication
- Polyploidy
- Chromosomal Rearrangement
- Genomic Imprinting
- Parent of Origin Effects
- Uniparental Disomy
Prerequisites
This chapter builds on concepts from:
- Chapter 4: Genome Structure and Chromatin Organization
- Chapter 5: Genetic Variation and Genome Diversity
Welcome to Genome Evolution!
Welcome, fellow investigators! In this chapter, we explore how genomes change over evolutionary time. Gene duplication is one of nature's most powerful creative tools, and imprinting reveals that sometimes it matters which parent a gene came from. Let's look at the evidence!
Introduction: The Evolving Genome
Genomes are not static blueprints. They are dynamic systems that change over generations through mutation, recombination, and selection. Among the most important mechanisms of genome evolution is gene duplication, the creation of an extra copy of a gene or genomic segment. Gene duplication provides raw material for evolutionary innovation because the redundant copy is free to accumulate mutations without compromising the function of the original gene.
This chapter examines the mechanisms and consequences of gene duplication, the large-scale rearrangements that reshape chromosomes, and a fascinating class of epigenetic phenomena, genomic imprinting, that violates classical Mendelian expectations about the equivalence of maternal and paternal alleles.
Gene Duplication: Creating Evolutionary Raw Material
Mechanisms of Gene Duplication
Gene duplication occurs when a segment of DNA containing one or more genes is copied, producing two identical copies where previously there was one. The most common mechanism is unequal crossing over during meiosis, in which misalignment of homologous chromosomes causes one recombination product to receive a duplicated segment while the other receives a deletion. Other mechanisms include retrotransposition (in which an mRNA transcript is reverse-transcribed and inserted elsewhere in the genome) and whole-genome duplication (discussed under polyploidy below).
Immediately after duplication, the two gene copies are identical in sequence. Over evolutionary time, however, they diverge through the accumulation of mutations. This divergence can lead to several distinct outcomes.
The fate of a duplicated gene depends on the mutations it accumulates:
- Conservation (redundancy): Both copies retain the original function. This is usually temporary, as neutral mutations will eventually differentiate them.
- Neofunctionalization: One copy acquires mutations that give it a new function while the other retains the original function. This is the classic model for evolutionary innovation.
- Subfunctionalization: Each copy loses part of the original function, so that both copies together perform the work originally done by one. For example, if the ancestral gene was expressed in both the liver and the brain, one copy might lose liver expression while the other loses brain expression, requiring both copies.
- Nonfunctionalization (pseudogenization): One copy accumulates disabling mutations and becomes a pseudogene.
Paralogs
A paralog is one of a pair (or set) of homologous genes within the same organism that arose by gene duplication. Paralogs are distinguished from orthologs, which are homologous genes in different species that arose by speciation. The distinction matters because paralogs within a genome may have diverged in function, while orthologs in different species often retain similar functions.
The human genome contains thousands of paralog pairs and families. The alpha-globin and beta-globin genes, for example, are paralogs that arose from an ancient gene duplication event roughly 500 million years ago. Each has since evolved to produce a globin subunit with distinct oxygen-binding properties, and they are regulated differently during development.
Gene Families
A gene family is a group of paralogous genes in a genome that share a common ancestor and have related sequences and functions. Gene families arise through successive rounds of gene duplication followed by divergence. Some gene families are small, with just a few members, while others are enormous.
The human genome contains several well-studied gene families that illustrate the creative power of duplication.
Notable examples of gene families include:
| Gene Family | Members (Human) | Function | Duplication Pattern |
|---|---|---|---|
| Globin family | ~14 genes | Oxygen transport | Tandem duplications over 500 Myr |
| Hox gene clusters | 39 genes in 4 clusters | Body axis patterning | Whole-cluster duplication |
| Olfactory receptors | ~800 genes (~400 functional) | Smell detection | Extensive tandem duplication |
| Immunoglobulin superfamily | >300 genes | Immune recognition | Domain duplication and divergence |
| Cytochrome P450 | ~57 genes | Drug/toxin metabolism | Ancient and recent duplications |
Think About It
The human olfactory receptor gene family has roughly 800 members, but about half are pseudogenes. What does this tell us about the evolutionary trajectory of the sense of smell in humans compared to, say, dogs, who have far fewer olfactory pseudogenes?
Pseudogenes
A pseudogene is a gene copy that has accumulated mutations, such as premature stop codons, frameshift insertions or deletions, or promoter deletions, that prevent it from producing a functional protein. Pseudogenes are sometimes described as "molecular fossils" because they reveal the evolutionary history of gene duplications even though they are no longer functional in the traditional sense.
Pseudogenes are classified into three types based on their origin:
- Duplicated (unprocessed) pseudogenes arose from gene duplication followed by accumulation of disabling mutations. They retain the intron-exon structure of the parent gene.
- Processed pseudogenes arose through retrotransposition of an mRNA molecule back into the genome. Because mRNA has been spliced, processed pseudogenes lack introns and often lack a promoter, rendering them transcriptionally dead from the start. They typically have a poly-A tail, a signature of their mRNA origin.
- Unitary pseudogenes are former functional genes that were inactivated by mutation without duplication. The single-copy GULO gene in humans, which encodes an enzyme required for vitamin C synthesis, is a unitary pseudogene; it was inactivated in our primate ancestors, which is why humans must obtain vitamin C from their diet.
The human genome contains an estimated 15,000 to 20,000 pseudogenes. While most were long considered "junk DNA," recent research suggests that some pseudogene transcripts may have regulatory functions, acting as decoys for microRNAs or influencing the expression of their parent genes.
Segmental Duplication and Polyploidy
Segmental Duplication
A segmental duplication is a block of DNA, typically 1 kilobase to several hundred kilobases in length, that is present in two or more copies in the genome with high sequence identity (>90%). Segmental duplications are distinct from tandem repeats in that they are much larger and may be located on different chromosomes (intrachromosomal or interchromosomal). Approximately 5% of the human genome consists of segmental duplications.
Segmental duplications are medically important because they predispose to structural rearrangements. When two duplicated segments are located in the same orientation on the same chromosome, misalignment during meiosis can cause unequal crossing over, leading to deletions, duplications, and other rearrangements. These recurrent rearrangements are responsible for several genomic disorders, including DiGeorge syndrome (22q11.2 deletion), Prader-Willi syndrome, and Angelman syndrome.
Polyploidy
Polyploidy is the condition of having more than two complete sets of chromosomes. While polyploidy is rare in animals, it is extremely common in plants: an estimated 30-70% of flowering plant species are polyploid. Polyploidy can arise through two main mechanisms.
Autopolyploidy occurs when a single species undergoes whole-genome duplication, for example when cell division fails during meiosis and produces a diploid gamete instead of a haploid one. Fusion of two diploid gametes creates a tetraploid (4n) individual.
Allopolyploidy occurs when two different species hybridize and the hybrid undergoes genome doubling to restore fertility. The resulting allopolyploid contains complete chromosome sets from both parental species. Bread wheat (Triticum aestivum) is a well-known allohexaploid (6n) that arose through two separate hybridization and genome doubling events involving three ancestral grass species over approximately 500,000 years.
Polyploidy is considered a major mechanism of speciation in plants because the polyploid individual is immediately reproductively isolated from its diploid progenitor (crosses produce sterile triploid offspring). In vertebrates, two rounds of whole-genome duplication early in the vertebrate lineage (the "2R hypothesis") are thought to have generated the four Hox gene clusters and many other duplicated gene families that enabled the evolution of vertebrate complexity.
Diagram: Fates of Duplicated Genes
Fates of Duplicated Genes
Type: Interactive diagram
sim-id: gene-duplication-fates
Library: p5.js
Status: Specified
An interactive diagram starting with a single ancestral gene that undergoes duplication (animated split into two copies). A timeline slider labeled "Evolutionary Time" allows users to advance through generations. At each time step, random mutations accumulate in both copies (shown as colored dots on the gene bar). Users select one of four fate pathways via buttons: (1) Conservation, where both copies stay functional; (2) Neofunctionalization, where one copy turns a new color representing a new function; (3) Subfunctionalization, where each copy loses a different expression domain (shown as colored tissue icons dimming); (4) Pseudogenization, where one copy accumulates a stop codon (shown as a red X) and grays out. Each pathway includes a brief text explanation and a real-world example (e.g., globin genes for neofunctionalization).
Chromosomal Rearrangements in Evolution
Types and Consequences
Chromosomal rearrangement is a broad term encompassing any event that changes the structure or order of DNA segments within or between chromosomes. While Chapter 5 introduced rearrangements as a form of genetic variation, here we consider their role in genome evolution over longer timescales.
The major types of chromosomal rearrangements, including inversions, translocations, deletions, duplications, and fusions/fissions, have distinct evolutionary consequences.
Inversions play a particularly important evolutionary role. A large inversion on a chromosome does not necessarily alter gene function, but in individuals heterozygous for the inversion, crossing over within the inverted region produces unbalanced gametes. This effectively suppresses recombination across the inverted region, maintaining blocks of co-adapted alleles. Inversions have been implicated in the evolution of sex chromosomes, where progressive inversions on the Y chromosome (or W chromosome in birds) have suppressed recombination between the sex chromosomes, leading to Y chromosome degeneration over evolutionary time.
Chromosome fusions have reshaped the karyotype (chromosome number) of many lineages. Humans have 46 chromosomes while our closest relatives, chimpanzees, have 48. The difference is explained by the fusion of two ancestral chromosomes to form human chromosome 2, as evidenced by the presence of vestigial telomere sequences at the fusion site and a vestigial centromere.
Comparing karyotypes across species reveals a rich history of rearrangement:
- Human vs. chimpanzee: One chromosome fusion (chr 2), plus 9 pericentric inversions
- Human vs. mouse: Over 300 rearrangement events separating the two karyotypes
- Human vs. dog: Extensive chromosome fissions, with dog karyotype (2n = 78) reflecting many smaller chromosomes
Practical Tip
Comparative genomics uses synteny maps to track chromosomal rearrangements between species. Synteny refers to the conservation of gene order along a chromosome. When a block of genes appears in the same order in two species, those genes are syntenic, and we infer that no rearrangement has disrupted that region since the species diverged.
Genomic Imprinting and Parent-of-Origin Effects
What Is Genomic Imprinting?
Genomic imprinting is an epigenetic phenomenon in which the expression of a gene depends on which parent transmitted it. At an imprinted locus, one allele (either the maternally or paternally inherited copy) is silenced by DNA methylation and histone modifications established during gametogenesis, while the other allele is expressed. This means that, unlike the standard Mendelian expectation where both alleles contribute equally, an imprinted gene is functionally hemizygous: only one allele is active.
Approximately 100 to 200 genes in the mammalian genome are known to be imprinted. Many imprinted genes are involved in fetal growth, placental development, and postnatal behavior, consistent with the "parental conflict" hypothesis. This evolutionary model, proposed by David Haig, predicts that paternally expressed genes tend to promote fetal growth (benefiting the father's reproductive interest) while maternally expressed genes tend to restrain fetal growth (conserving maternal resources for future offspring).
Parent-of-Origin Effects
Parent-of-origin effects describe any situation where the phenotypic consequence of inheriting a particular allele depends on whether it was transmitted by the mother or the father. Genomic imprinting is the primary molecular mechanism underlying parent-of-origin effects, but other mechanisms, such as maternally deposited mRNAs and mitochondrial inheritance, can also produce parent-of-origin effects.
The most striking clinical examples of parent-of-origin effects involve two conditions that map to the same chromosomal region, 15q11-q13:
-
Prader-Willi syndrome (PWS) results from loss of function of paternally expressed genes in the 15q11-q13 region. It is characterized by intellectual disability, obesity, short stature, and behavioral problems. PWS can arise from a deletion on the paternal chromosome 15, maternal uniparental disomy (where both chromosome 15s come from the mother), or an imprinting defect.
-
Angelman syndrome (AS) results from loss of function of the maternally expressed UBE3A gene in the same region. It is characterized by severe intellectual disability, seizures, ataxia, and a characteristically happy demeanor. AS can result from a deletion on the maternal chromosome 15, paternal uniparental disomy, or a mutation in UBE3A.
The fact that deletions of the same chromosomal region cause different syndromes depending on the parent of origin is powerful evidence for genomic imprinting. It demonstrates that the maternal and paternal copies of these genes are not functionally equivalent.
| Syndrome | Affected Genes | Parent of Origin | Key Symptoms |
|---|---|---|---|
| Prader-Willi | Paternally expressed genes at 15q11-q13 | Paternal loss | Obesity, intellectual disability, hypotonia |
| Angelman | UBE3A (maternally expressed) at 15q11-q13 | Maternal loss | Seizures, ataxia, severe intellectual disability |
| Beckwith-Wiedemann | IGF2 (paternal), CDKN1C (maternal) at 11p15 | Variable | Overgrowth, macroglossia, increased cancer risk |
| Silver-Russell | IGF2 and H19 at 11p15, chr 7 | Variable | Growth restriction, body asymmetry |
Diagram: Genomic Imprinting at the 15q11-q13 Locus
Genomic Imprinting at the 15q11-q13 Locus
Type: Interactive diagram
sim-id: genomic-imprinting-15q
Library: p5.js
Status: Specified
An interactive diagram showing the 15q11-q13 region with genes arranged linearly on two parallel bars representing the paternal (blue) and maternal (pink) chromosomes. Paternally expressed genes (SNRPN, NDN, MKRN3) are shown active (bright, with arrow) on the paternal chromosome and silenced (dimmed, with lock icon and methylation marks) on the maternal chromosome. The maternally expressed UBE3A gene is active on the maternal chromosome and silenced on the paternal. Users click buttons for three scenarios: (1) Normal (both chromosomes present), (2) Paternal deletion (paternal chromosome grays out, revealing Prader-Willi phenotype description), (3) Maternal deletion (maternal chromosome grays out, revealing Angelman phenotype description). A fourth button shows maternal UPD (both chromosomes turn pink), resulting in Prader-Willi because paternally expressed genes are absent.
Uniparental Disomy
Uniparental disomy (UPD) occurs when an individual inherits both copies of a chromosome (or chromosomal region) from the same parent, rather than one from each parent. UPD can be isodisomy (two identical copies of one parental homolog, which can unmask recessive alleles) or heterodisomy (one copy of each of the parent's two homologs).
UPD arises through several mechanisms, most commonly "trisomy rescue." If nondisjunction during meiosis produces a trisomic zygote, the cell may randomly lose one of the three chromosomes to restore a diploid state. If the lost chromosome happens to be from the parent who contributed only one copy, the resulting individual will have two copies from the other parent.
The clinical significance of UPD depends on whether the affected chromosome contains imprinted genes. If it does, UPD can produce the same syndrome as a deletion. Maternal UPD of chromosome 15, for example, results in Prader-Willi syndrome because the child has no paternally expressed copies of the critical genes. Paternal UPD of chromosome 15 causes Angelman syndrome because the child has no maternally expressed UBE3A.
UPD of chromosomes without imprinted genes may have no phenotypic effect in heterodisomy but can cause autosomal recessive disorders in isodisomy by making the child homozygous for a recessive allele that the contributing parent carried in a single copy.
Chapter Complete!
Wonderful work, researchers! You have now explored the major mechanisms of genome evolution, from gene duplication and its many fates to the remarkable world of genomic imprinting. These concepts connect molecular biology to evolution and clinical genetics, showing that inheritance is more complex and beautiful than Mendel ever imagined!
Chapter Summary
In this chapter, we examined the mechanisms that drive genome evolution and non-Mendelian inheritance:
- Gene duplication creates redundant gene copies through unequal crossing over, retrotransposition, or whole-genome duplication. Duplicated genes can undergo neofunctionalization, subfunctionalization, or pseudogenization.
- Paralogs are homologous genes within the same genome that arose by duplication, while orthologs arose by speciation.
- Gene families such as the globins, Hox genes, and olfactory receptors are products of successive duplication and divergence events.
- Pseudogenes are disabled gene copies classified as duplicated, processed, or unitary, and the human genome contains 15,000-20,000 of them.
- Segmental duplications (>1 kb, >90% identity) predispose to recurrent genomic disorders through nonallelic homologous recombination.
- Polyploidy (whole-genome duplication) is a major mechanism of speciation in plants, and ancient polyploidy events shaped the vertebrate genome.
- Chromosomal rearrangements including inversions, translocations, and fusions reshape karyotypes over evolutionary time and can drive speciation.
- Genomic imprinting causes parent-of-origin-dependent gene expression through DNA methylation and histone modifications, with Prader-Willi and Angelman syndromes as classic examples.
- Parent-of-origin effects violate the Mendelian assumption that maternal and paternal alleles are functionally equivalent.
- Uniparental disomy occurs when both copies of a chromosome come from one parent, causing imprinting disorders or unmasking recessive alleles.