Click here to close
Hello! We notice that you are using Internet Explorer, which is not supported by Xenbase and may cause the site to display incorrectly.
We suggest using a current version of Chrome,
FireFox, or Safari.
???displayArticle.abstract???
BACKGROUND: Many homeobox genes show remarkable conservation between divergent animal phyla. In contrast, the ARGFX (Arginine-fifty homeobox) homeobox locus was identified in the human genome but is not present in mouse or invertebrates. Here we ask when and how this locus originated and examine its pattern of molecular evolution.
RESULTS: Phylogenetic and phylogenomic analyses suggest that ARGFX originated by gene duplication from Otx1, Otx2 or Crx during early mammalian evolution, most likely on the stem lineage of the eutherians. ARGFX diverged extensively from its progenitor homeobox gene and its exons have been functional and subject to purifying selection through much of placental mammal radiation. Surprisingly, the coding region is disrupted in most mammalian genomes analysed, with human being the only mammal identified in which the full open reading frame is retained. Indeed, we describe a transcript from human testis that has the potential to encode the full deduced protein.
CONCLUSIONS: The unusual pattern of evolution suggests that the ARGFX gene may encode a functional RNA or alternatively it may have 'flickered' between functional and non-functional states in the evolutionary history of mammals, particularly in the period when many mammalian lineages diverged within a relatively short time span.
Figure 1. Gene structure of human ARGFX. PCR primer positions and amplicons are shown relative to predicted human ARGFX gene structure [6]. Boxes indicate exons, drawn to scale; lines indicate introns, not drawn to scale. Numbers above boxes and beneath lines indicate the lengths of each exon and intron. The 5' and 3' untranslated regions are shown in white and the protein coding regions are shown in black, except for the homeodomain which is red.
Figure 2. ARGFX gene in vertebrates. The phylogenetic tree of mammals is based on [29-32]. Column 1: ARGFX is inferred to be a probable gene (√), possible pseudogene with disrupted coding region (ψ), secondarily completely lost (×) or not included in current genome data (?). Column 2: Minimal number of retrotransposed ARGFX pseudogenes based on current genome data. Column 3: Synteny with the human ARGFX genomic region is conserved (√) or not (×), or not sequenced (?).
Figure 3. Comparison of ARGFX, OTX1, OTX2 and CRX gene structures. Human TPRX1, DPRX, HESX1 and GSC gene structures were used as references. Exons are represented by boxes and introns by lines, with the length in nucleotides written above. The 5' and 3' untranslated regions are shown in white and the protein coding regions in black except for homeodomains which are shown in red. Human gene structures follow the NCBI gene annotation; tree shrew and megabat ARGFX intron positions were deduced by reference to retroposed pseudogenes.
Figure 4. Greater conservation of DNA sequences between human ARGFX, OTX1 and OTX2 genomic regions than with DPRX and TPRX1. Human GSC and HESX1 were also used as references, but no similarity was found. Genomic sequences from the last base pair of upstream gene to the first base pair of downstream gene for each locus (based on UCSC at http://genome.ucsc.edu/ were used, and compared using Shuffle-LAGAN [28] in mVISTA, which can detect sequence rearrangements. Coloured peaks (purple, coding; pink, intergenic; blue, transcribed non-coding) indicate regions of at least 30 bp and 30% similarity.
Figure 5. Phylogenetic relationship between ARGFX and other PRD class homeobox genes. Maximum likelihood phylogenetic tree constructed using complete deduced human ARGFX protein sequence and the most similar human homeodomain proteins. Bootstrap support values over 50% are shown. Essentially the same topology was recovered by Bayesian analysis except at weakly supported nodes, notably the position of VSX1.
Figure 6. Synteny and paralogy around the Otx gene family. Map positions of amphioxus Otx and its neighbouring genes are compared to their human orthologues, which map primarily to chromosomes 1, 2, 14, 11 and 19, not chromosome 3. Amphioxus genes are shown in their physical order, and are numbered as in amphioxus (B. floridae) genome assembly v. 1.0. GeneID 20 is amphioxus Otx. GeneID 22 and 23 are most likely two parts of a gene and are treated as one locus. Human orthologues are not necessarily in order. Amphioxus genes 2 and 19 (black boxes) do not have clear human homologues; phylogenetic relationships are not well resolved for amphioxus genes 8, 17 and 24 (grey boxes). Human orthologues of amphioxus gene 13 do not map to on the five main chromosomal regions.
Figure 7. Conservation of DNA sequence conservation between mammalian ARGFX genomic sequences. Only species with a high genome assembly in this region were used. Sequences were aligned by the LAGAN program [17] in mVISTA. Length of the genomic region for each species is on the right. Macaque is missing a region between exon 2 and exon 4 accounting for higher similarity between human and marmoset than human and macaque in this region. The higher sequence similarity in and around exons is clearly visible, indicative of selective constraints since divergence of the species shown. Coloured peaks (purple, coding; pink, intergenic; light blue, UTR) indicate regions of at least 50 bp and 50% similarity.a. mVISTA plot using repeat-masked genomic sequences; b. mVISTA plot using sequence with no masking of repeats.
Abascal,
ProtTest: selection of best-fit models of protein evolution.
2005, Pubmed
Abascal,
ProtTest: selection of best-fit models of protein evolution.
2005,
Pubmed Altschul,
Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.
1997,
Pubmed Booth,
Annotation, nomenclature and evolution of four novel homeobox genes expressed in the human germ line.
2007,
Pubmed Brudno,
LAGAN and Multi-LAGAN: efficient tools for large-scale multiple alignment of genomic DNA.
2003,
Pubmed Brudno,
Glocal alignment: finding rearrangements during alignment.
2003,
Pubmed Cillo,
HOX genes in human cancers.
,
Pubmed Clapp,
Evolutionary conservation of a coding function for D4Z4, the tandem DNA repeat mutated in facioscapulohumeral muscular dystrophy.
2007,
Pubmed Del Bene,
Cell cycle control by homeobox genes in development and disease.
2005,
Pubmed Felsenstein,
CONFIDENCE LIMITS ON PHYLOGENIES: AN APPROACH USING THE BOOTSTRAP.
1985,
Pubmed Frith,
Pseudo-messenger RNA: phantoms of the transcriptome.
2006,
Pubmed Gal-Mark,
Alternative splicing of Alu exons--two arms are better than one.
2008,
Pubmed Garcia-Fernández,
Archetypal organization of the amphioxus Hox gene cluster.
1994,
Pubmed Guindon,
PHYML Online--a web server for fast maximum likelihood-based phylogenetic inference.
2005,
Pubmed Holland,
Classification and nomenclature of all human homeobox genes.
2007,
Pubmed Kappen,
Evolution of a regulatory gene family: HOM/HOX genes.
1993,
Pubmed Kriegs,
Evolutionary history of 7SL RNA-derived SINEs in Supraprimates.
2007,
Pubmed Manak,
A class act: conservation of homeodomain protein functions.
1994,
Pubmed Murphy,
Molecular phylogenetics and the origins of placental mammals.
2001,
Pubmed Nishihara,
Retroposon analysis and recent geological data suggest near-simultaneous divergence of the three superorders of mammals.
2009,
Pubmed Nunes,
Homeobox genes: a molecular link between development and cancer.
2003,
Pubmed Prasad,
Confirming the phylogeny of mammals by use of large comparative sequence data sets.
2008,
Pubmed Putnam,
The amphioxus genome and the evolution of the chordate karyotype.
2008,
Pubmed Ronquist,
MrBayes 3: Bayesian phylogenetic inference under mixed models.
2003,
Pubmed Saitou,
The neighbor-joining method: a new method for reconstructing phylogenetic trees.
1987,
Pubmed Schneider,
Support patterns from different outgroups provide a strong phylogenetic signal.
2009,
Pubmed Takatori,
Comprehensive survey and classification of homeobox genes in the genome of amphioxus, Branchiostoma floridae.
2008,
Pubmed Tamura,
MEGA4: Molecular Evolutionary Genetics Analysis (MEGA) software version 4.0.
2007,
Pubmed Thompson,
The CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools.
1997,
Pubmed Yang,
PAML 4: phylogenetic analysis by maximum likelihood.
2007,
Pubmed Zhang,
Positive Darwinian selection after gene duplication in primate ribonuclease genes.
1998,
Pubmed Zheng,
Integrated pseudogene annotation for human chromosome 22: evidence for transcription.
2005,
Pubmed