As their name implies, ultraconserved elements (UCEs) are highly conserved regions of organismal genomes shared among evolutionary distant taxa - for instance, birds share many UCEs with humans. UCEs were first described in a wonderful manuscript by Gil Bejerano et al. (2004) from David Haussler’s group and subsequently identified in several classes of organisms outside the group of original taxa (Siepel et al. 2005) used to identify these genomic elements. The 27-way vertebrate genome alignment (Miller et al. 2007) identified additional regions of high conservation.
We have discovered (see Citations) that we can collect data from UCEs and the DNA adjacent to UCE locations (flanking DNA), and that these data are useful for reconstructing the evolutionary history and population-level relationships of many organisms. Because UCEs are conserved across disparate taxa, UCEs are also
That's an extremely good question, and one to which we do not entirely know the answer (Dermitzakis et al. 2005). UCEs have been associated with gene regulation (Pennachio et al. 2006) and development (Sandelin et al. 2004, Woolfe et al. 2004) and we generally assume that UCEs must be important by the very nature of their near-universal conservation across extremely divergent taxa. However, gene knockouts of UCE loci in mice resulted in viable, fertile offspring (Ahituv et al. 2007), suggesting that their role in the biology of the genome may be cryptic.
You can identify UCEs in organismal genome sequences by aligning several genomes to each other, scanning the resulting genome alignments for areas of very high (95-100%) sequence conservation, and filtering on user-defined criteria, such as length (e.g., Bejerano et al. 2004). If you want to use these regions as genetic markers, it is best to remove UCEs that appear to be duplicates of one another which we loosely define as being in more than one spot within each genome that you aligned. The resulting loci are the highly conserved that we target for use as molecular markers.
From the resulting set of UCEs shared among a taxonomic group, we design sequence capture (AKA solution hybrid selection sensu Gnirke et al. 2009) probes that are similar in sequence to the UCE loci we are targeting. These probe sets differ in number and composition, depending on the types of questions we are asking and the taxa with which we are working. Once we design a probe set, we follow sequence capture protocols to enrich DNA libraries for the target UCEs, usually in multiplex. Following enrichment, we sequence the DNA enriched for UCEs using massively parallel sequencing.
The most complex part of using UCEs to understand evolutionary relationships, population structure, and population relationships is analyzing the DNA sequence data. We have created several software packages and we're working on tutorials to help get you started. Many of the steps, at this point, require that you are comfortable working with computer software on the command line. We encourage everyone interested to get the software and contribute to the effort of documenting, improving, and extending our computer code.
Faircloth BC, McCormack JE, Crawford NG, Harvey MG, Brumfield RT, Glenn TC. 2012. Ultraconserved Elements Anchor Thousands of Genetic Markers Spanning Multiple Evolutionary Timescales. Systematic Biol. pmid: 22232343 doi:10.1093/sysbio/sys004.
McCormack JE, Faircloth BC, Crawford NG, Gowaty PA, Brumfield RT, Glenn TC. 2012. Ultraconserved Elements Are Novel Phylogenomic Markers that Resolve Placental Mammal Phylogeny when Combined with Species Tree Analysis. Genome Res 22: 746–754. pmid: 22207614 doi: 10.1101/gr.125864.111.
Crawford NG, Faircloth BC, McCormack JE, Brumfield RT, Winker K, Glenn TC. 2012. More than 1000 ultraconserved elements provide evidence that turtles are the sister group of archosaurs. Biol Lett. pmid: 22593086 doi:10.1098/rsbl.2012.0331.
Below are several probe designs that we have used in publications or that we are currently using in the lab. We are constantly evaluating the utility of given probe sets and probe designs, in addition to expanding the number of UCE loci we are targeting. We have several larger probes sets in the works, and we are also working on optimizing probe sets based on their capture success, phylogenetic utility, etc. Please check back for updates.
The linked FASTA file (ZIP archive) contains probe sequences (120 nt) designed for synthesis as part of a Agilent SureSelect or MycroArray MyBait target enrichment kit. We used these probes for our in-silico analysis of the placental mammal phylogeny, our in vitro analysis of extant bird groups, and our in vitro analysis of the phylogenetic position of turtles. By their deposition in Dryad, all probes are available under a CC0 license, thus freely available for you to use.
Note: We designed probes from UCEs by including flanking sequence from chickens. Because of the highly conserved nature of UCEs and their flanking sequence, we have found these probes work well across amniotes.
The linked FASTA file (ZIP archive) contains probe sequences (120 nt) designed for synthesis as part of a Agilent SureSelect or MycroArray MyBait target enrichment kit. We used these probes for our in-silico analysis of the primate phylogeny, and the 2,560 probes targeting 2,386 loci are a subset of this larger set of probes. All probes are available under a CC0 license, thus freely available for you to use.
Note 1: We designed probes from UCEs by including flanking sequence from chickens. Because of the highly conserved nature of UCEs and their flanking sequence, we have found these probes work well across amniotes.
Note 2: Although this probe set is not, yet, referenced in a publication, we have been using it for some time across a variety of taxa with much success.
Below are several software packages we have developed to help analyze data collected from UCE loci. All computer code is available under a flexible open-source license (BSD). We welcome all code contributions, from helping to improve the code, fix bugs, improve usability, and improve documentation, which is rather sparse, at the moment. Please contact us through twitter (@ultraconserved) if you are interested in helping and/or post an issue on github for the respective package.
Note: All software packages are likely to contain bugs - use at your own risk.
Our main code repository for analyzing data collection from UCE loci. Contains command-line applications for assembling contigs from sequence data, finding which contigs align to UCEs, aligning UCE contigs, and preparing data for downstream analysis in mrbayes, raxml, and cloudforest.
Report an issue with phyluce.
A program for demultiplexing massively parallel sequencing reads tagged with edit distance or Hamming distance sequence tags - tailored to edit distance tags (see Tags). Capable of demultiplexing hundreds of sequence tagged libraries at once.
Report an issue with splitaake.
A program for automated cleaning of fastq files from sequencing. Removes adapter contamination using scythe and trims reads for quality using sickle. Concatenates reads into an "interleaved" fastq.gz file for use with velvet.
Report an issue with illumiprocessor
A program started by Nick Crawford with contributions from Brant Faircloth for parallel computation of individual genetrees and the estimation of a species tree from gene tree data. Allows analysis and bootstrapping of large datasets. Uses elastic map reduce or multiprocessing to analyze data in parallel. MPI version coming soon.Report an issue with cloudforest