Browsing by Author "Nekrutenko, Anton"
Now showing 1 - 4 of 4
Results Per Page
Sort Options
Item Controlling for contamination in re-sequencing studies with a reproducible web-based phylogenetic approach(2014) Dickins, Benjamin; Rebolledo-Jaramillo, Boris; Su, Marcia Shu-Wei; Paul, Ian M.; Blankenberg, Daniel; Stoler, Nicholas; Makova, Kateryna D.; Nekrutenko, AntonPolymorphism discovery is a routine application of next-generation sequencing technology where multiple samples are sent to a service provider for library preparation, subsequent sequencing, and bioinformatic analyses. The decreasing cost and advances in multiplexing approaches have made it possible to analyze hundreds of samples at a reasonable cost. However, because of the manual steps involved in the initial processing of samples and handling of sequencing equipment, cross-contamination remains a significant challenge. It is especially problematic in cases where polymorphism frequencies do not adhere to diploid expectation, for example, heterogeneous tumor samples, organellar genomes, as well as during bacterial and viral sequencing. In these instances, low levels of contamination may be readily mistaken for polymorphisms, leading to false results. Here we describe practical steps designed to reliably detect contamination and uncover its origin, and also provide new, Galaxy-based, readily accessible computational tools and workflows for quality control. All results described in this report can be reproduced interactively on the web as described at http://usegalaxy.org/contamination.Item Enhancing pre-defined workflows with ad hoc analytics using Galaxy, Docker and Jupyter(2016) Grüning, Björn; Rasche, Eric; Rebolledo-Jaramillo, Boris; Eberhard, Carl; Houwaart, Torsten; Chilton, John; Coraor, Nathan; Backofen, Rolf; Taylor, James; Nekrutenko, AntonWhat does it take to convert a heap of sequencing data into a publishable result? First, common tools are employed to reduce primary data (sequencing reads) to a form suitable for further analyses (i.e., list of variable sites). The subsequent exploratory stage is much more ad hoc and requires development of custom scripts making it problematic for biomedical researchers. Here we describe a hybrid platform combining common analysis pathways with exploratory environments. It aims at fully encompassing and simplifying the “raw data-to-publication” pathway and making it reproducible.Item Jupyter and Galaxy: Easing entry barriers into complex data analyses for biomedical researchers(PLoS, 2017) Grüning, Björn; Rasche, Eric; Rebolledo-Jaramillo, Boris; Eberhard, Carl; Houwaart, Torsten; Chilton, John; Coraor, Nate; Backofen, Rolf; Taylor, James; Nekrutenko, AntonWhat does it take to convert a heap of sequencing data into a publishable result? First, common tools are employed to reduce primary data (sequencing reads) to a form suitable for further analyses (i.e., the list of variable sites). The subsequent exploratory stage is much more ad hoc and requires the development of custom scripts and pipelines, making it problematic for biomedical researchers. Here, we describe a hybrid platform combining common analysis pathways with the ability to explore data interactively. It aims to fully encompass and simplify the "raw data-to-publication" pathway and make it reproducible.Item RNA-DNA differences in human mitochondria restore ancestral form of 16S ribosomal RNA(2013) Bar-Yaacov, Dan; Avital, Gal; Levin, Liron; Richards, Allison L.; Hachen, Naomi; Rebolledo-Jaramillo, Boris; Nekrutenko, Anton; Zarivach, Raz; Mishmar, DanRNA transcripts are generally identical to the underlying DNA sequences. Nevertheless, RNA-DNA differences (RDDs) were found in the nuclear human genome and in plants and animals but not in human mitochondria. Here, by deep sequencing of human mitochondrial DNA (mtDNA) and RNA, we identified three RDD sites at mtDNA positions 295 (C-to-U), 13710 (A-to-U, A-to-G), and 2617 (A-to-U, A-to-G). Position 2617, within the 16S rRNA, harbored the most prevalent RDDs (>30% A-to-U and similar to 15% A-to-G of the reads in all tested samples). The 2617 RDDs appeared already at the precursor polycistrone mitochondrial transcript. By using traditional Sanger sequencing, we identified the A-to-U RDD in six different cell lines and representative primates (Gorilla gorilla, Pongo pigmaeus, and Macaca mulatta), suggesting conservation of the mechanism generating such RDD. Phylogenetic analysis of more than 1700 vertebrate mtDNA sequences supported a thymine as the primate ancestral allele at position 2617, suggesting that the 2617 RDD recapitulates the ancestral 16S rRNA. Modeling U or G (the RDDs) at position 2617 stabilized the large ribosomal subunit structure in contrast to destabilization by an A (the pre-RDDs). Hence, these mitochondrial RDDs are likely functional.