In partial fulfillment of the Requirements for the Degree of
Master of Science in Biology
School of Biology
Juan Camilo Castro Gordillo
will defend his thesis
“Finding the needle in the haystack: Developing tools for genome detection in metagenomic datasets”
Wednesday, August 16, 2017
School of Biomedical Engineering (Whitaker Building), Room 1232
Dr. Kostas Konstantinidis
School of Civil and Environmental Engineering
Dr. I. King Jordan (Biological Sciences)
Dr. Frank Stewart (Biological Sciences)
Accurate detection of target microbial species in metagenomic datasets from environmental samples remains limited, because the limit of detection of current methods is typically inaccessible and the frequency of false-positives, resulting from inadequate identification of regions of the genome that are either too highly conserved to be diagnostic (e.g., rRNA genes) or prone to frequent horizontal genetic exchange (e.g., mobile elements) remains unknown.
Our framework, called imGLAD, is based on mapping reads against a reference genome and subsequently calculating the likelihood that the genome is present based on logistic feature classification. imGLAD achieves high accuracy because it uses the sequence-discrete population concept for discriminating between metagenomic reads originating from the target organism compared to reads from co-occurring close relatives, masks regions of the genome that are not informative using the MyTaxa engine, and models both the sequencing breadth and depth to determine relative abundance and limit of detection. We validated imGLAD by analyzing metagenomic datasets derived from spinach leafs inoculated with the enteric pathogen Escherichia coli O157:H7 and showed that its limit of detection is comparable to that of PCR-based approaches ~1 cell/gram.