ISCB Affiliated Group Logo

The Soeding lab ( at the MPI for biophysical chemistry in Goettingen is looking to fill a positions for postdocs (E13 TVoeD Bund, three years), starting ~ 01/2019 or later, in methods development for metagenomics. 

Motivation: To predict the function of protein sequences in metagenomes, all common tools search for related sequences in the reference databases, from which the functional annotation can be inferred. But many species found in metagenomics studies are not closely related to any organism with a well-annotated genome. Therefore, the fraction of protein sequences in metagenomic data that cannot be annotated using this "vertical" information transfer is often as high as 65% to 90%.  This is the major obstacle to make progress.

Project: We want to develop a new paradigm for function prediction based on the transfer of contextual, ”horizontal” information. Building on our MMseqs2 software for fast sequence and profile searches [1] and sequence clustering [2], we will develop a very fast sequence search method that can find clusters of neighboring and co-transcribed genes. The basis idea of utilising genomic context is similar to well-known tools such as the STRING database. However, we are devising a novel statistical approach which, in combination with MMseqs2 and Linclust, will allow us to analyse huge numbers of genomes and metagenomes.  Using iterative profile searches combined with horizontal information transfer, we will mine massive amounts of genomic and metagenomic data to learn functional modules of genes / proteins that will subsequently be used for improved annotation. This novel approach promises to greatly accelerate the rate of biological and biotechnological discoveries by deep mining of metagenomic and genomic sequence data. 

Your profile: The project requires very good programming skills and interest in writing efficient, fast code. Experience with C++ and AVX2 is not required and can be learned on the job. Furthermore, an interest and some background in statistical approaches in bioinformatics will be highly useful. Women are particularly encouraged to apply. Applications from non-germans are very welcome. 
Our group: Our group of 3 postdocs, 6 PhD students, two master students and a PI develop statistical and computational methods for analyzing data from high-throughput biological experiments. You would join a team of 3 PhD students and a postdoc working on method development for metagenomics.

If you are interested I would be happy to hear from you!
Johannes Soeding <Diese E-Mail-Adresse ist vor Spambots geschützt! Zur Anzeige muss JavaScript eingeschaltet sein!>
[1] Steinegger, M., and Söding, J. (2017) MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets. Nature Biotechnol. 35, 1026–1028.
[2] Steinegger, M., and Söding, J. (2018) Clustering huge protein sequence sets in linear time. Nature Commun. 9, 2542.