Sasse Lab Master's Thesis Project -HiWi Contract Possible
Toward a virtual cell — learning the post-transcriptional regulatory code
​
​​
Current efforts to build "virtual cells" — deep learning models of how cells respond to perturbation — focus almost entirely on transcription: which genes are turned on. They largely ignore the post-transcriptional layer, where RNA-binding proteins (RBPs) decide whether each transcript is spliced, transported, stabilized, or translated. This layer matters: RBP dysregulation underlies neurodegenerative diseases (ALS, FTD, Fragile X) and contributes to several cancers, and the first generation of RNA-targeting therapeutics — splice-modulating drugs, mRNA medicines — is already in clinical use.
From an ML perspective, the problem is well-defined. Hundreds of RBPs have been profiled by CLIP-seq, producing millions of binding sites in public datasets such as ENCODE — yet predicting RBP binding from RNA sequence alone remains an open and challenging problem, even with recent genomic foundation models like AlphaGenome. Building on our group's current modeling work, you will develop deep learning models — convolutional, transformer-based, or extensions of foundation architectures — to learn the sequence grammar of post-transcriptional regulation. These models should be tested on downstream directions: identifying genetic variants that disrupt this layer and contribute to disease, and designing sequences for therapeutic applications such as mRNA vaccines. The specific path is flexible and shaped to your interests: nucleotide-resolution models across many proteins at once, cross-cell-type or cross-species generalization, methodological extensions toward more mechanistic and interpretable models, or integration of additional datasets.
Requirements: strong Python; comfortable with a deep learning framework (PyTorch preferred). For suitably qualified candidates, a HiWi contract (40 h/month, 6 months) is available.
​
Applications should be made to a.sasse@zmbh.uni-heidelberg.de

