Skip to content

deepgenomics/REPRESS

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

REPRESS: A Deep Learning Model for Cell-Type-Specific Post-Transcriptional Gene Regulation

REPRESS (Regulatory Element PRediction of post-transcriptional Events using Sequence Signals) is a deep learning model that predicts cell-type-specific microRNA (miRNA) binding and mRNA degradation directly from RNA sequence.

Trained on millions of sites from AGO2-CLIP, miR-eCLIP, and Degradome-Seq data across multiple human and mouse cell types, REPRESS learns rich representations of post-transcriptional gene regulation. It captures both canonical and non-canonical miRNA binding, integrates sequence context and site multiplicity, and generalizes across tissues and modalities. The version of REPRESS released in this repository has 8356199 parameters with the list of AGO2-CLIP, miR-eCLIP and Degradome-Seq cell lines shown in the Published Datasets used for model training section below.

Key Features

  • Predicts miRNA binding and mRNA degradation at single-nucleotide resolution
  • Captures cell-type-specific regulatory signals using endogenous sequence features
  • Learns non-canonical targeting rules and contextual repression mechanisms
  • Outperforms prior methods on 7 orthogonal benchmarks, including:
    • Variant effect prediction on miRNA binding
    • Out-of-distribution generalization to reporter assay data
    • Discovery of novel repressive elements across transcripts

Applications

  • Functional annotation of non-coding variants
  • Mechanistic insights into miRNA and RBP interactions
  • Rational design of RNA therapeutics

Model Architecture

REPRESS Model Architecture

Prerequisites

  • Anaconda or Mamba (Python ≥3.8)
  • Optional (for GPU support):
    • NVIDIA GPU with a driver that supports CUDA ≥11.2
    • Note: The CUDA toolkit and cuDNN libraries are automatically installed in the Conda environment; no system-wide CUDA installation is needed.
  • Other Tools
    • wget (used to download initial data).

Steps to setup environment

The following commands will create a Conda environment, install required packages, and set up the project. GPU support will be enabled automatically if compatible hardware and drivers are present.

Note: The commands below use mamba for faster dependency resolution. You may substitute mamba with conda if preferred.

# Create the environment from the environment.yml file
mamba env create -f environment.yml

# Activate the environment
mamba activate repress

# Install the project in editable mode
pip install -e .

# Make the build script executable and run it
chmod +x build.sh
./build.sh

Run Analyses

All data required for the analyses with be under the data/ subfolder in each of the analysis and resulting plots will be under the plots/ folder.

# Run REPRESS to generate track plots and ISM for UTRN example
python repress/track_plots_and_ism_analysis/generate_track_plots_and_ism.py

# Run REPRESS to generate degradome predictions for ENO2
python repress/degradome_prediction_analysis/generate_degradome_predictions.py

# Run REPRESS on Slutskin MPRA analysis
python repress/slutskin_mpra_analysis/repress_slutskin_mpra_analysis.py

# Run REPRESS to generate wild type and mutant tracks with variant or oligo treatement
python repress/oligo_track_plots_analysis/generate_oligo_variant_tracks.py

# Run REPRESS on miRNA variant effect prediction analysis
python repress/variant_effect_prediction_analysis/get_variant_predictions.py
python repress/variant_effect_prediction_analysis/plot_variant_analysis.py

Usage

Note: The following code must be executed from the repository root.

import os
from genome_kit import Genome
import matplotlib.pyplot as plt

from repress.plot_tracks import make_track_plot
from repress.model_wrapper import REPRESS

# ------------------------------------------------------------------
# 1. Load a REPRESS model and its associated cell-line metadata
# ------------------------------------------------------------------
#   - "repress/repress_model"       → directory in repo with weights
#   - "repress/cell_line_csv.csv"   → CSV mapping cell-line names to indices
model = REPRESS(
    path="repress/repress_model",
    cell_type_csv="repress/cell_line_csv.csv"
)

# ------------------------------------------------------------------
# 2. Set up reference genome and locate the region of interest
# ------------------------------------------------------------------
# Load GENCODE v29 reference genome annotation
genome = Genome("gencode.v29")

# Gene we care about
gene_name = "ENO2"

# Get the Gene object for ENO2
# (Genome.genes is an iterable of Gene objects)
gene = [g for g in genome.genes if g.name == gene_name][0]

# Choose a specific transcript (second isoform in the list)
transcript = gene.transcripts[1]

# Grab the first 3′-UTR interval of that transcript
utr3 = transcript.utr3s[0]

# ------------------------------------------------------------------
# 3. Choose the cell line context for the prediction
# ------------------------------------------------------------------
cell_line = "deg_A549"          # must match an entry in cell_line_csv

# ------------------------------------------------------------------
# 4. Run the model on the interval
# ------------------------------------------------------------------
# predict_interval returns a list (one score per input interval);
# we take the first element because we passed a single interval
prediction_score = model.predict_interval(
    interval=utr3,  # Interval / Intervals of interest.
    genome=genome,  # Genome of the species of interest.
    transcript=transcript,  # transcript(s) that the intervals belongs to
    cell_lines=cell_line  # A list of length k corresponding to the cell lines to be included in the output
)[0]

# ------------------------------------------------------------------
# 5. Plot the degradome prediction tracks
# ------------------------------------------------------------------
fig, ax = make_track_plot(utr3, prediction_score, gene, genome, figsize=(8, 2),
                        scale_marker=0.4, custom_transcript=transcript,
                        vertical_markers=[6923337], vertical_markers_colors=["red"],
                        ylabels=["A549 degradome"], labelpad=40,)

os.makedirs("temporary_plots", exist_ok=True)
plt.savefig("temporary_plots/ENO2_degradome_track_plot.png", dpi=600, bbox_inches="tight")
plt.close()

ENO2_degradome_track_plot

Published Datasets used for model training

Datasets used for miRNA binding components of REPRESS.

Name GSE Study ID GSM ID SRX ID Protocol Aligner Type Species Notes Source
BC1 GSE32109 PRJNA154855 GSM796037, GSM796038 SRX097115, SRX097116 PAR-CLIP bowtie1 bcell human Previously Published
BC3 GSE32109 PRJNA154855 GSM796039, GSM796040 SRX097117, SRX097118 PAR-CLIP bowtie1 bcell human Previously Published
BCBL GSE43909 PRJNA188176 GSM1074233, GSM1074234 SRX220837, SRX220838 PAR-CLIP bowtie1 bcell human Previously Published
DG75 GSE43909 PRJNA188176 GSM1074231, GSM1074232 SRX220835, SRX220836 PAR-CLIP bowtie1 bcell human Previously Published
HEK293 GSE28859 PRJNA153959 GSM714646, GSM714647 SRX058621, SRX058622 PAR-CLIP bowtie1 kidney human Previously Published
HeLa GSE29943 PRJNA140779 GSM741173, GSM741174, GSM741175 SRX083305, SRX083306, SRX083307 PAR-CLIP bowtie1 cervix human Previously Published
A2780 GSE129076 PRJNA529911 GSM3693008, GSM3693009 SRX5604726, SRX5604726 PAR-CLIP bowtie1 ovary human Previously Published
MCF7 PRJNA230871 SRX388831 PAR-CLIP bowtie1 breast human Previously Published
hESC PRJNA80179 SRX103431 PAR-CLIP bowtie1 stem human Previously Published
A673 GSE80494 PRJNA319049 SRX1716186, SRX1716187 PAR-CLIP bowtie1 muscle human Previously Published
22RV1 GSE137072 PRJNA564505 GSM4066540, GSM4066541, GSM4066542 SRX6817511, SRX6817512, SRX6817513 HITS-CLIP bowtie2 prostate human Previously Published
HCT116 GSE146688 PRJNA611621 GSM4404081, GSM4404082 SRX7883128, SRX7883129 eCLIP bowtie2 colon human Previously Published
Huh7 GSE73057 PRJNA295996 CLEAR-CLIP star liver human Previously Published
A549 miR-eCLIP star lung human This Study
K562 miR-eCLIP star bone human This Study
HEK293T miR-eCLIP star kidney human Previously Published
Mouse ESC GSE108795 PRJNA428611 GSM2913321, GSM2913322 SRX3533994, SRX3533995 PAR-CLIP bowtie1 stem mouse mouse embryonic stem cells Previously Published
Mouse myotubes GSE108795 PRJNA428611 GSM2913323, GSM2913324 SRX3533996, SRX3533997 PAR-CLIP bowtie1 muscle mouse cells differentiated from Myoblast → Myotubes Previously Published
Mouse BMDM GSE63199 PRJNA266979 GSM1543768 SRX758007 PAR-CLIP bowtie1 bone marrow mouse C57BL/6 mouse Previously Published
Mouse retina GSE165832 PRJNA697981 GSM5050738, GSM5050741 SRX9980473, SRX9980474 PAR-CLIP bowtie1 retina mouse CD-1 mice, P0 Previously Published
Mouse brain GSE129885 PRJNA533090 GSM3724178 SRX5694011 eCLIP star brain mouse C57BL/6 mouse, P0 Previously Published
Mouse iNeuron GSE140838 PRJNA591126 GSM4188573 SRX7202046 seCLIP star CNS mouse ESC, day 10 differentiation Previously Published
Mouse liver miR-eCLIP star liver mouse C57BL/6J mouse, 8-week-old Previously Published

Datasets used for mRNA degradation components of REPRESS.

Cell Line Num Reps Protocol Aligner Type Species Source
A549 4 Degradome-Seq star lung human This Study
HepG2 3 Degradome-Seq star liver human This Study
iNeuron glutamatergic 3 Degradome-Seq star CNS human This Study
K562 3 Degradome-Seq star bone human This Study
PHH 6 Degradome-Seq star liver human This Study
Yecuris liver 3 Degradome-Seq star liver human This Study
Mouse cortex 3 Degradome-Seq star brain mouse This Study
Mouse liver 3 Degradome-Seq star liver mouse This Study
Mouse cns e18 day4 3 Degradome-Seq star CNS mouse This Study
Mouse cns e18 day11 3 Degradome-Seq star CNS mouse This Study

License

This material is released under the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) license. If you use this software or data in your non-commercial work, please cite our paper [1].

For inquiries regarding commercial use please contact: [email protected]. Deep Genomics has filed one or more patent applications related to technical aspects of this work including PCT/IB2025/052017.

Citing this work

If you use REPRESS in your research, please cite:

@article{Kanuparthi2025.05.15.654105,
	author = {Kanuparthi, Bhargav and Pour, Sara E. and Findlay, Scott D. and Wagih, Omar and Gutierrez, Jahir M. and Gao, Rory and Wintersinger, Jeff and Lin, Junru and Gabra, Martino and Bohn, Emma and Lau, Tammy and Cole, Christopher B and Jung, Andrew and Celaj, Albi and Soares, Fraser and Gray, Rachel and Vaz, Brandon and Delfosse, Kate and Lodaya, Varun and Bhargava, Sakshi and Ly, Diane and Yusuf, Farhan and Kron, Ken and Hoffman, Greg and Gandhi, Shreshth and Frey, Brendan J.},
	title = {Sequence based prediction of cell type specific microRNA binding and mRNA degradation for therapeutic discovery},
	elocation-id = {2025.05.15.654105},
	year = {2025},
	doi = {10.1101/2025.05.15.654105},
	publisher = {Cold Spring Harbor Laboratory},
	URL = {https://www.biorxiv.org/content/early/2025/05/16/2025.05.15.654105},
	eprint = {https://www.biorxiv.org/content/early/2025/05/16/2025.05.15.654105.full.pdf},
	journal = {bioRxiv}
}

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •