About Plant uORF-pep Database
Upstream open reading frames (uORFs) are regulatory elements located in the 5' untranslated region (5' UTR) of eukaryotic mRNAs. They play critical roles in regulating translation of downstream main open reading frames (mORFs).
📊 Database Contents
This database provides systematic uORF identification across 30 plant species:
Arabidopsis (Arabidopsis thaliana)
Banana (Musa acuminata)
Barley (Hordeum vulgare)
Brachypodium (Brachypodium distachyon)
Cabbage (Brassica oleracea)
Cabbage (Brassica rapa)
Cacao (Theobroma cacao)
Cassava (Manihot esculenta)
Chickpea (Cicer arietinum)
Cotton (Gossypium hirsutum)
Cowpea (Vigna unguiculata)
Cucumber (Cucumis sativus)
Maize (Zea mays)
Medicago (Medicago truncatula)
Melon (Cucumis melo)
Millet (Setaria italica)
Mungbean (Vigna radiata)
Pea (Pisum sativum)
Peach (Prunus persica)
Peanut (Arachis hypogaea)
Poplar (Populus trichocarpa)
Potato (Solanum tuberosum)
Radish (Raphanus sativus)
Rapeseed (Brassica napus)
Rice (Oryza sativa)
Sorghum (Sorghum bicolor)
Soybean (Glycine max)
Strawberry (Fragaria vesca)
Tomato (Solanum lycopersicum)
Wheat (Triticum aestivum)
🔬 Methods
uORFs were identified using a custom Python pipeline based on cDNA sequences from NCBI RefSeq. Key features:
- 5' UTR extraction: Directly from spliced cDNA sequences (no intron contamination)
- Start codon: Canonical AUG only
- Minimum uORF length: 18 nucleotides (encoding ≥6 amino acids)
- uORF identification: All three reading frames scanned with fixed-frame ORF detection
- Output: Each uORF includes nucleotide sequence and predicted peptide sequence
- Data sources: NCBI RefSeq (genome annotations), NCBI Gene (symbols & descriptions), GEO/SRA (Ribo-seq translation evidence)
🧬 Translation Evidence (Ribo-seq Validation)
To bridge the gap between computational prediction and biological reality, we integrated publicly available Ribo-seq data to annotate uORF-hosting genes with translation evidence.
- Data sources: Processed Ribo-seq datasets from GEO (GSE183264, GSE124115, GSE199932, GSE128680, GSE184725)
- Species validated: Arabidopsis (97.7%), Maize (66.5%), Tomato (62.4%), Medicago (51.1%), Rice (48.5%), Sorghum (81.6%)
- Validation method: Gene-level translation activity (TPM ≥ 1 in Ribo-seq, or Ribo-seq footprint overlap with annotated genes)
- Display: Genes with Ribo-seq support are marked with 🔬 Ribo-seq validated badge in search results
- Coverage: Overall 60.0% of uORFs across validated species have translation evidence
- Cross-species conservation: uORF peptides were compared across 10 representative plant species using DIAMOND (>50% identity, >80% coverage). 84,953 genes harbor conserved uORFs.
- Machine learning prediction: A Random Forest model (9 features, 79% accuracy) trained on experimentally validated Arabidopsis uORFs predicts translation probability for all 5.02 million uORFs. 234,060 genes have at least one uORF with >75% predicted translation probability.
- Novel candidate discovery: 6,628 conserved (≥2 species) but functionally unannotated uORF peptides (≥10 aa) were identified as high-value candidates for experimental validation. Top 100 ranked by conservation and ML probability.
- uORF-level validation: 5,297 predicted uORFs (75.7%) matched experimentally translated uORFs (RiboTaper & CiPS, Wu et al. 2024 Plant Cell) in Arabidopsis.
🔍 Features
- Search by gene symbol, NCBI ID, LOC ID, Ensembl ID, TAIR ID
- Case-insensitive search with prefix matching
- Cross-species query support
- FASTA download per transcript
📝 Citation
If you use this database in your research, please cite:
Plant uORF-pep Database: a comprehensive resource for upstream open reading frames in plants. (2026)