DREAM-High

DREAM-HighDREAM-HighDREAM-HighDREAM-High
  • Home
  • DREAM-High Course
  • Enrichment Activities
  • About DREAM-High
  • Contact Us
  • DREAM-High Summer 2024
  • News

DREAM-High

DREAM-HighDREAM-HighDREAM-High
  • Home
  • DREAM-High Course
  • Enrichment Activities
  • About DREAM-High
  • Contact Us
  • DREAM-High Summer 2024
  • News

Structure-based protein interaction networks

PrePPI: Predicting Protein-Protein Interactions (PPIs) on a proteome-wide scale

The PrePPI Algorithm

Using a Bayesian statistical framework, PrePPI integrates structural features to predict whether a given pair of proteins are likely to interact. Applying PrePPI to all pairwise combinations of human proteins, a database of 800K high-confidence interactions was assembled. PrePPI scores are given as likelihood ratios,

Proteome-wide analysis aims to comprehensively study the entire set of protein-protein interactions (PPIs) within a given organism or cellular system. By integrating diverse data, Systems Biology tools provide a framework for interpreting proteome-wide PPI data and extracting meaningful biological insights. Analysis of PPI networks facilitates the identification of protein hubs, pathway crosstalk, and emergent properties of cellular systems.

Structure-based function annotation of signaling pathways

Overview of PI3K/AKT Signaling

The PI3K/Akt signaling pathway is an intracellular signaling  cascade that regulates cellular  processes, including cell growth,  proliferation, survival, and  metabolism.  

Click here for a short video introduction.

Resources

We will use the following biological databases and bioinformatics tools to discover systems-level and mechanistic insight into PI3K/AKT signaling.  

SIGNOR: The SIGnaling Network Open Resource

SIGNOR is a web-based database that organizes information about causal relationships in biological signaling pathways.  The data can be visualized as a network, showing the flow of signaling information.

UniProtKB: The UniProt Knowledgebase

UniProt is the central hub for the collection of functional information on proteins, with accurate, consistent and rich annotation. 

Protein Data Bank (PDB)

The Research Collaboratory for Structural Bioinformatics (RCSB) PDB is the US data center for the global PDB archive of 3D experimentally determined structure data for large biological molecules (proteins, DNA, and RNA) essential for research.

PrePPI: Predicting Protein-Protein Interactions

PrePPI leverages protein structure information from the PDB and the AlphaFold Protein Structure Database to make proteome-wide PPI predictions and to predict pathogen host interactions. 

PrePCI: Predicting Protein-Compound Interactions

PrePCI is similar to PrePPI except that the PDB templates are complexes of proteins and small molecules. The PrePCI database has predictions for 19,797 human proteins and 6.8 million chemical  compounds.  

Gene Set AI (GSAI)

GSAI is a web-based tool for function annotation of a set of genes or proteins with Large Language Models (LLMs) to process and synthesize information from large amounts of data and literature.

g:Profiler GOST (Gene Ontology Statistics)

g:Profiler GOSt is a web-based tool to perform functional enrichment analysis on an input gene/protein list. It maps the proteins to known functional information sources and detects statistically significantly enriched terms. 

AlphaFold 3

AlphaFold 3 builds upon the AlphaFold 2 model to offer significant improvements in predicting complex biomolecular structures and interactions, including PPIs.

Watch the video "How Does AlphaFold Server Work?"

EXERCISES

A. The PI3K/AKT Signaling Pathway

Goal: Create and visualize protein pathways and interaction networks.


  • On the SIGNOR home page, under 'Pathway Browser', find 'PI3K/AKT Signaling' and click on the adjacent green network icon. Explore the information in the graph. For example, click on proteins (nodes) and edges (interactions). 
  • What information do you get from nodes and edges? What types of support are provided for high-scoring edges versus low-scoring edges?


B. Protein-protein interactions in pathways

Goal: Retrieve structural models for PPIs in the PI3K/AKT pathway. 


  • Find HRAS In the SIGNOR PI3K/AKT Signaling pathway diagram. HRAS, is a small GTPase that contributes to the regulation of PI3K/AKT activity. 
  • At the PrePPI database, query for HRAS (UniProt ID: P01112). What information is provided? To learn what a section contains, click on the '?' icon in the upper right corner of its pane.
  • Examine some of the HRAS PPIs in 'Interactions' table. Click on the column heading 'SM' to sort the table by Structural Modeling score, highest to lowest. 
  • Proteins with SM score > 27 are considered high-confidence predictions. However, 3D models are provided only for those interactions with SM > 100. Click on a gray box in SM column, e.g. the EXOC8 SM value of 997.3. The model for the predicted complex will appear 'Interaction Structure' viewer. 
  • View several interaction models. How does the predicted PPI complex compare with the PDB template used in modeling by clicking on the 'Template' chain tabs? 


C. Protein function annotation

Goal: Functionally annotate your protein with its PrePPI interactors. 


  • We will explore how a protein's predicted interactors may recapitulate its known functions and suggest new roles.
  • PrePPI database results for many of the proteins in the SIGNOR diagram are provided here, both as individual comma-separated values files and as an aggregate Excel file.  Download the file(s) you would like to work with.
  • Sort the interactions by SM, highest to lowest.  
  • Copy the gene names for the 50 top-ranking genes. Submit them to the function annotation servers GSAI and g:Profiler GOSt. Are the results from the two servers concordant? How do they differ? Do any of the annotations seem related to the PI3K/AKT Signaling pathway?
  • GSAI works well for obtaining an overview theme. g:Profiler GOSt returns many annotations from highly curated resources, and a user may be interested in particular components. 
  • Re-sort the PrePPI interactions by 'Pred_score', highest to lowest. Pred_score combines the structure-based SM scores with non-structural evidence, e.g. 'EP', which reflects the extent to which two proteins have similar expression patterns. 
  • Submit the 50 top-ranking gene names to GSAI and g:Profiler GOSt. How do the results differ from using the genes ranked by SM?
  • EXTRA: Lists of experimentally observed HRAS PPIs from BioGRID with difference evidence sources are available in this Excel file. The first sheet contains information about the experimental approaches. Which experiments are expected to provide PPIs that involve direct physical interaction between HRAS and its partners? Which are likely to provide information on indirect but functionally related PPIs? Submit the lists to g:Profiler GOSt to test your expectations. 

D. Protein-compound interactions in pathways

Goal: Examine the interactions between proteins and the lipid PIP3.  


  • The signaling lipid PIP3 (orange box in the SIGNOR diagram) activates proteins by recruiting them to the membrane surface. As indicated in the diagram, PIP3 concentration is strictly regulated by the interplay of lipid kinases (PI3K) and lipid phosphatases (PTEN).
  • Many experiments have been performed and structures solved by representing the lipid PIP3 by its headgroup, the portion of the lipid that extends into the cytoplasm and binds with high specificity to its protein targets. 
  • Search the PrePCI database for proteins predicted to bind 4IP. '4IP' is the PDB identifier for the PIP3 headgroup. 
  • Are any of the predicted targets members of the PI3K/AKT signaling pathway? 
  • The Excel or PrePCI-4IP.csv file contains the PrePCI predictions. Sort the predictions by 'LT-Scanner Score' (the structure component of the prediction) and perform funcion annotation with GSAI and/or g:Profiler GOSt.  Sort by PrePCI LR and do the same.
  • We hypothesize the proteins that are predicted to bind the PIP3 headgroup are involved in PIP3-related cellular processes.

E. PrePPI predictions and AlphaFold 3 models

Goal: Calculate an AlphaFold 3 model for a PrePPI prediction. 


  • It's currently not possible to run AlphaFold-based programs on all pairwise combinations of proteins or protein/drugs. PrePPI and PrePCI, which are high-throughput coarse-grained predictors, may be used to select high-priority complexes for more computationally intensive AlphaFold 3 calculations.
  • Since AlphaFold was first introduced, calculating protein and protein complex structures with this family of algorithms has been dramatically simplified, allowing more of us to integrate these tools into our research.
  • When examining a PrePPI model in the Interaction Viewer, you may have noticed that PrePPI also specifies the region (domain) of a protein predicted to participate in the interaction. Ideally, we would calculate AlphaFold 3 models both for the protein domains specified by PrePPI and for the full-length sequences of the proteins.

 

  • Choose a PrePPI prediction from this list:


  1. HRAS (P01112) - EXOC8 (Q8IYI6, residues 176-288)
  2. HRAS (P01112) - BCR (P11274, residues 499-693)
  3. SOS1 (Q07889, residues 84-200) - HIST2H2BF (Q5QNW6)
  4. GRB2 (P62993, residues 56-156) - VAV1 (P15498, residues 659-784)


  • Each PPI has interesting implications. Collect the sequences specified from UniProtKB, e.g. P01112 full-length sequence for HRAS and Q8IYI6 residues 176-288 for EXOC8.
  • Following the instructions in “How does AlphaFold Server work?” calculate AlphFold3 models for your PPI.
  • What metrics are provided to judge the quality of your AlphaFold3 models? Are the proteins in contact? Do the PrePPI and AlphaFold3 models have any commonalities?

Copyright © 2025 DREAM-High - All Rights Reserved.

Powered by