Toggle light / dark theme

GNNSeq: A Sequence-Based Graph Neural Network for Predicting Protein–Ligand Binding Affinity

Background/Objectives: Accurately predicting protein–ligand binding affinity is essential in drug discovery for identifying effective compounds. While existing sequence-based machine learning models for binding affinity prediction have shown potential, they lack accuracy and robustness in pattern recognition, which limits their generalizability across diverse and novel binding complexes. To overcome these limitations, we developed GNNSeq, a novel hybrid machine learning model that integrates a Graph Neural Network (GNN) with Random Forest (RF) and XGBoost. Methods: GNNSeq predicts ligand binding affinity by extracting molecular characteristics and sequence patterns from protein and ligand sequences. The fully optimized GNNSeq model was trained and tested on subsets of the PDBbind dataset. The novelty of GNNSeq lies in its exclusive reliance on sequence features, a hybrid GNN framework, and an optimized kernel-based context-switching design. By relying exclusively on sequence features, GNNSeq eliminates the need for pre-docked complexes or high-quality structural data, allowing for accurate binding affinity predictions even when interaction-based or structural information is unavailable. The integration of GNN, XGBoost, and RF improves GNNSeq performance by hierarchical sequence learning, handling complex feature interactions, reducing variance, and forming a robust ensemble that improves predictions and mitigates overfitting. The GNNSeq unique kernel-based context switching scheme optimizes model efficiency and runtime, dynamically adjusts feature weighting between sequence and basic structural information, and improves predictive accuracy and model generalization. Results: In benchmarking, GNNSeq performed comparably to several existing sequence-based models and achieved a Pearson correlation coefficient (PCC) of 0.784 on the PDBbind v.2020 refined set and 0.84 on the PDBbind v.2016 core set. During external validation with the DUDE-Z v.2023.06.20 dataset, GNNSeq attained an average area under the curve (AUC) of 0.74, demonstrating its ability to distinguish active ligands from decoys across diverse ligand–receptor pairs. To further evaluate its performance, we combined GNNSeq with two additional specialized models that integrate structural and protein–ligand interaction features. When tested on a curated set of well-characterized drug–target complexes, the hybrid models achieved an average PCC of 0.89, with the top-performing model reaching a PCC of 0.97. GNNSeq was designed with a strong emphasis on computational efficiency, training on 5000+ complexes in 1 h and 32 min, with real-time affinity predictions for test complexes. Conclusions: GNNSeq provides an efficient and scalable approach for binding affinity prediction, offering improved accuracy and generalizability while enabling large-scale virtual screening and cost-effective hit identification. GNNSeq is publicly available in a server-based graphical user interface (GUI) format.

Tumor-derived erythropoietin acts as an immunosuppressive switch in cancer immunity

A protein identified nearly 40 years ago for its ability to stimulate the production of red blood cells plays a surprising, critical role in dampening the immune system’s response to cancer.

Blocking the activity of the protein turns formerly “cold,” or immune-resistant, liver tumors in mice into “hot” tumors teeming with cancer-fighting immune cells. When combined with an immunotherapy that further activates these immune cells against the cancer, the treatment led to complete regression of existing liver tumors in most mice. Treated animals lived for the duration of the experiment. In contrast, control animals survived only a few weeks.

“This is a fundamental breakthrough in our understanding of how the immune system is turned off and on in cancer,” said the senior author published the work in Science. “I could not be more excited about this discovery, and I hope treatments that target the mechanism we uncovered will quickly move forward to human trials.”

A new computational framework illuminates the hidden ecology of diseased tissues

To understand what drives disease progression in tissues, scientists need more than just a snapshot of cells in isolation—they need to see where the cells are, how they interact, and how that spatial organization shifts across disease states. A computational method called MESA (Multiomics and Ecological Spatial Analysis), detailed in a study published in Nature Genetics, is helping researchers study diseased tissues in more meaningful ways.

The work details the results of a collaboration among researchers from MIT, Stanford University, Weill Cornell Medicine, the Ragon Institute of MGH, MIT, and Harvard, and the Broad Institute of MIT and Harvard, and was led by the Stanford team.

MESA brings an ecology-inspired lens to tissue analysis. It offers a pipeline to interpret spatial omics data—the product of cutting-edge technology that captures molecular information along with the location of cells in tissue samples. This data provides a high-resolution map of tissue “neighborhoods,” and MESA helps make sense of the structure of that map.

Brain’s ‘blue spot’ key to healthy aging, early Alzheimer’s detection

In a demographically diverse sample of healthy people, Cornell researchers found dramatic changes over the human lifespan in the brain’s “blue spot”—a tiny region involved in cognition and believed to be the first affected by neurodegenerative conditions including Alzheimer’s disease.

Using specialized MRI scans to measure the intensity of neuromelanin, a pigment that gives the locus coeruleus (LC) its , the research team observed an inverted U-shaped curve that peaked in later middle age before dropping off sharply, a finding that helps characterize healthy aging patterns.

Maintaining a stronger blue signal after age 60 was associated with better cognitive performance, according to the study involving 134 participants aged 19 to 86. Because of the participants’ diversity, including about 40% who were non-white, the researchers also discovered higher peaks among Black participants and women, groups known to be more susceptible to Alzheimer’s.

Dual scalable annealing processors overcome capacity and precision limits

Combinatorial optimization problems (COPs) arise in various fields such as shift scheduling, traffic routing, and drug development. However, they are challenging to solve using traditional computers in a practical timeframe.

Alternatively, annealing processors (APs), which are specialized hardware for solving COPs, have gained significant attention. They are based on the Ising model, in which COP variables are presented as magnetic spins and constraints as interactions between spins. Solutions are obtained by finding the spin state that minimizes the energy of the system.

There are two types of Ising models, the sparsely-coupled model and the fully-coupled model. Sparsely-coupled models offer high scalability by allowing more spins, but require COPs to be transformed to fit the model. Fully-coupled models, on the other hand, allow any COP to be mapped directly without transformation, making them highly desirable.

Fungi dwelling on human skin may provide new antibiotics

University of Oregon researchers have uncovered a molecule produced by yeast living on human skin that showed potent antimicrobial properties against a pathogen responsible for a half-million hospitalizations annually in the United States.

It’s a unique approach to tackling the growing problem of antibiotic-resistant bacteria. With the global threat of drug-resistant infections, fungi inhabiting human skin are an untapped resource for identifying , said Caitlin Kowalski, a postdoctoral researcher at the UO who led the study.

Described in a paper published in Current Biology, the common skin fungus Malassezia gobbles up oil and fats on human skin to produce fatty acids that selectively eliminate Staphylococcus aureus. One out of every three people has Staphylococcus aureus harmlessly dwelling in their nose, but the bacteria are a risk factor for serious infections when given the opportunity: open wounds, abrasions and cuts. They’re the primary cause of skin and soft tissue infections known as staph infections.

First atomic map of potato pathogen reveals potential infection mechanism

Plants are susceptible to a wide range of pathogens. For the common potato plant, one such threat is Pectobacterium atrosepticum, a bacterium that causes stems to blacken, tissues to decay, and often leads to plant death, resulting in significant agricultural losses each year.

In 2012, researchers isolated a new virus that infects and kills this bacterium—a bacteriophage named φTE (phiTE). Now, for the first time, scientists have uncovered the atomic structure of φTE, revealing a possible mechanism of infection that may be more complex than previously thought.

The study, published earlier this month in Nature Communications, is the result of a multidisciplinary collaboration between researchers from the Okinawa Institute of Science and Technology (OIST) and the University of Otago. It brings together expertise across several fields, including virology, , , protein engineering, biochemistry, and biophysics.

Light signature algorithm offers precise insight on viral proteins, brain disease markers and semiconductors

Researchers at Rice University have developed a new machine learning (ML) algorithm that excels at interpreting the “light signatures” (optical spectra) of molecules, materials and disease biomarkers, potentially enabling faster and more precise medical diagnoses and sample analysis.

“Imagine being able to detect early signs of diseases like Alzheimer’s or COVID-19 just by shining a light on a drop of fluid or a ,” said Ziyang Wang, an electrical and computer engineering doctoral student at Rice who is a first author on a study published in ACS Nano. “Our work makes this possible by teaching computers how to better ‘read’ the signal of light scattered from tiny molecules.”

Every material or molecule interacts with light in a unique way, producing a distinct pattern, like a fingerprint. Optical spectroscopy, which entails shining a laser on a material to observe how light interacts with it, is widely used in chemistry, materials science and medicine. However, interpreting spectral data can be difficult and time-consuming, especially when differences between samples are subtle. The new algorithm, called Peak-Sensitive Elastic-net Logistic Regression (PSE-LR), is specially designed to analyze light-based data.