Genetic Prediction of Cancer Recurrence: Scientists Verify Reliability of Computer Models
In biomedical research, machine learning algorithms are often used to analyse data—for instance, to predict cancer recurrence. However, it is not always clear whether these algorithms are detecting meaningful patterns or merely fitting random noise in the data. Scientists from HSE University, IBCh RAS, and Moscow State University have developed a test that makes it possible to determine this distinction. It could become an important tool for verifying the reliability of algorithms in medicine and biology. The study has been published on arXiv.
Machine learning methods help analyse complex biological data, ie for predicting the likelihood of cancer recurrence based on gene expression, which reflects the activity levels of specific DNA regions within cells. However, it is not always clear whether these algorithms are detecting meaningful patterns or merely fitting random noise in the data.
A team of scientists from HSE University, IBCh RAS, and Moscow State University has developed a test to assess how reliably the classifier distinguishes between different patient groups. In this case, the two groups were patients who experienced a recurrence of the disease and those who did not. A model performs correctly if it effectively captures biologically meaningful differences. If the algorithm simply separates the data at random, its accuracy may appear deceptively high. The researchers focused on linear classifiers, one of the most widely used ML tools in biomedicine.
Anton Zhiyanov
'We aimed to test whether randomly generated (synthetic) data could be separated by a linear classifier as effectively as real biological samples. To do this, we calculated an upper bound on the p-value, which indicates the likelihood that the model is merely "guessing." The lower this p-value, the more reliable the classifier,' explains Anton Zhiyanov, Research Fellow at the HSE Laboratory of Molecular Physiology.
The researchers conducted a series of experiments using synthetic data, allowing them to precisely control the degree of differences between classes. They then applied the new test to real-world medical models that predict the risk of breast cancer recurrence.
The results showed that most classifiers failed to capture any meaningful differences between patients with and without recurrence. Further analysis revealed that 559 out of 570 models produced results consistent with random chance. This suggests that many algorithms may appear accurate, while in reality their predictions are driven by coincidences rather than genuine patterns.
However, the researchers also identified reliable models that reveal biologically meaningful patterns. One such model was a classifier that focused on the activity levels of the ELOVL5 and IGFBP6 genes. This algorithm was further tested on an independent data sample, confirming that differences in the expression of these genes are indeed linked to the risk of cancer recurrence.
Each point on the graph represents a patient, with the expression levels of two genes measured: IGFBP6 on the X-axis and ELOVL5 on the Y-axis. The orange dots represent patients with a recurrence, while the blue dots represent those without. In the first graph, these points (patients) are clearly separated by a straight line, representing a linear classifier. In the second graph, the points are randomly distributed, and the classifier fails to identify any patterns between gene expression and actual recurrence.
Alexander Tonevitsky
'Our test could become an important tool for verifying the reliability of algorithms in biology and medicine. It helps prevent false conclusions and emphasises models that truly identify important patterns, which is crucial for making decisions about patient treatment,' comments Alexander Tonevitsky, Professor at the HSE Faculty of Biology and Biotechnology.
The study was conducted with support from HSE University's Basic Research Programme within the framework of the Centres of Excellence project.
See also:
Fifteen Minutes on Foot: How Post-Soviet Cities Manage Access to Essential Services
Researchers from HSE University and the Institute of Geography of the Russian Academy of Sciences analysed three major Russian cities to assess their alignment with the '15-minute city' concept—an urban design that ensures residents can easily access essential services and facilities within walking distance. Naberezhnye Chelny, where most residents live in Soviet-era microdistricts, demonstrated the highest levels of accessibility. In Krasnodar, fewer than half of residents can easily reach essential facilities on foot, and in Saratov, just over a third can. The article has been published in Regional Research of Russia.
HSE Researchers Find Counter-Strike Skins Outperform Bitcoin and Gold as Alternative Investments
Virtual knives, custom-painted machine guns, and gloves are common collectible items in videogames. A new study by scientists from HSE University suggests that digital skins from the popular video game Counter-Strike: Global Offensive (CS:GO) rank among the most profitable types of alternative investments, with average annual returns exceeding 40%. The study has been published in the Social Science Research Network (SSRN), a free-access online repository.
HSE Neurolinguists Reveal What Makes Apps Effective for Aphasia Rehabilitation
Scientists at the HSE Centre for Language and Brain have identified key factors that increase the effectiveness of mobile and computer-based applications for aphasia rehabilitation. These key factors include automated feedback, a variety of tasks within the application, extended treatment duration, and ongoing interaction between the user and the clinician. The article has been published in NeuroRehabilitation.
'Our Goal Is Not to Determine Which Version Is Correct but to Explore the Variability'
The International Linguistic Convergence Laboratory at the HSE Faculty of Humanities studies the processes of convergence among languages spoken in regions with mixed, multiethnic populations. Research conducted by linguists at HSE University contributes to understanding the history of language development and explores how languages are perceived and used in multilingual environments. George Moroz, head of the laboratory, shares more details in an interview with the HSE News Service.
Slim vs Fat: Overweight Russians Earn Less
Overweight Russians tend to earn significantly less than their slimmer counterparts, with a 10% increase in body mass index (BMI) associated with a 9% decrease in wages. These are the findings made by Anastasiia Deeva, lecturer at the HSE Faculty of Economic Sciences and intern researcher in Laboratory of Economic Research in Public Sector. The article has been published in Voprosy Statistiki.
Scientists Reveal Cognitive Mechanisms Involved in Bipolar Disorder
An international team of researchers including scientists from HSE University has experimentally demonstrated that individuals with bipolar disorder tend to perceive the world as more volatile than it actually is, which often leads them to make irrational decisions. The scientists suggest that their findings could lead to the development of more accurate methods for diagnosing and treating bipolar disorder in the future. The article has been published in Translational Psychiatry.
Scientists Develop AI Tool for Designing Novel Materials
An international team of scientists, including researchers from HSE University, has developed a new generative model called the Wyckoff Transformer (WyFormer) for creating symmetrical crystal structures. The neural network will make it possible to design materials with specified properties for use in semiconductors, solar panels, medical devices, and other high-tech applications. The scientists will present their work at ICML, a leading international conference on machine learning, on July 15 in Vancouver. A preprint of the paper is available on arxiv.org, with the code and data released under an open-source license.
HSE Linguists Study How Bilinguals Use Phrases with Numerals in Russian
Researchers at HSE University analysed over 4,000 examples of Russian spoken by bilinguals for whom Russian is a second language, collected from seven regions of Russia. They found that most non-standard numeral constructions are influenced not only by the speakers’ native languages but also by how frequently these expressions occur in everyday speech. For example, common phrases like 'two hours' or 'five kilometres’ almost always match the standard literary form, while less familiar expressions—especially those involving the numerals two to four or collective forms like dvoe and troe (used for referring to people)—often differ from the norm. The study has been published in Journal of Bilingualism.
Overcoming Baby Duck Syndrome: How Repeated Use Improves Acceptance of Interface Updates
Users often prefer older versions of interfaces due to a cognitive bias known as the baby duck syndrome, where their first experience with an interface becomes the benchmark against which all future updates are judged. However, an experiment conducted by researchers from HSE University produced an encouraging result: simply re-exposing users to the updated interface reduced the bias and improved their overall perception of the new version. The study has been published in Cognitive Processing.
Mathematicians from HSE Campus in Nizhny Novgorod Prove Existence of Robust Chaos in Complex Systems
Researchers from the International Laboratory of Dynamical Systems and Applications at the HSE Campus in Nizhny Novgorod have developed a theory that enables a mathematical proof of robust chaotic dynamics in networks of interacting elements. This research opens up new possibilities for exploring complex dynamical processes in neuroscience, biology, medicine, chemistry, optics, and other fields. The study findings have been accepted for publication in Physical Review Letters, a leading international journal. The findings are available on arXiv.org.