Group and Shuffle: Researchers at HSE University and AIRI Accelerate Neural Network Fine-Tuning

Researchers at HSE University and the AIRI Institute have proposed a method for quickly fine-tuning neural networks. Their approach involves processing data in groups and then optimally shuffling these groups to improve their interactions. The method outperforms alternatives in image generation and analysis, as well as in fine-tuning text models, all while requiring less memory and training time. The results have been presented at the NeurIPS 2024 Conference.
The larger the neural network, the more challenging it becomes to quickly adapt it to a new task. Retraining a model from scratch is a time-consuming and costly process. Therefore, developers seek cost-effective ways to adapt a model to a specific task while preserving the overall quality of the original.
One such approach is fine-tuning using orthogonal matrices, which, unlike other methods, preserve the essential features of the original model. Popular alternatives, such as block-diagonal or butterfly matrices, have drawbacks: they are either limited in scope or require extensive computations.
Researchers at the HSE Faculty of Computer Science and the AIRI Institute have proposed a new method of constructing matrices, which they call Group-and-Shuffle. Instead of working with all the data at once, they divide the parameters into small groups, process each group separately, and then shuffle them together. This structure is both flexible and efficient: it enables the model to adapt more precisely to the task while requiring fewer computations and less memory.
Building on GS matrices, the researchers developed GSOFT, a new method for orthogonal fine-tuning of neural networks. Unlike previous approaches, GSOFT uses fewer parameters while maintaining training stability and quality, even with limited data. The team also introduced a two-sided version of the method—Double GSOFT—which allows simultaneous adjustment of parameters from both sides, enhancing the model’s flexibility and accuracy.
'We discovered how to construct orthogonal matrices using only two special types of matrices, instead of five or six as required by previous methods. This saves computational resources and training time,' explains Nikolay Yudin, Research Assistant at the HSE Laboratory for Matrix and Tensor Methods in Machine Learning.
The researchers tested the approach on three types of tasks. When fine-tuning the RoBERTa language model, the method outperformed others while using a comparable number of parameters. In image generation, where the model needed to preserve the original features while adapting to the user’s request, GSOFT and Double GSOFT outperformed popular methods like LoRA and BOFT, all while using less memory and training time.

The authors also tested their approach on convolutional neural networks, which are commonly used for image and video analysis, such as in face recognition. The team adapted the GS matrices even for cases where the model required strong resistance to interference and distortion.
'We tested the method across various scenarios—from language and generative models to robust convolutional networks. In every case, it performed reliably while using fewer resources. This confirms that the method can be applied effectively to a variety of purposes,' comments Aibek Alanov, Senior Research Fellow at the Centre of Deep Learning and Bayesian Methods, AI and Digital Science Institute, HSE FCS, and leader of the Controllable Generative AI team at FusionBrain, AIRI.
See also:
Using Two Cryptocurrencies Enhances Volatility Forecasting
Researchers from the HSE Faculty of Economic Sciences have found that Bitcoin price volatility can be effectively predicted using Ethereum, the second-most popular cryptocurrency. Incorporating Ethereum into a predictive model reduces the forecast error to 23%, outperforming neural networks and other complex algorithms. The article has been published in Applied Econometrics.
Administrative Staff Are Crucial to University Efficiency—But Only in Teaching-Oriented Institutions
An international team of researchers, including scholars from HSE University, has analysed how the number of non-academic staff affects a university’s performance. The study found that the outcome depends on the institution’s profile: in research universities, the share of administrative and support staff has no effect on efficiency, whereas in teaching-oriented universities, there is a positive correlation. The findings have been published in Applied Economics.
Physicists at HSE University Reveal How Vortices Behave in Two-Dimensional Turbulence
Researchers from the Landau Institute for Theoretical Physics of the Russian Academy of Sciences and the HSE University's Faculty of Physics have discovered how external forces affect the behaviour of turbulent flows. The scientists showed that even a small external torque can stabilise the system and extend the lifetime of large vortices. These findings may improve the accuracy of models of atmospheric and oceanic circulation. The paper has been published in Physics of Fluids.
Solvent Instead of Toxic Reagents: Chemists Develop Environmentally Friendly Method for Synthesising Aniline Derivatives
An international team of researchers, including chemists from HSE University and the A.N. Nesmeyanov Institute of Organoelement Compounds of the Russian Academy of Sciences (INEOS RAS), has developed a new method for synthesising aniline derivatives—compounds widely used in the production of medicines, dyes, and electronic materials. Instead of relying on toxic and expensive reagents, they proposed using tetrahydrofuran, which can be derived from renewable raw materials. The reaction was carried out in the presence of readily available cobalt salts and syngas. This approach reduces hazardous waste and simplifies the production process, making it more environmentally friendly. The study has been published in ChemSusChem.
How Colour Affects Pricing: Why Art Collectors Pay More for Blue
Economists from HSE University, St Petersburg State University, and the University of Florida have found which colours in abstract paintings increase their market value. An analysis of thousands of canvases sold at auctions revealed that buyers place a higher value on blue and favour bright, saturated palettes, while showing less appreciation for traditional colour schemes. The article has been published in Information Systems Frontiers.
New Method for Describing Graphene Simplifies Analysis of Nanomaterials
An international team, including scientists from HSE University, has proposed a new mathematical method to analyse the structure of graphene. The scientists demonstrated that the characteristics of a graphene lattice can be represented using a three-step random walk model of a particle. This approach allows the lattice to be described more quickly and without cumbersome calculations. The study has been published in Journal of Physics A: Mathematical and Theoretical.
Scientists Have Modelled Supercapacitor Operation at Molecular and Ionic Level
HSE scientists used supercomputer simulations to study the behaviour of ions and water molecules inside the nanopores of a supercapacitor. The results showed that even a very small amount of water alters the charge distribution inside the nanopores and influences the device’s energy storage capacity. This approach makes it possible to predict how supercapacitors behave under different electrolyte compositions and humidity conditions. The paper has been published in Electrochimica Acta. The study was supported by a grant from the Russian Science Foundation (RSF).
Designing an Accurate Reading Skills Test: Why Parallel Texts are Important in Dyslexia Diagnosis
Researchers from the HSE Centre for Language and Brain have developed a tool for accurately assessing reading skills in adults with reading impairments. It can be used, for instance, before and after sessions with a language therapist. The tool includes two texts that differ in content but are equal in complexity: participants were observed to read them at the same speed, make a similar number of errors, and understand the content to the same degree. Such parallel texts will enable more accurate diagnosis of dyslexia and better monitoring of the effectiveness of interventions aimed at addressing it. The paper has been published in Educational Studies.
Internal Clock: How Heart Rate and Emotions Shape Our Perception of Time
Our perception of time depends on heart rate—this is the conclusion reached by neuroscientists at HSE University. In their experiment, volunteers watched short videos designed to evoke specific emotions and estimated each video's duration, while researchers recorded their heart activity using ECG. The study found that the slower a participant's heart rate, the shorter they perceived the video to be—especially when watching unpleasant content. The study has been published in Frontiers in Psychology.
Scientists Identify Personality Traits That Help Schoolchildren Succeed Academically
Economists from HSE University and the Southern Federal University have found that personality traits such as conscientiousness and open-mindedness help schoolchildren improve their academic performance. The study, conducted across seven countries, was the first large-scale international analysis of the impact of character traits on the academic achievement of 10 and 15-year-olds. The findings have been published in the International Journal of Educational Research.


