vesicularia

Paper Proposal: Hypernetwork Training with Genetic Algorithms

This is the paper proposal I wrote for the AAAI25 Undergraduate Consortium. I didn't get in, but I quite like the ideas and papers so I think there's some value in putting this up here. I might work on this for fun, and I have a few better ideas for this already.

If there's anything I might've gotten wrong or sounds extremely stupid, please tell me.

Bonus: An interesting application of evolution by Sakana AI that (very) recently came out.

===================================================================

Abstract

This proposal explores the use of genetic algorithms to train hypernetworks - a network that is used to generate the weights for another network. Hypernetworks are somewhat analogous to how nature works: a genotype (hypernetwork) produces proteins (weights) that are expressed in the phenotype (generated network) (Ha, Dai, and Le 2016; Stanley and Miikkulainen 2002). Similarly, genetic algorithms behave like natural selection, where fitness and robustness is rewarded while the inability to adapt extinguishes bloodlines. My work will focus on the union of the two ideas: to apply natural selection to the hypernetwork training with the goal of generating networks capable of matching SOTA performance with fewer parameters.

Introduction

Devising methods of training models is a fundamental aspect of artificial intelligence. As backpropagation has dominated this domain in recent years, hardware and software optimizations such as FlashAttention (Dao et al. 2022) are built to accelerate computation and inference for gradient-based methods.

However, the fact remains that backpropagation is not how biological systems perform parameter updates. Alternative approaches such as wake-sleep (Hinton et al. 1995), forward-forward (Hinton 2022), and various Reinforcement Learning algorithms have been explored, but performance still proves inferior to backpropagation in most tasks. Genetic/evolutionary algorithms/strategies (GEASS) take inspiration from biological processes, and provide a promising alternative to the optimization problem. In this project, I intend to explore different methods of GEASS and adapt them to hypernetworks for weight exploration and fitness optimization.

Aside from being closer to human cognition and biological processes, the main advantage of GEASS lies in the computational cost reduction, as demonstrated by (Salimans et al. 2017) where Evolution Strategies achieved near-SOTA results with a 2/3 cost reduction. The small updates also make them highly parallelizable; scaling up to thousands of devices (Salimans et al. 2017). This proposal begins with a background of the work done in this domain, followed by potential experiments and the results expected. It will conclude with a discussion and an outline of future work.

Background

To provide a brief overview of the ideas:

My proposed work will focus on the intersection between hypernetworks and GEASS (Stanley, D'Ambrosio, and Gauci 2009); specifically to explore the use of genetic algorithms on hypernetwork optimization. Previous work on the topic include HyperNEAT (Stanley, D'Ambrosio, and Gauci 2009) and HyperNetworks (Ha, Dai, and Le 2016).

HyperNetworks (Ha, Dai, and Le 2016)

This paper employs a hypernetwork to generate weights for a Wide Residual Network (40-2) on CIFAR100 image classification and a LSTM (1000 units) for two language modeling tasks (Penn Treebank, Hutter Prize), where the generated models achieved similar performance for a fraction of the parameter count. Backpropagation was used to train this network, and the gradients were propagated through the generated networks into the hypernetwork itself.

NeuroEvolution of Augmenting Topologies (NEAT) (Stanley and Miikkulainen 2002)

The technique employed in this paper involves evolving the architecture and weights by randomly mutating (adding) neurons and modifying connections. Network compactness was maintained by starting with a minimal network, and speciation was employed to maintain 'genetic' diversity. NEAT uses a direct encoding scheme, which means the weights are clearly and exactly defined (like model.named_parameters() in PyTorch). It was used to learn the pole-balancing problem (the one in RL), and demonstrated significantly faster convergence than the other RL methods it was compared to (3600 evaluations, next was 3800 and the next was 12600).

HyperNEAT (Stanley, D'Ambrosio, and Gauci 2009)

To put it simply, HyperNEAT is a method that trains a network that organizes a network that generates a network. NEAT is used to evolve a Compositional Pattern Producing Network (CPPN), which generates the connectivity pattern of the final network by determining the weights and connections between nodes. The point of this is to replicate the concept of genotypes and phenotypes, and utilize the information embedded within the geometry of the input data. The final network is the one that actually performs the task. HyperNEAT uses indirect coding to define the weights (defining connections rather than nodes themselves; like describing a lizard instead of writing its DNA).

Novelty Search/Quality Diversity (NS/QD) (Conti et al. 2017)

In this paper, weights are perturbed by adding Gaussian noise, then optimized for a) novelty (difference of the weights from the original, measured via kNN), b) performance (reward obtained from using the policy parameters). The combined technique (NSR-ES/NSRA-ES) performed better than NS-ES by itself in most tasks.

Mutating Multi-Component Networks (MMCN) (Risi and Stanley 2019)

Like the previous technique, the weights here are perturbed by the addition of Gaussian Noise. However, this model - based on 'World Models' (Ha and Schmidhuber 2018) - uses two components (VAE and RNN controller). The experiments involved mutating the controller only, mutating either with equal probability, and mutating both. The resulting score for CarRacing-V0 (903 ± 72) was similar to World Model (906 ± 21) and DQN + Dropout (893 ± 41) albeit with higher variance.

Approach

In this proposal, we view the genetic algorithm and the hypernetwork as separate components and will experiment with the Cartesian Product of the two sets. For the algorithm, I plan to implement the techniques described above (NS/QD, MMCN, HyperNEAT) as well as experiment with the fitness function, which could potentially be designed to provide the network with more information about the environment.

I plan on using the techniques to train a ResNet-50 for image classification tasks and a LSTM RNN for language modeling; if resources allow it I'd also like to experiment on a GPT-2 sized model to observe the results of this algorithm at scale. The experiment will mainly be used to compare the number of parameters required to achieve a similar performance.

Evaluation

Evaluation will be performed using standard benchmarks (ImageNet, CIFAR100, MNIST, OpenWebText) using the generated network weights. The results will be compared to equivalent networks trained using backpropagation and the technique described in HyperNetworks (Ha, Dai, and Le 2016). Shareable hyperparameters will be shared and fixed between all networks of the same type. Hypernetwork accuracy will be evaluated by the loss of the generated network.

Discussion

I expect to find that genetic algorithms can be used to train hypernetworks to generate weights that match performance comparable to SOTA backpropagation methods; based on past results, I anticipate that this approach will require fewer parameters and less compute to achieve a similar accuracy.

If successful, we will show that pairing genetic algorithms with hypernetworks provide an efficient alternative to backpropagation for training neural networks at scale. It will also hopefully inspire the reemergence of biology-inspired design, and expand the interpretation of how the field approaches model optimization.

Like Parkinson's Law, training methods will utilize as much energy as it can regardless of efficiency. The potential benefit to society, therefore will be the additional compute eked out from devices and GPUs rather than a reduced energy consumption. Furthermore, the biological inspiration behind this approach could provide insights that bridge the gap between artificial and biological intelligence, potentially contributing to advancements in neuroscience and cognitive science.

Avenues of future exploration include:

Conclusion

In summary, I propose the topic of "Hypernetwork Training with Genetic Algorithms".

References