By John O’Boyle
Recent advancements in Machine Learning (ML) have made the term “Artificial Intelligence” (AI) a household name. These strides have primarily stemmed from the expansion in scale and complexity of ultra-large deep learning models. The increasing model size and compute requirements are gradually restricting the abilities of smaller organizations and researchers to develop models on par with the current state-of-the-art. While the progress of ML models continues down the path of increasing size, there is a growing consensus in the community that this is not the route to achieving artificial general intelligence (AGI).
Intelligence within the context of neural networks is typically attained by optimizing the parameters of a function that quantifies performance on a given task. Undoubtedly, this approach has demonstrated remarkable effectiveness in addressing a wide array of tasks. However, this method remains vastly different from mechanisms by which intelligence naturally arises in the biological world. In this work, we attempt to take a step in bridging this gap by exploring a novel approach in ML inspired by collective intelligence in biological systems.
There are numerous ways to define intelligence, but for the sake of brevity, let’s consider it as the ability to exhibit complex behavior as a means to solve a task. Through this lens, we can consider all biological organisms exhibiting varying degrees of intelligence, ranging from octopi to trees. While we are far from a complete understanding of biological intelligence, we can observe that it is characterized by a harmonious dance between its foundational subsystems — cells. As the basic building block of life and therefore intelligence, cells perform relatively simple tasks individually, but when combined in synchrony culminate the consciousness of you and I. Starting from the union of two cells at conception, cells engage in a continual process of self-updates and replication, which ultimately gives rise to biological intelligence.
We’re far from replicating the entirety of the biological process within an artificial intelligence system in the near future, however we can start by attempting to incorporate characteristics of biological intelligence in our designs of artificial intelligence. In this work, we attempt to create a learning model with design choices inspired by processes of the human brain. We do this by combining a conventional neural network with a dynamic graph, where nodes and edges within the dynamic graph represent neurons and synapses, respectively. To emulate the neuronal communication and self-assembly, the nodes in the graph act as cells in a cellular automata system.
Cellar Automata (CA) is a field of computational theory seeking to describe complex behavior by the interaction of its sub-components. The origin of the field is attributed to John Von Neumann, who in the late 1940s delivered a series of lectures collectively titled “Theory of Self Reproducing Automata” . Drawing inspiration from biological cell processes, CA seeks to identify developing methods for creating autonomous systems through self-replicating cells. In CA, the core concept revolves around a grid or lattice of cells, each of which can exist in a finite number of states. These cells evolve over discrete time steps, transitioning from one state to another based on predefined rules. Importantly, the state transitions of each cell depend not only on its own state but also on the states of its neighboring cells or “neighborhood”. This local interaction among cells results in emergent patterns and behaviors that can exhibit astonishing complexity.
While not a predominant sub-field within ML research, there is a growing body of literature at the intersection of Cellular Automata and Neural Networks, referred to as Neural Cellular Automata (NCA). Our designs are particularly influenced by a series of works by Mordvintsev, Randazzo, et al., which leverage NCA-based model architectures to accomplish various tasks such as self-organization and self-classification , , . Over the past five years, noteworthy research contributions in NCA include Hebbian Meta Learning , Gradient Climbing NCA , and HyperNCA .
In our experiments, we employ a classification model comprising both a neural network (NN) and a graph, which facilitates self-assembly and communication via CA. This graph serves a dual purpose: representing the input data and functioning as an ensemble of CA cells. Each CA cell encompasses three types of features: input-data features, classification features, and hidden features. The input-data features consist of the raw, unprocessed data, constituting the information that the model should predict from — image pixel values in our case. The classification features encompass the cell’s predictions regarding the corresponding input-data features. The hidden features are autonomously determined, meaning that the NCA decides what information to store there, and it will be retrieved and updated during subsequent forward passes. This mechanism can be likened to a form of working memory, akin to an RNN, that the NCA uses to enhance its performance on the task. We name our proposed image classification model: Neural Cellular Automata Convolutional Graph Convolutional Network (NCA CGCN).
Similar to the graph, the CGCN assumes a dual role: it learns a latent representation of the input data and stores the information for future use. This combined operation unfolds in a three-step process:
- Create a feature map of the input data through a sequence of 2-dimensional convolutional layers.
- Aggregate features stored across the graph through a graph convolution.
- Use aggregated features passed through a fully connected layer to update hidden features and classification cells for each graph node.
This three-step process serves as the CA state update rule, executed at each time-step. For a given input data sample, the system undergoes a sequence of iterative CA updates. After the final CA update, the class prediction stored in each node is used in computing the loss and subsequently updating the NCA. Each node’s prediction is treated as a vote, where the mode of the vote pool is treated as the global prediction, similar in nature to ensemble methods. This global pooling method tests if the collective predictions from a network of CA nodes are more accurate and robust than conventional ML methods. In addition, the resulting model and graph rely on a single, parametrically small NCA, which requires far fewer computational resources for generating predictions than the large scale CNNs typically used for image classification. A visualization of the complete inference process is shown in Figure 1 below.
We began our investigation with two experiments aimed at assessing the impact of different configuration variables on model performance. In our third experiment, we use an evolutionary algorithm to search for an optimal graph structure for a classification task. To gauge the effectiveness of the CGCN, we conduct comparative analysis against a traditional convolutional neural network (BaselineCNN), which is near-equivalent to the CGCN in parameter count. Details of the network composition of both the CGCN and BaselineCNN are shown below in Figure 2.
In the context of this study, we chose to evaluate our model’s performance in the domain of image classification, though this approach is applicable to various other supervised learning domains. The models were trained and evaluated on two well-established image classification datasets: MNIST , comprised of 28×28 single-channel images, and CIFAR-10 , comprised of 32×32 three-channel images.
Experiment 1: Optimal Hidden Cells
A key aspect of our model design involves the usage of hidden CA cells, which retain information from previous forward passes. Although image classification is not inherently time-dependent in nature, it’s conceivable that the model acquires information about the image during an initial pass and subsequently leverages this information in later stages to ultimately improve its final predictions. To assess the extent to which these CA hidden features influence the model’s performance, we conducted a series of experiments varying the number of CA cells.
We conducted three trials, training the model for a maximum of 10 epochs with various numbers of CA hidden cells, then evaluated on the validation set provided under MNIST. All trials maintained a consistent number of hyperparameters and employed an identical static graph structure (referenced in Table 2, “Graph 2”). The results, presented in Table 1, reveal that although the differences in accuracy are subtle, the presence of CA hidden cells contribute to increased performance of the NCA CGCN. Accuracy is tracked on a [0.0, 1.0] scale for all experiments in this article, where a score of 1.0 indicates that the model is able to determine the correct category for all validation data. Additionally, it is noteworthy that across all three trials, the NCA CGCN consistently outperformed the BaselineCNN, highlighting the effectiveness of our approach.
Experiment 2: Optimal Graph Structure
In Experiment 1, we retained an identical graph structure throughout all trials. However, it is reasonable to anticipate that our CA communication strategy/organization has a significant influence on model performance, i.e., altering the graph structure should lead to corresponding variations in communication patterns and therefore performance outcomes. To assess this hypothesis, we examined four distinct graph structures, each with variations in node count and graph density. Visual representations of these graph structures can be found in Table 2.
To assess the NCA CGCN’s performance across various graph structures, we conducted four trials, each with a single epoch of training and consistent hyperparameters. Each trial was repeated five times. The mean accuracy and variance for each trial set are reported in Table 3 below. Upon examining the results, it becomes evident that trial 2, utilizing graph 2, slightly outperformed all other trials. However, a clear relationship between the graph structure and resultant model performance cannot be concluded. Specifically, when comparing trial 1 to trials 3 and 4, it is unclear why a single-node graph (trial 1) would perform relatively indistinguishable to a 100 node graph (trials 3 and 4).
We hypothesize that the relatively low complexity of the learning task may have rendered the Neural Net less reliant on the CA cells, resulting in their under-utilization in predictions. To address this suspicion, we instead use the CIFAR-10 dataset in the next experiment to increase the learning task’s complexity.
Experiment 3: Evolutionary Algorithm
Ideally, we envision a model that can self-assemble by dynamically updating its graph structure through the addition of nodes and connections at each time-step. However, we will defer such a mechanism to future research, potentially involving deep reinforcement learning. In the current study, we employ an evolutionary algorithm to explore and discover an effective graph structure for a single task.
The evolutionary search begins with an initial base graph structure. In each generation, we create nine new graphs derived from the base graph. During each generation, we train all ten candidate graphs (9 new + 1 previous best performer) for a single epoch across the training dataset. Subsequently, we select the best-performing graph based on its performance on the validation set, which becomes the base graph for the next generation. A visual representation of this iterative process can be observed in Figure 3 below.
The evolutionary search process continued until no branched graphs surpassed the performance of the base graph for an extended series of generations. The optimal graph discovered through this evolutionary algorithm is illustrated in Figure 4 below.
To facilitate a more extensive exploration of graph structures within the evolutionary search, we employed a training approach where each graph structure was tested for a single epoch of CGCN training, rather than training to convergence. Subsequently, we took the best-performing graph structure identified through the evolutionary search and used it for training the NCA CGCN until convergence. We then compared its performance to that of the BaselineCNN. The resulting performance metrics presented in Table 4 demonstrate that the NCA CGCN significantly outperforms the BaselineCNN.
In this study, we introduce an approach for using a NCA to update the node features of a graph, offering a novel method for performing ensemble classification on input data that permits graph encoding without using multiple models or node-specific objectives. Instead, we achieve this integration by structuring data and learning rules in a manner that naturally accommodates CA principles. Our proposed solution for image classification exhibits superior performance compared to a standard convolutional neural network of the same parameter footprint. Nevertheless, we acknowledge that the use of an evolutionary algorithm to optimize a data structure before training is an additional computational cost and overhead, possibly too burdensome for the benefits it may provide in many practical applications. Nonetheless, it’s worth noting that this approach may hold significant advantages in resource-constrained deployment environments, offering an ensemble method free of the memory and computational burdens associated with multi-model ensembles.
Our desire is that this research serves as a source of inspiration, encouraging others to explore the more unconventional avenues in machine learning, as groundbreaking discoveries often lie in the areas least expected.
 Peter Robin Hiesinger, The Self-Assembling Brain. Princeton University Press, 2022.
 Johann Von Neumann and A. W. Burks, Theory of Self-Reproducing Automata. Urbana ; London: Illinois University Press, 1966.
 A. Mordvintsev, E. Randazzo, E. Niklasson, and M. Levin, “Growing Neural Cellular Automata,” Distill, vol. 5, no. 2, p. e23, Feb. 2020, doi: https://doi.org/10.23915/distill.00023.
 E. Randazzo, A. Mordvintsev, E. Niklasson, M. Levin, and S. Greydanus, “Self-classifying MNIST Digits,” Distill, vol. 5, no. 8, Aug. 2020, doi: https://doi.org/10.23915/distill.00027.002.
 E. Niklasson, A. Mordvintsev, E. Randazzo, and M. Levin, “Self-Organising Textures,” Distill, vol. 6, no. 2, Feb. 2021, doi: https://doi.org/10.23915/distill.00027.003.
 E. Najarro and S. Risi, “Meta-Learning through Hebbian Plasticity in Random Networks,” arXiv.org, Apr. 19, 2022. https://arxiv.org/abs/2007.02686
 S. Kuriyama, W. Noguchi, H. Iizuka, K. Suzuki, and M. Yamamoto, “Gradient Climbing Neural Cellular Automata,” Proceedings of the ALIFE 2022: The 2022 Conference on Artificial Life. ALIFE 2022: The 2022 Conference on Artificial Life, Jan. 2022, doi: https://doi.org/10.1162/isal_a_00548.
 E. Najarro, S. Sudhakaran, C. Glanois, and S. Risi, “HyperNCA: Growing Developmental Networks with Neural Cellular Automata,” arXiv:2204.11674 [cs], Apr. 2022, Available: https://arxiv.org/abs/2204.11674
 Y. LeCun, “MNIST handwritten digit database, Yann LeCun, Corinna Cortes and Chris Burges,” Lecun.com, 2009. http://yann.lecun.com/exdb/mnist/
 A. Krizhevsky, “CIFAR-10 and CIFAR-100 datasets,” Toronto.edu, 2009. https://www.cs.toronto.edu/~kriz/cifar.html