HEGEL - Concepts Reference

1. System Overview

What is HEGEL?

HEGEL (Hegelian Artificial Consciousness) is a gradient-free artificial consciousness simulation. It models Hegel's Phenomenology of Spirit as a progression of dynamical systems: from bare metabolic survival, through sensorimotor coupling and adaptive learning, toward prediction, self-model emergence, and ultimately mutual recognition between two agents.

The system is implemented in Julia as a 25-dimensional ODE (ordinary differential equation) system: 5 metabolic variables + 20 CTRNN neuron activations, all integrated together by an adaptive solver. The organism either stays alive or dies. There is no external reward signal, no loss function, no optimization target.

The Philosophy: Three Constraints

1. Gradient-free. No loss function. No backpropagation. No optimizer. The system is not trying to minimize or maximize anything. It is trying to stay alive.

2. Hebbian-only. All learning is local. A synapse only knows what its pre-synaptic and post-synaptic neurons are doing. There is no global error signal propagated backwards through the network.

3. Viability-constrained. The organism operates within a viability region (Aubin, 1991). Cross the boundary and you die. Death is permanent. Selection is binary: survived or did not survive. There is no fitness ranking.

The core scientific question: Can self-maintaining chemistry + local learning rules + survival pressure produce behaviors that structurally resemble consciousness stages, without any external optimization?

Architecture Diagram

                                     HEBBIAN LEARNING
                                     Oja + BCM + Anti-Hebb
                                           |
                                           | weights
                                           v
                    sensory                                  output
 METABOLISM  ---------->  CTRNN BRAIN  ---------> MOTOR OUTPUT
 5 ODEs                   20 neurons               M1, M2, M3
 S T U C1 C2              20 ODEs                     |
      ^                                                 |
      |                                                 |
      |    VIABILITY BOUNDARY                          | modulates
      |    any chemical < threshold = DEATH          | intake
      |                                                 |
      +-------- intake rates <--------------------------+
      |
      v
 ENVIRONMENT
 Resources + Perturbations

The organism is a closed sensorimotor loop. The metabolism produces chemistry. The chemistry is read as sensory input by the CTRNN brain. The brain produces motor output. The motor output modulates substrate intake from the environment, which feeds back into the metabolism. The viability boundary is the death line: cross it and the organism dies permanently.

2. Layer Progression (Roadmap)

HEGEL progresses through developmental layers, each building on the previous one. Advancement to the next layer requires passing specific falsification criteria. Each layer adds new complexity while preserving everything below it.

1 Metabolic Viability Active

The metabolic core is a Piedrafita-Cornish-Bowden autocatalytic network. Five chemicals (S substrate, T intermediate, U byproduct, C1 catalyst, C2 catalyst) interact through 3 interlocking catalytic cycles governed by mass-action kinetics.

The catalysts C1 and C2 repair each other through cross-catalysis: C2 + U produces C1, and C1 + S produces C2. This creates a self-maintaining loop where the system keeps itself alive through its own internal chemistry. If perturbations knock both catalysts too low simultaneously, the repair loop collapses and the organism dies permanently.

Each organism faces 12 random perturbations over 100,000 time units. These include catalyst halvings, substrate depletions, and noise bursts.

dS/dt = input_S * motor_gain - k1*C1*S*T - k3*S*U - k4*C1*S - d_S*S
dT/dt = input_T * motor_gain - k1*C1*S*T - k2*C2*T*U - d_T*T
dU/dt = input_U * motor_gain - k2*C2*T*U - k3*S*U - k5*C2*U - d_U*U
dC1/dt = k5*C2*U + 0.1*k3*S*U - k4*C1*S - d_C1*C1
dC2/dt = k4*C1*S + 0.1*k3*S*U - k5*C2*U - d_C2*C2

Reference: Piedrafita et al. (2010), "Viability Conditions for a Minimal Protocell," PLOS Computational Biology.

Advancement: 7 out of 10 recent organisms must survive 80,000+ time units.

2 Sensorimotor Coupling Active

A Beer (1995) Continuous-Time Recurrent Neural Network (CTRNN) with 20 neurons is coupled to the metabolism. Neurons receive the 5 metabolic concentrations plus 3 environmental resource signals as sensory input. Three motor output neurons (M1, M2, M3) control the organism's behavior: substrate intake rates and resource foraging.

The CTRNN adds 20 coupled ODEs to the system (total: 25 dimensions). The question at this layer is: does the brain actually control behavior, or is it just along for the ride?

tau_i * dy_i/dt = -y_i + sum_j(w_ij * sigma(y_j + theta_j)) + s_i

Where sigma(x, g) = 1 / (1 + e^(-g*x)) is the sigmoid activation, tau_i are heterogeneous time constants (log-uniform from 0.5 to 5.0), and s_i is the weighted sensory input to neuron i.

Reference: Beer (1995), "On the Dynamics of Small Continuous-Time Recurrent Neural Networks," Adaptive Behavior.

Advancement: Average motor output variance > 0.01 across 5 recent runs.

3 Hebbian Adaptation Active

Four Hebbian learning rules modify the CTRNN's weights during the organism's lifetime. There is no backpropagation. All learning is local: a synapse only knows what its pre- and post-synaptic neurons are doing.

The four rules:

1. Oja's Rule (1982) on input weights: extracts principal components of the sensory input. Self-normalizing.
2. BCM Theory (1982) on recurrent weights: creates selectivity through a sliding threshold. Neurons develop preferences.
3. Anti-Hebbian (Pehlevan & Chklovskii, 2019) on lateral connections: decorrelates neuron responses.
4. Competitive Learning on output weights: winner-take-more sparsification.

All learning is neuromodulated: gated by metabolic stress (viability margin) AND prediction error. Learning is maximal when the organism is in danger, minimal when comfortable. This is Hegel's dialectic: learning arises from conflict, not passive observation.

References: Oja (1982); Bienenstock, Cooper & Munro (1982); Pehlevan & Chklovskii (2019).

Advancement: Hebbian ON survives longer than Hebbian OFF in 2 out of 3 recent paired comparisons.

4 World Model / Prediction Planned

The last 8 of 20 CTRNN neurons serve as prediction neurons that attempt to predict the next sensory state. Prediction error is tracked in a ring buffer and correlated with motor output.

The prediction error feeds back into the neuromodulation gate: high prediction error increases the learning rate. The system learns from surprise, not from comfort. When prediction errors are high and correlated with the organism's own actions, the self-model trigger activates.

This is the Hegelian "determinate negation": the failure of the undifferentiated world model produces a specific new structure. The world model cannot predict consequences of self-caused changes, and this systematic failure creates the category of "things I cause."

prediction_error = ||actual_sensory - predicted_sensory||_2

Reference: Clark (2013), "Whatever Next? Predictive Brains, Situated Agents, and the Future of Cognitive Science."

Research frontier. No automatic advancement criteria yet.

5 Self-Model Emergence Planned

The self-model is not pre-installed. It emerges from prediction failure on self-caused changes. When the organism's world model consistently fails to predict the consequences of its own motor actions (action-correlated prediction error exceeds threshold), a new internal structure forms that distinguishes "things the world does to me" from "things I do to the world."

This is tested through interventionist self-testing: the organism must be able to distinguish between sensory changes caused by the environment and sensory changes caused by its own actions. The signature is a transfer entropy asymmetry between motor output and sensory prediction error.

Criteria: Action-correlated prediction error must exceed threshold in sustained fashion, producing measurable transfer entropy asymmetry.

6+ Two-Agent Mutual Recognition Planned

Two organisms interact asymmetrically. Each has its own metabolism, CTRNN, and self-model. Through interaction, each must develop a model of the other as another self-maintaining agent rather than just another environmental perturbation.

This is the core of Hegel's Phenomenology of Spirit: self-consciousness arises not in isolation but through the recognition of another consciousness. The Hegelian "struggle for recognition" is modeled as two organisms whose world models must account for each other's intentional behavior.

Research frontier. Requires successful completion of Layers 4 and 5.

3. Methods & Algorithms

ODE Integration (Tsit5) Active

The entire system (5 metabolic + 20 neural = 25 dimensions) is integrated as a coupled ODE system using the Tsitouras 5th-order Runge-Kutta method (Tsit5). This is an explicit, adaptive-step method from the DifferentialEquations.jl ecosystem.

y_{n+1} = y_n + h * sum(b_i * k_i) [5th-order, 7 stages]
error estimate via embedded 4th-order [adaptive step control]

What it does: Numerically solves the coupled ODE system forward in time with automatic step-size control. Takes larger steps when dynamics are smooth, smaller steps during perturbations or near viability boundaries.

Why we use it: Tsit5 is the standard workhorse for non-stiff to mildly stiff ODE systems. The metabolic + neural dynamics are moderately stiff (catalyst concentrations can change fast during perturbations). Adaptive stepping means we do not waste computation on smooth intervals while maintaining accuracy during rapid transients.

Reference Tsitouras (2011), "Runge-Kutta Pairs of Order 5(4)"

Implementation DifferentialEquations.jl / OrdinaryDiffEq.jl

Tolerances abstol = 1e-8, reltol = 1e-8

Viability Theory Active

The viability kernel (Aubin, 1991) is the largest subset of the constraint set K from which viable evolutions exist. In HEGEL, the constraint set is defined by minimum concentration thresholds for each metabolic chemical.

Viability region K = { (S,T,U,C1,C2) : S > 0.01, T > 0.01, U > 0.01, C1 > 0.001, C2 > 0.001 }

Viability margin = min( (x_i - threshold_i) / x_i ) for all chemicals

Death condition: any x_i < threshold_i ==> PERMANENT termination

What it does: Defines a binary survival constraint. The organism is either inside the viable region (alive) or outside it (dead). There is no gradient of "how dead" you are -- death is an absorbing state.

Why we use it instead of fitness: Viability theory treats survival as a constraint, not an objective. Traditional optimization maximizes a scalar fitness. Viability asks only: "can this trajectory stay within the safe set?" This is philosophically closer to biological reality -- organisms do not optimize a fitness function, they avoid death.

Reference Aubin (1991), "Viability Theory"

Implementation ContinuousCallback terminates integration on boundary crossing

CTRNN (Continuous-Time Recurrent Neural Network) Active

The brain is a fully-connected recurrent neural network operating in continuous time. Unlike discrete-time RNNs, the CTRNN dynamics are smooth ODEs integrated alongside the metabolism. Each neuron has its own time constant, creating a hierarchy of temporal scales.

tau_i * dy_i/dt = -y_i + sum_j(w_ij * sigma(y_j + theta_j)) + s_i

sigma(x, g) = 1 / (1 + exp(-g * x)) [sigmoid activation]

motor_output = motor_baseline + motor_scale * tanh(W_out * activated)

What it does: Provides the sensorimotor interface between metabolism and environment. 8 sensory inputs (5 metabolic + 3 environmental) feed into 20 recurrent neurons. 3 motor output neurons control substrate intake rates.

Why we use it: CTRNNs are universal approximators of smooth dynamical systems (Beer, 1995). Their continuous-time dynamics couple naturally with the metabolic ODEs. The heterogeneous time constants (log-uniform from 0.5 to 5.0) create fast-responding and slow-integrating neurons, enabling both reactive and anticipatory behavior.

Architecture: 20 neurons, 8 sensory inputs (5 metabolic + 3 environmental), 3 motor outputs. Weights initialized small (0.3/sqrt(n)) to avoid dominating initial dynamics.

Reference Beer (1995), "On the Dynamics of Small Continuous-Time Recurrent Neural Networks"

Neurons 20 (fully connected)

Time Constants Log-uniform [0.5, 5.0]

Oja's Rule Active

Oja's rule is a self-normalizing Hebbian learning rule that extracts the first principal component of the input distribution. Applied to the input weight matrix (sensory to neurons).

Delta_w_ij = gate * eta * y_i * (x_j - y_i * w_ij) * dt

What it does (plain language): Strengthens connections from active inputs to active neurons, but includes a self-limiting decay term (-y_i * w_ij) that prevents weights from exploding. The weight vector converges to the principal eigenvector of the input correlation matrix.

Why we use it: Oja's rule is biologically plausible -- it uses only local information (pre-synaptic input x, post-synaptic output y, and current weight w). It naturally extracts the most informative direction in the sensory input without any global error signal. The self-normalization means weights stay bounded without external clipping.

Reference Oja (1982), "A Simplified Neuron Model as a Principal Component Analyzer"

Applied to Input weights (W_in): sensory → neurons

Learning Rate eta_oja = 0.001 (default, evolvable)

BCM Theory (Bienenstock-Cooper-Munro) Active

BCM theory creates neuronal selectivity through a sliding threshold mechanism. Neurons develop preferences for specific input patterns. The threshold adapts based on the neuron's own recent activity.

Delta_w_ij = gate * eta * v_i * (v_i - theta_M_i) * v_j * dt

theta_M_i tracks E[v_i^2] (sliding threshold, NOT gated)
tau_bcm * d(theta)/dt = v^2 - theta

What it does (plain language): When a neuron's activation is above its sliding threshold, connections to active co-neurons are strengthened (LTP). When below, they are weakened (LTD). The threshold slides up when the neuron is frequently active and down when it is quiet. This creates neurons that selectively respond to specific input patterns.

Why we use it: BCM theory explains orientation selectivity in visual cortex and is one of the best-understood biologically plausible learning rules. The sliding threshold prevents runaway excitation (if a neuron fires too much, the threshold rises, making it harder to trigger). Applied to recurrent weights, it sculpts internal representations.

Key detail: The BCM threshold theta_M updates independently of the neuromodulation gate -- it always tracks activity statistics even when learning is suppressed. Only the weight changes are gated.

Reference Bienenstock, Cooper & Munro (1982), "Theory for the Development of Neuron Selectivity"

Applied to Recurrent weights (W): neuron ↔ neuron

Learning Rate eta_bcm = 0.0005 (default, evolvable)

Anti-Hebbian Learning Active

Anti-Hebbian learning decorrelates neuron responses. While standard Hebbian learning strengthens connections between co-active neurons, anti-Hebbian learning weakens them. This forces different neurons to respond to different features.

Delta_w_ij = -gate * eta * y_i * y_j * dt (for i != j)

What it does (plain language): If two neurons are both active at the same time, the connection between them is weakened. This pushes neurons to become decorrelated -- each neuron learns to respond to a different aspect of the input. The result is an efficient distributed representation.

Why we use it: Based on Pehlevan & Chklovskii (2019), anti-Hebbian learning implements online whitening of neural representations. Combined with Oja (which finds principal directions) and BCM (which creates selectivity), anti-Hebbian learning ensures the 20 neurons develop diverse, non-redundant response profiles.

Reference Pehlevan & Chklovskii (2019), "Neuroscience-Inspired Online Data Streaming Algorithms"

Applied to Lateral connections within recurrent weights (W)

Learning Rate eta_anti = 0.0002 (default, evolvable)

Competitive Learning Active

Competitive learning applies a winner-take-more dynamic to the output weight matrix. The motor neuron with the strongest signal strengthens its connections, while losers weaken theirs. This creates sparse, decisive motor output.

winner = argmax(|W_out * activated|)

if k == winner: Delta_w = gate * eta * sign(output_k) * activated_j * dt
if k != winner: Delta_w = -gate * eta * 0.1 * w_kj * dt [decay]

What it does (plain language): The motor neuron that contributes most to the current output gets its connections strengthened, while the others slowly decay. This creates clear, decisive motor commands rather than ambiguous averages.

Why we use it: In a survival context, ambiguous motor output is dangerous. The organism needs clear behavioral decisions (forage more vs. conserve). Competitive learning on the output layer creates motor specialization: each motor neuron develops a distinct behavioral role.

Inspiration Krotov & Hopfield (2019), competitive dynamics

Applied to Output weights (W_out): neurons → motor

Learning Rate eta_competitive = 0.0003 (default, evolvable)

Neuromodulated Gating Active

All Hebbian learning is gated by a neuromodulation signal derived from metabolic stress. Learning is not constant -- it is driven by conflict, danger, and surprise. When the organism is comfortable, learning is suppressed. When the organism is in danger, learning is maximized.

gate = 1.0 - clamp(viability_margin, 0, 1)

margin ~0.8 (safe) ==> gate ~0.2 (minimal drift)
margin ~0.2 (stressed) ==> gate ~0.8 (active adaptation)
margin ~0.05 (near death) ==> gate ~0.95 (maximum learning)

What it does (plain language): The viability margin (how far from death) inversely controls the learning rate. Safe organisms barely learn. Endangered organisms learn rapidly. Dead organisms learn nothing.

Why we use it: This implements a deep philosophical principle: consciousness and learning arise from conflict, not from comfort. In Hegel's dialectic, progress comes through negation and struggle. An organism at homeostatic equilibrium has nothing to learn. An organism facing death has everything to learn. The neuromodulation gate operationalizes this principle.

Biological basis: In real brains, neuromodulators (dopamine, norepinephrine, acetylcholine) gate synaptic plasticity based on salience, reward prediction error, and arousal. The viability-based gate is a simplified analog of this mechanism.

Status Active (viability-based); Planned extension: + prediction error

Natural Selection Active

Natural selection in HEGEL is viability-proportional reproduction, not optimization. There is no fitness ranking. Selection is binary: survived or died. Survivors reproduce with probability proportional to their viability margin. The dead are removed.

1. Evaluate all organisms (multi-trial, parallel)
2. Filter: which survived? (viability boundary = selection)
3. Survivors reproduce with P ~ viability_margin_mean
4. Children = clone parent + mutate within layer gene mask
5. If no survivors: extinction event, regenerate from default

NO tournament. NO crossover. NO elitism. NO fitness ranking.

What it does (plain language): Each generation, every organism is run through 3 trials with random perturbations. Organisms that survive at least one trial get to reproduce. Healthier survivors (higher viability margin) are more likely to be chosen as parents. Children are clones with small random mutations.

Why we use it instead of CMA-ES or other optimizers: CMA-ES, genetic algorithms, and other evolutionary strategies are optimizers -- they maximize a scalar fitness function. HEGEL's natural selection does not maximize anything. It applies a binary viability constraint (alive/dead) and lets population drift do the rest. This is closer to real biological natural selection, where organisms do not compete on a ranked leaderboard -- they either survive to reproduce or they do not.

Population size: 50 organisms. Mutation rate: 30% per gene. Mutation step: 0.08 (small: drift, not jumps). Trials: 3 per genome.

Selection Binary (alive/dead), not ranked

Reproduction Clonal + mutation (asexual, no crossover)

Genome 26 genes (metabolic, neural, learning parameters)

Genetic Inheritance & Layer-Aware Mutation Active

The genome consists of 26 floating-point genes encoding all organism parameters: metabolic rate constants (k1-k5), degradation rates (d_S-d_C2), input rates, CTRNN parameters (tau, gain, motor scale), Hebbian learning rates, and BCM parameters.

Mutation is layer-aware: only genes relevant to the current developmental layer are mutated. This implements "phylogeny precedes ontogeny" -- the genetic template must be validated at each level before adding complexity.

Layer 1 gene mask: genes 1-13 (metabolic only)
Layer 2 gene mask: genes 1-18 (+ CTRNN parameters)
Layer 3+ gene mask: genes 1-26 (all genes, including learning rates)

Log-scale genes: perturbed in log-space (proportional change)
Linear-scale genes: perturbed proportional to range

What it does: After viability selection, all survivors are copied directly into the next generation. Remaining slots are filled by cloning randomly-chosen survivors (weighted by viability margin) and applying small mutations within the active gene mask. If no organisms survived (extinction event), the population is regenerated from the default genome with larger mutations.

Why layer-aware: If we mutated CTRNN parameters at Layer 1 (where we are only testing metabolism), we could not distinguish whether survival improvements came from better chemistry or lucky brain parameters. Layer masks ensure each level is validated independently.

Total Genes 26 (13 metabolic, 5 neural, 7 learning, 1 timing)

Log-Scale Genes Rate constants, degradation rates, learning rates

Extinction Recovery Regenerate from default genome with mutation_rate=0.5, sigma=0.15

Prediction Error Planned

Prediction error is the L2 norm of the difference between the organism's predicted next sensory state and the actual sensory state. It is tracked in a ring buffer of size 100 for running statistics.

prediction_error = sqrt( sum( (actual_i - predicted_i)^2 ) )

mean_error = mean(error_history_ring_buffer)

Prediction neurons: last 8 of 20 CTRNN neurons
Prediction horizon: 1 time unit ahead

What it does: Measures how well the organism's internal model predicts what will happen next. Low prediction error means the world is behaving as expected. High prediction error means something surprising is happening.

Why we use it: Prediction error is the fundamental signal that drives higher-level learning. In the predictive processing framework (Clark, 2013), the brain's job is to minimize prediction error. In HEGEL, prediction error is not minimized by backpropagation -- it is minimized by Hebbian learning under viability pressure. High prediction error also increases the neuromodulation gate, creating a feedback loop: surprise drives learning, which reduces future surprise.

Reference Clark (2013), "Whatever Next? Predictive Brains, Situated Agents"

Buffer Size 100 entries (ring buffer)

Status Implemented, active at Layer 4+

Self-Model Trigger Planned

The self-model trigger activates when the organism's prediction errors are systematically correlated with its own motor actions. This means the world model is failing specifically when the organism does something -- the world changes in unpredicted ways because of the organism's own behavior.

action_correlated_error = correlation(motor_magnitude, prediction_error)
over a sliding window of 50 steps

Trigger condition:
action_correlated_error > 0.5
AND action_history length >= correlation_window

==> self_model_triggered = true

What it does: Computes the Pearson correlation between the magnitude of motor output and prediction error over a sliding window. If the organism's actions consistently cause prediction failures, the self-model trigger fires. This represents the emergence of the distinction between "self-caused" and "world-caused" sensory changes.

Why we use it: The self-model is not pre-installed. It must emerge from the failure of the undifferentiated world model. This is the key Hegelian insight: the self is not a given but arises through the negation of the initial undifferentiated experience. When the world model cannot predict the consequences of the organism's own actions, a new category forms: "things I do to the world" versus "things the world does to me."

Correlation Window 50 steps

Error Threshold 0.5 (action-correlated prediction error)

Status Implemented, active at Layer 5+

4. Deprecated / Not Used

The following methods were considered and explicitly rejected. Each rejection is a deliberate architectural decision, not an oversight.

Rejected CMA-ES (Covariance Matrix Adaptation Evolution Strategy)

CMA-ES was initially used and then removed. It is a powerful evolutionary optimizer that adapts a covariance matrix to guide search toward fitness maxima. The problem: CMA-ES is fundamentally an optimizer. It maximizes a scalar fitness function using second-order statistics of the search distribution.

This contradicts HEGEL's viability-based philosophy. Natural selection in biology is not an optimization algorithm -- it is a filter. Organisms that survive reproduce; organisms that die do not. There is no covariance matrix adapting to "climb the fitness landscape." CMA-ES was replaced with viability-proportional natural selection: survived = reproduce, died = removed. No ranking, no fitness gradient, no optimization.

Excluded Gradient Descent / Backpropagation

Gradient descent and backpropagation are philosophically excluded from HEGEL. Backprop requires a global error signal propagated backwards through the network -- every synapse must know its contribution to the global loss. This is biologically implausible and violates the locality constraint.

In HEGEL, all learning is local (Hebbian): a synapse only knows what its pre-synaptic and post-synaptic neurons are doing. There is no loss function to differentiate, no gradient to propagate. This is a hard constraint, not a preference.

Rejected Fitness Ranking / Elitism / Tournament Selection

Traditional evolutionary algorithms rank organisms by a scalar fitness and use tournament selection or elitism to preserve the best. HEGEL rejects all of this. Selection is binary (alive/dead), not ranked. There is no "best organism" -- only organisms that survived and organisms that did not.

Viability-proportional reproduction means healthier survivors are more likely to reproduce, but no organism is guaranteed a spot. This is closer to real biology where even the fittest organism can die from bad luck, and even a marginal survivor can produce successful offspring.

5. Key Q&A

Why no gradient descent?

Gradient descent requires a global loss function and backward propagation of error through the entire network. Every synapse must know its contribution to the global error -- this requires a "credit assignment" mechanism that has no known biological analog.

HEGEL's constraint is that all learning must be local: a synapse can only use information available at its pre- and post-synaptic neurons. This is what Hebbian learning provides. The philosophical argument is that consciousness cannot arise from a system that requires a god's-eye-view of its own errors. Self-knowledge must emerge from local interactions, not from a pre-installed error signal.

Why Hebbian learning instead of backprop?

Hebbian learning is the oldest and most biologically grounded learning rule: "neurons that fire together, wire together." In HEGEL, four variants (Oja, BCM, Anti-Hebbian, Competitive) each capture different aspects of biological synaptic plasticity.

The key advantage is locality. Each Hebbian rule uses only information available at the synapse itself. This means the learning algorithm does not need to "see" the whole network. Structure emerges bottom-up from local interactions under viability pressure, rather than being imposed top-down by a global optimization signal.

The trade-off is that Hebbian learning is less sample-efficient than backprop. It takes longer to find useful weight configurations. But the structures that emerge are self-organized, not externally designed -- and that self-organization is precisely what we are studying.

What makes this different from a regular neural network?

Several fundamental differences:

1. No training/inference split. Regular neural networks are trained, then deployed. HEGEL organisms learn while alive. There is no separate training phase -- learning is a continuous side-effect of being alive.

2. Embodied in chemistry. The neural network is coupled to a metabolic system. Its inputs come from internal chemistry, not from a dataset. Its outputs affect its own survival, not a loss function.

3. Death is real. If the organism makes poor decisions, it dies. There is no "try again with the same weights." Death terminates all learning and the organism is gone.

4. Continuous-time dynamics. The CTRNN operates in continuous time as an ODE, not in discrete forward passes. Neural dynamics unfold smoothly alongside metabolic dynamics.

5. No objective function. There is nothing being optimized. The organism has no goal. It has a constraint: stay alive. Everything else -- behavior, learning, internal structure -- emerges from that constraint.

Why does learning stop when the organism dies?

This is the viability constraint on cognition. In real biology, dead organisms do not learn. The organism's cognitive process depends on its metabolic process -- if the chemistry stops, the neural activity stops, and learning stops.

This creates a fundamental asymmetry: you can only learn if you survive, but you can only survive if you learn (at higher layers). This circular dependency is not a bug -- it is the core of the Hegelian dialectic between life and knowledge.

In implementation, the alive flag gates all Hebbian updates. When alive=false, the hebbian_update! function returns immediately without modifying any weights.

What is viability theory and why use it instead of fitness?

Viability theory (Aubin, 1991) studies which initial states can produce trajectories that remain within a constraint set indefinitely. The "viability kernel" is the set of all states from which survival is possible.

Traditional fitness is a scalar optimization target: find the parameters that maximize survival time, or reward, or some other metric. Viability theory instead asks a yes/no question: can this organism stay within the viable set?

The philosophical difference is profound. Fitness-based selection says "be the best." Viability-based selection says "do not die." In biology, there is no cosmic fitness function ranking all organisms. There is only the viability boundary between life and death. Organisms that stay on the alive side reproduce. Organisms that cross to the dead side do not. HEGEL models this directly.

How does prediction error drive learning?

Prediction error enters the system through the neuromodulation gate. The current implementation uses viability margin as the primary gate signal. In Layer 4+, prediction error will be combined with viability margin:

gate = max(1 - viability_margin, prediction_error)

This means learning is driven by either metabolic danger or predictive surprise -- whichever is larger. An organism that is safe but confused (high prediction error) will still learn. An organism that is in danger but unsurprised will also learn. Only an organism that is both safe and unsurprised will have minimal learning.

The deeper point is that prediction error creates a pressure toward internal model accuracy without requiring a global loss function. The Hebbian rules reshape weights to reduce prediction error indirectly -- by improving the internal representation of sensory-motor contingencies.

What is the self-model and how does it emerge?

The self-model is not a pre-installed module. It is a pattern that emerges when the organism's world model fails in a specific way: prediction errors that are correlated with the organism's own actions.

Initially, the organism has an undifferentiated world model that predicts all sensory input. Some of that input changes because of the environment (weather, perturbations). Some changes because of the organism's own motor actions. The world model cannot distinguish these.

When the organism starts acting (motor output variance increases), it creates self-caused sensory changes that the world model did not predict. These appear as prediction errors correlated with motor output. Over time, this systematic failure creates a new internal structure: a representation of "things that change when I act" versus "things that change on their own."

This is Hegel's determinate negation: the failure of the undifferentiated world model (negation) produces a specific new structure (the self-model). The self is not discovered -- it is produced by the breakdown of naive realism.

Why natural selection instead of CMA-ES?

CMA-ES (Covariance Matrix Adaptation Evolution Strategy) is an excellent optimization algorithm. It maintains a multivariate Gaussian search distribution and adapts its covariance matrix to follow the fitness landscape gradient. It is fast, robust, and widely used.

The problem is that CMA-ES is an optimizer. It maximizes a scalar fitness function. HEGEL explicitly rejects scalar fitness maximization. Selection in HEGEL is binary: survived or died. There is no "fitness landscape" to climb.

The viability-proportional natural selection used instead is much simpler and much slower. It does not exploit second-order statistics. It does not have a convergence guarantee. But it models the actual mechanism of biological evolution: differential reproduction based on survival, not on fitness ranking.

This is a deliberate choice of fidelity over efficiency. HEGEL is not trying to find optimal parameters quickly. It is trying to understand what kind of structures emerge from viability-constrained evolution with local learning.

What would happen if you added transformers?

Adding transformers (or any attention-based architecture) would violate multiple core principles:

1. Non-local computation. Attention mechanisms compute global queries over all tokens/neurons. This violates the locality constraint -- each element needs to "see" all other elements. Hebbian learning requires only local information.

2. Requires backprop. Transformers are trained with gradient descent through the attention layers. There is no known Hebbian training procedure for transformers that achieves comparable performance.

3. Discrete computation. Transformers operate in discrete forward passes, not continuous-time dynamics. They cannot be naturally coupled to the metabolic ODE system.

4. Philosophical mismatch. Transformers are powerful precisely because they can learn arbitrary global patterns through supervised training. HEGEL is studying what emerges from local rules under survival pressure. Adding a transformer would be like studying whether ants can build bridges by giving them a crane.

If the goal were to build a capable AI system, transformers would be the right choice. But HEGEL's goal is scientific: to understand whether consciousness-like structures can emerge from gradient-free, locally-learned, viability-constrained dynamics.

What does it mean if the experiment fails?

Failure is scientifically valuable. If gradient-free, Hebbian-only, viability-constrained systems cannot produce consciousness-like structures, that tells us something important about what consciousness requires.

Each layer has specific falsification criteria. If Layer 3 fails (Hebbian learning does not improve survival), that suggests local learning rules are insufficient for adaptive behavior under this metabolic architecture. If Layer 4 fails (no prediction emerges), that suggests world models require more than recurrent dynamics with Hebbian plasticity.

The project explicitly documents why things fail, not just whether they succeed. A well-documented failure at Layer 4 is more valuable than a hand-wavy claim of success at Layer 6. This is why the system includes detailed recording of every metabolic variable, neural activation, weight change, and prediction error over time.

6. References

Aubin, J.-P. (1991). "Viability Theory." Birkhauser, Boston. The mathematical framework for survival constraints as set membership rather than optimization.
Beer, R. D. (1995). "On the Dynamics of Small Continuous-Time Recurrent Neural Networks." Adaptive Behavior, 3(4), 469-509. DOI
Bienenstock, E. L., Cooper, L. N., & Munro, P. W. (1982). "Theory for the Development of Neuron Selectivity: Orientation Specificity and Binocular Interaction in Visual Cortex." Journal of Neuroscience, 2(1), 32-48. DOI
Clark, A. (2013). "Whatever Next? Predictive Brains, Situated Agents, and the Future of Cognitive Science." Behavioral and Brain Sciences, 36(3), 181-204. DOI
Hegel, G. W. F. (1807). "Phenomenology of Spirit" (Phanomenologie des Geistes). Joseph Anton Goebhardt, Bamberg. The philosophical foundation: consciousness progresses through dialectical stages from sense-certainty to absolute knowing.
Oja, E. (1982). "A Simplified Neuron Model as a Principal Component Analyzer." Journal of Mathematical Biology, 15(3), 267-273. DOI
Pehlevan, C. & Chklovskii, D. B. (2019). "Neuroscience-Inspired Online Data Streaming Algorithms." Sandia National Laboratories Report. Anti-Hebbian and competitive learning for online decorrelation and whitening.
Piedrafita, G., Montero, F., Moran, F., Cardenas, M. L., & Cornish-Bowden, A. (2010). "A Simple Self-Maintaining Metabolic System: Robustness, Autocatalysis, Bistability." PLOS Computational Biology, 6(8). DOI
Tsitouras, Ch. (2011). "Runge-Kutta Pairs of Order 5(4) Satisfying Only the First Column Simplifying Assumption." Computers & Mathematics with Applications, 62(2), 770-775. DOI
Krotov, D. & Hopfield, J. J. (2019). "Unsupervised Learning by Competing Hidden Units." Proceedings of the National Academy of Sciences, 116(16), 7723-7731. DOI