Complex Log-Mean-Exp Networks

1. Core definition

A complex log-mean-exp unit (LME unit) is defined as

$$y = \frac{1}{\beta}\,\log\Big(\widetilde{\sum_{i=1}^n w_i \exp (\beta \, x_i)} \Big)$$

where

$x_i \in \mathbb{C}$ are the complex inputs.
$y$ is the unit’s output.
$w_i \in \mathbb{R}$ (or $\mathbb{C}$) are learned weights.
$\beta \in \mathbb{R}$ (or $\mathbb{C}$) is a learned sharpness parameter.
$\exp : \mathbb{C} \rightarrow L$, where $L = \{ (r, \theta) \mid r > 0,\ \theta \in \mathbb{R} \}$, is the lifted exponential: $\exp (u + i v) = (e^u, v)$.
$\log : L \rightarrow \mathbb{C}$ is the unwrapped complex logarithm: $\log \, (r, \theta) = \ln(r) + i \theta$.
the notation $\widetilde{(\cdot)}$ means: first form the complex sum $z = \sum_{i=1}^n w_i \exp(\beta \, x_i)$, then lift $z$ to $L$ by choosing the phase continuously along the parameter path (i.e. by phase unwrapping).

This mapping sends inputs in $\mathbb{C}$ through exponentials into $L$, multiplies by the weights in $L$, projects the result to $\mathbb{C}$ in order to sum it linearly, and maps back analytically to $\mathbb{C}$ via the helical logarithm.

Note: $\exp$ and $\log$ form an analytic isomorphism between the complex plane and the Riemann surface of the logarithm $L$.

2. Behavior

$\lim_{\beta\to+\infty} \, y \;=\; x_k$, where $k$ is any index attaining $\max_i \operatorname{Re}(x_i)$.
$\lim_{\beta\to-\infty} \, y \;=\; x_k$, where $k$ is any index attaining $\min_i \operatorname{Re}(x_i)$.
$\lim_{\beta\to 0} \, y \;=\; \frac{\sum_i w_i \, x_i}{\sum_i w_i} \,+ \, \tfrac{1}{\beta}\,\log\big(\sum_i w_i\big)$, so, provided $\lim_{\beta\to 0} \, \frac{1}{\beta}\,\log\big(\sum_i w_i\big) \,=\, 0$, we have $\lim_{\beta\to 0} \, y \;=\; \sum_i w_i \, x_i$.

$\beta$ thus determines the softness of the weighted min (for negative $\beta$) or max (for positive $\beta$).

The function is smooth and complex-analytic away from the zeros of $\sum_i w_i \exp(\beta \, x_i)$, which are isolated and of measure zero.

3. Technical considerations

To avoid the case when the argument of the logarithm is too close to 0, it can be passed through any smooth mapping that keeps a small radius away from zero, for example $\phi\,(r, \theta) = (r+\varepsilon, \theta)$.

The $w_i$ can also easily be parametrized in such a way that $\lim_{\beta\to 0} \, \frac{1}{\beta}\,\log\big(\sum_i w_i\big) \,=\, 0$. For example, $w_i=\omega_i-\frac{\exp(-|\beta|^2 / \epsilon^2)}{n}\Big(\sum_{j}\omega_j-1\Big)$, where the $\omega_i$ are the true free weights. Additionally, to minimize reparametrization distortion, a regularization term that encourages the $\omega_i$ to sum to 1 when $\beta$ is small can be added to the loss; for example $\lambda \exp\big((|\beta|+\epsilon)^{-1} - |\beta|\big)\cdot\big|1 - \sum_i \omega_i\big|^2$.

The helical logarithm guarantees smoothness along the chosen parameter path: the phase of the lifted sum evolves continuously even when the complex sum circles the origin repeatedly, as long as exact zeros are avoided.

4. Conceptual significance

LME units can replace ordinary neuron activations. This yields a particularly natural class of complex-valued neural networks:

LME units provide a learnable smooth generalization of both mean and extremum operations within a single algebraic form (and more, when $\beta$ is allowed to be complex).
It integrates real and complex geometry seamlessly: weights may remain real, while activations explore the full complex plane.

Because activations live in $\mathbb{C}$, each channel carries two real degrees of freedom (magnitude and phase). For the same width and number of learned weights, an LME layer can therefore transmit more usable signal than a real-valued layer. Unlike ReLU, which is irreversible and discards entire half-spaces, LME maps are analytic and largely invertible for layers with as many outputs as inputs.

Complex Log-Mean-Exp Networks

1. Core definition

2. Behavior

3. Technical considerations

4. Conceptual significance

Comments

More from this blog

Less Overfitting via Stochastic Exposure

Why Transformers Are Powerful

On the Power of Attention

NN Architectures as Generalized Algorithms

Command Palette

1. Core definition

2. Behavior

3. Technical considerations

4. Conceptual significance

Comments

More from this blog