Consider a 90$^\circ$ rotation of an image:
where $T_{90^\circ}$ is an $784 \times 784$ matrix that acts on vectorized images $\mathbf{x} \in \mathbb{R}^{784}$ via a matrix-vector product.
where $$\tilde{T}_{\theta} = \begin{bmatrix} \text{cos }\theta & -\text{sin }\theta \\ \text{sin }\theta & \text{cos }\theta \end{bmatrix} \:.$$
Given an input $\mathbf{x} \in \mathbb{R}^n$:
Before we look into the details of the invariant subspace $\mathbb{W}$, first lets see how it affects the invariance properties of the G-invariant neuron.
The following images compute $\mathbf{w}^T \mathbf{x}$ and $\mathbf{w}^T (T_{90^\circ}\mathbf{x})$ for a random $\mathbf{w} \in \mathbb{W}$.
from g_invariance import *
from g_invariance import *
rotation_angle = 90 # Experiment with other multiples of 90 degrees.
show_dot_product(img_4, invariant_w)
show_dot_product(np.rot90(img_4, k=rotation_angle//90), invariant_w)
Hence, the output of the neuron $\sigma(\mathbf{w}^T \mathbf{x} + b)$ is the same for the original image and its rotated version (same can be seen for rotation by any multiple of $90^\circ$).
This is because all $\mathbf{w} \in \mathbb{W}$ remain invariant under rotations of $90^\circ$ multiples, i.e., $\mathbf{w}^T = \mathbf{w}^T T_{90^\circ}$.
wT = invariant_w.T
wT_T90 = wT @ T90
fig, axes = plt.subplots(1, 2, figsize=(12//2, 9//2))
axes[0].imshow(wT.reshape((28, 28)), cmap='gray')
axes[0].set_title(r"$\mathbf{w} \in \mathbb{W}$ (unvectorized)")
axes[0].set_xticks([])
axes[0].set_yticks([])
axes[1].imshow(wT_T90.reshape((28, 28)), cmap='gray')
axes[1].set_title(r"$\mathbf{w}^T ~T_{90^\circ}$ (unvectorized)")
axes[1].set_xticks([])
axes[1].set_yticks([])
plt.show()
In what follows we describe the concepts need to understand the invariant subspace $\mathbb{W}$.
We begin with a general description of groups and their close relation to symmetries.
Then, we show how to construct the Reynolds operator of any finite group by averaging all the transformations in the group. We depict a key property of the Reynolds operator need for our G-invariant neuron.
We construct $\mathbb{W}$ as the left eigenspace of the Reynolds operator and prove the invariance properties of any $\mathbf{w} \in \mathbb{W}$.
At the end, we give a step-by-step example of constructing a G-invariant neuron.
Step-by-step examples
A transformation group consists of a set of transformations $\mathcal{G}$ and a binary composition operation $\circ$ defined between any two elements of $\mathcal{G}$:
Yes. For example:
No. For example:
Let us add $T_{180^\circ}$ to $\overline{\mathcal{G}}_\text{flip}$ and check again.
Yes (check).
A group $\mathcal{G}$ is called Abelian or commutative if $A\circ B=B \circ A$ for all $A,B \in \mathcal{G}$.
Yes.
For example: $T_\text{h-flip} \circ T_{90^\circ} = T_\text{diag2-flip} \neq T_\text{diag1-flip} = T_{90^\circ} \circ T_\text{h-flip}$
If $\mathcal{G}$ has a finite number of elements it is called a finite group, otherwise it is an infinite group.
$\mathcal{H}$ is called a subgroup of a group $\mathcal{G}$ (denoted $\mathcal{H} \leq \mathcal{G}$ ) if:
For us, the concept of group insofar as it acts as a transformation on an input.
A (linear finite dimensional) group representation $\rho : G \to \text{GL}(m,\mathbb{R})$ associates each $g \in G$ to an invertible matrix $\rho(g) \in \mathbb{R}^{m \times m}$ that acts on $\mathbb{R}^m$.
The representation specifies how objects transform under the group, and can be considered a specification of the type of an object.
Set representation example: $\pi \in \mathbb{S}_n$ be a permutation of $\{1,\ldots,n\}$. Define $\rho(\pi)$ as the (linear) group representation of $\pi$ that will permute the 3D points in the sequence ${\bf x} \in \mathbb{R}^{n \times 3}$ according to the permutation provided by $\pi$.
Graph representation example: $\pi \in \mathbb{S}_n$ be a permutation of $\{1,\ldots,n\}$. Define $\rho(\pi)$ as the (linear) group representation of $\pi$ that will jointly permute the rows an columns of an adjacency matrix ${\bf A} \in \mathbb{R}^{n\times n}$.
In this tutorial, we will always operate on vectorized input such as the above.
$\text{GL}(n,\mathbb{R}) = \{A \in \mathbb{R}^{n \times n} ~:~ A \text{ is invertible}\}$ is a group of $n\times n$ invertible matrices with the binary operation defined as the matrix product.
Satisfies group properties:
Other properties:
Every matrix $A \in \text{GL}(n, \mathbb{R})$ is associated to a bijective linear transformation $T_A: \mathbb{R}^n \to \mathbb{R}^n$ as follows:
$$T_A(\mathbf{x}) = A\mathbf{x}\:, \qquad \forall \mathbf{x} \in \mathbb{R}^n$$Any subgroup of $\text{GL}(n, \mathbb{R})$ is called a linear group.
Consider $A = \begin{bmatrix} 0 & 1 \\ 1 & 0 \end{bmatrix} \in \text{GL}(2, \mathbb{R})$ and $\mathbf{x} = \begin{bmatrix} x_1 \\ x_2 \end{bmatrix} \in \mathbb{R}^2$. Then the transformation $T_A: \mathbb{R}^2 \to \mathbb{R}^2$ transforms $\mathbf{x}$ as follows: $$T_A\left(\begin{bmatrix} x_1 \\ x_2 \end{bmatrix}\right) = \begin{bmatrix} x_2 \\ x_1 \end{bmatrix}$$
Consider an input $\mathbf{x} \in \mathbb{R}^{3 m^2}$ representing a vectorized $m \times m$ RGB image.
We consider the general linear group, $\text{GL}(3m^2,\mathbb{R})$. Recall that any matrix $A \in \text{GL}(3m^2, \mathbb{R})$ is an invertible $3m^2 \times 3m^2$ matrix and acts on the vectorized image $x$ via the matrix-vector product $Ax$.
Q: Is there a transformation in $\text{GL}(3m^2, \mathbb{R})$ equivalent to $T_{45^\circ}$ that rotates an $m\times m$ RGB image by $45^\circ$?
Q: Can the $45^\circ$ rotation and scaling $T^{(45 \text{ & scale})} \in G(3 m^2,\mathbb{R})$?
A:: No. In the definition of a group, if $A,B \in G$, then $A B \in G$. But that does not happen in this scenario:
A transformation $\bar{T}$ is $G$-invariant iff $\forall T \in G$ and $\forall {\bf x} \in \mathbb{R}^m$ we have $$\bar{T} \circ (T \circ {\bf x}) = \bar{T}\circ {\bf x}.$$
Lemma: Reynolds operator (Billik and Rota, 1960). The transformation $$\bar{T} = \frac{1}{|G|} \sum_{T \in G} T,$$ is $G$-invariant.
# Consider a grayscale 2x2 image. The vectorized image has dimension n = 4.
# Consider the rotation group over this image: {T_0, T_90, T_180, T_270}.
# All T_\theta are 4 x 4 invertible matrices.
# We will see later how to construct these transformation matrices.
T0 = np.array([[1., 0., 0., 0.],
[0., 1., 0., 0.],
[0., 0., 1., 0.],
[0., 0., 0., 1.]])
T90 = np.array([[0., 1., 0., 0.],
[0., 0., 0., 1.],
[1., 0., 0., 0.],
[0., 0., 1., 0.]])
T180 = np.array([[0., 0., 0., 1.],
[0., 0., 1., 0.],
[0., 1., 0., 0.],
[1., 0., 0., 0.]])
T270 = np.array([[0., 0., 1., 0.],
[1., 0., 0., 0.],
[0., 0., 0., 1.],
[0., 1., 0., 0.]])
# Show that the transformation above are correct using an example 2x2 image.
img = np.random.rand(2, 2)
vectorized_img = img.reshape(-1)
img_rot90 = (T90 @ vectorized_img).reshape(2, 2)
img_rot180 = (T180 @ vectorized_img).reshape(2, 2)
img_rot270 = (T270 @ vectorized_img).reshape(2, 2)
fig, axes = plt.subplots(1, 4, figsize=(12, 9))
axes[0].imshow(img, cmap='gray')
axes[0].set_title(r'$T_{0^\circ} x$', fontsize=22)
axes[0].set_xticks([])
axes[0].set_yticks([])
axes[1].imshow(img_rot90, cmap='gray')
axes[1].set_title(r'$T_{90^\circ} x$', fontsize=22)
axes[1].set_xticks([])
axes[1].set_yticks([])
axes[2].imshow(img_rot180, cmap='gray')
axes[2].set_title(r'$T_{180^\circ} x$', fontsize=22)
axes[2].set_xticks([])
axes[2].set_yticks([])
axes[3].imshow(img_rot270, cmap='gray')
axes[3].set_title(r'$T_{270^\circ} x$', fontsize=22)
axes[3].set_xticks([])
axes[3].set_yticks([])
plt.show()
# Construct the Reynolds operator Tbar
Tbar = (T0 + T90 + T180 + T270)/4
# Show that Tbar @ T_\theta = Tbar for all T_\theta
display(Markdown('$\overline{T} = $'))
print(Tbar)
display(Markdown('$\overline{T} ~T_{0^\circ} = $'))
print(Tbar @ T0)
display(Markdown('$\overline{T} ~T_{90^\circ} = $'))
print(Tbar @ T90)
display(Markdown('$\overline{T} ~T_{180^\circ} = $'))
print(Tbar @ T180)
display(Markdown('$\overline{T} ~T_{270^\circ} = $'))
print(Tbar @ T270)
$\overline{T} = $
[[0.25 0.25 0.25 0.25] [0.25 0.25 0.25 0.25] [0.25 0.25 0.25 0.25] [0.25 0.25 0.25 0.25]]
$\overline{T} ~T_{0^\circ} = $
[[0.25 0.25 0.25 0.25] [0.25 0.25 0.25 0.25] [0.25 0.25 0.25 0.25] [0.25 0.25 0.25 0.25]]
$\overline{T} ~T_{90^\circ} = $
[[0.25 0.25 0.25 0.25] [0.25 0.25 0.25 0.25] [0.25 0.25 0.25 0.25] [0.25 0.25 0.25 0.25]]
$\overline{T} ~T_{180^\circ} = $
[[0.25 0.25 0.25 0.25] [0.25 0.25 0.25 0.25] [0.25 0.25 0.25 0.25] [0.25 0.25 0.25 0.25]]
$\overline{T} ~T_{270^\circ} = $
[[0.25 0.25 0.25 0.25] [0.25 0.25 0.25 0.25] [0.25 0.25 0.25 0.25] [0.25 0.25 0.25 0.25]]
Proof: Consider an arbitrary transformation $T_\dagger \in G$. Then $$\bar{T} \circ T_\dagger = \frac{1}{\vert G \vert}\sum_{T \in G} T \circ T_\dagger = \frac{1}{\vert G \vert}\sum_{T' \in G_\dagger} T' ,$$ where we define $G_\dagger = \{T \circ T_\dagger: \forall T \in G \}$.
We call $\mathbb{W}$ a subspace of a vector space $\mathbb{V}$ if $\mathbb{W} \subseteq \mathbb{V}$ and:
No. $\mathbf{0} \notin \mathbb{W}$.
$\mathbb{W}$ is the subset of all vectors on this line. Is this a subspace?
Yes.
$\mathbb{W}$ is the subset of all vectors within this circle. Is this a subspace?
No. Consider any vector $\mathbf{v}$ on the boundary of the circle. Then 2$\mathbf{v}$ is not within the circle.
$\mathbb{W}$ is the subset of all vectors on this plane passing through origin. Is this a subspace?
Yes. For any two vectors $\mathbf{x}, \mathbf{y}$ in the plane, $\mathbf{x} + \mathbf{y}$ is in the plane, as is $\lambda \mathbf{x}$ for all $\lambda \in \mathbb{R}$.
Given a set of vectors $S = \{\mathbf{x}_1, \ldots, \mathbf{x}_n\}$, the span of $S$ is the set of all linear combinations of the vectors in $S$, i.e., $$ \text{span}(S) = \{\lambda_1 \mathbf{x}_1 + \ldots + \lambda_n \mathbf{x}_n ~|~ \lambda_1,\ldots,\lambda_n \in \mathbb{R}\} $$
A set of vectors $B = \{\mathbf{b}_1, \ldots, \mathbf{b}_d\}$ is called the basis of a subspace $\mathbb{U}$ if:
Then $d$ is known as the dimension of subspace $\mathbb{U}$ and is unique.
Our goal is to learn a neuron with
This means that the output of the neuron does not depend on the transformation of $G$ applied to the input.
(Special case) A function $f:\mathbb{R}^m \to \mathbb{R}^m$ (endomorphism) is $G$-equivariant iff $\forall T \in G$ and $\forall {\bf x} \in \mathbb{R}^m$ we have $$f(T {\bf x}) = T f({\bf x}).$$
$$\rho_1 :\mathbb{R}^m \to \mathbb{R}^m \quad \text{and} \quad \rho_2 :\mathbb{R}^d \to \mathbb{R}^d.$$
We will first focus on G-invariant representations.
A subspace $M \subseteq \mathbb{R}^d$, $d \geq 1$, is an invariant subspace of a linear transformation $\bar{T}:\mathbb{R}^d \to \mathbb{R}^d$ if $$ {\bf w}^T \bar{T} \in M, ~\forall {\bf w}^T \in M. $$
For instance. The left 1-eigenspace is an invariant subspace defined as: $$\text{Left-1-Eig}(\bar{T}) = \{ {\bf w} \in \mathbb{R}^d | {\bf w}^T \bar{T} = {\bf w}^T \}.$$
For an eigenvalue $\lambda$, is $\text{$\lambda$-Left-Eig}(\bar{T})$ a subspace of $\mathbb{R}^n$?
$\mathbf{0} \in \text{$\lambda$-Left-Eig}(\bar{T})$
$\mathbf{u}, \mathbf{v} \in \text{$\lambda$-Left-Eig}(\bar{T}) \implies (\mathbf{u} + \mathbf{v})^T \bar{T} = \mathbf{u}^T \bar{T} + \mathbf{v}^T \bar{T} =$ $= \lambda (\mathbf{u} + \mathbf{v})^T \implies (\mathbf{u} + \mathbf{v}) \in \text{$\lambda$-Left-Eig}(\bar{T}).$
$\mathbf{u} \in \text{$\lambda$-Left-Eig}(\bar{T}) \implies \alpha \mathbf{u}^T \bar{T} = \lambda (\alpha \mathbf{u}^T) \implies \alpha \mathbf{u} \in \text{$\lambda$-Left-Eig}(\bar{T})$.
What is the basis of $\text{$\lambda$-Left-Eig}(\bar{T})$?
The Reynolds operator $\overline{T}$ is a projection (i.e., $\overline{T}^2 = \overline{T}$), thus all the eigenvalues of $\overline{T}$ are either 0 or 1.
You can verify this: $\overline{T}^2 = \overline{T} \frac{1}{|G|} \sum_{T \in G} T = \frac{1}{|G|} \sum_{T \in G} \overline{T} T = \frac{1}{\vert G \vert}\sum_{T' \in G_\dagger} T' = \frac{1}{|G|} \sum_{T \in G} T = \overline{T}$, since we saw earlier that $G_\dagger = G$.
If $\mathbf{v}^T \neq \mathbf{0}$ is a left eigenvector of $\overline{T}$ with eigenvalue $\lambda$, then $\mathbf{v}^T \overline{T}^2 = (\lambda \mathbf{v}^T) \overline{T} = \lambda^2 \mathbf{v}^T$. But also, $\mathbf{v}^T \overline{T}^2 = \mathbf{v}^T \overline{T} = \lambda \mathbf{v}^T$. Thus, $\lambda \in \{0,1\}$.
The left eigenspace of $\overline{T}:\mathbb{R}^n \to \mathbb{R}^n$ corresponding to the eigenvalue 1 is given by,
$$ \text{1-Left-Eig}(\overline{T}) = \{\mathbf{w}\in\mathbb{R}^n ~|~ \mathbf{w}^T \overline{T} = \mathbf{w}^T\} \subseteq \mathbb{R}^n $$with the basis $\{\mathbf{v}_1, \ldots \mathbf{v}_d\}$ comprising of linearly independent left eigenvectors (corresponding to eigenvalue $1$) of $\overline{T}$.
In other words, $\text{1-Left-Eig}(\overline{T}) = \text{span}(\{\mathbf{v}_1, \ldots \mathbf{v}_d\}$).
Proposition (G-invariant neuron):
As proposed by (Yarotsky (2018) and Kondor & Trivedi (ICML 2018))
Let $\mathcal{W}$ denote the right eigenspace corresponding to the eigenvalue 1 of the Reynolds operator $\bar{T}$ for the group $G$.
Let ${\bf v}_1,\ldots,{\bf v}_k$ be the left eigenvectors of $\bar{T}$. That is, $$ {\bf v}_i^T \bar{T} = \lambda_i {\bf v}_i, \quad i =1,\ldots,k. $$
For any transformed input ${\bf x}' = T {\bf x}$, $T \in G$,
$$\sigma({\bf w}^T {\bf x}' + b) = \sigma({\bf w}^T {\bf x} + b).$$In other words, the neuron output is the same for ${\bf x}'$ and ${\bf x}$.
Proof:
Image Task: MNIST handwritten digit classification task consisting of $28 \times 28$ images.
Our tranformation group of interest is the $\mathcal{G}_\text{rot} \equiv \{T({\theta})\}_{\theta\in \{0^\circ,90^\circ,180^\circ,270^\circ\}}$, which rotates the image by $\theta$ degrees, and $T({0^\circ})$ is the identity element.
Consider the MNIST dataset of handwritten digits acted upon by $\mathcal{G}_\text{rot}$:
Graph task: Graph classification task consisting of graphs with $n$ nodes
We would like to obtain the Reynolds operator as an $n^2 \times n^2$ matrix operating over the vectorized vector $\mathbf{x} \in \mathbb{R}^{n^2}$ instead.
The transformation groups over images are typically given as black-box functions. For example, $\mathcal{G}_\text{rot}$ is typically given via the numpy.rot90
function over an $n \times n$ image (grayscale for MNIST).
It is easy to construct the Reynolds operator $\overline{T}$ as a function over $n \times n$ images.
def Tbar_function(img):
# Reynolds operator of the rotation group: {T_0, T_90, T_180, T_270}
# img shape: (H, W)
# Outputs x transformed by the Reynolds operator.
transformed_img = np.zeros_like(img)
for rotationDeg in [0, 90, 180, 270]:
transformed_img += np.rot90(img, k=rotationDeg//90)
return transformed_img / 4
img_4_Tbar = Tbar_function(img_4)
fig, axes = plt.subplots(1, 2, figsize=(12//2,9//2))
axes[0].imshow(img_4, cmap='gray')
axes[0].set_title(r'Original $28\times 28$ image', fontsize=16)
axes[0].set_xticks([])
axes[0].set_yticks([])
axes[1].imshow(img_4_Tbar, cmap='gray')
axes[1].set_title(r'Tranformed by $\overline{T}$', fontsize=16)
axes[1].set_xticks([])
axes[1].set_yticks([])
plt.show()
def construct_Tbar_matrix(image_shape):
x_shape = np.prod(image_shape) # Shape of the vectorized image
e = np.eye(x_shape) # Standard basis
Tbar_columns = [] # List of columns of Tbar
for i in range(x_shape):
# Transform e_i by Tbar_function to obtain column i of Tbar
Tbar_column_i = Tbar_function(e[i].reshape(image_shape))
# Append to the list
Tbar_columns.append(Tbar_column_i.reshape(-1))
# Stack all columns to construct Tbar
return np.stack(Tbar_columns, axis=1)
# Construct Tbar for rotation group over 28 x 28 grayscale images (MNIST).
Tbar_matrix = construct_Tbar_matrix(img_4.shape)
# Check that Tbar as a black-box function and Tbar_matrix transform images the same way.
# - Tbar_function acts directly on images.
# - Tbar_matrix is matrix that multiplies with vectorized image.
img_4_Tbar_matrix = (Tbar_matrix @ np.array(img_4).reshape(-1)).reshape(img_4.shape)
fig, axes = plt.subplots(1, 3, figsize=(12, 9))
axes[0].imshow(img_4, cmap='gray')
axes[0].set_title(r'Original $28\times 28$ image', fontsize=16)
axes[0].set_xticks([])
axes[0].set_yticks([])
axes[1].imshow(img_4_Tbar, cmap='gray')
axes[1].set_title(r'Tranformed by $\overline{T}$')
axes[1].set_xticks([])
axes[1].set_yticks([])
axes[2].imshow(img_4_Tbar_matrix, cmap='gray')
axes[2].set_title(r'Tranformed by $\overline{T}$ as a matrix')
axes[2].set_xticks([])
axes[2].set_yticks([])
plt.show()
import itertools
import numpy as np
import math
# Five nodes
n = 5
# Adjacency matrix skeleton
A = np.zeros((n,n))
# Reynolds operator matrix to obtain a permutation-invariant representation for the entire graph
Tbar = np.zeros((n**2,n**2))
# Get Reynolds operator matrix
for i in range(n**2):
#e_i = (0, \ldots, 1, \ldots, 0) in matrix form$
A[i // n, i % n] = 1
for idx in itertools.permutations(list(range(n))):
# Get the joint permutation of rows and columns of matrix A
myT = A[:,idx]
myT = myT[idx,:]
# This is the i-th row of the permutation matrix
Tbar[i,:] += myT.flatten()
# Prepare erase the one and prepare for next indicator value
A[i // n, i % n] = 0
# Reynolds operator is the average (not the sum)
Tbar = Tbar / math.factorial(n)
print('Change the value of n and observe a pattern emerge (i.e., we can directly construct Tbar)')
print(f'The Reynolds operator Tbar is a {Tbar.shape[0]}x{Tbar.shape[1]} matrix:\n {Tbar}')
Change the value of n and observe a pattern emerge (i.e., we can directly construct Tbar) The Reynolds operator Tbar is a 25x25 matrix: [[0.2 0. 0. 0. 0. 0. 0.2 0. 0. 0. 0. 0. 0.2 0. 0. 0. 0. 0. 0.2 0. 0. 0. 0. 0. 0.2 ] [0. 0.05 0.05 0.05 0.05 0.05 0. 0.05 0.05 0.05 0.05 0.05 0. 0.05 0.05 0.05 0.05 0.05 0. 0.05 0.05 0.05 0.05 0.05 0. ] [0. 0.05 0.05 0.05 0.05 0.05 0. 0.05 0.05 0.05 0.05 0.05 0. 0.05 0.05 0.05 0.05 0.05 0. 0.05 0.05 0.05 0.05 0.05 0. ] [0. 0.05 0.05 0.05 0.05 0.05 0. 0.05 0.05 0.05 0.05 0.05 0. 0.05 0.05 0.05 0.05 0.05 0. 0.05 0.05 0.05 0.05 0.05 0. ] [0. 0.05 0.05 0.05 0.05 0.05 0. 0.05 0.05 0.05 0.05 0.05 0. 0.05 0.05 0.05 0.05 0.05 0. 0.05 0.05 0.05 0.05 0.05 0. ] [0. 0.05 0.05 0.05 0.05 0.05 0. 0.05 0.05 0.05 0.05 0.05 0. 0.05 0.05 0.05 0.05 0.05 0. 0.05 0.05 0.05 0.05 0.05 0. ] [0.2 0. 0. 0. 0. 0. 0.2 0. 0. 0. 0. 0. 0.2 0. 0. 0. 0. 0. 0.2 0. 0. 0. 0. 0. 0.2 ] [0. 0.05 0.05 0.05 0.05 0.05 0. 0.05 0.05 0.05 0.05 0.05 0. 0.05 0.05 0.05 0.05 0.05 0. 0.05 0.05 0.05 0.05 0.05 0. ] [0. 0.05 0.05 0.05 0.05 0.05 0. 0.05 0.05 0.05 0.05 0.05 0. 0.05 0.05 0.05 0.05 0.05 0. 0.05 0.05 0.05 0.05 0.05 0. ] [0. 0.05 0.05 0.05 0.05 0.05 0. 0.05 0.05 0.05 0.05 0.05 0. 0.05 0.05 0.05 0.05 0.05 0. 0.05 0.05 0.05 0.05 0.05 0. ] [0. 0.05 0.05 0.05 0.05 0.05 0. 0.05 0.05 0.05 0.05 0.05 0. 0.05 0.05 0.05 0.05 0.05 0. 0.05 0.05 0.05 0.05 0.05 0. ] [0. 0.05 0.05 0.05 0.05 0.05 0. 0.05 0.05 0.05 0.05 0.05 0. 0.05 0.05 0.05 0.05 0.05 0. 0.05 0.05 0.05 0.05 0.05 0. ] [0.2 0. 0. 0. 0. 0. 0.2 0. 0. 0. 0. 0. 0.2 0. 0. 0. 0. 0. 0.2 0. 0. 0. 0. 0. 0.2 ] [0. 0.05 0.05 0.05 0.05 0.05 0. 0.05 0.05 0.05 0.05 0.05 0. 0.05 0.05 0.05 0.05 0.05 0. 0.05 0.05 0.05 0.05 0.05 0. ] [0. 0.05 0.05 0.05 0.05 0.05 0. 0.05 0.05 0.05 0.05 0.05 0. 0.05 0.05 0.05 0.05 0.05 0. 0.05 0.05 0.05 0.05 0.05 0. ] [0. 0.05 0.05 0.05 0.05 0.05 0. 0.05 0.05 0.05 0.05 0.05 0. 0.05 0.05 0.05 0.05 0.05 0. 0.05 0.05 0.05 0.05 0.05 0. ] [0. 0.05 0.05 0.05 0.05 0.05 0. 0.05 0.05 0.05 0.05 0.05 0. 0.05 0.05 0.05 0.05 0.05 0. 0.05 0.05 0.05 0.05 0.05 0. ] [0. 0.05 0.05 0.05 0.05 0.05 0. 0.05 0.05 0.05 0.05 0.05 0. 0.05 0.05 0.05 0.05 0.05 0. 0.05 0.05 0.05 0.05 0.05 0. ] [0.2 0. 0. 0. 0. 0. 0.2 0. 0. 0. 0. 0. 0.2 0. 0. 0. 0. 0. 0.2 0. 0. 0. 0. 0. 0.2 ] [0. 0.05 0.05 0.05 0.05 0.05 0. 0.05 0.05 0.05 0.05 0.05 0. 0.05 0.05 0.05 0.05 0.05 0. 0.05 0.05 0.05 0.05 0.05 0. ] [0. 0.05 0.05 0.05 0.05 0.05 0. 0.05 0.05 0.05 0.05 0.05 0. 0.05 0.05 0.05 0.05 0.05 0. 0.05 0.05 0.05 0.05 0.05 0. ] [0. 0.05 0.05 0.05 0.05 0.05 0. 0.05 0.05 0.05 0.05 0.05 0. 0.05 0.05 0.05 0.05 0.05 0. 0.05 0.05 0.05 0.05 0.05 0. ] [0. 0.05 0.05 0.05 0.05 0.05 0. 0.05 0.05 0.05 0.05 0.05 0. 0.05 0.05 0.05 0.05 0.05 0. 0.05 0.05 0.05 0.05 0.05 0. ] [0. 0.05 0.05 0.05 0.05 0.05 0. 0.05 0.05 0.05 0.05 0.05 0. 0.05 0.05 0.05 0.05 0.05 0. 0.05 0.05 0.05 0.05 0.05 0. ] [0.2 0. 0. 0. 0. 0. 0.2 0. 0. 0. 0. 0. 0.2 0. 0. 0. 0. 0. 0.2 0. 0. 0. 0. 0. 0.2 ]]
Now we can find the left eigenvectors $\mathbf{v}_j \in \mathbb{R}^{n^2}$ of $\overline{T}$ with eigenvalue $\lambda_j =1$.
Recall that our neuron parameters live in the invariant subspace $\mathbb{W} = \text{span}(\{\mathbf{v}_j\}_{\forall j | \lambda_j = 1})$
# Eigenvalues and eigenvectors of a real symmetric matrix Tbar.
img_eigenvalues, img_eigenvectors = np.linalg.eigh(Tbar_matrix)
# Find the rank of Tbar (here, equivalent to the number of eigenvectors with eigenvalue 1).
img_rank = np.linalg.matrix_rank(np.diag(img_eigenvalues), hermitian=True)
print(f'Rank of Tbar_matrix of size {Tbar_matrix.shape[0]} x {Tbar_matrix.shape[1]}: {rank}')
# Eigenvectors with eigenvalue 1 (eigenvalues are arranged in ascending order).
img_one_eigenvectors = img_eigenvectors[:, -img_rank:]
np.savetxt('basis.npy', img_one_eigenvectors)
Rank of Tbar_matrix of size 784 x 784: 196
Lets visualize a few unvectorized eigenvectors:
fig, axes = plt.subplots(2, 4, figsize=(12, 9))
for i in range(8):
img_eigenvector = img_one_eigenvectors[:, i].reshape((28, 28))
axes[i//4][i%4].imshow(img_eigenvector, cmap='gray')
axes[i//4][i%4].set_xticks([])
axes[i//4][i%4].set_yticks([])
plt.tight_layout()
plt.show()
Recall that any vector in a subspace can be identified by a set of coefficients corresponding to the basis of the subspace. In this case, the basis are the eigenvectors.
Any $\mathbf{w} \in \mathcal{W}$ can be written as
for some $\alpha_i \in \mathbb{R}$.
Now lets visualize a random $\mathbf{w}$ unvectorized, i.e., with random scalar values for $\alpha_i$ in equation above:
random_alpha = np.random.rand(196)
random_w_in_eigenspace = img_one_eigenvectors @ random_alpha
fig, axes = plt.subplots(1, 2, figsize=(12//2, 9//2))
axes[0].imshow(random_w_in_eigenspace.reshape((28, 28)), cmap='gray')
axes[0].set_title(r"Random $\mathbf{w} \in \mathcal{W}$")
axes[0].set_xticks([])
axes[0].set_yticks([])
axes[1].imshow(np.rot90(random_w_in_eigenspace.reshape((28, 28)), k=1), cmap='gray')
axes[1].set_title(r"$\mathbf{w}^T T_{90^\circ}$")
axes[1].set_xticks([])
axes[1].set_yticks([])
plt.show()
Any $\mathbf{w} \in \mathcal{W}$ is invariant to $\mathcal{G}_\text{rot}$, i.e., rotations of $90^\circ$ multiples.
# Tbar is the Reynolds operator of the adjacency matrix permutation group
graph_eigenvalues, graph_eigenvectors = np.linalg.eigh(Tbar)
print(f'eigenvalues={graph_eigenvalues}\n')
# Find the rank of Tbar (here, equivalent to the number of eigenvectors with eigenvalue 1).
graph_rank = np.linalg.matrix_rank(np.diag(graph_eigenvalues), hermitian=True)
print(f'Rank of Tbar of size {Tbar.shape[0]} x {Tbar.shape[1]}: {graph_rank}')
# Eigenvectors with eigenvalue 1 (eigenvalues are arranged in ascending order).
graph_one_eigenvectors = graph_eigenvectors[:, -graph_rank:]
eigenvalues=[-4.31809982e-017 -1.61747315e-017 -6.69789612e-032 -5.34118705e-034 -7.58467252e-048 -1.69136529e-049 -2.77989923e-050 -4.57834487e-052 -2.97584285e-064 -2.25538818e-067 -2.35127721e-069 -5.09828814e-080 -1.20036095e-080 -3.07372861e-113 -6.49430519e-146 1.29973049e-145 3.71107657e-096 9.04479804e-064 1.60555814e-052 1.77303730e-048 3.55812649e-034 2.49873277e-016 2.84684090e-016 1.00000000e+000 1.00000000e+000] Rank of Tbar of size 25 x 25: 2
Note that for invariant graph representations, the invariant subspace $\mathbb{W}$ has only two bases (dimension two).
fig, axes = plt.subplots(1, 2, figsize=(12//2, 9//2))
for i in range(2):
graph_eigenvector = graph_one_eigenvectors[:, i].reshape((n, n))
axes[i].imshow(graph_eigenvector, cmap='gray')
axes[i].set_xticks([])
axes[i].set_yticks([])
plt.tight_layout()
plt.show()
The neuron is defined as: $$ \sigma({\bf w}^T {\bf x} + b) \quad \text{ with } \mathbf{w} \in \mathbb{W} \subseteq \mathbb{R}^n\:, $$
show_dot_product(img_4, random_w_in_eigenspace)
show_dot_product(img_4_rot90, random_w_in_eigenspace)
# Random adjacency matrix
A = np.random.rand(n,n)
random_graph_alpha = np.random.rand(2)
random_graph_w_in_eigenspace = graph_one_eigenvectors @ random_graph_alpha
show_dot_product(A,random_graph_w_in_eigenspace,size=n)
# Joint permutation of rows and columns
idx = [3,2,1,0,4]
TA = A[:,idx]
TA = TA[idx,:]
show_dot_product(TA,random_graph_w_in_eigenspace,size=n)
from g_invariance import *
plot_data(train_loader, valid_loader, in_domain_test_loader, rotated_test_loader)