Consider a 90$^\circ$ rotation of an image:
where $T_{90^\circ}$ is an $784 \times 784$ matrix that acts on vectorized images $\mathbf{x} \in \mathbb{R}^{784}$ via a matrix-vector product.
where $$\tilde{T}_{\theta} = \begin{bmatrix} \text{cos }\theta & -\text{sin }\theta \\ \text{sin }\theta & \text{cos }\theta \end{bmatrix} \:.$$
Given an input $\mathbf{x} \in \mathbb{R}^n$:
Before we look into the details of the invariant subspace $\mathbb{W}$, first lets see how it affects the invariance properties of the G-invariant neuron.
The following images compute $\mathbf{w}^T \mathbf{x}$ and $\mathbf{w}^T (T_{90^\circ}\mathbf{x})$ for a random $\mathbf{w} \in \mathbb{W}$.
from g_invariance import *
from g_invariance import *
rotation_angle = 90 # Experiment with other multiples of 90 degrees.
show_dot_product(img_4, invariant_w)
show_dot_product(np.rot90(img_4, k=rotation_angle//90), invariant_w)
Hence, the output of the neuron $\sigma(\mathbf{w}^T \mathbf{x} + b)$ is the same for the original image and its rotated version (same can be seen for rotation by any multiple of $90^\circ$).
This is because all $\mathbf{w} \in \mathbb{W}$ remain invariant under rotations of $90^\circ$ multiples, i.e., $\mathbf{w}^T = \mathbf{w}^T T_{90^\circ}$.
wT = invariant_w.T
wT_T90 = wT @ T90
fig, axes = plt.subplots(1, 2, figsize=(12//2, 9//2))
axes[0].imshow(wT.reshape((28, 28)), cmap='gray')
axes[0].set_title(r"$\mathbf{w} \in \mathbb{W}$ (unvectorized)")
axes[0].set_xticks([])
axes[0].set_yticks([])
axes[1].imshow(wT_T90.reshape((28, 28)), cmap='gray')
axes[1].set_title(r"$\mathbf{w}^T ~T_{90^\circ}$ (unvectorized)")
axes[1].set_xticks([])
axes[1].set_yticks([])
plt.show()
In what follows we describe the concepts need to understand the invariant subspace $\mathbb{W}$.
We begin with a general description of groups and their close relation to symmetries.
Then, we show how to construct the Reynolds operator of any finite group by averaging all the transformations in the group. We depict a key property of the Reynolds operator need for our G-invariant neuron.
We construct $\mathbb{W}$ as the left eigenspace of the Reynolds operator and prove the invariance properties of any $\mathbf{w} \in \mathbb{W}$.
At the end, we give a step-by-step example of constructing a G-invariant neuron.
Step-by-step examples
A transformation group consists of a set of transformations $\mathcal{G}$ and a binary composition operation $\circ$ defined between any two elements of $\mathcal{G}$:
Yes. For example:
No. For example:
Let us add $T_{180^\circ}$ to $\overline{\mathcal{G}}_\text{flip}$ and check again.
Yes (check).
A group $\mathcal{G}$ is called Abelian or commutative if $A\circ B=B \circ A$ for all $A,B \in \mathcal{G}$.
Yes.
For example: $T_\text{h-flip} \circ T_{90^\circ} = T_\text{diag2-flip} \neq T_\text{diag1-flip} = T_{90^\circ} \circ T_\text{h-flip}$
If $\mathcal{G}$ has a finite number of elements it is called a finite group, otherwise it is an infinite group.
$\mathcal{H}$ is called a subgroup of a group $\mathcal{G}$ (denoted $\mathcal{H} \leq \mathcal{G}$ ) if:
For us, the concept of group insofar as it acts as a transformation on an input.
A (linear finite dimensional) group representation $\rho : G \to \text{GL}(m,\mathbb{R})$ associates each $g \in G$ to an invertible matrix $\rho(g) \in \mathbb{R}^{m \times m}$ that acts on $\mathbb{R}^m$.
The representation specifies how objects transform under the group, and can be considered a specification of the type of an object.
Set representation example: $\pi \in \mathbb{S}_n$ be a permutation of $\{1,\ldots,n\}$. Define $\rho(\pi)$ as the (linear) group representation of $\pi$ that will permute the 3D points in the sequence ${\bf x} \in \mathbb{R}^{n \times 3}$ according to the permutation provided by $\pi$.
Graph representation example: $\pi \in \mathbb{S}_n$ be a permutation of $\{1,\ldots,n\}$. Define $\rho(\pi)$ as the (linear) group representation of $\pi$ that will jointly permute the rows an columns of an adjacency matrix ${\bf A} \in \mathbb{R}^{n\times n}$.
In this tutorial, we will always operate on vectorized input such as the above.
$\text{GL}(n,\mathbb{R}) = \{A \in \mathbb{R}^{n \times n} ~:~ A \text{ is invertible}\}$ is a group of $n\times n$ invertible matrices with the binary operation defined as the matrix product.
Satisfies group properties:
Other properties:
Every matrix $A \in \text{GL}(n, \mathbb{R})$ is associated to a bijective linear transformation $T_A: \mathbb{R}^n \to \mathbb{R}^n$ as follows:
$$T_A(\mathbf{x}) = A\mathbf{x}\:, \qquad \forall \mathbf{x} \in \mathbb{R}^n$$Any subgroup of $\text{GL}(n, \mathbb{R})$ is called a linear group.
Consider $A = \begin{bmatrix} 0 & 1 \\ 1 & 0 \end{bmatrix} \in \text{GL}(2, \mathbb{R})$ and $\mathbf{x} = \begin{bmatrix} x_1 \\ x_2 \end{bmatrix} \in \mathbb{R}^2$. Then the transformation $T_A: \mathbb{R}^2 \to \mathbb{R}^2$ transforms $\mathbf{x}$ as follows: $$T_A\left(\begin{bmatrix} x_1 \\ x_2 \end{bmatrix}\right) = \begin{bmatrix} x_2 \\ x_1 \end{bmatrix}$$
Consider an input $\mathbf{x} \in \mathbb{R}^{3 m^2}$ representing a vectorized $m \times m$ RGB image.
We consider the general linear group, $\text{GL}(3m^2,\mathbb{R})$. Recall that any matrix $A \in \text{GL}(3m^2, \mathbb{R})$ is an invertible $3m^2 \times 3m^2$ matrix and acts on the vectorized image $x$ via the matrix-vector product $Ax$.
Q: Is there a transformation in $\text{GL}(3m^2, \mathbb{R})$ equivalent to $T_{45^\circ}$ that rotates an $m\times m$ RGB image by $45^\circ$?
Q: Can the $45^\circ$ rotation and scaling $T^{(45 \text{ & scale})} \in G(3 m^2,\mathbb{R})$?
A:: No. In the definition of a group, if $A,B \in G$, then $A B \in G$. But that does not happen in this scenario:
A transformation $\bar{T}$ is $G$-invariant iff $\forall T \in G$ and $\forall {\bf x} \in \mathbb{R}^m$ we have $$\bar{T} \circ (T \circ {\bf x}) = \bar{T}\circ {\bf x}.$$
Lemma: Reynolds operator (Billik and Rota, 1960). The transformation $$\bar{T} = \frac{1}{|G|} \sum_{T \in G} T,$$ is $G$-invariant.
# Consider a grayscale 2x2 image. The vectorized image has dimension n = 4.
# Consider the rotation group over this image: {T_0, T_90, T_180, T_270}.
# All T_\theta are 4 x 4 invertible matrices.
# We will see later how to construct these transformation matrices.
T0 = np.array([[1., 0., 0., 0.],
[0., 1., 0., 0.],
[0., 0., 1., 0.],
[0., 0., 0., 1.]])
T90 = np.array([[0., 1., 0., 0.],
[0., 0., 0., 1.],
[1., 0., 0., 0.],
[0., 0., 1., 0.]])
T180 = np.array([[0., 0., 0., 1.],
[0., 0., 1., 0.],
[0., 1., 0., 0.],
[1., 0., 0., 0.]])
T270 = np.array([[0., 0., 1., 0.],
[1., 0., 0., 0.],
[0., 0., 0., 1.],
[0., 1., 0., 0.]])
# Show that the transformation above are correct using an example 2x2 image.
img = np.random.rand(2, 2)
vectorized_img = img.reshape(-1)
img_rot90 = (T90 @ vectorized_img).reshape(2, 2)
img_rot180 = (T180 @ vectorized_img).reshape(2, 2)
img_rot270 = (T270 @ vectorized_img).reshape(2, 2)
fig, axes = plt.subplots(1, 4, figsize=(12, 9))
axes[0].imshow(img, cmap='gray')
axes[0].set_title(r'$T_{0^\circ} x$', fontsize=22)
axes[0].set_xticks([])
axes[0].set_yticks([])
axes[1].imshow(img_rot90, cmap='gray')
axes[1].set_title(r'$T_{90^\circ} x$', fontsize=22)
axes[1].set_xticks([])
axes[1].set_yticks([])
axes[2].imshow(img_rot180, cmap='gray')
axes[2].set_title(r'$T_{180^\circ} x$', fontsize=22)
axes[2].set_xticks([])
axes[2].set_yticks([])
axes[3].imshow(img_rot270, cmap='gray')
axes[3].set_title(r'$T_{270^\circ} x$', fontsize=22)
axes[3].set_xticks([])
axes[3].set_yticks([])
plt.show()
# Construct the Reynolds operator Tbar
Tbar = (T0 + T90 + T180 + T270)/4
# Show that Tbar @ T_\theta = Tbar for all T_\theta
display(Markdown('$\overline{T} = $'))
print(Tbar)
display(Markdown('$\overline{T} ~T_{0^\circ} = $'))
print(Tbar @ T0)
display(Markdown('$\overline{T} ~T_{90^\circ} = $'))
print(Tbar @ T90)
display(Markdown('$\overline{T} ~T_{180^\circ} = $'))
print(Tbar @ T180)
display(Markdown('$\overline{T} ~T_{270^\circ} = $'))
print(Tbar @ T270)
$\overline{T} = $
[[0.25 0.25 0.25 0.25] [0.25 0.25 0.25 0.25] [0.25 0.25 0.25 0.25] [0.25 0.25 0.25 0.25]]
$\overline{T} ~T_{0^\circ} = $
[[0.25 0.25 0.25 0.25] [0.25 0.25 0.25 0.25] [0.25 0.25 0.25 0.25] [0.25 0.25 0.25 0.25]]
$\overline{T} ~T_{90^\circ} = $
[[0.25 0.25 0.25 0.25] [0.25 0.25 0.25 0.25] [0.25 0.25 0.25 0.25] [0.25 0.25 0.25 0.25]]
$\overline{T} ~T_{180^\circ} = $
[[0.25 0.25 0.25 0.25] [0.25 0.25 0.25 0.25] [0.25 0.25 0.25 0.25] [0.25 0.25 0.25 0.25]]
$\overline{T} ~T_{270^\circ} = $
[[0.25 0.25 0.25 0.25] [0.25 0.25 0.25 0.25] [0.25 0.25 0.25 0.25] [0.25 0.25 0.25 0.25]]
Proof: Consider an arbitrary transformation $T_\dagger \in G$. Then $$\bar{T} \circ T_\dagger = \frac{1}{\vert G \vert}\sum_{T \in G} T \circ T_\dagger = \frac{1}{\vert G \vert}\sum_{T' \in G_\dagger} T' ,$$ where we define $G_\dagger = \{T \circ T_\dagger: \forall T \in G \}$.
We call $\mathbb{W}$ a subspace of a vector space $\mathbb{V}$ if $\mathbb{W} \subseteq \mathbb{V}$ and:
No. $\mathbf{0} \notin \mathbb{W}$.
$\mathbb{W}$ is the subset of all vectors on this line. Is this a subspace?
Yes.
$\mathbb{W}$ is the subset of all vectors within this circle. Is this a subspace?
No. Consider any vector $\mathbf{v}$ on the boundary of the circle. Then 2$\mathbf{v}$ is not within the circle.
$\mathbb{W}$ is the subset of all vectors on this plane passing through origin. Is this a subspace?
Yes. For any two vectors $\mathbf{x}, \mathbf{y}$ in the plane, $\mathbf{x} + \mathbf{y}$ is in the plane, as is $\lambda \mathbf{x}$ for all $\lambda \in \mathbb{R}$.
Given a set of vectors $S = \{\mathbf{x}_1, \ldots, \mathbf{x}_n\}$, the span of $S$ is the set of all linear combinations of the vectors in $S$, i.e., $$ \text{span}(S) = \{\lambda_1 \mathbf{x}_1 + \ldots + \lambda_n \mathbf{x}_n ~|~ \lambda_1,\ldots,\lambda_n \in \mathbb{R}\} $$
A set of vectors $B = \{\mathbf{b}_1, \ldots, \mathbf{b}_d\}$ is called the basis of a subspace $\mathbb{U}$ if:
Then $d$ is known as the dimension of subspace $\mathbb{U}$ and is unique.
Our goal is to learn a neuron with
This means that the output of the neuron does not depend on the transformation of $G$ applied to the input.
(Special case) A function $f:\mathbb{R}^m \to \mathbb{R}^m$ (endomorphism) is $G$-equivariant iff $\forall T \in G$ and $\forall {\bf x} \in \mathbb{R}^m$ we have $$f(T {\bf x}) = T f({\bf x}).$$
$$\rho_1 :\mathbb{R}^m \to \mathbb{R}^m \quad \text{and} \quad \rho_2 :\mathbb{R}^d \to \mathbb{R}^d.$$
We will first focus on G-invariant representations.
A subspace $M \subseteq \mathbb{R}^d$, $d \geq 1$, is an invariant subspace of a linear transformation $\bar{T}:\mathbb{R}^d \to \mathbb{R}^d$ if $$ {\bf w}^T \bar{T} \in M, ~\forall {\bf w}^T \in M. $$
For instance. The left 1-eigenspace is an invariant subspace defined as: $$\text{Left-1-Eig}(\bar{T}) = \{ {\bf w} \in \mathbb{R}^d | {\bf w}^T \bar{T} = {\bf w}^T \}.$$
For an eigenvalue $\lambda$, is $\text{$\lambda$-Left-Eig}(\bar{T})$ a subspace of $\mathbb{R}^n$?
$\mathbf{0} \in \text{$\lambda$-Left-Eig}(\bar{T})$
$\mathbf{u}, \mathbf{v} \in \text{$\lambda$-Left-Eig}(\bar{T}) \implies (\mathbf{u} + \mathbf{v})^T \bar{T} = \mathbf{u}^T \bar{T} + \mathbf{v}^T \bar{T} =$ $= \lambda (\mathbf{u} + \mathbf{v})^T \implies (\mathbf{u} + \mathbf{v}) \in \text{$\lambda$-Left-Eig}(\bar{T}).$
$\mathbf{u} \in \text{$\lambda$-Left-Eig}(\bar{T}) \implies \alpha \mathbf{u}^T \bar{T} = \lambda (\alpha \mathbf{u}^T) \implies \alpha \mathbf{u} \in \text{$\lambda$-Left-Eig}(\bar{T})$.
What is the basis of $\text{$\lambda$-Left-Eig}(\bar{T})$?
The Reynolds operator $\overline{T}$ is a projection (i.e., $\overline{T}^2 = \overline{T}$), thus all the eigenvalues of $\overline{T}$ are either 0 or 1.
You can verify this: $\overline{T}^2 = \overline{T} \frac{1}{|G|} \sum_{T \in G} T = \frac{1}{|G|} \sum_{T \in G} \overline{T} T = \frac{1}{\vert G \vert}\sum_{T' \in G_\dagger} T' = \frac{1}{|G|} \sum_{T \in G} T = \overline{T}$, since we saw earlier that $G_\dagger = G$.
If $\mathbf{v}^T \neq \mathbf{0}$ is a left eigenvector of $\overline{T}$ with eigenvalue $\lambda$, then $\mathbf{v}^T \overline{T}^2 = (\lambda \mathbf{v}^T) \overline{T} = \lambda^2 \mathbf{v}^T$. But also, $\mathbf{v}^T \overline{T}^2 = \mathbf{v}^T \overline{T} = \lambda \mathbf{v}^T$. Thus, $\lambda \in \{0,1\}$.
The left eigenspace of $\overline{T}:\mathbb{R}^n \to \mathbb{R}^n$ corresponding to the eigenvalue 1 is given by,
$$ \text{1-Left-Eig}(\overline{T}) = \{\mathbf{w}\in\mathbb{R}^n ~|~ \mathbf{w}^T \overline{T} = \mathbf{w}^T\} \subseteq \mathbb{R}^n $$with the basis $\{\mathbf{v}_1, \ldots \mathbf{v}_d\}$ comprising of linearly independent left eigenvectors (corresponding to eigenvalue $1$) of $\overline{T}$.
In other words, $\text{1-Left-Eig}(\overline{T}) = \text{span}(\{\mathbf{v}_1, \ldots \mathbf{v}_d\}$).
Proposition (G-invariant neuron):
As proposed by (Yarotsky (2018) and Kondor & Trivedi (ICML 2018))
Let $\mathcal{W}$ denote the right eigenspace corresponding to the eigenvalue 1 of the Reynolds operator $\bar{T}$ for the group $G$.
Let ${\bf v}_1,\ldots,{\bf v}_k$ be the left eigenvectors of $\bar{T}$. That is, $$ {\bf v}_i^T \bar{T} = \lambda_i {\bf v}_i, \quad i =1,\ldots,k. $$
For any transformed input ${\bf x}' = T {\bf x}$, $T \in G$,
$$\sigma({\bf w}^T {\bf x}' + b) = \sigma({\bf w}^T {\bf x} + b).$$In other words, the neuron output is the same for ${\bf x}'$ and ${\bf x}$.
Proof:
Image Task: MNIST handwritten digit classification task consisting of $28 \times 28$ images.
Our tranformation group of interest is the $\mathcal{G}_\text{rot} \equiv \{T({\theta})\}_{\theta\in \{0^\circ,90^\circ,180^\circ,270^\circ\}}$, which rotates the image by $\theta$ degrees, and $T({0^\circ})$ is the identity element.
Consider the MNIST dataset of handwritten digits acted upon by $\mathcal{G}_\text{rot}$:
Graph task: Graph classification task consisting of graphs with $n$ nodes
We would like to obtain the Reynolds operator as an $n^2 \times n^2$ matrix operating over the vectorized vector $\mathbf{x} \in \mathbb{R}^{n^2}$ instead.
The transformation groups over images are typically given as black-box functions. For example, $\mathcal{G}_\text{rot}$ is typically given via the numpy.rot90
function over an $n \times n$ image (grayscale for MNIST).
It is easy to construct the Reynolds operator $\overline{T}$ as a function over $n \times n$ images.
def Tbar_function(img):
# Reynolds operator of the rotation group: {T_0, T_90, T_180, T_270}
# img shape: (H, W)
# Outputs x transformed by the Reynolds operator.
transformed_img = np.zeros_like(img)
for rotationDeg in [0, 90, 180, 270]:
transformed_img += np.rot90(img, k=rotationDeg//90)
return transformed_img / 4
img_4_Tbar = Tbar_function(img_4)
fig, axes = plt.subplots(1, 2, figsize=(12//2,9//2))
axes[0].imshow(img_4, cmap='gray')
axes[0].set_title(r'Original $28\times 28$ image', fontsize=16)
axes[0].set_xticks([])
axes[0].set_yticks([])
axes[1].imshow(img_4_Tbar, cmap='gray')
axes[1].set_title(r'Tranformed by $\overline{T}$', fontsize=16)
axes[1].set_xticks([])
axes[1].set_yticks([])
plt.show()
def construct_Tbar_matrix(image_shape):
x_shape = np.prod(image_shape) # Shape of the vectorized image
e = np.eye(x_shape) # Standard basis
Tbar_columns = [] # List of columns of Tbar
for i in range(x_shape):
# Transform e_i by Tbar_function to obtain column i of Tbar
Tbar_column_i = Tbar_function(e[i].reshape(image_shape))
# Append to the list
Tbar_columns.append(Tbar_column_i.reshape(-1))
# Stack all columns to construct Tbar
return np.stack(Tbar_columns, axis=1)
# Construct Tbar for rotation group over 28 x 28 grayscale images (MNIST).
Tbar_matrix = construct_Tbar_matrix(img_4.shape)
# Check that Tbar as a black-box function and Tbar_matrix transform images the same way.
# - Tbar_function acts directly on images.
# - Tbar_matrix is matrix that multiplies with vectorized image.
img_4_Tbar_matrix = (Tbar_matrix @ np.array(img_4).reshape(-1)).reshape(img_4.shape)
fig, axes = plt.subplots(1, 3, figsize=(12, 9))
axes[0].imshow(img_4, cmap='gray')
axes[0].set_title(r'Original $28\times 28$ image', fontsize=16)
axes[0].set_xticks([])
axes[0].set_yticks([])
axes[1].imshow(img_4_Tbar, cmap='gray')
axes[1].set_title(r'Tranformed by $\overline{T}$')
axes[1].set_xticks([])
axes[1].set_yticks([])
axes[2].imshow(img_4_Tbar_matrix, cmap='gray')
axes[2].set_title(r'Tranformed by $\overline{T}$ as a matrix')
axes[2].set_xticks([])
axes[2].set_yticks([])
plt.show()
import itertools
import numpy as np
import math
# Five nodes
n = 5
# Adjacency matrix skeleton
A = np.zeros((n,n))
# Reynolds operator matrix to obtain a permutation-invariant representation for the entire graph
Tbar = np.zeros((n**2,n**2))
# Get Reynolds operator matrix
for i in range(n**2):
#e_i = (0, \ldots, 1, \ldots, 0) in matrix form$
A[i // n, i % n] = 1
for idx in itertools.permutations(list(range(n))):
# Get the joint permutation of rows and columns of matrix A
myT = A[:,idx]
myT = myT[idx,:]
# This is the i-th row of the permutation matrix
Tbar[i,:] += myT.flatten()
# Prepare erase the one and prepare for next indicator value
A[i // n, i % n] = 0
# Reynolds operator is the average (not the sum)
Tbar = Tbar / math.factorial(n)
print('Change the value of n and observe a pattern emerge (i.e., we can directly construct Tbar)')
print(f'The Reynolds operator Tbar is a {Tbar.shape[0]}x{Tbar.shape[1]} matrix:\n {Tbar}')
Change the value of n and observe a pattern emerge (i.e., we can directly construct Tbar) The Reynolds operator Tbar is a 25x25 matrix: [[0.2 0. 0. 0. 0. 0. 0.2 0. 0. 0. 0. 0. 0.2 0. 0. 0. 0. 0. 0.2 0. 0. 0. 0. 0. 0.2 ] [0. 0.05 0.05 0.05 0.05 0.05 0. 0.05 0.05 0.05 0.05 0.05 0. 0.05 0.05 0.05 0.05 0.05 0. 0.05 0.05 0.05 0.05 0.05 0. ] [0. 0.05 0.05 0.05 0.05 0.05 0. 0.05 0.05 0.05 0.05 0.05 0. 0.05 0.05 0.05 0.05 0.05 0. 0.05 0.05 0.05 0.05 0.05 0. ] [0. 0.05 0.05 0.05 0.05 0.05 0. 0.05 0.05 0.05 0.05 0.05 0. 0.05 0.05 0.05 0.05 0.05 0. 0.05 0.05 0.05 0.05 0.05 0. ] [0. 0.05 0.05 0.05 0.05 0.05 0. 0.05 0.05 0.05 0.05 0.05 0. 0.05 0.05 0.05 0.05 0.05 0. 0.05 0.05 0.05 0.05 0.05 0. ] [0. 0.05 0.05 0.05 0.05 0.05 0. 0.05 0.05 0.05 0.05 0.05 0. 0.05 0.05 0.05 0.05 0.05 0. 0.05 0.05 0.05 0.05 0.05 0. ] [0.2 0. 0. 0. 0. 0. 0.2 0. 0. 0. 0. 0. 0.2 0. 0. 0. 0. 0. 0.2 0. 0. 0. 0. 0. 0.2 ] [0. 0.05 0.05 0.05 0.05 0.05 0. 0.05 0.05 0.05 0.05 0.05 0. 0.05 0.05 0.05 0.05 0.05 0. 0.05 0.05 0.05 0.05 0.05 0. ] [0. 0.05 0.05 0.05 0.05 0.05 0. 0.05 0.05 0.05 0.05 0.05 0. 0.05 0.05 0.05 0.05 0.05 0. 0.05 0.05 0.05 0.05 0.05 0. ] [0. 0.05 0.05 0.05 0.05 0.05 0. 0.05 0.05 0.05 0.05 0.05 0. 0.05 0.05 0.05 0.05 0.05 0. 0.05 0.05 0.05 0.05 0.05 0. ] [0. 0.05 0.05 0.05 0.05 0.05 0. 0.05 0.05 0.05 0.05 0.05 0. 0.05 0.05 0.05 0.05 0.05 0. 0.05 0.05 0.05 0.05 0.05 0. ] [0. 0.05 0.05 0.05 0.05 0.05 0. 0.05 0.05 0.05 0.05 0.05 0. 0.05 0.05 0.05 0.05 0.05 0. 0.05 0.05 0.05 0.05 0.05 0. ] [0.2 0. 0. 0. 0. 0. 0.2 0. 0. 0. 0. 0. 0.2 0. 0. 0. 0. 0. 0.2 0. 0. 0. 0. 0. 0.2 ] [0. 0.05 0.05 0.05 0.05 0.05 0. 0.05 0.05 0.05 0.05 0.05 0. 0.05 0.05 0.05 0.05 0.05 0. 0.05 0.05 0.05 0.05 0.05 0. ] [0. 0.05 0.05 0.05 0.05 0.05 0. 0.05 0.05 0.05 0.05 0.05 0. 0.05 0.05 0.05 0.05 0.05 0. 0.05 0.05 0.05 0.05 0.05 0. ] [0. 0.05 0.05 0.05 0.05 0.05 0. 0.05 0.05 0.05 0.05 0.05 0. 0.05 0.05 0.05 0.05 0.05 0. 0.05 0.05 0.05 0.05 0.05 0. ] [0. 0.05 0.05 0.05 0.05 0.05 0. 0.05 0.05 0.05 0.05 0.05 0. 0.05 0.05 0.05 0.05 0.05 0. 0.05 0.05 0.05 0.05 0.05 0. ] [0. 0.05 0.05 0.05 0.05 0.05 0. 0.05 0.05 0.05 0.05 0.05 0. 0.05 0.05 0.05 0.05 0.05 0. 0.05 0.05 0.05 0.05 0.05 0. ] [0.2 0. 0. 0. 0. 0. 0.2 0. 0. 0. 0. 0. 0.2 0. 0. 0. 0. 0. 0.2 0. 0. 0. 0. 0. 0.2 ] [0. 0.05 0.05 0.05 0.05 0.05 0. 0.05 0.05 0.05 0.05 0.05 0. 0.05 0.05 0.05 0.05 0.05 0. 0.05 0.05 0.05 0.05 0.05 0. ] [0. 0.05 0.05 0.05 0.05 0.05 0. 0.05 0.05 0.05 0.05 0.05 0. 0.05 0.05 0.05 0.05 0.05 0. 0.05 0.05 0.05 0.05 0.05 0. ] [0. 0.05 0.05 0.05 0.05 0.05 0. 0.05 0.05 0.05 0.05 0.05 0. 0.05 0.05 0.05 0.05 0.05 0. 0.05 0.05 0.05 0.05 0.05 0. ] [0. 0.05 0.05 0.05 0.05 0.05 0. 0.05 0.05 0.05 0.05 0.05 0. 0.05 0.05 0.05 0.05 0.05 0. 0.05 0.05 0.05 0.05 0.05 0. ] [0. 0.05 0.05 0.05 0.05 0.05 0. 0.05 0.05 0.05 0.05 0.05 0. 0.05 0.05 0.05 0.05 0.05 0. 0.05 0.05 0.05 0.05 0.05 0. ] [0.2 0. 0. 0. 0. 0. 0.2 0. 0. 0. 0. 0. 0.2 0. 0. 0. 0. 0. 0.2 0. 0. 0. 0. 0. 0.2 ]]
Now we can find the left eigenvectors $\mathbf{v}_j \in \mathbb{R}^{n^2}$ of $\overline{T}$ with eigenvalue $\lambda_j =1$.
Recall that our neuron parameters live in the invariant subspace $\mathbb{W} = \text{span}(\{\mathbf{v}_j\}_{\forall j | \lambda_j = 1})$
# Eigenvalues and eigenvectors of a real symmetric matrix Tbar.
img_eigenvalues, img_eigenvectors = np.linalg.eigh(Tbar_matrix)
# Find the rank of Tbar (here, equivalent to the number of eigenvectors with eigenvalue 1).
img_rank = np.linalg.matrix_rank(np.diag(img_eigenvalues), hermitian=True)
print(f'Rank of Tbar_matrix of size {Tbar_matrix.shape[0]} x {Tbar_matrix.shape[1]}: {rank}')
# Eigenvectors with eigenvalue 1 (eigenvalues are arranged in ascending order).
img_one_eigenvectors = img_eigenvectors[:, -img_rank:]
np.savetxt('basis.npy', img_one_eigenvectors)
Rank of Tbar_matrix of size 784 x 784: 196
Lets visualize a few unvectorized eigenvectors:
fig, axes = plt.subplots(2, 4, figsize=(12, 9))
for i in range(8):
img_eigenvector = img_one_eigenvectors[:, i].reshape((28, 28))
axes[i//4][i%4].imshow(img_eigenvector, cmap='gray')
axes[i//4][i%4].set_xticks([])
axes[i//4][i%4].set_yticks([])
plt.tight_layout()
plt.show()
Recall that any vector in a subspace can be identified by a set of coefficients corresponding to the basis of the subspace. In this case, the basis are the eigenvectors.
Any $\mathbf{w} \in \mathcal{W}$ can be written as
for some $\alpha_i \in \mathbb{R}$.
Now lets visualize a random $\mathbf{w}$ unvectorized, i.e., with random scalar values for $\alpha_i$ in equation above:
random_alpha = np.random.rand(196)
random_w_in_eigenspace = img_one_eigenvectors @ random_alpha
fig, axes = plt.subplots(1, 2, figsize=(12//2, 9//2))
axes[0].imshow(random_w_in_eigenspace.reshape((28, 28)), cmap='gray')
axes[0].set_title(r"Random $\mathbf{w} \in \mathcal{W}$")
axes[0].set_xticks([])
axes[0].set_yticks([])
axes[1].imshow(np.rot90(random_w_in_eigenspace.reshape((28, 28)), k=1), cmap='gray')
axes[1].set_title(r"$\mathbf{w}^T T_{90^\circ}$")
axes[1].set_xticks([])
axes[1].set_yticks([])
plt.show()
Any $\mathbf{w} \in \mathcal{W}$ is invariant to $\mathcal{G}_\text{rot}$, i.e., rotations of $90^\circ$ multiples.
# Tbar is the Reynolds operator of the adjacency matrix permutation group
graph_eigenvalues, graph_eigenvectors = np.linalg.eigh(Tbar)
print(f'eigenvalues={graph_eigenvalues}\n')
# Find the rank of Tbar (here, equivalent to the number of eigenvectors with eigenvalue 1).
graph_rank = np.linalg.matrix_rank(np.diag(graph_eigenvalues), hermitian=True)
print(f'Rank of Tbar of size {Tbar.shape[0]} x {Tbar.shape[1]}: {graph_rank}')
# Eigenvectors with eigenvalue 1 (eigenvalues are arranged in ascending order).
graph_one_eigenvectors = graph_eigenvectors[:, -graph_rank:]
eigenvalues=[-4.31809982e-017 -1.61747315e-017 -6.69789612e-032 -5.34118705e-034 -7.58467252e-048 -1.69136529e-049 -2.77989923e-050 -4.57834487e-052 -2.97584285e-064 -2.25538818e-067 -2.35127721e-069 -5.09828814e-080 -1.20036095e-080 -3.07372861e-113 -6.49430519e-146 1.29973049e-145 3.71107657e-096 9.04479804e-064 1.60555814e-052 1.77303730e-048 3.55812649e-034 2.49873277e-016 2.84684090e-016 1.00000000e+000 1.00000000e+000] Rank of Tbar of size 25 x 25: 2
Note that for invariant graph representations, the invariant subspace $\mathbb{W}$ has only two bases (dimension two).
fig, axes = plt.subplots(1, 2, figsize=(12//2, 9//2))
for i in range(2):
graph_eigenvector = graph_one_eigenvectors[:, i].reshape((n, n))
axes[i].imshow(graph_eigenvector, cmap='gray')
axes[i].set_xticks([])
axes[i].set_yticks([])
plt.tight_layout()
plt.show()
The neuron is defined as: $$ \sigma({\bf w}^T {\bf x} + b) \quad \text{ with } \mathbf{w} \in \mathbb{W} \subseteq \mathbb{R}^n\:, $$
show_dot_product(img_4, random_w_in_eigenspace)
show_dot_product(img_4_rot90, random_w_in_eigenspace)
# Random adjacency matrix
A = np.random.rand(n,n)
random_graph_alpha = np.random.rand(2)
random_graph_w_in_eigenspace = graph_one_eigenvectors @ random_graph_alpha
show_dot_product(A,random_graph_w_in_eigenspace,size=n)
# Joint permutation of rows and columns
idx = [3,2,1,0,4]
TA = A[:,idx]
TA = TA[idx,:]
show_dot_product(TA,random_graph_w_in_eigenspace,size=n)
from g_invariance import *
plot_data(train_loader, valid_loader, in_domain_test_loader, rotated_test_loader)
class MLP(nn.Module):
def __init__(self):
super().__init__()
# We will later replace the first layer with a G-invariant layer.
# Rest of the code remains the same.
self.first_layer = nn.Linear(784, 50)
self.mlp = nn.Sequential(
self.first_layer,
nn.ReLU(),
nn.Linear(50, 10)
)
def forward(self, X):
out = self.mlp(X)
return F.log_softmax(out, dim=1)
net = MLP().to(device)
train(net, train_loader, valid_loader, lr=1e-2, momentum=0.5, epochs=10)
[Epoch 0] Validation loss: 2.367385 [Epoch 1] Validation loss: 0.369851 [Epoch 2] Validation loss: 0.299486 [Epoch 3] Validation loss: 0.267263 [Epoch 4] Validation loss: 0.243571 [Epoch 5] Validation loss: 0.223185 [Epoch 6] Validation loss: 0.208289 [Epoch 7] Validation loss: 0.195990 [Epoch 8] Validation loss: 0.184902 [Epoch 9] Validation loss: 0.176478
train_metrics = get_metrics(net, train_loader)
in_domain_test_metrics = get_metrics(net, in_domain_test_loader)
rotated_test_metrics = get_metrics(net, rotated_test_loader)
plot_metrics(train_metrics, in_domain_test_metrics, rotated_test_metrics)
Accuracy and Confusion, for standard NN performance on (test data, etc.).
class GInvariantLayer(nn.Module):
def __init__(self, input_dim, output_dim):
super().__init__()
# Load the left 1-eigenvectors of the Reynolds operator that we computed before.
self.basis = torch.tensor(np.loadtxt('basis.npy'), dtype=torch.float32).T
assert self.basis.shape[1] == input_dim
self.coeffs = nn.Parameter(torch.Tensor(output_dim, self.basis.shape[0], 1))
self.bias = nn.Parameter(torch.Tensor(output_dim))
stdv = 1.0/math.sqrt(output_dim)
self.coeffs.data.uniform_(-stdv, stdv)
self.bias.data.zero_()
def forward(self, X):
# Input shape: torch.Size([minibatch, input_dim])
if self.basis.device != X.device:
self.basis = self.basis.to(X.device)
# Construct weight w \in \mathcal{W} (the left 1-eigenspace)
# using the current learnable coefficients.
# coeffs: (output_dim, n_basis, 1)
# basis : (n_basis, basis_dim)
# result after torch.mul : (output_dim, n_basis, basis_dim)
# result after sum : (output_dim, basis_dim)
weights = torch.mul(self.coeffs, self.basis)
weights = weights.sum(dim=-2)
# Output shape: torch.Size([minibatch, output_dim])
out = X @ weights.T + self.bias
return out
class GInvariantMLP(nn.Module):
def __init__(self):
super().__init__()
# Replace the first layer with a G-invariant layer.
self.first_layer = GInvariantLayer(784, 50)
self.mlp = nn.Sequential(
self.first_layer,
nn.ReLU(),
nn.Linear(50, 10)
)
def forward(self, X):
out = self.mlp(X)
return F.log_softmax(out, dim=1)
g_invariant_net = GInvariantMLP().to(device)
train(g_invariant_net, train_loader, valid_loader, lr=1e-2, momentum=0.5, epochs=10)
[Epoch 0] Validation loss: 2.436452 [Epoch 1] Validation loss: 1.164346 [Epoch 2] Validation loss: 1.075125 [Epoch 3] Validation loss: 1.016847 [Epoch 4] Validation loss: 0.969406 [Epoch 5] Validation loss: 0.918552 [Epoch 6] Validation loss: 0.889736 [Epoch 7] Validation loss: 0.854602 [Epoch 8] Validation loss: 0.826470 [Epoch 9] Validation loss: 0.808643
train_metrics = get_metrics(g_invariant_net, train_loader)
in_domain_test_metrics = get_metrics(g_invariant_net, in_domain_test_loader)
rotated_test_metrics = get_metrics(g_invariant_net, rotated_test_loader)
plot_metrics(train_metrics, in_domain_test_metrics, rotated_test_metrics)
Extensions:
A Lie group is a continuous groups that uses infinitesimal generators
Note that in the above example the data consists of a sequence of $N$ points, defined as $((x_1,y_1,z_1),\ldots,(x_N,y_N,z_N))$:
The 3D rotation group, denoted SO(3), is the group of all rotations about the origin of three-dimensional Euclidean space $\mathbb{R}^{3}$ under the operation of composition.
A matrix exponential $\exp(A)$ is defined as $$\exp(A) =\sum _{k=0}^{\infty }{1 \over k!}A^{k}.$$
The exponential map defines composition of transformations through the matrix exponential:
Then, convince yourself that (just look at the series definition of sine and cosine) $$T_{(0,0,1)}(\theta) =\begin{pmatrix}\cos \theta &-\sin \theta &0\\\sin \theta &\cos \theta &0\\0&0&1\end{pmatrix}=\exp(\theta L_{z}).$$
Also note that $$\exp(A+B) =\sum _{k=0}^{\infty }{1 \over k!}(A+B)^{k} = I + A + B + \frac{A^2}{2} + \frac{A B}{2} + \frac{BA}{2} + \frac{B^2}{2} + \cdots, $$ which is a composition of all transformations $A$ and $B$.
Then, for a rotation $\boldsymbol{\theta} = (\theta_x,\theta_y,\theta_z)$ is $$\begin{aligned}T_{\bf u}(\boldsymbol{\theta})&=\exp {\bigl (}\theta_x L_{\mathbf {x} } + \theta_y L_{\mathbf {y} } + \theta_z L_{\mathbf {z} }{\bigr )}\\&=\exp \left(\left[{\begin{matrix}0&-\theta_z &\theta_y \\\theta_z &0&-\theta_x \\-\theta_y &\theta_x &0\end{matrix}}\right]\right).\end{aligned}$$
The group of rigid transformations in 3D space, SE(3), consider both rotations and translations:
If $T \in \text{SO}(3)$ is a rotation and ${\bf t} \in \mathbb{R}^{3\times 1}$ is a translation, then the 3D rigid transformation is $$C = \begin{bmatrix}T&{\bf t}\\{\bf 0}&1\end{bmatrix} \in \text{SE}(3). $$
Rigid transformation of points: In this matrix representation, the transformation (group action) on 3D points is: $${\bf x}^\mathrm{T}=[x,y,z,1]$$ and then $$C {\bf x} = \begin{bmatrix}T {\bf x} + {\bf t}\\1\end{bmatrix}$$
Let $$G_{1}=\begin{bmatrix}0&0&0&1\\0&0&0&0\\0&0&0&0\\0&0&0&0\end{bmatrix},\quad G_{2}=\begin{bmatrix}0&0&0&0\\0&0&0&1\\0&0&0&0\\0&0&0&0\end{bmatrix},\quad G_{3}=\begin{bmatrix}0&0&0&0\\0&0&0&0\\0&0&0&1\\0&0&0&0\end{bmatrix},\quad G_{4}=\begin{bmatrix}0&0&0&0\\0&0&-1&0\\0&1&0&0\\0&0&0&0\end{bmatrix}, \quad G_{5}=\begin{bmatrix}0&0&1&0\\0&0&0&0\\-1&0&0&0\\0&0&0&0\end{bmatrix}, \quad G_{6}=\begin{bmatrix}0&-1&0&0\\1&0&0&0\\0&0&0&0\\0&0&0&0\end{bmatrix}.$$
A rigid transformation that rotates around the origin by $\theta = (\theta_x,\theta_y,\theta_z)$ angles (on the x,y,z-axis, respectively) and translates by ${\bf u}=(x,y,z)$ is then $$ T_{{\bf u},\theta} = \exp(x G_1 + y G_2+ z G_3 + \theta_x G_4 + \theta_y G_5 + \theta_z G_6 ). $$
The group of similarity transforms in 3D space, Sim(3), has a nearly identical representation to SE(3), with an additional scale factor:
If $T \in \text{SO}(3)$ is a rotation, ${\bf t} \in \mathbb{R}^{3\times 1}$ is a translation, and $s \in \mathbb{R}$ is a scaling, then the 3D similarity transformation is $$S = \begin{bmatrix}T&{\bf t}\\{\bf 0}&s^{-1}\end{bmatrix} \in \text{Sim}(3). $$
Similarity transformation of points: In this matrix representation, the transformation (group action) on 3D points is: $${\bf x}^\mathrm{T}=[x,y,z,1]$$ and then $$S {\bf x} = \begin{bmatrix}T {\bf x} + {\bf t}\\s^{-1}\end{bmatrix} \cong \begin{bmatrix}s (T {\bf x} + {\bf t})\\1\end{bmatrix},$$ where $\cong$ represent congruent angles and segments, which have the same length and angles.
Let $G$ be a group on 3D point clouds (one among SO(3), SE(3), Sim(3)). Those are also called Lie Groups.
We have seen that any group element $g \in G$ can be written in terms of the exponential: $$ \forall g \in G: \quad \quad g = \exp{\Big(\sum_{i=1}^D \alpha_i A_i\Big)}, $$
where $\alpha_i \in \mathbb{R}$, $\forall i \in \{1,..D\}$, and $\{A_i\}_{i=1}^D$ are the generators (e.g., $L_{\bf x}$, $L_{\bf y}$, $L_{\bf z}$ in SO(3)).
The generators $\{A_i\}_{i=1}^D$ are the basis of a vector space called Lie algebra $\mathfrak{g}$.
A group element is most relevant in how it acts as a transformation on a input.
A group representation $$\rho: G \rightarrow \text{GL}(m)$$ associates every group element $g$ to an invertible matrix $\rho(g) \in \mathbb{R}^{m \times m}$ that acts on $\mathbb{R}^{m}$.
The representaton satisfies $\forall g_1, g_2 \in G$, $\rho(g_1 g_2)= \rho(g_1) \rho(g_2)$ and therefore also $\rho(g_1^{-1}) = \rho(g_1)^{-1}$.
A group representation describes how a group transformation acts on $\mathbb{R}^m$.
Therefore every group has multiple representations, each describing transformations on some vector space by invertible linear maps.
Mirroring the group representations, Lie groups have an associated representation of their Lie algebra.
A Lie algebra representation $d\rho: \mathfrak{g} \to \mathfrak{gl}(m)$ is a linear map from the Lie algebra to ${m\times m}$ matrices.
An important result in Lie Theory relates the representation of a Lie Group to the representation of its Lie Algebra $$ \begin{equation}\label{eq:lie_correspondance} \forall A \in \mathfrak{g}: \quad \rho(e^A) = e^{d\rho(A)} \end{equation} $$
We want to characterize all equivariant linear layers $W\in \mathbb{R}^{N_2\times N_1}$ that map from one vector space $V_1$ with representation $\rho_1$ to another vector space $V_2$ with representation $\rho_2$ for a matrix group $G$.
Being equivariant to every transformation in the group means that transforming the input $x \in V_1$ is the same as transforming the output $W x \in V_2$. $$\begin{equation}\tag{10} \label{eq:groupinv} \forall x \in V_1 \quad \forall g \in G: \quad \quad \rho_2(g) W x = W \rho_1(g) x. \end{equation}$$
Since true for all $x$, we have $$\rho_2(g)W\rho_1(g)^{-1}=W,$$ or more abstractly: $$\begin{equation}\tag{11} \label{eq:kro} \forall g\in G:\quad \rho_2(g)\otimes \rho_1(g^{-1})^\top \mathrm{vec}(W)=\mathrm{vec}(W),\end{equation}$$ where $\mathrm{vec}$ flattens the matrix into a vector and $\otimes$ on matrices is the Kronecker product.
The above result is described in Finzi et al. 2021
Let's denote $$\rho(g) = \rho_2(g)\otimes \rho_1(g^{-1})^\top,$$ which is a representation of how $g$ acts on matrices mapping from $V_1\to V_2$.
Therefore, rewriting using $\rho(g)$ and using $v = \mathrm{vec}(W)$ we have $$\begin{equation}\tag{12} \forall g\in G:\quad \rho(g) v = v . \end{equation}$$
Since we have said $$ \forall g \in G: \quad \quad g = \exp{\Big(\sum_{i=1}^D \alpha_i A_i\Big)}, $$ Equation (12) (that is $\forall g\in G: \rho(g) v = v$) becomes $$ \tag{13} \forall g\in G:\quad \rho\Big(\exp{\Big(\sum_{i=1}^D \alpha_i A_i\Big)}\Big) v = v. $$
Which we can rewrite using $$ \forall A \in \mathfrak{g}: \quad \rho(e^A) = e^{d\rho(A)}$$ as $$ \forall \alpha_i: \quad \quad \exp{\Big(\sum_{i=1}^D \alpha_i d\rho(A_i)\Big)}v = v $$
Note that $d\rho(A_i)$ are the called infinitesimal generators.
Taking the derivative with respect to $\alpha_i$ at $\alpha = 0$ we get a constraint for each infinitesimal generator: $$\forall i \in \{1, .. D\}: \quad \quad d\rho(A_i) v = 0. $$
These constrains represent a necessary and sufficient condition (Finzi et al., 2021) and thus characterize all solutions (i.e. all G-equivariant layers).
We can collect all the $D$ constrains into a single matrix $C$, which we can break down into its nullspace spanned by the columns of $Q \in \mathbb{R}^{m \times r}$ and orthogonal complement $P \in \mathbb{R}^{m \times (m-r)}$ $$ C v = \begin{bmatrix} d\rho(A_1) \\ d\rho(A_2) \\ \vdots\\ d\rho(A_D) \end{bmatrix} v = U \begin{bmatrix}\Sigma & 0 \\ 0 & 0 \\ \end{bmatrix} \begin{bmatrix} P^T \\ Q^T \\ \end{bmatrix} v = 0 $$
All symmetric solutions for $v$ must lie in the nullspace of $C$, that is $$ v = Q \beta $$
for any coefficients $\beta \in \mathbb{R}^{r \times 1}$.
This means that if we consider weights $W$ such that $$\text{vec}(W) = Q \beta$$ and only learn the coefficients $\beta$, then $W$ is $G$-equivariant, since by construction $$\rho_2(g) W x = W \rho_1(g) x , \qquad \forall x \in V_1 \: , \: \forall g \in G.$$ For more details, refer to our lecture on G-invariant neural networks.
We have seen a practical method to obtain equivariant layers $W$ (Finzi et al., 2021).
Q: Can arbitrary functions $\phi:V_1 \rightarrow \mathbb{R}$, $\Phi:V_1 \rightarrow V_2$, where $V_1,V_2$ are some vector spaces, be made respectively invariant and equivariant?
A: Yes, arbitrary functions can be made invariant or equivariant by averaging over the group as we have seen in Murphy et al 2019: $$\begin{equation} \tag{14} \psi(X) = \frac{1}{|G|}\sum_{g\in G} \phi(\rho_1(g)X), \quad \Psi(X) = \frac{1}{|G|}\sum_{g\in G}\rho_2(g)^{-1} \Phi(\rho_1(g)X) \end{equation}$$ are $G$-invariant and equivariant, respectively.
The challenge is that when the cardinality of $G$ is large or infinite (e.g., continuous groups such as rotations), then exact averaging is intractable.
In such cases, we can approximate the sum via heuristics or Monte Carlo (MC), thereby sacrificing the exact invariance/equivariance property for computational efficiency.
Puny et al., (2022), propose to replace the group average in Equation (14) with an average over a carefully selected subset $\mathcal{F}(X)\subset G$ while retaining both exact invariance & equivariance and expressive power.
$\mathcal{F}(X)\subset G$ is called a frame, and it is defined as a set valued function $\mathcal{F}:V_1 \rightarrow 2^{G} \setminus \emptyset$.
A frame is $G$-equivariant if $$\begin{equation}\tag{15} \mathcal{F}(\rho_1(g)X)= g \mathcal{F}(X), \quad \forall X\in V_1,\ g\in G,\end{equation}$$ where $g\mathcal{F}(X)=\{g h \ \vert \ h\in \mathcal{F}(X)\}$, and the equality should be understood as equality of sets.
How are equivariant frames useful?
Consider a scenario where an equivariant frame is easy to compute, and its cardinality, $|\mathcal{F}(X)|$, is not too large. Then averaging over the frame, denoted ${\left \langle \cdot \right \rangle}_\mathcal{F}$ is defined by
$$\begin{align} \tag{16} \label{e:inv} {\left \langle \phi \right \rangle}_\mathcal{F}(X) &= \frac{1}{|\mathcal{F}(X)|}\sum_{g\in \mathcal{F}(X)} \phi(\rho_1(g) X) \\ \tag{17} \label{e:equi} {\left \langle \Phi \right \rangle}_\mathcal{F}(X) &= \frac{1}{|\mathcal{F}(X)|}\sum_{g \in \mathcal{F}(X)} \rho_2(g)^{-1} \Phi(\rho_1(g)X) \end{align} $$Theorem 1 (Puny et al., 2022) Let $\mathcal{F}$ be a $G$-equivariant frame, and $\phi:V_1\rightarrow\mathbb{R}$, $\Phi:V_1\rightarrow V_2$ some functions. Then, ${\left \langle \phi \right \rangle}_\mathcal{F}$ is $G$-invariant, while ${\left \langle \Phi \right \rangle}_\mathcal{F}$ is $G$-equivariant.
Therefore we can replace the group averaging operator with an averaging operator over a small subset of the group elements (a frame). Averaging over a frame guarantees exact invariance or equivariance while often being simpler to compute.
Puny et al., (2022) define the frames and instatiate the frame averaging framework for several groups.
Michael Bronstein has a great series of tutorials on graph representation learning.
Linear Algebra Basics, Deep Learning by Ian Goodfellow and Yoshua Bengio and Aaron Courville, MIT Press, 2016
Course in the Theory of Groups, by Robinson, Derek J.S, Springer Graduate Texts in Mathematics, 1996.
Invariant Subspaces of Matrices with Applications, by Israel Gohberg, Peter Lancaster, Leiba Rodman, Society for Industrial and Applied Mathematics, 2006.
(Murphy et al., 2019) Ryan L. Murphy, Balasubramaniam Srinivasan, Vinayak Rao, and Bruno Ribeiro, "Janossy Pooling: Learning Deep Permutation-invariant Functions for Variable-size Inputs", ICLR 2019.
(Mouli and Ribeiro, 2021) S. C. Mouli, Bruno Ribeiro, "Neural networks for learning counterfactual G-invariances from single environments". ICLR 2021.
Dmitry Yarotsky, "Universal approximations of invariant maps by neural networks," Constructive Approximations, 2021
(Sabour et al., 2017) Sara Sabour, Nicholas Frosst, and Geoffrey E. Hinton. "Dynamic routing between capsules." NeurIPS 2017.
(Bahdanau et al., 2015) Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. "Neural machine translation by jointly learning to align and translate." ICLR 2015.
(Zaheer et al., 2017) Manzil Zaheer, Satwik Kottur, Siamak Ravanbakhsh, Barnabas Poczos, Ruslan Salakhutdinov, and Alexander Smola. "Deep sets". NeurIPS 2017.
(Vinyals et al., 2016) Oriol Vinyals, Samy Bengio, and Manjunath Kudlur. Order Matters: Sequence to Sequence for Sets. ICLR, 2016
(Lee et al., 2019) Lee, Juho, Yoonho Lee, Jungtaek Kim, Adam R. Kosiorek, Seungjin Choi, and Yee Whye Teh. "Set transformer: A framework for attention-based permutation-invariant neural networks." ICML 2019.
(Finzi et al., 2021) Marc Finzi, Max Welling, and Andrew Gordon Wilson. "A Practical Method for Constructing Equivariant Multilayer Perceptrons for Arbitrary Matrix Groups". ICML 2021.
(Puny et al., 2022) Omri Puny, Matan Atzmon, Heli Ben-Hamu, Edward J. Smith, Ishan Misra, Aditya Grover, and Yaron Lipman. "Frame Averaging for Invariant and Equivariant Network Design". ICLR 2022.