A Complete Beginner's Guide to
G-invariant and G-equivariant Neural Networks

Copyright: Bruno Ribeiro , S. Chandra Mouli and Beatrice Bevilacqua¶

Purdue University¶


If reusing this material, please keep copyright notice above¶


Preface¶

  • In this tutorial we consider the design of G-invariant and G-equivariant neural networks, which are neural networks invariant to transformations (actions) of a transformation group.
    • We will start with G-invariances
    • We will then show that G-equivariances are a simple extension of G-invariances
  • Method and further readings
    • Our exposition follows Mouli & Ribeiro (ICLR 2021):
      • Mouli & Ribeiro (ICLR 2021) uses fundamental but easy-to-understand concepts:
        • Group theory: Reynolds operator
        • Linear algebra: Invariant subspaces, eigenvectors.
      • We also recommend reading Yarotsky (2018) to better understand how (linear algebra) invariances become most-powerful neural networks.

All transformations in this tutorial will be automorphisms¶

  • All transformations used in G-invariant neural networks are endomorphisms
    • If the input is $x \in \Omega$, the transformation $T(x)$ is always an endomorphism: $T:\Omega \to \Omega$.
    • If the transformations are associated with a group (representations of a group), they will be automorphisms.
      • An automorphism is an endomorphism that has an inverse.
    • We will also restrict our attention to linear transformations:
      • $T(x+y) = T(x) + T(y)$ and $T(\alpha x) = \alpha T(x)$ for any scalar $\alpha \in \mathbb{R}$.
      • We recommend reading Murphy et al. (ICLR 2019) and Murphy et al. (ICML 2019) for an application of the Reynolds operator over nonlinear functions.
    • We advise the participant to avoid the use intermediate spaces between the input and the output (e.g., image $\to$ point cloud $\to$ image)

Transformation for G-invariances are always endomorphisms (image example) :¶

Consider a 90$^\circ$ rotation of an image:

where $T_{90^\circ}$ is an $784 \times 784$ matrix that acts on vectorized images $\mathbf{x} \in \mathbb{R}^{784}$ via a matrix-vector product.

  • Clearly, any invertible transformation in the space of vectorized images is invertible in the space of 2d images as well.

Note: Avoid image transformations via point clouds:¶

where $$\tilde{T}_{\theta} = \begin{bmatrix} \text{cos }\theta & -\text{sin }\theta \\ \text{sin }\theta & \text{cos }\theta \end{bmatrix} \:.$$

  • We strongly advise against this approach

G-invariant neuron (Preview)¶

Standard Neuron vs G-invariant neuron¶

Given an input $\mathbf{x} \in \mathbb{R}^n$:

  • The structure of a G-invariant neuron is essentially the same as the standard neuron
  • Except that the weights $\mathbf{w}$ lie in an invariant subspace $\mathbb{W}$ of $\mathbb{R}^n$, as shown in the figure below

Before we look into the details of the invariant subspace $\mathbb{W}$, first lets see how it affects the invariance properties of the G-invariant neuron.

The following images compute $\mathbf{w}^T \mathbf{x}$ and $\mathbf{w}^T (T_{90^\circ}\mathbf{x})$ for a random $\mathbf{w} \in \mathbb{W}$.

  • Note that $\mathbf{w}^T \mathbf{x}$, where $\mathbf{x}$ is the vectorized input, is equivalent to $\text{sum}(\text{unvectorized}(\mathbf{w}^T) \odot \text{unvectorized}(\mathbf{x}))$, where $\odot$ is the element-wise production (Hadamard product)
  • You will need the file g_invariance.py to run the next example: from g_invariance import *
    • The MNIST data will be saved in the folder data/MNIST
In [1]:
from g_invariance import *

rotation_angle = 90   # Experiment with other multiples of 90 degrees.
show_dot_product(img_4, invariant_w)
show_dot_product(np.rot90(img_4, k=rotation_angle//90), invariant_w)

Hence, the output of the neuron $\sigma(\mathbf{w}^T \mathbf{x} + b)$ is the same for the original image and its rotated version (same can be seen for rotation by any multiple of $90^\circ$).

This is because all $\mathbf{w} \in \mathbb{W}$ remain invariant under rotations of $90^\circ$ multiples, i.e., $\mathbf{w}^T = \mathbf{w}^T T_{90^\circ}$.

In [2]:
wT = invariant_w.T
wT_T90 = wT @ T90

fig, axes = plt.subplots(1, 2, figsize=(12//2, 9//2))
axes[0].imshow(wT.reshape((28, 28)), cmap='gray')
axes[0].set_title(r"$\mathbf{w} \in \mathbb{W}$ (unvectorized)")
axes[0].set_xticks([])
axes[0].set_yticks([])

axes[1].imshow(wT_T90.reshape((28, 28)), cmap='gray')
axes[1].set_title(r"$\mathbf{w}^T ~T_{90^\circ}$ (unvectorized)")
axes[1].set_xticks([])
axes[1].set_yticks([])

plt.show()

How can we construct the invariant subspace $\mathbb{W}$?¶

  • The invariant subspace $\mathbb{W}$ is the left eigenspace corresponding to the eigenvalue 1 of the Reynolds operator for the transformation group $\mathcal{G}$.
    • In the previous examples, the group was $\mathcal{G}_\text{rot} := \{T_{\theta}\}_{\theta \in \{0^\circ, 90^\circ, 180^\circ, 270^\circ\}}$ that rotates the image by $\theta$ degrees.

In what follows we describe the concepts need to understand the invariant subspace $\mathbb{W}$.

We begin with a general description of groups and their close relation to symmetries.

  1. What is a group?
    1. Abelian or not-or-Not)
    2. Finite/Infinite groups
    3. General linear group

Then, we show how to construct the Reynolds operator of any finite group by averaging all the transformations in the group. We depict a key property of the Reynolds operator need for our G-invariant neuron.

  1. Reynolds operator of a finite group-Transformation)
  2. Subspaces of a vector space $\mathbb{V}$
    1. Basis and dimension

We construct $\mathbb{W}$ as the left eigenspace of the Reynolds operator and prove the invariance properties of any $\mathbf{w} \in \mathbb{W}$.

  1. Eigenspace of a linear operator T

  2. Invariant subspaces

At the end, we give a step-by-step example of constructing a G-invariant neuron.

  1. Step-by-step examples

    1. For graph inputs
    2. For image inputs
  2. G-equivariant networks


Definition of a Transformation Group¶

  • Relationship between transformation group and symmetries:
    • If $A$ and $B$ are symmetry transformations of an object, then clearly $A\circ B$ will preserve the symmetry.
    • Hence, symmetric transformations form a group

A transformation group consists of a set of transformations $\mathcal{G}$ and a binary composition operation $\circ$ defined between any two elements of $\mathcal{G}$:

  1. (Closure) $\forall A,B \in \mathcal{G}, A \circ B \in \mathcal{G}$.
  1. (Associative) The binary operation is associative: $A \circ (B\circ C) = (A\circ B)\circ C$ and holds for all $A,B,C \in \mathcal{G}$.
  1. (Identity) There exists an element $I \in \mathcal{G}$ with the property that $I \circ A = A = A \circ I$ for all $A \in \mathcal{G}$.
    • $I$ is unique and is called the identity of $\mathcal{G}$.
  1. (Inverse) To each $A \in \mathcal{G}$ there exists $A^{-1} \in \mathcal{G}$ with the property that $A A^{-1} = I = A^{-1} A$.
    • $A^{-1}$ is unique and is called the inverse of $A$.

Example 1:¶

Are all the group properties satisfied?

Yes. For example:

  • $T_{180^\circ} \circ T_{270^\circ} = T_{90^\circ} \in \mathcal{G}_\text{rot}$
  • $T_{180^\circ}^{-1} = T_{180^\circ} \in \mathcal{G}_\text{rot}$
  • (check others)

Example 2:¶

Are all the group properties satisfied?

No. For example:

  • $T_\text{h-flip} \circ T_\text{v-flip} = T_{180^\circ} \notin \overline{\mathcal{G}}_\text{flip}$

Let us add $T_{180^\circ}$ to $\overline{\mathcal{G}}_\text{flip}$ and check again.

Are all the group properties satisfied?

Yes (check).

Types of Groups: Commutative (Abelian) or Noncommutative¶

A group $\mathcal{G}$ is called Abelian or commutative if $A\circ B=B \circ A$ for all $A,B \in \mathcal{G}$.

Example 1:¶

Is this a commutative (abelian) group?

Yes.

Example 2:¶

Are all the group properties satisfied? Yes.

Is this a commutative (abelian) group? No.

For example: $T_\text{h-flip} \circ T_{90^\circ} = T_\text{diag2-flip} \neq T_\text{diag1-flip} = T_{90^\circ} \circ T_\text{h-flip}$

Types of Groups: Finite or Infinite¶

If $\mathcal{G}$ has a finite number of elements it is called a finite group, otherwise it is an infinite group.

Infinite group example:¶

  • This is also Lie group, which we will revisit later

Subgroup of a group¶

$\mathcal{H}$ is called a subgroup of a group $\mathcal{G}$ (denoted $\mathcal{H} \leq \mathcal{G}$ ) if:

  • $\mathcal{H} \subseteq \mathcal{G}$,
  • $\mathcal{H}$ is a group on its own.

An important group: The permutation group¶

  • A permutation group $G$ is a group whose elements are permutations of a given set $A$ and whose group operation is the composition of permutations in G
    • Given a set $A$, a permutation of $A$ is a function $f : A \to A$ which is a bijection. A permutation group of $A$ is a set of permutations of $A$ that forms a group under function composition.
    • We’ll focus specifically on the case when $A = \{1, \ldots, n\}$ for some fixed integer $n \geq 1$. This means each group element will permute this set.
      • For example if $A = \{1, 2, 3\}$ then a permutation $\pi$ might have $\pi(1) = 2$, $\pi(2) = 1$, and $\pi(3) = 3$.
  • Example: Examples of permutation groups include
    • the symmetric group $\mathbb{S}_n$ (of order $n!$),
      • The symmetric group $\mathbb{S}_n$ is the group of all permutations of the set $\{1, 2, \ldots, n\}$.
    • the alternating group $\mathbb{A}_n$ (of order $n!/2$ for $n\geq 2$),
    • the cyclic group $\mathbb{C}_n$ (of order $n$), and
    • the dihedral group $\mathbb{D}_n$ (of order $2n$).

Applications of the Symmetric Group¶

  • A 3D point could is a sequence of $m$ 3D points ${\bf x} \in \mathbb{R}^{m \times 3}$
    • Representations of a point cloud should not depend on the order of the $m$ points in ${\bf x}$.
  • A $n$-size graph is a sequence of $n$ nodes and $n^2$ edges ${\bf A} \in \mathbb{R}^{n\times n \times d}$ (adjacency tensor), where $d \geq 1$ is a feature vector (may include edge and node attributes).
    • Representations of a graph should not depend on to the oder of the $n$ nodes in the adjacency tensor.
Group Actions¶
  • Above we see that we want our final representations to be invariant (equivariant) to the symmetric group $\mathbb{S}_n$.
    • But the symmetric group acts on integer values (i.e., permuting indices), not matrices or tensors.
    • When we say $\pi \in \mathbb{S}_n$ is a permutation, the application of this permutation on an object ${\bf x}$ is denoted **the action of $\pi$ on ${\bf x}$, and written as $\pi \circ {\bf x}$.
    • $\pi \circ {\bf x}$ is formally defined through the concept of group representations (actions of a group on a space), as shown next.

Group Representations¶

For us, the concept of group insofar as it acts as a transformation on an input.

A (linear finite dimensional) group representation $\rho : G \to \text{GL}(m,\mathbb{R})$ associates each $g \in G$ to an invertible matrix $\rho(g) \in \mathbb{R}^{m \times m}$ that acts on $\mathbb{R}^m$.

  • The representation satisfies:
    • for the identity element $ e \in G$: $\rho(e) = I$ is the identity matrix.
    • $\forall g_1,g_2 \in G : \rho(g_1g_2) = \rho(g_1)\rho(g_2),$ and therefore also $\rho(g^{-1}) = \rho(g)^{-1}$.

The representation specifies how objects transform under the group, and can be considered a specification of the type of an object.

  • Set representation example: $\pi \in \mathbb{S}_n$ be a permutation of $\{1,\ldots,n\}$. Define $\rho(\pi)$ as the (linear) group representation of $\pi$ that will permute the 3D points in the sequence ${\bf x} \in \mathbb{R}^{n \times 3}$ according to the permutation provided by $\pi$.

    • Hence, $\text{vec}({\bf x}') = \rho(\pi) \text{vec}({\bf x})$ is a sequence of 3D points permuted according to the permutation given in $\pi$.
  • Graph representation example: $\pi \in \mathbb{S}_n$ be a permutation of $\{1,\ldots,n\}$. Define $\rho(\pi)$ as the (linear) group representation of $\pi$ that will jointly permute the rows an columns of an adjacency matrix ${\bf A} \in \mathbb{R}^{n\times n}$.

    • Hence, $\text{vec}({\bf A}') = \rho(\pi) \text{vec}({\bf A})$ is ${\bf A}$ jointly permuted (rows and columns) according to the permutation given in $\pi$.
    • The math performed on $\text{vec}({\bf A})$ is much simpler than the equivalent math performed on the more common definition of the permutation of a matrix: ${\bf A}' = {\bf P}{\bf A}{\bf P}^T$.

Vectorize Example¶

  • Consider the tensor $\mathbf{J} \in \mathbb{R}^{m \times m \times 3}$ representing an $m \times m$ RGB image.

In this tutorial, we will always operate on vectorized input such as the above.

  • The vectorization is performed by some arbitrary bijective mapping.
    • E.g., For our image above, $\text{vec}: \mathbb{R}^{m \times m \times 3} \to \mathbb{R}^{3 m^2}$, where $\text{vec}$ is a bijective function. Then ${\bf x} = \text{vec}(\mathbf{J})$.

An important group: The General Linear Group¶

$\text{GL}(n,\mathbb{R}) = \{A \in \mathbb{R}^{n \times n} ~:~ A \text{ is invertible}\}$ is a group of $n\times n$ invertible matrices with the binary operation defined as the matrix product.

Satisfies group properties:

  1. (Closure) If $A$ and $B$ are invertible, then the product $AB$ is invertible.
  2. (Associative) Matrix product is associative: $A(BC) = (AB)C$ for all $A, B, C \in \text{GL}(n, \mathbb{R})$.
  3. (Identity) Identity matrix $I_{n \times n}$ is invertible and satisfies $AI = A = IA$ for all $A \in \text{GL}(n, \mathbb{R})$.
  4. (Inverse) If $A$ is invertible, then the inverse $A^{-1}$ is invertible.

Other properties:

  • For $n > 1$, the group is nonabelian, i.e., $\exists A,B \in \text{GL}(n,\mathbb{R})$ such that $AB \neq BA$.
  • Infinite group

More about $\text{GL}(n, \mathbb{R})$¶

Every matrix $A \in \text{GL}(n, \mathbb{R})$ is associated to a bijective linear transformation $T_A: \mathbb{R}^n \to \mathbb{R}^n$ as follows:

$$T_A(\mathbf{x}) = A\mathbf{x}\:, \qquad \forall \mathbf{x} \in \mathbb{R}^n$$

  • Since $A$ is a square matrix of size $n\times n$, the domain and co-domain of $T_A$ are both $\mathbb{R}^n$.
  • $T_A$ is a linear transformation for all $A \in \mathcal{G}$, i.e., for any vectors $\mathbf{x}, \mathbf{x}' \in \mathbb{R}^n$ and any scalar $\alpha \in \mathbb{R}$,
    • $T_A(\mathbf{x} + \mathbf{x}')= T_A(\mathbf{x}) + T_A(\mathbf{x}')$,
    • $T_A(\alpha \mathbf{x}) = \alpha T_A(\mathbf{x})$
  • Since the matrix $A$ is invertible, $T_A$ is a bijective transformation.

Any subgroup of $\text{GL}(n, \mathbb{R})$ is called a linear group.

Example:¶

$$\text{GL}(2, \mathbb{R}) = \Bigg \{ \begin{bmatrix} a & b\\ c & d \end{bmatrix} ~:~ a, b, c, d \in \mathbb{R}, ~ad - bc \neq 0 \Bigg \} \:. $$

Consider $A = \begin{bmatrix} 0 & 1 \\ 1 & 0 \end{bmatrix} \in \text{GL}(2, \mathbb{R})$ and $\mathbf{x} = \begin{bmatrix} x_1 \\ x_2 \end{bmatrix} \in \mathbb{R}^2$. Then the transformation $T_A: \mathbb{R}^2 \to \mathbb{R}^2$ transforms $\mathbf{x}$ as follows: $$T_A\left(\begin{bmatrix} x_1 \\ x_2 \end{bmatrix}\right) = \begin{bmatrix} x_2 \\ x_1 \end{bmatrix}$$

General Linear Transformation Group over Images¶

Consider an input $\mathbf{x} \in \mathbb{R}^{3 m^2}$ representing a vectorized $m \times m$ RGB image.

We consider the general linear group, $\text{GL}(3m^2,\mathbb{R})$. Recall that any matrix $A \in \text{GL}(3m^2, \mathbb{R})$ is an invertible $3m^2 \times 3m^2$ matrix and acts on the vectorized image $x$ via the matrix-vector product $Ax$.

Example:¶

  • $T_{90^\circ}$ is an invertible $3m^2 \times 3m^2$ matrix.
  • $T_{90^\circ}$ permutes pixels of the vectorized image $\mathbf{x}$ corresponding to the $90^\circ$ rotation of the $m\times m$ RGB image.

Q: Is there a transformation in $\text{GL}(3m^2, \mathbb{R})$ equivalent to $T_{45^\circ}$ that rotates an $m\times m$ RGB image by $45^\circ$?

A: No. Note that $T_{45^\circ}$ is not invertible since anything at the corners of the image will be cut from the output.

Q: Can the $45^\circ$ rotation and scaling $T^{(45 \text{ & scale})} \in G(3 m^2,\mathbb{R})$?

A:: No. In the definition of a group, if $A,B \in G$, then $A B \in G$. But that does not happen in this scenario:

Group-Invariant ($G$-Invariant) Transformation¶

  • This part of the tutorial is based on a few works, but our presentation focuses mostly on (Mouli & Ribeiro, ICLR 2021).

  • Let $G$ be a discrete transformation group from $T:\mathbb{R}^m \to \mathbb{R}^m$.
    • $T$ may be a group representation $\rho(g)$, $g \in G$.
  • A transformation $\bar{T}$ is $G$-invariant iff $\forall T \in G$ and $\forall {\bf x} \in \mathbb{R}^m$ we have $$\bar{T} \circ (T \circ {\bf x}) = \bar{T}\circ {\bf x}.$$

    • Since the above is true for all ${\bf x}$, then $$\bar{T} \circ T = \bar{T}.$$
  • Lemma: Reynolds operator (Billik and Rota, 1960). The transformation $$\bar{T} = \frac{1}{|G|} \sum_{T \in G} T,$$ is $G$-invariant.

In [3]:
# Consider a grayscale 2x2 image. The vectorized image has dimension n = 4.
# Consider the rotation group over this image: {T_0, T_90, T_180, T_270}.
# All T_\theta are 4 x 4 invertible matrices.
# We will see later how to construct these transformation matrices.
T0 = np.array([[1., 0., 0., 0.],
               [0., 1., 0., 0.],
               [0., 0., 1., 0.],
               [0., 0., 0., 1.]])

T90 = np.array([[0., 1., 0., 0.],
                [0., 0., 0., 1.],
                [1., 0., 0., 0.],
                [0., 0., 1., 0.]])

T180 = np.array([[0., 0., 0., 1.],
                [0., 0., 1., 0.],
                [0., 1., 0., 0.],
                [1., 0., 0., 0.]])

T270 = np.array([[0., 0., 1., 0.],
                 [1., 0., 0., 0.],
                 [0., 0., 0., 1.],
                 [0., 1., 0., 0.]])
In [4]:
# Show that the transformation above are correct using an example 2x2 image.
img = np.random.rand(2, 2)
vectorized_img = img.reshape(-1)
img_rot90 = (T90 @ vectorized_img).reshape(2, 2)
img_rot180 = (T180 @ vectorized_img).reshape(2, 2)
img_rot270 = (T270 @ vectorized_img).reshape(2, 2)

fig, axes = plt.subplots(1, 4, figsize=(12, 9))
axes[0].imshow(img, cmap='gray')
axes[0].set_title(r'$T_{0^\circ} x$', fontsize=22)
axes[0].set_xticks([])
axes[0].set_yticks([])

axes[1].imshow(img_rot90, cmap='gray')
axes[1].set_title(r'$T_{90^\circ} x$', fontsize=22)
axes[1].set_xticks([])
axes[1].set_yticks([])

axes[2].imshow(img_rot180, cmap='gray')
axes[2].set_title(r'$T_{180^\circ} x$', fontsize=22)
axes[2].set_xticks([])
axes[2].set_yticks([])

axes[3].imshow(img_rot270, cmap='gray')
axes[3].set_title(r'$T_{270^\circ} x$', fontsize=22)
axes[3].set_xticks([])
axes[3].set_yticks([])
plt.show()
In [5]:
# Construct the Reynolds operator Tbar
Tbar = (T0 + T90 + T180 + T270)/4

# Show that Tbar @ T_\theta = Tbar for all T_\theta
display(Markdown('$\overline{T} = $'))
print(Tbar)

display(Markdown('$\overline{T} ~T_{0^\circ} = $'))
print(Tbar @ T0)

display(Markdown('$\overline{T} ~T_{90^\circ} = $'))
print(Tbar @ T90)

display(Markdown('$\overline{T} ~T_{180^\circ} = $'))
print(Tbar @ T180)

display(Markdown('$\overline{T} ~T_{270^\circ} = $'))
print(Tbar @ T270)

$\overline{T} = $

[[0.25 0.25 0.25 0.25]
 [0.25 0.25 0.25 0.25]
 [0.25 0.25 0.25 0.25]
 [0.25 0.25 0.25 0.25]]

$\overline{T} ~T_{0^\circ} = $

[[0.25 0.25 0.25 0.25]
 [0.25 0.25 0.25 0.25]
 [0.25 0.25 0.25 0.25]
 [0.25 0.25 0.25 0.25]]

$\overline{T} ~T_{90^\circ} = $

[[0.25 0.25 0.25 0.25]
 [0.25 0.25 0.25 0.25]
 [0.25 0.25 0.25 0.25]
 [0.25 0.25 0.25 0.25]]

$\overline{T} ~T_{180^\circ} = $

[[0.25 0.25 0.25 0.25]
 [0.25 0.25 0.25 0.25]
 [0.25 0.25 0.25 0.25]
 [0.25 0.25 0.25 0.25]]

$\overline{T} ~T_{270^\circ} = $

[[0.25 0.25 0.25 0.25]
 [0.25 0.25 0.25 0.25]
 [0.25 0.25 0.25 0.25]
 [0.25 0.25 0.25 0.25]]

Proof: Consider an arbitrary transformation $T_\dagger \in G$. Then $$\bar{T} \circ T_\dagger = \frac{1}{\vert G \vert}\sum_{T \in G} T \circ T_\dagger = \frac{1}{\vert G \vert}\sum_{T' \in G_\dagger} T' ,$$ where we define $G_\dagger = \{T \circ T_\dagger: \forall T \in G \}$.

  • Now, in order to prove $\bar{T} \circ T_\dagger = \bar{T}$, we only need to show that $G_\dagger = G$.
  • Since groups are closed under compositions, we have $\forall T \in G$, $T \circ T_\dagger \in G$, and thus $G_\dagger \subseteq G$.
  • Finally, since $T_\dagger$ is a bijection and $T_a \circ T_\dagger = T_b \circ T_\dagger$ only if $T_a = T_b$ for any $T_a, T_b \in G$, it must be that $|G_\dagger| = |G|$.
  • Hence, $G_\dagger = G$.

Subspaces of a vector space $\mathbb{V}$¶

We call $\mathbb{W}$ a subspace of a vector space $\mathbb{V}$ if $\mathbb{W} \subseteq \mathbb{V}$ and:

  1. $\mathbf{0} \in \mathbb{W}$.
  2. $\mathbf{x} \in \mathbb{W} \implies \lambda \mathbf{x} \in \mathbb{W}$ for all $\lambda \in \mathbb{R}$.
  3. $\mathbf{x}, \mathbf{y} \in \mathbb{W} \implies \mathbf{x} + \mathbf{y} \in \mathbb{W}$.

Examples¶

$\mathbb{W}$ is the subset of all vectors on the line. Is this a subspace?

No. $\mathbf{0} \notin \mathbb{W}$.

Examples¶

$\mathbb{W}$ is the subset of all vectors on this line. Is this a subspace?

Yes.

Examples¶

$\mathbb{W}$ is the subset of all vectors within this circle. Is this a subspace?

No. Consider any vector $\mathbf{v}$ on the boundary of the circle. Then 2$\mathbf{v}$ is not within the circle.

Examples¶

$\mathbb{W}$ is the subset of all vectors on this plane passing through origin. Is this a subspace?

Yes. For any two vectors $\mathbf{x}, \mathbf{y}$ in the plane, $\mathbf{x} + \mathbf{y}$ is in the plane, as is $\lambda \mathbf{x}$ for all $\lambda \in \mathbb{R}$.

Basis and dimension of a subspace¶

Span of a set of vectors¶

Given a set of vectors $S = \{\mathbf{x}_1, \ldots, \mathbf{x}_n\}$, the span of $S$ is the set of all linear combinations of the vectors in $S$, i.e., $$ \text{span}(S) = \{\lambda_1 \mathbf{x}_1 + \ldots + \lambda_n \mathbf{x}_n ~|~ \lambda_1,\ldots,\lambda_n \in \mathbb{R}\} $$

Basis of a subspace¶

A set of vectors $B = \{\mathbf{b}_1, \ldots, \mathbf{b}_d\}$ is called the basis of a subspace $\mathbb{U}$ if:

  1. $\mathbb{U} = \text{span}(B)$
  2. $B$ is linearly independent, i.e., there are no values for $\lambda_1, \ldots \lambda_d$ such that $\lambda_1 \mathbf{b}_1 + \ldots + \lambda_d \mathbf{b}_d = \mathbf{0}$.

Then $d$ is known as the dimension of subspace $\mathbb{U}$ and is unique.

$G$-Invariant Representations¶

  • Our goal is to learn a neuron with

    • Weights ${\bf w} \in \mathbb{W}$ on a subspace $\mathbb{W}$
    • Bias $b \in \mathbb{R}$
    • And activation function $\sigma:\mathbb{R} \to \mathbb{R}$.
    • Such that $$\sigma({\bf w}^T x + b) = \sigma({\bf w}^T T x + b) , \quad \forall T \in G. $$
  • This means that the output of the neuron does not depend on the transformation of $G$ applied to the input.

    • We call it a $G$-Invariant representation

Group-Equivaviant ($G$-Equivariant) Transformation¶

  • Equivariance requires that transforming the input is the same as transforming the output
  • Let $G$ be a discrete transformation group from $T:\mathbb{R}^a \to \mathbb{R}^a$.
  • (Special case) A function $f:\mathbb{R}^m \to \mathbb{R}^m$ (endomorphism) is $G$-equivariant iff $\forall T \in G$ and $\forall {\bf x} \in \mathbb{R}^m$ we have $$f(T {\bf x}) = T f({\bf x}).$$

    • (General case) A function $f:\mathbb{R}^m \to \mathbb{R}^d$ is $G$-equivariant iff $\forall g \in G$ and $\forall {\bf x} \in \mathbb{R}^m$ we have $$f(\rho_1(g) {\bf x}) = \rho_2(g) f( {\bf x}),$$ where
      $$\rho_1 :\mathbb{R}^m \to \mathbb{R}^m \quad \text{and} \quad \rho_2 :\mathbb{R}^d \to \mathbb{R}^d.$$ 
  • We will first focus on G-invariant representations.

    • G-equivariant representation will appear at the end of the lecture

First: Understading Invariant Subspaces¶

A subspace $M \subseteq \mathbb{R}^d$, $d \geq 1$, is an invariant subspace of a linear transformation $\bar{T}:\mathbb{R}^d \to \mathbb{R}^d$ if $$ {\bf w}^T \bar{T} \in M, ~\forall {\bf w}^T \in M. $$

For instance. The left 1-eigenspace is an invariant subspace defined as: $$\text{Left-1-Eig}(\bar{T}) = \{ {\bf w} \in \mathbb{R}^d | {\bf w}^T \bar{T} = {\bf w}^T \}.$$

For an eigenvalue $\lambda$, is $\text{$\lambda$-Left-Eig}(\bar{T})$ a subspace of $\mathbb{R}^n$?

  • Yes.

$\mathbf{0} \in \text{$\lambda$-Left-Eig}(\bar{T})$

$\mathbf{u}, \mathbf{v} \in \text{$\lambda$-Left-Eig}(\bar{T}) \implies (\mathbf{u} + \mathbf{v})^T \bar{T} = \mathbf{u}^T \bar{T} + \mathbf{v}^T \bar{T} =$ $= \lambda (\mathbf{u} + \mathbf{v})^T \implies (\mathbf{u} + \mathbf{v}) \in \text{$\lambda$-Left-Eig}(\bar{T}).$

$\mathbf{u} \in \text{$\lambda$-Left-Eig}(\bar{T}) \implies \alpha \mathbf{u}^T \bar{T} = \lambda (\alpha \mathbf{u}^T) \implies \alpha \mathbf{u} \in \text{$\lambda$-Left-Eig}(\bar{T})$.

What is the basis of $\text{$\lambda$-Left-Eig}(\bar{T})$?

  • The set of linearly independent left eigenvectors corresponding to the eigenvalue $\lambda$.

Second: Reynods Operator Eigenvalues¶

  • The Reynolds operator $\overline{T}$ is a projection (i.e., $\overline{T}^2 = \overline{T}$), thus all the eigenvalues of $\overline{T}$ are either 0 or 1.

    • You can verify this: $\overline{T}^2 = \overline{T} \frac{1}{|G|} \sum_{T \in G} T = \frac{1}{|G|} \sum_{T \in G} \overline{T} T = \frac{1}{\vert G \vert}\sum_{T' \in G_\dagger} T' = \frac{1}{|G|} \sum_{T \in G} T = \overline{T}$, since we saw earlier that $G_\dagger = G$.

    • If $\mathbf{v}^T \neq \mathbf{0}$ is a left eigenvector of $\overline{T}$ with eigenvalue $\lambda$, then $\mathbf{v}^T \overline{T}^2 = (\lambda \mathbf{v}^T) \overline{T} = \lambda^2 \mathbf{v}^T$. But also, $\mathbf{v}^T \overline{T}^2 = \mathbf{v}^T \overline{T} = \lambda \mathbf{v}^T$. Thus, $\lambda \in \{0,1\}$.

  • Note:
    • Since $T \in G$, $T:\mathbb{R}^n \to \mathbb{R}^n$, is invertible, it would have $n$ non-zero eigenvalues (full rank).
    • But the average $\overline{T}$ may not be full rank, and can have zero eigenvalues!

Left 1-eigenspace of Reynolds operator¶

The left eigenspace of $\overline{T}:\mathbb{R}^n \to \mathbb{R}^n$ corresponding to the eigenvalue 1 is given by,

$$ \text{1-Left-Eig}(\overline{T}) = \{\mathbf{w}\in\mathbb{R}^n ~|~ \mathbf{w}^T \overline{T} = \mathbf{w}^T\} \subseteq \mathbb{R}^n $$

with the basis $\{\mathbf{v}_1, \ldots \mathbf{v}_d\}$ comprising of linearly independent left eigenvectors (corresponding to eigenvalue $1$) of $\overline{T}$.

In other words, $\text{1-Left-Eig}(\overline{T}) = \text{span}(\{\mathbf{v}_1, \ldots \mathbf{v}_d\}$).

  • Every $\mathbf{w} \in \text{1-Left-Eig}(\overline{T})$ can be expressed as a linear combination $\mathbf{w} = \alpha_1 \mathbf{v}_1 + \ldots + \alpha_d \mathbf{v}_d$.

$G$-Invariant Neuron¶

Proposition (G-invariant neuron):

  • As proposed by (Yarotsky (2018) and Kondor & Trivedi (ICML 2018))

  • Let $\mathcal{W}$ denote the right eigenspace corresponding to the eigenvalue 1 of the Reynolds operator $\bar{T}$ for the group $G$.

  • Let ${\bf v}_1,\ldots,{\bf v}_k$ be the left eigenvectors of $\bar{T}$. That is, $$ {\bf v}_i^T \bar{T} = \lambda_i {\bf v}_i, \quad i =1,\ldots,k. $$

  • Since $\bar{T}$ is a projection operator, $$ {\bf v}_i^T \bar{T} = {\bf v}_i, \quad i =1,\ldots,k. $$
  • Now consider a vector of neuron weights ${\bf w}$.
  • Consider a set of free parameters $\{\alpha_i\}_{i=1}^k$, $\alpha_i \in \mathbb{R}$ and a bias $b \in \mathbb{R}$.
  • Consider a transformed input ${\bf x}' = T {\bf x}$, $T \in G$.
  • Consider an activation function $\sigma:\mathbb{R} \to \mathbb{R}$.
  • Then, the neuron is $G$-invariant:
    $$\sigma({\bf w}^T {\bf x}' + b) = \sigma({\bf w}^T {\bf x} + b).$$

For any transformed input ${\bf x}' = T {\bf x}$, $T \in G$,

$$\sigma({\bf w}^T {\bf x}' + b) = \sigma({\bf w}^T {\bf x} + b).$$

In other words, the neuron output is the same for ${\bf x}'$ and ${\bf x}$.

Proof:

  • Note that since $\bar{T}$ is a projection operator, $\lambda_i = 1$,$i=1,\ldots,k$.
    • Also note that ${\bf w}^T = \sum_{i=1}^{k} \alpha_i {\bf v}_i^T = \sum_{i=1}^{k} \alpha_i {\bf v}_i^T \bar{T} = {\bf w}^T \bar{T} $.
  • Hence, ${\bf w}^T {\bf x}' + b = {\bf w}^T \bar{T} {\bf x}' + b = {\bf w}^T \bar{T}\, T {\bf x} + b = {\bf w}^T \bar{T} {\bf x} + b = {\bf w}^T {\bf x} + b$.
    • Which means the neuron output is the same for ${\bf x}'$ and ${\bf x}$.

$G$-Invariant Neuron¶

  • Note that the parameters are $\{\alpha_i\}_{i=1}^k$ and a bias $b$.
  • The number of parameters is at most the same as the original neuron (since $k\leq d$), but often much smaller.
  • The computation takes $k$ inner products, one element-wise product, and a sum (a little more expensive but GPUs can do this very efficiently).

Step-by-Step Example : Image and Graph inputs¶

  • Image Task: MNIST handwritten digit classification task consisting of $28 \times 28$ images.

    • Our tranformation group of interest is the $\mathcal{G}_\text{rot} \equiv \{T({\theta})\}_{\theta\in \{0^\circ,90^\circ,180^\circ,270^\circ\}}$, which rotates the image by $\theta$ degrees, and $T({0^\circ})$ is the identity element.

      • Where $\mathcal{G}_\text{rot} \in \text{GL}(28^2,\mathbb{R})$ and $T({\theta})$ is a linear representation of the angle $\theta$.
      • The input to our transformation group is the vectorized image $\mathbf{x} \in \mathbb{R}^{784}$.
    • Consider the MNIST dataset of handwritten digits acted upon by $\mathcal{G}_\text{rot}$:

  • Graph task: Graph classification task consisting of graphs with $n$ nodes

    • Our tranformation group of interest is the joint permutation of adjacency matrices $\mathcal{G}_\text{per} \equiv \{T(\pi)\}_{\forall \pi \in \mathbb{S}_n}$,
    • where $\mathbb{S}_n$ is the symmetric group, the set of all permutations of the sequence $(1,\ldots,n)$.
    • The representation $T(\pi):\mathbb{R}^{n^2} \to \mathbb{R}^{n^2} $ takes a vectorized adjacency matrix $A \in \mathbb{R}^{n \times n}$ and outputs a vectorized jointly permuted adjacency matrix, with the same permutation over the rows and columns.
    • The literature sometimes refers to $\mathbb{S}_n$ as the permutation group with respect to graphs.
      • This sometimes confuses readers since $\mathbb{S}_n$ is the symmetric group.
      • The permutation group of a graph is $\mathcal{G}_\text{per} = \{T(\pi)\}_{\forall \pi \in \mathbb{S}_n}$ of the representation $T_\pi$.

Step 1: Obtain Reynolds operator from a black-box (linear) transformation¶

We would like to obtain the Reynolds operator as an $n^2 \times n^2$ matrix operating over the vectorized vector $\mathbf{x} \in \mathbb{R}^{n^2}$ instead.

  • For any $i = 1, \ldots, n^2$, the $i$-th column of $\overline{T}$ is given by $\overline{T}e_i$ where $e_i = (0, \ldots, 1, \ldots, 0)^T$ is the $i$-th standard basis.
    • In other words, we can obtain the $i$-th column of $\overline{T}$ by computing how the black-box version of $\overline{T}$ transforms the $i$-th standard basis.
  • Stack all the columns to obtain $\overline{T}$ as a matrix.

Step 1.1 Example (Image): Obtain Reynolds operator as a matrix¶

The transformation groups over images are typically given as black-box functions. For example, $\mathcal{G}_\text{rot}$ is typically given via the numpy.rot90 function over an $n \times n$ image (grayscale for MNIST).

It is easy to construct the Reynolds operator $\overline{T}$ as a function over $n \times n$ images.

In [6]:
def Tbar_function(img):
    # Reynolds operator of the rotation group: {T_0, T_90, T_180, T_270}
    # img shape: (H, W)
    # Outputs x transformed by the Reynolds operator. 
    
    transformed_img = np.zeros_like(img)
    for rotationDeg in [0, 90, 180, 270]:
        transformed_img += np.rot90(img, k=rotationDeg//90)

    return transformed_img / 4 
In [7]:
img_4_Tbar = Tbar_function(img_4)
fig, axes = plt.subplots(1, 2, figsize=(12//2,9//2))
axes[0].imshow(img_4, cmap='gray')
axes[0].set_title(r'Original $28\times 28$ image', fontsize=16)
axes[0].set_xticks([])
axes[0].set_yticks([])
axes[1].imshow(img_4_Tbar, cmap='gray')
axes[1].set_title(r'Tranformed by $\overline{T}$', fontsize=16)
axes[1].set_xticks([])
axes[1].set_yticks([])
plt.show()

Example (images): The 90$^\circ$ rotation group¶

In [8]:
def construct_Tbar_matrix(image_shape):
    x_shape = np.prod(image_shape)  # Shape of the vectorized image
    e = np.eye(x_shape)             # Standard basis
    Tbar_columns = []               # List of columns of Tbar
    
    for i in range(x_shape):
        # Transform e_i by Tbar_function to obtain column i of Tbar
        Tbar_column_i = Tbar_function(e[i].reshape(image_shape))
        
        # Append to the list
        Tbar_columns.append(Tbar_column_i.reshape(-1))
    
    # Stack all columns to construct Tbar
    return np.stack(Tbar_columns, axis=1)

# Construct Tbar for rotation group over 28 x 28 grayscale images (MNIST). 
Tbar_matrix = construct_Tbar_matrix(img_4.shape)

# Check that Tbar as a black-box function and Tbar_matrix transform images the same way.
# - Tbar_function acts directly on images. 
# - Tbar_matrix is matrix that multiplies with vectorized image.

img_4_Tbar_matrix = (Tbar_matrix @ np.array(img_4).reshape(-1)).reshape(img_4.shape)
fig, axes = plt.subplots(1, 3, figsize=(12, 9))
axes[0].imshow(img_4, cmap='gray')
axes[0].set_title(r'Original $28\times 28$ image', fontsize=16)
axes[0].set_xticks([])
axes[0].set_yticks([])
axes[1].imshow(img_4_Tbar, cmap='gray')
axes[1].set_title(r'Tranformed by $\overline{T}$')
axes[1].set_xticks([])
axes[1].set_yticks([])
axes[2].imshow(img_4_Tbar_matrix, cmap='gray')
axes[2].set_title(r'Tranformed by $\overline{T}$ as a matrix')
axes[2].set_xticks([])
axes[2].set_yticks([])
plt.show()

Step 1.2 Example (Graphs): Obtain Reynolds operator as a matrix¶

  • Our Reynolds operator is the average of all transformations that give adjacency matrix permutations
  • The group $\mathcal{G}$ is the permutation group
In [9]:
import itertools
import numpy as np
import math

# Five nodes
n = 5

# Adjacency matrix skeleton
A = np.zeros((n,n))

# Reynolds operator matrix to obtain a permutation-invariant representation for the entire graph
Tbar = np.zeros((n**2,n**2))

# Get Reynolds operator matrix
for i in range(n**2):
     #e_i = (0, \ldots, 1, \ldots, 0) in matrix form$
    A[i // n, i % n] = 1
    for idx in itertools.permutations(list(range(n))):
        # Get the joint permutation of rows and columns of matrix A        
        myT = A[:,idx]
        myT = myT[idx,:]
        # This is the i-th row of the permutation matrix
        Tbar[i,:] += myT.flatten()

    # Prepare erase the one and prepare for next indicator value
    A[i // n, i % n] = 0

# Reynolds operator is the average (not the sum)
Tbar = Tbar / math.factorial(n)

print('Change the value of n and observe a pattern emerge (i.e., we can directly construct Tbar)')
print(f'The Reynolds operator Tbar is a {Tbar.shape[0]}x{Tbar.shape[1]} matrix:\n {Tbar}')
Change the value of n and observe a pattern emerge (i.e., we can directly construct Tbar)
The Reynolds operator Tbar is a 25x25 matrix:
 [[0.2  0.   0.   0.   0.   0.   0.2  0.   0.   0.   0.   0.   0.2  0.
  0.   0.   0.   0.   0.2  0.   0.   0.   0.   0.   0.2 ]
 [0.   0.05 0.05 0.05 0.05 0.05 0.   0.05 0.05 0.05 0.05 0.05 0.   0.05
  0.05 0.05 0.05 0.05 0.   0.05 0.05 0.05 0.05 0.05 0.  ]
 [0.   0.05 0.05 0.05 0.05 0.05 0.   0.05 0.05 0.05 0.05 0.05 0.   0.05
  0.05 0.05 0.05 0.05 0.   0.05 0.05 0.05 0.05 0.05 0.  ]
 [0.   0.05 0.05 0.05 0.05 0.05 0.   0.05 0.05 0.05 0.05 0.05 0.   0.05
  0.05 0.05 0.05 0.05 0.   0.05 0.05 0.05 0.05 0.05 0.  ]
 [0.   0.05 0.05 0.05 0.05 0.05 0.   0.05 0.05 0.05 0.05 0.05 0.   0.05
  0.05 0.05 0.05 0.05 0.   0.05 0.05 0.05 0.05 0.05 0.  ]
 [0.   0.05 0.05 0.05 0.05 0.05 0.   0.05 0.05 0.05 0.05 0.05 0.   0.05
  0.05 0.05 0.05 0.05 0.   0.05 0.05 0.05 0.05 0.05 0.  ]
 [0.2  0.   0.   0.   0.   0.   0.2  0.   0.   0.   0.   0.   0.2  0.
  0.   0.   0.   0.   0.2  0.   0.   0.   0.   0.   0.2 ]
 [0.   0.05 0.05 0.05 0.05 0.05 0.   0.05 0.05 0.05 0.05 0.05 0.   0.05
  0.05 0.05 0.05 0.05 0.   0.05 0.05 0.05 0.05 0.05 0.  ]
 [0.   0.05 0.05 0.05 0.05 0.05 0.   0.05 0.05 0.05 0.05 0.05 0.   0.05
  0.05 0.05 0.05 0.05 0.   0.05 0.05 0.05 0.05 0.05 0.  ]
 [0.   0.05 0.05 0.05 0.05 0.05 0.   0.05 0.05 0.05 0.05 0.05 0.   0.05
  0.05 0.05 0.05 0.05 0.   0.05 0.05 0.05 0.05 0.05 0.  ]
 [0.   0.05 0.05 0.05 0.05 0.05 0.   0.05 0.05 0.05 0.05 0.05 0.   0.05
  0.05 0.05 0.05 0.05 0.   0.05 0.05 0.05 0.05 0.05 0.  ]
 [0.   0.05 0.05 0.05 0.05 0.05 0.   0.05 0.05 0.05 0.05 0.05 0.   0.05
  0.05 0.05 0.05 0.05 0.   0.05 0.05 0.05 0.05 0.05 0.  ]
 [0.2  0.   0.   0.   0.   0.   0.2  0.   0.   0.   0.   0.   0.2  0.
  0.   0.   0.   0.   0.2  0.   0.   0.   0.   0.   0.2 ]
 [0.   0.05 0.05 0.05 0.05 0.05 0.   0.05 0.05 0.05 0.05 0.05 0.   0.05
  0.05 0.05 0.05 0.05 0.   0.05 0.05 0.05 0.05 0.05 0.  ]
 [0.   0.05 0.05 0.05 0.05 0.05 0.   0.05 0.05 0.05 0.05 0.05 0.   0.05
  0.05 0.05 0.05 0.05 0.   0.05 0.05 0.05 0.05 0.05 0.  ]
 [0.   0.05 0.05 0.05 0.05 0.05 0.   0.05 0.05 0.05 0.05 0.05 0.   0.05
  0.05 0.05 0.05 0.05 0.   0.05 0.05 0.05 0.05 0.05 0.  ]
 [0.   0.05 0.05 0.05 0.05 0.05 0.   0.05 0.05 0.05 0.05 0.05 0.   0.05
  0.05 0.05 0.05 0.05 0.   0.05 0.05 0.05 0.05 0.05 0.  ]
 [0.   0.05 0.05 0.05 0.05 0.05 0.   0.05 0.05 0.05 0.05 0.05 0.   0.05
  0.05 0.05 0.05 0.05 0.   0.05 0.05 0.05 0.05 0.05 0.  ]
 [0.2  0.   0.   0.   0.   0.   0.2  0.   0.   0.   0.   0.   0.2  0.
  0.   0.   0.   0.   0.2  0.   0.   0.   0.   0.   0.2 ]
 [0.   0.05 0.05 0.05 0.05 0.05 0.   0.05 0.05 0.05 0.05 0.05 0.   0.05
  0.05 0.05 0.05 0.05 0.   0.05 0.05 0.05 0.05 0.05 0.  ]
 [0.   0.05 0.05 0.05 0.05 0.05 0.   0.05 0.05 0.05 0.05 0.05 0.   0.05
  0.05 0.05 0.05 0.05 0.   0.05 0.05 0.05 0.05 0.05 0.  ]
 [0.   0.05 0.05 0.05 0.05 0.05 0.   0.05 0.05 0.05 0.05 0.05 0.   0.05
  0.05 0.05 0.05 0.05 0.   0.05 0.05 0.05 0.05 0.05 0.  ]
 [0.   0.05 0.05 0.05 0.05 0.05 0.   0.05 0.05 0.05 0.05 0.05 0.   0.05
  0.05 0.05 0.05 0.05 0.   0.05 0.05 0.05 0.05 0.05 0.  ]
 [0.   0.05 0.05 0.05 0.05 0.05 0.   0.05 0.05 0.05 0.05 0.05 0.   0.05
  0.05 0.05 0.05 0.05 0.   0.05 0.05 0.05 0.05 0.05 0.  ]
 [0.2  0.   0.   0.   0.   0.   0.2  0.   0.   0.   0.   0.   0.2  0.
  0.   0.   0.   0.   0.2  0.   0.   0.   0.   0.   0.2 ]]

Step 2: Left 1-Eigenspace of the Reynolds operator¶

Now we can find the left eigenvectors $\mathbf{v}_j \in \mathbb{R}^{n^2}$ of $\overline{T}$ with eigenvalue $\lambda_j =1$.

Recall that our neuron parameters live in the invariant subspace $\mathbb{W} = \text{span}(\{\mathbf{v}_j\}_{\forall j | \lambda_j = 1})$

Step 2.1 Example (images): The 90$^\circ$ rotation group¶

In [10]:
# Eigenvalues and eigenvectors of a real symmetric matrix Tbar.
img_eigenvalues, img_eigenvectors = np.linalg.eigh(Tbar_matrix)

# Find the rank of Tbar (here, equivalent to the number of eigenvectors with eigenvalue 1).
img_rank = np.linalg.matrix_rank(np.diag(img_eigenvalues), hermitian=True)

print(f'Rank of Tbar_matrix of size {Tbar_matrix.shape[0]} x {Tbar_matrix.shape[1]}: {rank}')

# Eigenvectors with eigenvalue 1 (eigenvalues are arranged in ascending order). 
img_one_eigenvectors = img_eigenvectors[:, -img_rank:]
np.savetxt('basis.npy', img_one_eigenvectors)
Rank of Tbar_matrix of size 784 x 784: 196

Step 2.1 (images): Eigenvectors of the Reynolds Operator¶

Lets visualize a few unvectorized eigenvectors:

In [11]:
fig, axes = plt.subplots(2, 4, figsize=(12, 9))
for i in range(8):
    img_eigenvector = img_one_eigenvectors[:, i].reshape((28, 28))
    axes[i//4][i%4].imshow(img_eigenvector, cmap='gray')
    axes[i//4][i%4].set_xticks([])
    axes[i//4][i%4].set_yticks([])
    plt.tight_layout()
plt.show()

Eigenvectors of the Reynolds Operator¶

Recall that any vector in a subspace can be identified by a set of coefficients corresponding to the basis of the subspace. In this case, the basis are the eigenvectors.

Any $\mathbf{w} \in \mathcal{W}$ can be written as

for some $\alpha_i \in \mathbb{R}$.

Now lets visualize a random $\mathbf{w}$ unvectorized, i.e., with random scalar values for $\alpha_i$ in equation above:

In [12]:
random_alpha = np.random.rand(196)
random_w_in_eigenspace = img_one_eigenvectors @ random_alpha
fig, axes = plt.subplots(1, 2, figsize=(12//2, 9//2))

axes[0].imshow(random_w_in_eigenspace.reshape((28, 28)), cmap='gray')
axes[0].set_title(r"Random $\mathbf{w} \in \mathcal{W}$")
axes[0].set_xticks([])
axes[0].set_yticks([])
axes[1].imshow(np.rot90(random_w_in_eigenspace.reshape((28, 28)), k=1), cmap='gray')
axes[1].set_title(r"$\mathbf{w}^T T_{90^\circ}$")
axes[1].set_xticks([])
axes[1].set_yticks([])
plt.show()

Any $\mathbf{w} \in \mathcal{W}$ is invariant to $\mathcal{G}_\text{rot}$, i.e., rotations of $90^\circ$ multiples.

Step 2.2 Example (graphs): The permutation group¶

In [13]:
# Tbar is the Reynolds operator of the adjacency matrix permutation group

graph_eigenvalues, graph_eigenvectors = np.linalg.eigh(Tbar)

    
print(f'eigenvalues={graph_eigenvalues}\n')

# Find the rank of Tbar (here, equivalent to the number of eigenvectors with eigenvalue 1).
graph_rank = np.linalg.matrix_rank(np.diag(graph_eigenvalues), hermitian=True)

print(f'Rank of Tbar of size {Tbar.shape[0]} x {Tbar.shape[1]}: {graph_rank}')

# Eigenvectors with eigenvalue 1 (eigenvalues are arranged in ascending order). 
graph_one_eigenvectors = graph_eigenvectors[:, -graph_rank:]
eigenvalues=[-4.31809982e-017 -1.61747315e-017 -6.69789612e-032 -5.34118705e-034
 -7.58467252e-048 -1.69136529e-049 -2.77989923e-050 -4.57834487e-052
 -2.97584285e-064 -2.25538818e-067 -2.35127721e-069 -5.09828814e-080
 -1.20036095e-080 -3.07372861e-113 -6.49430519e-146  1.29973049e-145
  3.71107657e-096  9.04479804e-064  1.60555814e-052  1.77303730e-048
  3.55812649e-034  2.49873277e-016  2.84684090e-016  1.00000000e+000
  1.00000000e+000]

Rank of Tbar of size 25 x 25: 2

Note that for invariant graph representations, the invariant subspace $\mathbb{W}$ has only two bases (dimension two).

  • For equivariant graph representations the math is similar but must consider $n^4\times n^4$ transformation groups. We will not cover these for now. You can find some of the bases in (Maron et al. ICLR 2019)
In [14]:
fig, axes = plt.subplots(1, 2, figsize=(12//2, 9//2))
for i in range(2):
    graph_eigenvector = graph_one_eigenvectors[:, i].reshape((n, n))
    axes[i].imshow(graph_eigenvector, cmap='gray')
    axes[i].set_xticks([])
    axes[i].set_yticks([])
    plt.tight_layout()
plt.show()

Step 3: Construct a G-invariant neuron¶

The neuron is defined as: $$ \sigma({\bf w}^T {\bf x} + b) \quad \text{ with } \mathbf{w} \in \mathbb{W} \subseteq \mathbb{R}^n\:, $$

Step 3.1 Example (images): The 90$^\circ$ rotation group¶

  • Recall that for our images we have 196 bases that span the invariant subspace $\mathbb{W}$.
  • Hence, our neuron has $\{\alpha_i\}_{i=1}^{196}$ and bias $b \in \mathbb{R}$ learnable parameters = 197 parameters.
  • Note that this is less than $28 \times 28+ 1 = 785$ parameters of the standard neuron.

Standard Neuron vs G-invariant neuron¶

In [15]:
show_dot_product(img_4, random_w_in_eigenspace)

show_dot_product(img_4_rot90, random_w_in_eigenspace)

Step 3.2 Example (graphs): The permutation group¶

  • Recall that for graphs we have 2 bases that span the invariant subspace $\mathbb{W}$.
  • Hence, our neuron has $\{\alpha_1,\alpha_2\}$ and bias $b \in \mathbb{R}$ learnable parameters.
  • Note that this is a lot less than $n^2 + 1$ parameters of the standard neuron that takes a vectorizes $n \times n$ adjacency matrix.
In [16]:
# Random adjacency matrix 
A = np.random.rand(n,n)

random_graph_alpha = np.random.rand(2)
random_graph_w_in_eigenspace = graph_one_eigenvectors @ random_graph_alpha

show_dot_product(A,random_graph_w_in_eigenspace,size=n)


# Joint permutation of rows and columns
idx = [3,2,1,0,4]
TA = A[:,idx]
TA = TA[idx,:]

show_dot_product(TA,random_graph_w_in_eigenspace,size=n)

Step 4: Stack G-invariant neurons to obtain a G-invariant MLP¶

Image experiments using the MNIST dataset¶

In [17]:
from g_invariance import *
In [18]:
plot_data(train_loader, valid_loader, in_domain_test_loader, rotated_test_loader)

Image experiment: Standard MLP¶

  • The standard MLP is not invariant to transformations (e.g., 90$\circ$ rotations)
  • Now we will learn a standard MLP on upright images.
  • And then test its capabilities:
    • With a in-distribution test, where the images are also upright
    • With an extrapolation, out-of-distribution test data using rotated images
In [19]:
class MLP(nn.Module):
    def __init__(self):
        super().__init__()
        
        # We will later replace the first layer with a G-invariant layer.
        # Rest of the code remains the same.
        self.first_layer = nn.Linear(784, 50)
        
        self.mlp = nn.Sequential(
                self.first_layer,
                nn.ReLU(),
                nn.Linear(50, 10)
        )
                
    def forward(self, X):
        out = self.mlp(X)
        return F.log_softmax(out, dim=1)

    
        
net = MLP().to(device)
In [20]:
train(net, train_loader, valid_loader, lr=1e-2, momentum=0.5, epochs=10)
[Epoch 0] Validation loss: 2.367385
[Epoch 1] Validation loss: 0.369851
[Epoch 2] Validation loss: 0.299486
[Epoch 3] Validation loss: 0.267263
[Epoch 4] Validation loss: 0.243571
[Epoch 5] Validation loss: 0.223185
[Epoch 6] Validation loss: 0.208289
[Epoch 7] Validation loss: 0.195990
[Epoch 8] Validation loss: 0.184902
[Epoch 9] Validation loss: 0.176478
In [21]:
train_metrics = get_metrics(net, train_loader)
in_domain_test_metrics = get_metrics(net, in_domain_test_loader)
rotated_test_metrics = get_metrics(net, rotated_test_loader)

plot_metrics(train_metrics, in_domain_test_metrics, rotated_test_metrics)

Accuracy and Confusion, for standard NN performance on (test data, etc.).


Image experiment: G-invariant MLP¶

  • Our G-invariant MLP is invariant to transformations of the 90$\circ$ rotation group.
  • Now we will learn a the G-invariant MLP on upright images.
  • And then test its capabilities:
    • With a in-distribution test, where the images are also upright
    • With an extrapolation, out-of-distribution test data using rotated images
In [22]:
class GInvariantLayer(nn.Module):
    
    def __init__(self, input_dim, output_dim):
        super().__init__()
        
        # Load the left 1-eigenvectors of the Reynolds operator that we computed before.
        self.basis = torch.tensor(np.loadtxt('basis.npy'), dtype=torch.float32).T
        assert self.basis.shape[1] == input_dim
        
        self.coeffs = nn.Parameter(torch.Tensor(output_dim, self.basis.shape[0], 1))
        self.bias = nn.Parameter(torch.Tensor(output_dim))
        
        stdv = 1.0/math.sqrt(output_dim)
        self.coeffs.data.uniform_(-stdv, stdv)
        self.bias.data.zero_()
    

    def forward(self, X):
        # Input shape: torch.Size([minibatch, input_dim])
        
        if self.basis.device != X.device:
            self.basis = self.basis.to(X.device)

        # Construct weight w \in \mathcal{W} (the left 1-eigenspace)
        #      using the current learnable coefficients.
        # coeffs: (output_dim, n_basis, 1)
        # basis  : (n_basis, basis_dim)
        # result after torch.mul : (output_dim, n_basis, basis_dim)
        # result after sum : (output_dim, basis_dim)
        weights = torch.mul(self.coeffs, self.basis)
        weights = weights.sum(dim=-2)

        # Output shape: torch.Size([minibatch, output_dim])
        out = X @ weights.T + self.bias

        return out
In [23]:
class GInvariantMLP(nn.Module):
    def __init__(self):
        super().__init__()
        
        # Replace the first layer with a G-invariant layer.
        self.first_layer = GInvariantLayer(784, 50)
        
        self.mlp = nn.Sequential(
                self.first_layer,
                nn.ReLU(),
                nn.Linear(50, 10)
        )
                
    def forward(self, X):
        out = self.mlp(X)
        return F.log_softmax(out, dim=1)

        
g_invariant_net = GInvariantMLP().to(device)
In [24]:
train(g_invariant_net, train_loader, valid_loader, lr=1e-2, momentum=0.5, epochs=10)
[Epoch 0] Validation loss: 2.436452
[Epoch 1] Validation loss: 1.164346
[Epoch 2] Validation loss: 1.075125
[Epoch 3] Validation loss: 1.016847
[Epoch 4] Validation loss: 0.969406
[Epoch 5] Validation loss: 0.918552
[Epoch 6] Validation loss: 0.889736
[Epoch 7] Validation loss: 0.854602
[Epoch 8] Validation loss: 0.826470
[Epoch 9] Validation loss: 0.808643
In [25]:
train_metrics = get_metrics(g_invariant_net, train_loader)
in_domain_test_metrics = get_metrics(g_invariant_net, in_domain_test_loader)
rotated_test_metrics = get_metrics(g_invariant_net, rotated_test_loader)

plot_metrics(train_metrics, in_domain_test_metrics, rotated_test_metrics)

Extensions:

  1. (Invariance to multiple groups) What if we want to be invariant to more than one group: $\mathcal{G}_1, \ldots, \mathcal{G}_m$?
    1. Create a group $\mathcal{G}_\text{all} = \{\rho(\pi \circ {\bf a})\}_{\forall {\bf a} \in \mathcal{G}_1\times \cdots \times \mathcal{G}_m},\forall \pi \in \mathbb{S}_m$
    2. Find the invariant subspace of $\mathcal{G}_\text{all}$
    3. If $\rho(\pi \circ {\bf a})$ is invariant to all permutations $\pi \in \mathbb{S}_m$, please see the answer in (Mouli & Ribeiro, 2021)
  2. (Extrapolations are about being as invariant as we can be without contradicting the training data) Given a set of transformation groups $\{\mathcal{G}_1, \ldots, \mathcal{G}_m\}$ we can use the training data to learn:
    1. Which subset of groups we must be sensitive to.
    2. And which subset of groups we can be invariant to without contradicting the training data.
    • Answer in (Mouli & Ribeiro, 2021)

G-equivariant Neurons¶

  • Consider an input in $\mathbb{R}^m$.
  • And the output in $\mathbb{R}^k$.
  • Let ${\bf W} \in \mathbb{R}^{k \times m}$ be the neuron parameters
  • Equivariance requires that transforming the input is the same as transforming the output: $$ {\bf x} \in \mathbb{R}^m, \forall g \in G , \quad \rho_2(g) {\bf W} {\bf x} = {\bf W} \rho_1(g) {\bf x}, $$ where $\rho_1:G \to \mathbb{R}^{m \times m}$ and $\rho_2:G \to \mathbb{R}^{k \times k}$.
  • Since the above is true for all ${\bf x}$, then $$\rho_2(g) {\bf W} \rho_1(g)^{-1} = {\bf W},$$ or equivalently $$\forall g \in G, \qquad \rho_2(g) \otimes \rho_1(g^{-1})^T \text{vec}({\bf W}) = \text{vec}({\bf W}),$$ where vec flattens the matrix into a vector, and so the whole transformation is $\rho_2(g) \otimes \rho_1(g^{-1})^T = \rho_{12}(g)$ that is a representation of how $g$ acts on matrices mapping from $A_1 \to A_2$.
  • Note: Special operators acting on matrices (and tensors):
    • $\oplus$ is the direct sum which concatenates the matrices on the diagonal $$X \oplus Y =\begin{bmatrix} X & 0 \\ 0 & Y\\ \end{bmatrix}$$ and facilitates multiple representations which are acted upon separately.
    • $\otimes$ is the Kronecker product, where for matrices ${\bf A}$ and ${\bf B}$ is defined as $$\mathbf {A} \otimes \mathbf {B} =\begin{bmatrix}a_{11}\mathbf {B} &\cdots &a_{1n}\mathbf {B} \\\vdots &\ddots &\vdots \\a_{m1}\mathbf {B} &\cdots &a_{mn}\mathbf {B} \end{bmatrix}.$$
      • Example (from Wikipedia) $${\begin{bmatrix}1&2\\3&4\\\end{bmatrix}}\otimes {\begin{bmatrix}0&5\\6&7\\\end{bmatrix}}={\begin{bmatrix}1{\begin{bmatrix}0&5\\6&7\\\end{bmatrix}}&2{\begin{bmatrix}0&5\\6&7\\\end{bmatrix}}\\3{\begin{bmatrix}0&5\\6&7\\\end{bmatrix}}&4{\begin{bmatrix}0&5\\6&7\\\end{bmatrix}}\\\end{bmatrix}}={\begin{bmatrix}1\times 0&1\times 5&2\times 0&2\times 5\\1\times 6&1\times 7&2\times 6&2\times 7\\3\times 0&3\times 5&4\times 0&4\times 5\\3\times 6&3\times 7&4\times 6&4\times 7\\\end{bmatrix}}={\begin{bmatrix}0&5&0&10\\6&7&12&14\\0&15&0&20\\18&21&24&28\end{bmatrix}}.$$

Group-Equivariant ($G$-Equivariant) Neurons¶

  • Note that equation $$\forall g \in G, \qquad \rho_2(g) \otimes \rho_1(g^{-1})^T \text{vec}({\bf W}) = \text{vec}({\bf W})$$ has a familiar form.
    • We can rewrite the above equation as $$T {\bf x} = {\bf x},$$ where $T = \rho_2(g) \otimes \rho_1(g^{-1})^T$ and ${\bf x} = \text{vec}({\bf W}).$
  • The above is the equation required for a $G$-invariant representation.
    • We can again use the Reynolds operator to find $\bar{T}$ such that $\forall {\bf x} \in \mathbb{R}^{mk}$ and $$\forall T \in \{\rho_2(g) \otimes \rho_1(g^{-1})^T : \forall g \in G\}$$ such that $$\bar{T}(T {\bf x}) = \bar{T} {\bf x}.$$

Lie Groups¶

A Lie group is a continuous groups that uses infinitesimal generators

  • Lie theory provides a way of analyzing continuous groups in terms of their infinitesimal generators
  • The Lie Algebra $g$ of a Lie Group $G$ (a continuous group that forms a smooth manifold) is the tangent space at the identity $g := T_{id} G \subseteq \mathbb{R}^{m\times m}$, which is a vector space of infinitesimal generators of group transformations from $G$.
  • The exponential map $\exp : g \to G$ maps back to the Lie Group and can be understood through the series $$\exp(A) = \sum_{t=0}^\infty \frac{A^t}{t!}.$$
  • A classic example is the rotation group $G = \text{SO}(m)$ with matrices $\mathbb{R}^{m\times m}$ satisfying $R^T R = I$ and $\det(R) = 1$.

Representations of Point Clouds¶

  • A point cloud is an example of a set input:
    • Object in $\mathbb{R}^3$ is a collection of points, represented by a subset $O \subsetneq \mathbb{R}^3$

Note that in the above example the data consists of a sequence of $N$ points, defined as $((x_1,y_1,z_1),\ldots,(x_N,y_N,z_N))$:

  • The order that we draw the points is irrelevant. Thus representing the sequence should be invariant to permutations of the sequence elements.
    • The sequence is permutation invariant
  • The $(x,y,z)$ values can be jointly rotated around an axis and the object remains the same.
    • The points are jointly rotation invariant
  • The $(x,y,z)$ values can be jointly translated and the object remains the same.
    • The points are jointly translation invariant

3D transformations:¶

  • Transformation of an object: A single mapping $T : O \to O$ which maps the coordinates of points in the object from their original to their final configurations.
  • Rigid Body (Isometric) Transformation: A mapping of each point in $O \subsetneq \mathbb{R}^3$ such that $T : \mathbb{R}^3 \to \mathbb{R}^3$ is called a rigid body transformation if it satisfies the following two properties:
    • The length is preserved: $\Vert T(v) - T(u)\Vert = \Vert v - u \Vert$, for all $u,v \in \mathbb{R}^3$.
    • The cross product is preserved: $T(v \times u) = T(v) \times T(u),$ for all $v, u \in \mathbb{R}^3$.
    • These imply:
      • It preserves the Euclidean distance between every pair of points
      • Angles between vectors are preserved (inner product is preserved)
      • Orthogonal vectors are transformed to orthogonal vectors
      • Orientation is preserved (right-handed coordinate frames are transformed to right-handed coordinate frames)

Rotation Transformations in 3D space (SO(3))¶

The 3D rotation group, denoted SO(3), is the group of all rotations about the origin of three-dimensional Euclidean space $\mathbb{R}^{3}$ under the operation of composition.

  • The group of all $3\times 3$ orthogonal matrices $T$ is denoted O(3)
    • $T^{\mathrm {T} }T=T T^{\mathrm {T} }=I,$
    • Orthogonal matrices preserve length of vectors, since for $\mathbf{v} \in \mathbb{R}^3,$ we have ${\mathbf{v}}^{\mathrm{T}}{\mathbf{v}}=(T{\mathbf{v}})^{\mathrm{T}}(T{\mathbf{v}})={\mathbf{v}}^{\mathrm{T}} T^{\mathrm{T}} T{\mathbf{v}}.$
    • Special Orthogonal 3D group (SO(3)) is an orthogonal group such that $(\text{det} T)^2 = 1$.
  • SO(3) has continuous rotations $T_\theta$, where $\theta \in [0,360)$ degrees.
  • SO(3) is a Lie group (smooth/differentiable manifold)
  • Every rotation can be represented uniquely by an orthogonal matrix $T$ with unit determinant.
  • For instance, a counterclockwise rotation about a positive axis by angle $\theta$ is given by $$\begin{alignedat}{1}T_{(1,0,0)}(\theta )&={\begin{bmatrix}1&0&0\\0&\cos \theta &-\sin \theta \\[3pt]0&\sin \theta &\cos \theta \\[3pt]\end{bmatrix}},\\[6pt]T_{(0,1,0)}(\theta )&={\begin{bmatrix}\cos \theta &0&\sin \theta \\[3pt]0&1&0\\[3pt]-\sin \theta &0&\cos \theta \\\end{bmatrix}},\\[6pt]T_{(0,0,1)}(\theta )&={\begin{bmatrix}\cos \theta &-\sin \theta &0\\[3pt]\sin \theta &\cos \theta &0\\[3pt]0&0&1\\\end{bmatrix}}.\end{alignedat}$$
  • Other rotation matrices can be obtained from the above three rotations using matrix multiplication.

Exponential Map of SO(3)¶

  • A matrix exponential $\exp(A)$ is defined as $$\exp(A) =\sum _{k=0}^{\infty }{1 \over k!}A^{k}.$$

  • The exponential map defines composition of transformations through the matrix exponential:

    • Let $$L_{\mathbf {x} }=\left[{\begin{matrix}0&0&0\\0&0&-1\\0&1&0\end{matrix}}\right],\quad L_{\mathbf {y} }=\left[{\begin{matrix}0&0&1\\0&0&0\\-1&0&0\end{matrix}}\right],\quad L_{\mathbf {z} }=\left[{\begin{matrix}0&-1&0\\1&0&0\\0&0&0\end{matrix}}\right].$$

Then, convince yourself that (just look at the series definition of sine and cosine) $$T_{(0,0,1)}(\theta) =\begin{pmatrix}\cos \theta &-\sin \theta &0\\\sin \theta &\cos \theta &0\\0&0&1\end{pmatrix}=\exp(\theta L_{z}).$$

Also note that $$\exp(A+B) =\sum _{k=0}^{\infty }{1 \over k!}(A+B)^{k} = I + A + B + \frac{A^2}{2} + \frac{A B}{2} + \frac{BA}{2} + \frac{B^2}{2} + \cdots, $$ which is a composition of all transformations $A$ and $B$.

Then, for a rotation $\boldsymbol{\theta} = (\theta_x,\theta_y,\theta_z)$ is $$\begin{aligned}T_{\bf u}(\boldsymbol{\theta})&=\exp {\bigl (}\theta_x L_{\mathbf {x} } + \theta_y L_{\mathbf {y} } + \theta_z L_{\mathbf {z} }{\bigr )}\\&=\exp \left(\left[{\begin{matrix}0&-\theta_z &\theta_y \\\theta_z &0&-\theta_x \\-\theta_y &\theta_x &0\end{matrix}}\right]\right).\end{aligned}$$

3D Rigid Transformation (SE(3))¶

The group of rigid transformations in 3D space, SE(3), consider both rotations and translations:

  • If $T \in \text{SO}(3)$ is a rotation and ${\bf t} \in \mathbb{R}^{3\times 1}$ is a translation, then the 3D rigid transformation is $$C = \begin{bmatrix}T&{\bf t}\\{\bf 0}&1\end{bmatrix} \in \text{SE}(3). $$

  • Rigid transformation of points: In this matrix representation, the transformation (group action) on 3D points is: $${\bf x}^\mathrm{T}=[x,y,z,1]$$ and then $$C {\bf x} = \begin{bmatrix}T {\bf x} + {\bf t}\\1\end{bmatrix}$$

  • Rigid transformation of direction vectors: If ${\bf x}$ is a direction vector, we define ${\bf x}^\mathrm{T}=[x,y,z,0]$ and the translation is ignored.

Exponential Map of SE(3)¶

  • Let $$G_{1}=\begin{bmatrix}0&0&0&1\\0&0&0&0\\0&0&0&0\\0&0&0&0\end{bmatrix},\quad G_{2}=\begin{bmatrix}0&0&0&0\\0&0&0&1\\0&0&0&0\\0&0&0&0\end{bmatrix},\quad G_{3}=\begin{bmatrix}0&0&0&0\\0&0&0&0\\0&0&0&1\\0&0&0&0\end{bmatrix},\quad G_{4}=\begin{bmatrix}0&0&0&0\\0&0&-1&0\\0&1&0&0\\0&0&0&0\end{bmatrix}, \quad G_{5}=\begin{bmatrix}0&0&1&0\\0&0&0&0\\-1&0&0&0\\0&0&0&0\end{bmatrix}, \quad G_{6}=\begin{bmatrix}0&-1&0&0\\1&0&0&0\\0&0&0&0\\0&0&0&0\end{bmatrix}.$$

  • A rigid transformation that rotates around the origin by $\theta = (\theta_x,\theta_y,\theta_z)$ angles (on the x,y,z-axis, respectively) and translates by ${\bf u}=(x,y,z)$ is then $$ T_{{\bf u},\theta} = \exp(x G_1 + y G_2+ z G_3 + \theta_x G_4 + \theta_y G_5 + \theta_z G_6 ). $$

Similarity Transformations in 3D space (Sim(3))¶

  • Similarity transformations are combinations of rigid transformation and scaling.
  • The group of similarity transforms in 3D space, Sim(3), has a nearly identical representation to SE(3), with an additional scale factor:

    • If $T \in \text{SO}(3)$ is a rotation, ${\bf t} \in \mathbb{R}^{3\times 1}$ is a translation, and $s \in \mathbb{R}$ is a scaling, then the 3D similarity transformation is $$S = \begin{bmatrix}T&{\bf t}\\{\bf 0}&s^{-1}\end{bmatrix} \in \text{Sim}(3). $$

    • Similarity transformation of points: In this matrix representation, the transformation (group action) on 3D points is: $${\bf x}^\mathrm{T}=[x,y,z,1]$$ and then $$S {\bf x} = \begin{bmatrix}T {\bf x} + {\bf t}\\s^{-1}\end{bmatrix} \cong \begin{bmatrix}s (T {\bf x} + {\bf t})\\1\end{bmatrix},$$ where $\cong$ represent congruent angles and segments, which have the same length and angles.

Lie group¶

Let $G$ be a group on 3D point clouds (one among SO(3), SE(3), Sim(3)). Those are also called Lie Groups.

We have seen that any group element $g \in G$ can be written in terms of the exponential: $$ \forall g \in G: \quad \quad g = \exp{\Big(\sum_{i=1}^D \alpha_i A_i\Big)}, $$

where $\alpha_i \in \mathbb{R}$, $\forall i \in \{1,..D\}$, and $\{A_i\}_{i=1}^D$ are the generators (e.g., $L_{\bf x}$, $L_{\bf y}$, $L_{\bf z}$ in SO(3)).

The generators $\{A_i\}_{i=1}^D$ are the basis of a vector space called Lie algebra $\mathfrak{g}$.

A group element is most relevant in how it acts as a transformation on a input.

A group representation $$\rho: G \rightarrow \text{GL}(m)$$ associates every group element $g$ to an invertible matrix $\rho(g) \in \mathbb{R}^{m \times m}$ that acts on $\mathbb{R}^{m}$.

The representaton satisfies $\forall g_1, g_2 \in G$, $\rho(g_1 g_2)= \rho(g_1) \rho(g_2)$ and therefore also $\rho(g_1^{-1}) = \rho(g_1)^{-1}$.

A group representation describes how a group transformation acts on $\mathbb{R}^m$.

Therefore every group has multiple representations, each describing transformations on some vector space by invertible linear maps.

Lie algebra¶

Mirroring the group representations, Lie groups have an associated representation of their Lie algebra.

A Lie algebra representation $d\rho: \mathfrak{g} \to \mathfrak{gl}(m)$ is a linear map from the Lie algebra to ${m\times m}$ matrices.

An important result in Lie Theory relates the representation of a Lie Group to the representation of its Lie Algebra $$ \begin{equation}\label{eq:lie_correspondance} \forall A \in \mathfrak{g}: \quad \rho(e^A) = e^{d\rho(A)} \end{equation} $$

G-Equivariant Layers¶

We want to characterize all equivariant linear layers $W\in \mathbb{R}^{N_2\times N_1}$ that map from one vector space $V_1$ with representation $\rho_1$ to another vector space $V_2$ with representation $\rho_2$ for a matrix group $G$.

Being equivariant to every transformation in the group means that transforming the input $x \in V_1$ is the same as transforming the output $W x \in V_2$. $$\begin{equation}\tag{10} \label{eq:groupinv} \forall x \in V_1 \quad \forall g \in G: \quad \quad \rho_2(g) W x = W \rho_1(g) x. \end{equation}$$

Since true for all $x$, we have $$\rho_2(g)W\rho_1(g)^{-1}=W,$$ or more abstractly: $$\begin{equation}\tag{11} \label{eq:kro} \forall g\in G:\quad \rho_2(g)\otimes \rho_1(g^{-1})^\top \mathrm{vec}(W)=\mathrm{vec}(W),\end{equation}$$ where $\mathrm{vec}$ flattens the matrix into a vector and $\otimes$ on matrices is the Kronecker product.

The above result is described in Finzi et al. 2021

Let's denote $$\rho(g) = \rho_2(g)\otimes \rho_1(g^{-1})^\top,$$ which is a representation of how $g$ acts on matrices mapping from $V_1\to V_2$.

Therefore, rewriting using $\rho(g)$ and using $v = \mathrm{vec}(W)$ we have $$\begin{equation}\tag{12} \forall g\in G:\quad \rho(g) v = v . \end{equation}$$

G-Equivariant Layers conditions¶

Since we have said $$ \forall g \in G: \quad \quad g = \exp{\Big(\sum_{i=1}^D \alpha_i A_i\Big)}, $$ Equation (12) (that is $\forall g\in G: \rho(g) v = v$) becomes $$ \tag{13} \forall g\in G:\quad \rho\Big(\exp{\Big(\sum_{i=1}^D \alpha_i A_i\Big)}\Big) v = v. $$

Which we can rewrite using $$ \forall A \in \mathfrak{g}: \quad \rho(e^A) = e^{d\rho(A)}$$ as $$ \forall \alpha_i: \quad \quad \exp{\Big(\sum_{i=1}^D \alpha_i d\rho(A_i)\Big)}v = v $$

Note that $d\rho(A_i)$ are the called infinitesimal generators.

Taking the derivative with respect to $\alpha_i$ at $\alpha = 0$ we get a constraint for each infinitesimal generator: $$\forall i \in \{1, .. D\}: \quad \quad d\rho(A_i) v = 0. $$

These constrains represent a necessary and sufficient condition (Finzi et al., 2021) and thus characterize all solutions (i.e. all G-equivariant layers).

G-Equivariant Layers solutions¶

We can collect all the $D$ constrains into a single matrix $C$, which we can break down into its nullspace spanned by the columns of $Q \in \mathbb{R}^{m \times r}$ and orthogonal complement $P \in \mathbb{R}^{m \times (m-r)}$ $$ C v = \begin{bmatrix} d\rho(A_1) \\ d\rho(A_2) \\ \vdots\\ d\rho(A_D) \end{bmatrix} v = U \begin{bmatrix}\Sigma & 0 \\ 0 & 0 \\ \end{bmatrix} \begin{bmatrix} P^T \\ Q^T \\ \end{bmatrix} v = 0 $$

All symmetric solutions for $v$ must lie in the nullspace of $C$, that is $$ v = Q \beta $$

for any coefficients $\beta \in \mathbb{R}^{r \times 1}$.

This means that if we consider weights $W$ such that $$\text{vec}(W) = Q \beta$$ and only learn the coefficients $\beta$, then $W$ is $G$-equivariant, since by construction $$\rho_2(g) W x = W \rho_1(g) x , \qquad \forall x \in V_1 \: , \: \forall g \in G.$$ For more details, refer to our lecture on G-invariant neural networks.

Invariant and equivariant functions¶

We have seen a practical method to obtain equivariant layers $W$ (Finzi et al., 2021).

Q: Can arbitrary functions $\phi:V_1 \rightarrow \mathbb{R}$, $\Phi:V_1 \rightarrow V_2$, where $V_1,V_2$ are some vector spaces, be made respectively invariant and equivariant?

A: Yes, arbitrary functions can be made invariant or equivariant by averaging over the group as we have seen in Murphy et al 2019: $$\begin{equation} \tag{14} \psi(X) = \frac{1}{|G|}\sum_{g\in G} \phi(\rho_1(g)X), \quad \Psi(X) = \frac{1}{|G|}\sum_{g\in G}\rho_2(g)^{-1} \Phi(\rho_1(g)X) \end{equation}$$ are $G$-invariant and equivariant, respectively.

The challenge is that when the cardinality of $G$ is large or infinite (e.g., continuous groups such as rotations), then exact averaging is intractable.

In such cases, we can approximate the sum via heuristics or Monte Carlo (MC), thereby sacrificing the exact invariance/equivariance property for computational efficiency.

Puny et al., (2022), propose to replace the group average in Equation (14) with an average over a carefully selected subset $\mathcal{F}(X)\subset G$ while retaining both exact invariance & equivariance and expressive power.

Equivariant frame¶

$\mathcal{F}(X)\subset G$ is called a frame, and it is defined as a set valued function $\mathcal{F}:V_1 \rightarrow 2^{G} \setminus \emptyset$.

A frame is $G$-equivariant if $$\begin{equation}\tag{15} \mathcal{F}(\rho_1(g)X)= g \mathcal{F}(X), \quad \forall X\in V_1,\ g\in G,\end{equation}$$ where $g\mathcal{F}(X)=\{g h \ \vert \ h\in \mathcal{F}(X)\}$, and the equality should be understood as equality of sets.

Frame averaging: Input-conditional Reynolds operator¶

How are equivariant frames useful?

Consider a scenario where an equivariant frame is easy to compute, and its cardinality, $|\mathcal{F}(X)|$, is not too large. Then averaging over the frame, denoted ${\left \langle \cdot \right \rangle}_\mathcal{F}$ is defined by

$$\begin{align} \tag{16} \label{e:inv} {\left \langle \phi \right \rangle}_\mathcal{F}(X) &= \frac{1}{|\mathcal{F}(X)|}\sum_{g\in \mathcal{F}(X)} \phi(\rho_1(g) X) \\ \tag{17} \label{e:equi} {\left \langle \Phi \right \rangle}_\mathcal{F}(X) &= \frac{1}{|\mathcal{F}(X)|}\sum_{g \in \mathcal{F}(X)} \rho_2(g)^{-1} \Phi(\rho_1(g)X) \end{align} $$

Theorem 1 (Puny et al., 2022) Let $\mathcal{F}$ be a $G$-equivariant frame, and $\phi:V_1\rightarrow\mathbb{R}$, $\Phi:V_1\rightarrow V_2$ some functions. Then, ${\left \langle \phi \right \rangle}_\mathcal{F}$ is $G$-invariant, while ${\left \langle \Phi \right \rangle}_\mathcal{F}$ is $G$-equivariant.

Therefore we can replace the group averaging operator with an averaging operator over a small subset of the group elements (a frame). Averaging over a frame guarantees exact invariance or equivariance while often being simpler to compute.

Puny et al., (2022) define the frames and instatiate the frame averaging framework for several groups.


Study Guide:¶

  • Michael Bronstein has a great series of tutorials on graph representation learning.

  • Linear Algebra Basics, Deep Learning by Ian Goodfellow and Yoshua Bengio and Aaron Courville, MIT Press, 2016

    • Chapter 2.1 (Scalars, Vectors, Matrices and Tensors)
    • Chapter 2.2 (Multiplying Matrices and Vectors)
    • Chapter 2.3 (Identity and Inverse Matrices)
    • Chapter 2.4 (Linear Dependence and Span)
    • Chapter 2.5 (Norms)
    • Chapter 2.6 (Special Kinds of Matrices and Vectors)
    • Chapter 2.7 (Eigendecomposition)
    • Chapter 2.8 (Singular Value Decomposition)
  • Course in the Theory of Groups, by Robinson, Derek J.S, Springer Graduate Texts in Mathematics, 1996.

    • Chapter 1.1 (Binary Operations, Semigroups, and Groups)
    • Chapter 1.2 (Examples of Groups) Groups of Matrices, of Linear Transformations, of Isometries, of Permutations
    • Chapter 1.3 (Subgroups and Cosets) Subgroups, Intersections, Subgroup Generation, Hasse Diagrams, Normal Subgroups
  • Invariant Subspaces of Matrices with Applications, by Israel Gohberg, Peter Lancaster, Leiba Rodman, Society for Industrial and Applied Mathematics, 2006.

    • Chapter 1.1 (Invariant Subspaces: Definitions and Examples)
    • Chapter 1.2 (Eigenvalues and Eigenvectors)

References¶

(Murphy et al., 2019) Ryan L. Murphy, Balasubramaniam Srinivasan, Vinayak Rao, and Bruno Ribeiro, "Janossy Pooling: Learning Deep Permutation-invariant Functions for Variable-size Inputs", ICLR 2019.

(Mouli and Ribeiro, 2021) S. C. Mouli, Bruno Ribeiro, "Neural networks for learning counterfactual G-invariances from single environments". ICLR 2021.

Dmitry Yarotsky, "Universal approximations of invariant maps by neural networks," Constructive Approximations, 2021

(Sabour et al., 2017) Sara Sabour, Nicholas Frosst, and Geoffrey E. Hinton. "Dynamic routing between capsules." NeurIPS 2017.

(Bahdanau et al., 2015) Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. "Neural machine translation by jointly learning to align and translate." ICLR 2015.

(Zaheer et al., 2017) Manzil Zaheer, Satwik Kottur, Siamak Ravanbakhsh, Barnabas Poczos, Ruslan Salakhutdinov, and Alexander Smola. "Deep sets". NeurIPS 2017.

(Vinyals et al., 2016) Oriol Vinyals, Samy Bengio, and Manjunath Kudlur. Order Matters: Sequence to Sequence for Sets. ICLR, 2016

(Lee et al., 2019) Lee, Juho, Yoonho Lee, Jungtaek Kim, Adam R. Kosiorek, Seungjin Choi, and Yee Whye Teh. "Set transformer: A framework for attention-based permutation-invariant neural networks." ICML 2019.

(Finzi et al., 2021) Marc Finzi, Max Welling, and Andrew Gordon Wilson. "A Practical Method for Constructing Equivariant Multilayer Perceptrons for Arbitrary Matrix Groups". ICML 2021.

(Puny et al., 2022) Omri Puny, Matan Atzmon, Heli Ben-Hamu, Edward J. Smith, Ishan Misra, Aditya Grover, and Yaron Lipman. "Frame Averaging for Invariant and Equivariant Network Design". ICLR 2022.

In [ ]: