Convolutional Networks

The convolution operation

Toeplitz matrix, Circulant matrix

Motivation

Sparse interactions: by making the kernel size smaller than the input.
Parameter sharing: using the same parameter for more than one function in model.
Equivariant representations: if the input changes, the output changes in the same way $f(g(x))=g(f(x))$, the convolution is not equivariant to some other transformations (change scale or rotation)

Using convolution is an infinitely strong prior probability distribution over the parameters of a layer — that is the function of the layer should learn contains only local interactions and is equivariant to translation.

The use of convolution constrains the class of functions that the layer can represent. If the necessary function does not have these properties, then using a convolutional layer will cause the model to have high training error.

Matrix multiplication cannot be applied to fixed-shape matrix, but convolution could.

Pooling

CNN consists of three stages, 1.convolution, 2.nonlinear activation (sometimes called detector stage), 3.Pooling.

Pooling helps to make the representation become invariant to small translations of input.
Invariance to local translation can be a very useful property if we care more about whether come feature is present than exactly where it is.

Invariance不关心特征位置只关心特征是否存在。

Using pooling is an infinitely strong prior that the function the layer learns must be invariant to small translations

Variants of the basic convolution function

differ form convolution operation in mathematical literature.

it means many application of convolution
the input is a grid of vector-valued observation instead of real values.
be not guaranteed to be commutative.

three special case of zero-padding

valid convolution: kernel is contained within the image. $m-k+1\times m-k+1$
same convolution: keep the size of the output equal to the size of the input. $m\times m$
full convolution: be not guaranteed to be commutative. $m+k-1\times m+k-1$

tiled convolution

Data types

Efficient convolution algorithms

Parallel computation.

Using Fourier transform accelerate the speed of convolution

Devising faster ways of performing convolution or approximate convolution without harming the accuracy if the model is an active area of research.

In the commercial, deployment is more important than training.

Deep learning history

Conv-Nets are first really successful deep net.

GanYuFei (甘宇飞)

《Deep Learning》(Bengio)读书笔记3-CNN