文章目錄
  1. 1. Convolutional Networks
    1. 1.1. The convolution operation
    2. 1.2. Motivation
    3. 1.3. Pooling
    4. 1.4. Variants of the basic convolution function
    5. 1.5. Data types
    6. 1.6. Efficient convolution algorithms
    7. 1.7. Deep learning history

此书尚未出版,该笔记仅供学习参考,原文见http://www.iro.umontreal.ca/~bengioy/dlbook/

Convolutional Networks

The convolution operation

Toeplitz matrix, Circulant matrix

Motivation

  • Sparse interactions: by making the kernel size smaller than the input.
  • Parameter sharing: using the same parameter for more than one function in model.
  • Equivariant representations: if the input changes, the output changes in the same way $f(g(x))=g(f(x))$, the convolution is not equivariant to some other transformations (change scale or rotation)

    Using convolution is an infinitely strong prior probability distribution over the parameters of a layer — that is the function of the layer should learn contains only local interactions and is equivariant to translation.

The use of convolution constrains the class of functions that the layer can represent. If the necessary function does not have these properties, then using a convolutional layer will cause the model to have high training error.

Matrix multiplication cannot be applied to fixed-shape matrix, but convolution could.

Pooling

CNN consists of three stages, 1.convolution, 2.nonlinear activation (sometimes called detector stage), 3.Pooling.

Pooling helps to make the representation become invariant to small translations of input.
Invariance to local translation can be a very useful property if we care more about whether come feature is present than exactly where it is.

Invariance不关心特征位置只关心特征是否存在。

Using pooling is an infinitely strong prior that the function the layer learns must be invariant to small translations

Variants of the basic convolution function

differ form convolution operation in mathematical literature.

  • it means many application of convolution
  • the input is a grid of vector-valued observation instead of real values.
  • be not guaranteed to be commutative.

three special case of zero-padding

  • valid convolution: kernel is contained within the image. $m-k+1\times m-k+1$
  • same convolution: keep the size of the output equal to the size of the input. $m\times m$
  • full convolution: be not guaranteed to be commutative. $m+k-1\times m+k-1$

tiled convolution

Data types

Efficient convolution algorithms

Parallel computation.

Using Fourier transform accelerate the speed of convolution

Devising faster ways of performing convolution or approximate convolution without harming the accuracy if the model is an active area of research.

In the commercial, deployment is more important than training.

Deep learning history

Conv-Nets are first really successful deep net.

文章目錄
  1. 1. Convolutional Networks
    1. 1.1. The convolution operation
    2. 1.2. Motivation
    3. 1.3. Pooling
    4. 1.4. Variants of the basic convolution function
    5. 1.5. Data types
    6. 1.6. Efficient convolution algorithms
    7. 1.7. Deep learning history