Machi e LeaØ i g a d C¡ ÍċöeØ...
Transcript of Machi e LeaØ i g a d C¡ ÍċöeØ...
![Page 1: Machi e LeaØ i g a d C¡ ÍċöeØ Vièi¡control.units.it/pellegrino/attachments/article/100/2017MLCV_UNITS... · Machi e LeaØ i g a d C¡ ÍċöeØ Vièi¡ A i i ö¡ċØ a ¡](https://reader030.fdocuments.es/reader030/viewer/2022040605/5eab564292ebe623524a0ce1/html5/thumbnails/1.jpg)
Machine Learning and Computer VisionA mini tour among concepts and applications
Felice Andrea Pellegrino1 Walter Vanzella2
December 13, 2017
1University of Trieste, Trieste
2Glance Vision Technologies Srl, Trieste
![Page 2: Machi e LeaØ i g a d C¡ ÍċöeØ Vièi¡control.units.it/pellegrino/attachments/article/100/2017MLCV_UNITS... · Machi e LeaØ i g a d C¡ ÍċöeØ Vièi¡ A i i ö¡ċØ a ¡](https://reader030.fdocuments.es/reader030/viewer/2022040605/5eab564292ebe623524a0ce1/html5/thumbnails/2.jpg)
Sources
• Course: Li Fei-Fei, Andrej Karpathy, Justin Johnson, and Serena Yeung. CS n: ConvolutionalNeural Networks for Visual Recognition.Stanford University, 2017.http://cs231n.stanford.edu/
• Book: Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep Learning.MIT Press, 2016.http://www.deeplearningbook.org [freely available online]
2
![Page 3: Machi e LeaØ i g a d C¡ ÍċöeØ Vièi¡control.units.it/pellegrino/attachments/article/100/2017MLCV_UNITS... · Machi e LeaØ i g a d C¡ ÍċöeØ Vièi¡ A i i ö¡ċØ a ¡](https://reader030.fdocuments.es/reader030/viewer/2022040605/5eab564292ebe623524a0ce1/html5/thumbnails/3.jpg)
Introduction
![Page 4: Machi e LeaØ i g a d C¡ ÍċöeØ Vièi¡control.units.it/pellegrino/attachments/article/100/2017MLCV_UNITS... · Machi e LeaØ i g a d C¡ ÍċöeØ Vièi¡ A i i ö¡ċØ a ¡](https://reader030.fdocuments.es/reader030/viewer/2022040605/5eab564292ebe623524a0ce1/html5/thumbnails/4.jpg)
Definitions
• Machine Learning: the science of getting computers to act without being explicitlyprogrammed.
• Learn from data• Make predictions from “similar” data
• Computer Vision: the science of getting the machines “see”. Computer Vision is concerned withthe automatic extraction, analysis and understanding of useful information from a singleimage or a sequence of images.
3
![Page 5: Machi e LeaØ i g a d C¡ ÍċöeØ Vièi¡control.units.it/pellegrino/attachments/article/100/2017MLCV_UNITS... · Machi e LeaØ i g a d C¡ ÍċöeØ Vièi¡ A i i ö¡ċØ a ¡](https://reader030.fdocuments.es/reader030/viewer/2022040605/5eab564292ebe623524a0ce1/html5/thumbnails/5.jpg)
Examples: Detection
Face detection, (Viola and Jones, 2001)
4
![Page 6: Machi e LeaØ i g a d C¡ ÍċöeØ Vièi¡control.units.it/pellegrino/attachments/article/100/2017MLCV_UNITS... · Machi e LeaØ i g a d C¡ ÍċöeØ Vièi¡ A i i ö¡ċØ a ¡](https://reader030.fdocuments.es/reader030/viewer/2022040605/5eab564292ebe623524a0ce1/html5/thumbnails/6.jpg)
Examples: Recognition
SIFT and object recognition, (Lowe, 1999)
Right whale recognition, Kaggle Challenge,https://www.kaggle.com/c/noaa-right-whale-recognition
5
![Page 7: Machi e LeaØ i g a d C¡ ÍċöeØ Vièi¡control.units.it/pellegrino/attachments/article/100/2017MLCV_UNITS... · Machi e LeaØ i g a d C¡ ÍċöeØ Vièi¡ A i i ö¡ċØ a ¡](https://reader030.fdocuments.es/reader030/viewer/2022040605/5eab564292ebe623524a0ce1/html5/thumbnails/7.jpg)
Examples: Segmentation
(Mnih and Hinton, 2010)
(Ronneberger et al., 2015)
6
![Page 8: Machi e LeaØ i g a d C¡ ÍċöeØ Vièi¡control.units.it/pellegrino/attachments/article/100/2017MLCV_UNITS... · Machi e LeaØ i g a d C¡ ÍċöeØ Vièi¡ A i i ö¡ċØ a ¡](https://reader030.fdocuments.es/reader030/viewer/2022040605/5eab564292ebe623524a0ce1/html5/thumbnails/8.jpg)
Examples: Classification
ImageNet (www.image-net.org)• 14M images• 22K categories
• Animal (fish, bird, mammal, invertebrate)• Plant (tree, flower, vegetable)• Instrumentation (utensile, appliance, tool, musical
instrument)• …
• annual ImageNet Large Scale Visual RecognitionChallenge (ILSVRC)
• object detection within an image may be obtained byapplying image classification to many sub-images.
7
![Page 9: Machi e LeaØ i g a d C¡ ÍċöeØ Vièi¡control.units.it/pellegrino/attachments/article/100/2017MLCV_UNITS... · Machi e LeaØ i g a d C¡ ÍċöeØ Vièi¡ A i i ö¡ċØ a ¡](https://reader030.fdocuments.es/reader030/viewer/2022040605/5eab564292ebe623524a0ce1/html5/thumbnails/9.jpg)
Image Classification: the semantic gap
What the computer sees
Class?• dog• hammer• guitar• cat• dolphin• …
8
![Page 10: Machi e LeaØ i g a d C¡ ÍċöeØ Vièi¡control.units.it/pellegrino/attachments/article/100/2017MLCV_UNITS... · Machi e LeaØ i g a d C¡ ÍċöeØ Vièi¡ A i i ö¡ċØ a ¡](https://reader030.fdocuments.es/reader030/viewer/2022040605/5eab564292ebe623524a0ce1/html5/thumbnails/10.jpg)
Image Classification: the challenges
Illumination Occlusion
Deformation
Background Intraclass variation
9
![Page 11: Machi e LeaØ i g a d C¡ ÍċöeØ Vièi¡control.units.it/pellegrino/attachments/article/100/2017MLCV_UNITS... · Machi e LeaØ i g a d C¡ ÍċöeØ Vièi¡ A i i ö¡ċØ a ¡](https://reader030.fdocuments.es/reader030/viewer/2022040605/5eab564292ebe623524a0ce1/html5/thumbnails/11.jpg)
A summer project
10
![Page 12: Machi e LeaØ i g a d C¡ ÍċöeØ Vièi¡control.units.it/pellegrino/attachments/article/100/2017MLCV_UNITS... · Machi e LeaØ i g a d C¡ ÍċöeØ Vièi¡ A i i ö¡ċØ a ¡](https://reader030.fdocuments.es/reader030/viewer/2022040605/5eab564292ebe623524a0ce1/html5/thumbnails/12.jpg)
Solutions i
(Marr, 1982)
(Marr, 1982)
11
![Page 13: Machi e LeaØ i g a d C¡ ÍċöeØ Vièi¡control.units.it/pellegrino/attachments/article/100/2017MLCV_UNITS... · Machi e LeaØ i g a d C¡ ÍċöeØ Vièi¡ A i i ö¡ċØ a ¡](https://reader030.fdocuments.es/reader030/viewer/2022040605/5eab564292ebe623524a0ce1/html5/thumbnails/13.jpg)
Solutions ii
“Pictorial structure” - (Fischer and Elschlager, 1973)
• ad-hoc detectors, tailored for the different parts• ad-hoc relation between parts• detection is formulated as an optimization problem:
• maximize the local matching• minimize the springs’ stretching
12
![Page 14: Machi e LeaØ i g a d C¡ ÍċöeØ Vièi¡control.units.it/pellegrino/attachments/article/100/2017MLCV_UNITS... · Machi e LeaØ i g a d C¡ ÍċöeØ Vièi¡ A i i ö¡ċØ a ¡](https://reader030.fdocuments.es/reader030/viewer/2022040605/5eab564292ebe623524a0ce1/html5/thumbnails/14.jpg)
A data-driven solution: nearest neighbor
A simple idea:
• store all the N pairs of training images and labels (Ij, yj), j = 1 . . .N• define a distance between images d(I1, I2) : I × I −→ R+
• when presented a test image Itest, output label yj where
j = argminj=1...N
d(Itest, Ij)
13
![Page 15: Machi e LeaØ i g a d C¡ ÍċöeØ Vièi¡control.units.it/pellegrino/attachments/article/100/2017MLCV_UNITS... · Machi e LeaØ i g a d C¡ ÍċöeØ Vièi¡ A i i ö¡ċØ a ¡](https://reader030.fdocuments.es/reader030/viewer/2022040605/5eab564292ebe623524a0ce1/html5/thumbnails/15.jpg)
Distances
Examples of distances (Ipj is the p-th pixel of image Ij)
• L1 distance (sum of absolute difference of corresponding pixels)
d(I1, I2) =∑
p|Ip1 − Ip2 |
• L2 distance (Euclidean distance)
d(I1, I2) =
√
∑
p(Ip1 − Ip2)2
14
![Page 16: Machi e LeaØ i g a d C¡ ÍċöeØ Vièi¡control.units.it/pellegrino/attachments/article/100/2017MLCV_UNITS... · Machi e LeaØ i g a d C¡ ÍċöeØ Vièi¡ A i i ö¡ċØ a ¡](https://reader030.fdocuments.es/reader030/viewer/2022040605/5eab564292ebe623524a0ce1/html5/thumbnails/16.jpg)
Training and predicting
training ≡ collecting examples
• prediction is computationally demanding• need to compare the test instance to all the
collected examples
15
![Page 17: Machi e LeaØ i g a d C¡ ÍċöeØ Vièi¡control.units.it/pellegrino/attachments/article/100/2017MLCV_UNITS... · Machi e LeaØ i g a d C¡ ÍċöeØ Vièi¡ A i i ö¡ċØ a ¡](https://reader030.fdocuments.es/reader030/viewer/2022040605/5eab564292ebe623524a0ce1/html5/thumbnails/17.jpg)
Example of results
CIFAR-10 dataset• 10 labels• 50K training images• 10K test images
First 10 neighbors of the test sample (leftmostcolumn)
• perceptually similar• class is often wrong
16
![Page 18: Machi e LeaØ i g a d C¡ ÍċöeØ Vièi¡control.units.it/pellegrino/attachments/article/100/2017MLCV_UNITS... · Machi e LeaØ i g a d C¡ ÍċöeØ Vièi¡ A i i ö¡ċØ a ¡](https://reader030.fdocuments.es/reader030/viewer/2022040605/5eab564292ebe623524a0ce1/html5/thumbnails/18.jpg)
Problems may arise
• pixel-wise distances usually perform poorly• perceptually similar images may represent very different objects• perceptually different images may represent the same class of objects• the distance itself may not explain perceptual differences:
original translated messed up darkened
Same L2 distance from the original image.
17
![Page 19: Machi e LeaØ i g a d C¡ ÍċöeØ Vièi¡control.units.it/pellegrino/attachments/article/100/2017MLCV_UNITS... · Machi e LeaØ i g a d C¡ ÍċöeØ Vièi¡ A i i ö¡ċØ a ¡](https://reader030.fdocuments.es/reader030/viewer/2022040605/5eab564292ebe623524a0ce1/html5/thumbnails/19.jpg)
Linear classifier
• N images, each containing D pixels• K classes• the training set is
T ={
(xi, yi), xi ∈ RD, yi ∈ {1 . . . K}, i = 1 . . .N
}
• define a function f : RD −→ RK
f(xi,W, b) = Wxi + b
• W: weights matrix• b: bias• fj (the j-th component of f), represents the score of class j
• predicted label for test image xtest is
argmaxj
fj(xtest,W, b)
18
![Page 20: Machi e LeaØ i g a d C¡ ÍċöeØ Vièi¡control.units.it/pellegrino/attachments/article/100/2017MLCV_UNITS... · Machi e LeaØ i g a d C¡ ÍċöeØ Vièi¡ A i i ö¡ċØ a ¡](https://reader030.fdocuments.es/reader030/viewer/2022040605/5eab564292ebe623524a0ce1/html5/thumbnails/20.jpg)
Interpretation
19
![Page 21: Machi e LeaØ i g a d C¡ ÍċöeØ Vièi¡control.units.it/pellegrino/attachments/article/100/2017MLCV_UNITS... · Machi e LeaØ i g a d C¡ ÍċöeØ Vièi¡ A i i ö¡ċØ a ¡](https://reader030.fdocuments.es/reader030/viewer/2022040605/5eab564292ebe623524a0ce1/html5/thumbnails/21.jpg)
Training a linear classifier
• need a principled way to select θ = (W, b)• choose θ in order to minimize a properly defined loss:
• define the loss as
J(θ) = 1N∑
iL(f(xi, θ), yi)
︸ ︷︷ ︸
empirical loss
+ λR(θ)︸ ︷︷ ︸
regulatization loss
• many possible choices for L and R, e.g.
L(f(xi, θ), yi) =∑
j =yi
positive if wrong class hashigher score than true class
︷ ︸︸ ︷
max(0, fj(xi, θ)︸ ︷︷ ︸
score ofwrong class
− fyi (xi, θ)︸ ︷︷ ︸
score oftrue class
)
R(θ) =∑
k
∑
lW2
kl
• training becomes anoptimization problem:
minθ
J(θ)
20
![Page 22: Machi e LeaØ i g a d C¡ ÍċöeØ Vièi¡control.units.it/pellegrino/attachments/article/100/2017MLCV_UNITS... · Machi e LeaØ i g a d C¡ ÍċöeØ Vièi¡ A i i ö¡ċØ a ¡](https://reader030.fdocuments.es/reader030/viewer/2022040605/5eab564292ebe623524a0ce1/html5/thumbnails/22.jpg)
A step forward: features
• So far:
inputimage Classifier label
• Feature-based approach:
inputimage Feature extractor Classifier label
feat
ure
vect
or21
![Page 23: Machi e LeaØ i g a d C¡ ÍċöeØ Vièi¡control.units.it/pellegrino/attachments/article/100/2017MLCV_UNITS... · Machi e LeaØ i g a d C¡ ÍċöeØ Vièi¡ A i i ö¡ċØ a ¡](https://reader030.fdocuments.es/reader030/viewer/2022040605/5eab564292ebe623524a0ce1/html5/thumbnails/23.jpg)
Histogram of oriented gradients
Histogram of oriented gradients(HoG) (Dalal and Triggs, 2005)
• Outline:• image divided in a grid of cells• HoG computed for each cell• the descriptor is the input of a linear classifier (SVM)
• Many parameters to be tuned:• spatial band of filters• # of angular bins• # of spatial bins• hyperparameters of the SVM
• Some heuristics:• histogram smoothing• local histogram normalization
22
![Page 24: Machi e LeaØ i g a d C¡ ÍċöeØ Vièi¡control.units.it/pellegrino/attachments/article/100/2017MLCV_UNITS... · Machi e LeaØ i g a d C¡ ÍċöeØ Vièi¡ A i i ö¡ċØ a ¡](https://reader030.fdocuments.es/reader030/viewer/2022040605/5eab564292ebe623524a0ce1/html5/thumbnails/24.jpg)
Deformable part model
Deformable part model (Felzenszwalb et al., 2008)
• a data-driven version of (Fischer and Elschlager, 1973)• objects, parts and relationships between parts are learned from data• multiscale:
• “root object” at coarse scale• parts at fine scale
• each part contributes to the final score• many parameters to be tuned 23
![Page 25: Machi e LeaØ i g a d C¡ ÍċöeØ Vièi¡control.units.it/pellegrino/attachments/article/100/2017MLCV_UNITS... · Machi e LeaØ i g a d C¡ ÍċöeØ Vièi¡ A i i ö¡ċØ a ¡](https://reader030.fdocuments.es/reader030/viewer/2022040605/5eab564292ebe623524a0ce1/html5/thumbnails/25.jpg)
Bag of words
Extraction of the visual vocabulary
Training/inference
• aim: find a “standardized”description of the image content
• local descriptors are extractedfrom the training set:
• local patches• SIFT (Lowe, 2004)• …
• clustering produces “visual words”• each class is characterized by a
histogram of visual words• classification becomes a
“comparison of histograms”
24
![Page 26: Machi e LeaØ i g a d C¡ ÍċöeØ Vièi¡control.units.it/pellegrino/attachments/article/100/2017MLCV_UNITS... · Machi e LeaØ i g a d C¡ ÍċöeØ Vièi¡ A i i ö¡ċØ a ¡](https://reader030.fdocuments.es/reader030/viewer/2022040605/5eab564292ebe623524a0ce1/html5/thumbnails/26.jpg)
Convolution
• most local image features, for instance thedirectional derivatives, can be detected bylinear spatial filtering.
• convolution operator:
g(i, j) =∑
k,l
f(i + k, j + l)h(k, l)
original vertical Prewitt filtered:[
−1 0 1−1 0 1−1 0 1
]
horizontal Prewitt filtered:[
−1 −1 −10 0 01 1 1
]
gradient magnitude√
(
∂I∂x)2
+(
∂I∂y
)2
25
![Page 27: Machi e LeaØ i g a d C¡ ÍċöeØ Vièi¡control.units.it/pellegrino/attachments/article/100/2017MLCV_UNITS... · Machi e LeaØ i g a d C¡ ÍċöeØ Vièi¡ A i i ö¡ċØ a ¡](https://reader030.fdocuments.es/reader030/viewer/2022040605/5eab564292ebe623524a0ce1/html5/thumbnails/27.jpg)
A bunch of keywords
26
![Page 28: Machi e LeaØ i g a d C¡ ÍċöeØ Vièi¡control.units.it/pellegrino/attachments/article/100/2017MLCV_UNITS... · Machi e LeaØ i g a d C¡ ÍċöeØ Vièi¡ A i i ö¡ċØ a ¡](https://reader030.fdocuments.es/reader030/viewer/2022040605/5eab564292ebe623524a0ce1/html5/thumbnails/28.jpg)
Convolutional Neural Networks
• In 2012, a Convolutional Neural Network (CNN) won the ImageNet challenge by a wide margin:
Imagenet classification with deep convolutional neural networks, (Krizhevsky et al., 2012)
• a CNN is an Artificial Neural Networks (ANN) having a particular architecture, developedspecifically for dealing with images
• nowadays, CNNs are employed even for dealing with information other than images• in the last few years, the term “Deep Learning”” has become popular to identify the approach
to machine learning based on ANN having many layers (“deep”)
27
![Page 29: Machi e LeaØ i g a d C¡ ÍċöeØ Vièi¡control.units.it/pellegrino/attachments/article/100/2017MLCV_UNITS... · Machi e LeaØ i g a d C¡ ÍċöeØ Vièi¡ A i i ö¡ċØ a ¡](https://reader030.fdocuments.es/reader030/viewer/2022040605/5eab564292ebe623524a0ce1/html5/thumbnails/29.jpg)
Artificial Neural Networks
![Page 30: Machi e LeaØ i g a d C¡ ÍċöeØ Vièi¡control.units.it/pellegrino/attachments/article/100/2017MLCV_UNITS... · Machi e LeaØ i g a d C¡ ÍċöeØ Vièi¡ A i i ö¡ċØ a ¡](https://reader030.fdocuments.es/reader030/viewer/2022040605/5eab564292ebe623524a0ce1/html5/thumbnails/30.jpg)
A step backward: Artificial Neural Networks I
• a network of artificial neurons (McCulloch and Pitts, 1943)• biological analogy (with caution)• the output of a neuron is obtained by applying an activation function to the weighted sum of
its inputs
28
![Page 31: Machi e LeaØ i g a d C¡ ÍċöeØ Vièi¡control.units.it/pellegrino/attachments/article/100/2017MLCV_UNITS... · Machi e LeaØ i g a d C¡ ÍċöeØ Vièi¡ A i i ö¡ċØ a ¡](https://reader030.fdocuments.es/reader030/viewer/2022040605/5eab564292ebe623524a0ce1/html5/thumbnails/31.jpg)
A step backward: Artificial Neural Networks II
The perceptron Frank Rosenblatt (1928-1971)
• first trainable Artificial Neural Network: Rosenblatt’s Perceptron (Rosenblatt, 1958)• basic procedure:
• design the architecture (# layers, topology of connections)• choose the (possibly different) activation functions• learn the weights (based on the provided examples)
29
![Page 32: Machi e LeaØ i g a d C¡ ÍċöeØ Vièi¡control.units.it/pellegrino/attachments/article/100/2017MLCV_UNITS... · Machi e LeaØ i g a d C¡ ÍċöeØ Vièi¡ A i i ö¡ċØ a ¡](https://reader030.fdocuments.es/reader030/viewer/2022040605/5eab564292ebe623524a0ce1/html5/thumbnails/32.jpg)
Architectures
Fully connected, 1 hidden layer Fully connected, 2 hidden layers
• left:• 4 + 2 = 6 neurons (input layer is disregarded)• 3× 4 + 4× 2 weights (w) and 4 + 2 biases (b)⇒ 26 parameters
• right:• 4 + 4 + 1 = 9 neurons• 3× 4 + 4× 4 + 4× 1 weights (w) and 4 + 4 + 1 biases (b)⇒ 41 parameters
• a typical CNN has million of parameters (e.g. VGG ha 138M parameters)
30
![Page 33: Machi e LeaØ i g a d C¡ ÍċöeØ Vièi¡control.units.it/pellegrino/attachments/article/100/2017MLCV_UNITS... · Machi e LeaØ i g a d C¡ ÍċöeØ Vièi¡ A i i ö¡ċØ a ¡](https://reader030.fdocuments.es/reader030/viewer/2022040605/5eab564292ebe623524a0ce1/html5/thumbnails/33.jpg)
Activation functions
• output of a neuron
f(∑
i
wixi + b)
• f(·) : R −→ R is the activation function• many possible choices, e.g.
• sigmoidf(x) = 1
1 + e−x
• hyperbolic tangent
f(x) = tanh(x)
• rectified linear unit (ReLU)
f(x) = max {0, x}
• ReLU is currently preferred because• is fast to compute• guarantees faster convergence
sigmoid hyperbolic tangent
ReLU ReLU vs hyperbolic tangent (Krizhevskyet al., 2012)
31
![Page 34: Machi e LeaØ i g a d C¡ ÍċöeØ Vièi¡control.units.it/pellegrino/attachments/article/100/2017MLCV_UNITS... · Machi e LeaØ i g a d C¡ ÍċöeØ Vièi¡ A i i ö¡ċØ a ¡](https://reader030.fdocuments.es/reader030/viewer/2022040605/5eab564292ebe623524a0ce1/html5/thumbnails/34.jpg)
Training as an optimization problem
• N instances, each of dimension D, and K classes• the training set is
T ={
(xi, yi), xi ∈ RD, yi ∈ 1 . . . K, i = 1 . . .N
}
• let the output of the network be
f(x, θ) : RD −→ RK
• define the loss as
J(θ) = 1N∑
i
L(f(xi, θ), yi) + λR(θ)
• many possible choices for L and R, e.g.
L(f(xi, θ), yi) =∑
j=yi
max(0, fj(xi, θ)− fyi(xi, θ) + ∆)
R(θ) =∑
k
∑
l
W2kl
• minimize the loss by a properchoice of θ = (W, b)
• iterative minimization• the instances are presented
repeatedly to the network• each cycle of presentation is
called an “epoch”• the weights and biases are
adapted in such a way to modifythe actual output toward thedesired output
• the most popular training methodsare based on the so calledback-propagation
32
![Page 35: Machi e LeaØ i g a d C¡ ÍċöeØ Vièi¡control.units.it/pellegrino/attachments/article/100/2017MLCV_UNITS... · Machi e LeaØ i g a d C¡ ÍċöeØ Vièi¡ A i i ö¡ċØ a ¡](https://reader030.fdocuments.es/reader030/viewer/2022040605/5eab564292ebe623524a0ce1/html5/thumbnails/35.jpg)
Weight update by gradient descent
• at each step of iteration:• compute an estimate g of the gradient ∇J(θ)• update the weight vector:
θ ← θ − ϵg
• ϵ is called learning rate and is one of the mostimportant learning parameters
33
![Page 36: Machi e LeaØ i g a d C¡ ÍċöeØ Vièi¡control.units.it/pellegrino/attachments/article/100/2017MLCV_UNITS... · Machi e LeaØ i g a d C¡ ÍċöeØ Vièi¡ A i i ö¡ċØ a ¡](https://reader030.fdocuments.es/reader030/viewer/2022040605/5eab564292ebe623524a0ce1/html5/thumbnails/36.jpg)
Stochastic Gradient Descent
• the gradient of the loss function is
∇J(θ) = 1N∇
∑
i
L(f(xi, θ), yi)
• at each iteration, for computing ∇J(θ), the evaluationof the whole training data is required→ very slow forlarge N
• Stochastic Gradient Descent (SGD): at each iteration• sample a minibatch of m examples {x1, . . . , xm} and the
corresponding yi• estimate the gradient as
g =1m∇
∑
iL(f(xi, θ), yi)
• apply the updateθ ← θ − ϵg
• Size of minibatch:• m = 1: gradient direction is
unreliable (based on a singleexample)
• m = N: precise gradient direction,but slow computation
• typical choices for m are 32, 64, 128,to exploit the GPU
34
![Page 37: Machi e LeaØ i g a d C¡ ÍċöeØ Vièi¡control.units.it/pellegrino/attachments/article/100/2017MLCV_UNITS... · Machi e LeaØ i g a d C¡ ÍċöeØ Vièi¡ A i i ö¡ċØ a ¡](https://reader030.fdocuments.es/reader030/viewer/2022040605/5eab564292ebe623524a0ce1/html5/thumbnails/37.jpg)
Improved weight update
• simple update:θ ← θ − ϵg
• momentum:{
v ← αv− ϵgθ ← θ + v
• global adaption of learning rate (“annealing”):• step decay: Reduce the learning rate by some factor
every few epochs.• exponential decay:
ϵ = ϵ0e−kt
• per-parameter learning rate adaption:• AdaGrad• RMSProp• Adam (the most recommended, to date)
35
![Page 38: Machi e LeaØ i g a d C¡ ÍċöeØ Vièi¡control.units.it/pellegrino/attachments/article/100/2017MLCV_UNITS... · Machi e LeaØ i g a d C¡ ÍċöeØ Vièi¡ A i i ö¡ċØ a ¡](https://reader030.fdocuments.es/reader030/viewer/2022040605/5eab564292ebe623524a0ce1/html5/thumbnails/38.jpg)
Computing the gradient ∇J(θ)
• how can we actually compute the gradient ∇J(θ)?• the ANN can be thought of as the composition of many functions:
output = H(input) = [hn ◦ hn−1 ◦ · · · ◦ h0] (input)
• apply the chain rule:(f ◦ g)′(x) = f′(g(x))g′(x)
“the derivative of a composition of functions is the product of the derivatives”
36
![Page 39: Machi e LeaØ i g a d C¡ ÍċöeØ Vièi¡control.units.it/pellegrino/attachments/article/100/2017MLCV_UNITS... · Machi e LeaØ i g a d C¡ ÍċöeØ Vièi¡ A i i ö¡ċØ a ¡](https://reader030.fdocuments.es/reader030/viewer/2022040605/5eab564292ebe623524a0ce1/html5/thumbnails/39.jpg)
Backpropagation
Example of backpropagation for a neuron with sigmoid activation function: σ =1
1 + e−(w0x0+w1x1+w2)
• forward step (from input to output, green): sets the “state” and output of the networkcorresponding to a particular input
• backward step (reverse order, red): by applying the chain rule, computes the derivative of theoutput w.r.t. the input for the given state
• s = 2× (−1) + (−3)× (−2) + (−3) = 1⇒ (1 + e−s)−1 = 0.731• s = (2−0.2)× (−1) + (−3−0.39)× (−2) + (−3+0.2) = 2.18⇒ (1 + e−s)−1 = 0.898← increased
37
![Page 40: Machi e LeaØ i g a d C¡ ÍċöeØ Vièi¡control.units.it/pellegrino/attachments/article/100/2017MLCV_UNITS... · Machi e LeaØ i g a d C¡ ÍċöeØ Vièi¡ A i i ö¡ċØ a ¡](https://reader030.fdocuments.es/reader030/viewer/2022040605/5eab564292ebe623524a0ce1/html5/thumbnails/40.jpg)
Convolutional Neural Networks
![Page 41: Machi e LeaØ i g a d C¡ ÍċöeØ Vièi¡control.units.it/pellegrino/attachments/article/100/2017MLCV_UNITS... · Machi e LeaØ i g a d C¡ ÍċöeØ Vièi¡ A i i ö¡ċØ a ¡](https://reader030.fdocuments.es/reader030/viewer/2022040605/5eab564292ebe623524a0ce1/html5/thumbnails/41.jpg)
Convolutional Neural Networks
Example of CNN: a sequence of convolutional layers, pool layers and a final fully connected layer
38
![Page 42: Machi e LeaØ i g a d C¡ ÍċöeØ Vièi¡control.units.it/pellegrino/attachments/article/100/2017MLCV_UNITS... · Machi e LeaØ i g a d C¡ ÍċöeØ Vièi¡ A i i ö¡ċØ a ¡](https://reader030.fdocuments.es/reader030/viewer/2022040605/5eab564292ebe623524a0ce1/html5/thumbnails/42.jpg)
Convolution Layers i
slide over
all spatial locations
• the 28× 28× 1 “image” on the right is called activation map
• each element of the activation map is obtained by computing wTx + b where w and b are theweights and bias of the filter, and x is a 5× 5× 3 chunk of the image
39
![Page 43: Machi e LeaØ i g a d C¡ ÍċöeØ Vièi¡control.units.it/pellegrino/attachments/article/100/2017MLCV_UNITS... · Machi e LeaØ i g a d C¡ ÍċöeØ Vièi¡ A i i ö¡ċØ a ¡](https://reader030.fdocuments.es/reader030/viewer/2022040605/5eab564292ebe623524a0ce1/html5/thumbnails/43.jpg)
Convolution Layers ii
slide 6 filters over
all spatial locations
• typically, a bank of filters is employed
• using e.g. 6 filters, leads to a 28× 28× 6 activation map
40
![Page 44: Machi e LeaØ i g a d C¡ ÍċöeØ Vièi¡control.units.it/pellegrino/attachments/article/100/2017MLCV_UNITS... · Machi e LeaØ i g a d C¡ ÍċöeØ Vièi¡ A i i ö¡ċØ a ¡](https://reader030.fdocuments.es/reader030/viewer/2022040605/5eab564292ebe623524a0ce1/html5/thumbnails/44.jpg)
Convolution Layers iii
…CONV,
ReLU
CONV,
ReLU
CONV,
ReLU
• a ConvNet is a sequence of convolution layers, interspersed with activation functions
41
![Page 45: Machi e LeaØ i g a d C¡ ÍċöeØ Vièi¡control.units.it/pellegrino/attachments/article/100/2017MLCV_UNITS... · Machi e LeaØ i g a d C¡ ÍċöeØ Vièi¡ A i i ö¡ċØ a ¡](https://reader030.fdocuments.es/reader030/viewer/2022040605/5eab564292ebe623524a0ce1/html5/thumbnails/45.jpg)
Convolution Layers iv
Benefits due to convolutional layers:• sparse connectivity• parameter sharing• equivariance to translation
f(T(x)) = T(f(x))
“activation maps of translated image aretranslated activation maps of the originalimage”
Sparse connectivity (top) vs full connectivity (bottom)
Parameter sharing (top) vs individual parameters (bottom)
42
![Page 46: Machi e LeaØ i g a d C¡ ÍċöeØ Vièi¡control.units.it/pellegrino/attachments/article/100/2017MLCV_UNITS... · Machi e LeaØ i g a d C¡ ÍċöeØ Vièi¡ A i i ö¡ċØ a ¡](https://reader030.fdocuments.es/reader030/viewer/2022040605/5eab564292ebe623524a0ce1/html5/thumbnails/46.jpg)
Pooling layer
Pooling principle Max pooling
• pooling layer downsamples the volume spatially, independently in each depth slice of theinput volume
• the most common pooling operator is max
• also used: average pooling
43
![Page 47: Machi e LeaØ i g a d C¡ ÍċöeØ Vièi¡control.units.it/pellegrino/attachments/article/100/2017MLCV_UNITS... · Machi e LeaØ i g a d C¡ ÍċöeØ Vièi¡ A i i ö¡ċØ a ¡](https://reader030.fdocuments.es/reader030/viewer/2022040605/5eab564292ebe623524a0ce1/html5/thumbnails/47.jpg)
AlexNet
Imagenet classification with deep convolutional neural networks, (Krizhevsky et al., 2012)
• ILSVRC 2012 winner (15.3% top five error)
Convolutional kernels learned from the first convolutional layer of AlexNet
44
![Page 48: Machi e LeaØ i g a d C¡ ÍċöeØ Vièi¡control.units.it/pellegrino/attachments/article/100/2017MLCV_UNITS... · Machi e LeaØ i g a d C¡ ÍċöeØ Vièi¡ A i i ö¡ċØ a ¡](https://reader030.fdocuments.es/reader030/viewer/2022040605/5eab564292ebe623524a0ce1/html5/thumbnails/48.jpg)
Feature visualization i
• aim: understanding how a Convnetworks
• are the activation maps and layers“tuned” to specific patterns?
• which input pattern activates aspecific activation map of aspecific layer?
• “Deconvnet” (Zeiler and Fergus,2014) maps features to pixels(while a convolutional networkdoes the opposite)
A deconvnet layer (left) attached to a convnet layer (right), (Zeiler and Fergus, 2014)
45
![Page 49: Machi e LeaØ i g a d C¡ ÍċöeØ Vièi¡control.units.it/pellegrino/attachments/article/100/2017MLCV_UNITS... · Machi e LeaØ i g a d C¡ ÍċöeØ Vièi¡ A i i ö¡ċØ a ¡](https://reader030.fdocuments.es/reader030/viewer/2022040605/5eab564292ebe623524a0ce1/html5/thumbnails/49.jpg)
Feature visualization ii
Layers 1-3 Layers 4-5
Random subsets of feature maps and corresponding image patches
• layer 2: low level features• layer 3: textures (mainly)
• layer 4: more class-specific• layer 5: entire objects
46
![Page 50: Machi e LeaØ i g a d C¡ ÍċöeØ Vièi¡control.units.it/pellegrino/attachments/article/100/2017MLCV_UNITS... · Machi e LeaØ i g a d C¡ ÍċöeØ Vièi¡ A i i ö¡ċØ a ¡](https://reader030.fdocuments.es/reader030/viewer/2022040605/5eab564292ebe623524a0ce1/html5/thumbnails/50.jpg)
Feature visualization iii
inputimage
Low-levelfeatures
Mid-levelfeatures
High-levelfeatures Classifier label
47
![Page 51: Machi e LeaØ i g a d C¡ ÍċöeØ Vièi¡control.units.it/pellegrino/attachments/article/100/2017MLCV_UNITS... · Machi e LeaØ i g a d C¡ ÍċöeØ Vièi¡ A i i ö¡ċØ a ¡](https://reader030.fdocuments.es/reader030/viewer/2022040605/5eab564292ebe623524a0ce1/html5/thumbnails/51.jpg)
Case studies
![Page 52: Machi e LeaØ i g a d C¡ ÍċöeØ Vièi¡control.units.it/pellegrino/attachments/article/100/2017MLCV_UNITS... · Machi e LeaØ i g a d C¡ ÍċöeØ Vièi¡ A i i ö¡ċØ a ¡](https://reader030.fdocuments.es/reader030/viewer/2022040605/5eab564292ebe623524a0ce1/html5/thumbnails/52.jpg)
Case study: VGGNet
• systematic evaluation of many structures• winner:
• only 3× 3 convolution kernels• stride 1• pad 1• only 2× 2 max pool layers
VGGNet (Simonyan and Zisserman, 2014)
48
![Page 53: Machi e LeaØ i g a d C¡ ÍċöeØ Vièi¡control.units.it/pellegrino/attachments/article/100/2017MLCV_UNITS... · Machi e LeaØ i g a d C¡ ÍċöeØ Vièi¡ A i i ö¡ċØ a ¡](https://reader030.fdocuments.es/reader030/viewer/2022040605/5eab564292ebe623524a0ce1/html5/thumbnails/53.jpg)
Case study: GoogLe Net
GoogLe Net (Szegedy et al., 2015)
• ILSVRC 2014 winner (6.7% top five error)• inception modules:
• “network within a network”• filters with different spatial support in the
same layer
• auxiliary intermediate classifiers (ignored atinference time) Inception module
49
![Page 54: Machi e LeaØ i g a d C¡ ÍċöeØ Vièi¡control.units.it/pellegrino/attachments/article/100/2017MLCV_UNITS... · Machi e LeaØ i g a d C¡ ÍċöeØ Vièi¡ A i i ö¡ċØ a ¡](https://reader030.fdocuments.es/reader030/viewer/2022040605/5eab564292ebe623524a0ce1/html5/thumbnails/54.jpg)
Case study: ResNet
ResNet (He et al., 2016)
ILSVRC 2015 winner (3.6% top five error)
50
![Page 55: Machi e LeaØ i g a d C¡ ÍċöeØ Vièi¡control.units.it/pellegrino/attachments/article/100/2017MLCV_UNITS... · Machi e LeaØ i g a d C¡ ÍċöeØ Vièi¡ A i i ö¡ċØ a ¡](https://reader030.fdocuments.es/reader030/viewer/2022040605/5eab564292ebe623524a0ce1/html5/thumbnails/55.jpg)
Training deep networks
• network depth is of crucial importance• results on ImageNet competition seem to
suggest that ”the deeper, the better” BUT• when the net is “too deep” a degradation
phenomen occurs• the degradation is NOT caused by overfitting
• deeper structures are inherently moredifficult to train
• “proof”:• take a shallow network• add some layers having an equal amount of
inputs and outputs (i.e. layers that couldimplement an identity mapping)
• train both the networks on the same data• the deeper network should produce no
higher training error than its shallowercounterpart, but experiments with currentsolvers show that this is not the case
51
![Page 56: Machi e LeaØ i g a d C¡ ÍċöeØ Vièi¡control.units.it/pellegrino/attachments/article/100/2017MLCV_UNITS... · Machi e LeaØ i g a d C¡ ÍċöeØ Vièi¡ A i i ö¡ċØ a ¡](https://reader030.fdocuments.es/reader030/viewer/2022040605/5eab564292ebe623524a0ce1/html5/thumbnails/56.jpg)
Learning the residuals
• let H(x) be the whole transformation fromtop to bottom
• instead of learning H(x), write
H(x) = x + F(x)
and learn the residuals F(x)
• ResNet blocks learn variations from theidentity
Typical blocks. With a single layer the authors did not observe advantages.
52
![Page 57: Machi e LeaØ i g a d C¡ ÍċöeØ Vièi¡control.units.it/pellegrino/attachments/article/100/2017MLCV_UNITS... · Machi e LeaØ i g a d C¡ ÍċöeØ Vièi¡ A i i ö¡ċØ a ¡](https://reader030.fdocuments.es/reader030/viewer/2022040605/5eab564292ebe623524a0ce1/html5/thumbnails/57.jpg)
Results
• thin curves: training error, bold curves: validation error.• for ResNet, deeper networks show better performance as desired• ResNet-34 has 3.6 billion FLOPS• VGG-19 has 19.6 billion FLOPS
53
![Page 58: Machi e LeaØ i g a d C¡ ÍċöeØ Vièi¡control.units.it/pellegrino/attachments/article/100/2017MLCV_UNITS... · Machi e LeaØ i g a d C¡ ÍċöeØ Vièi¡ A i i ö¡ċØ a ¡](https://reader030.fdocuments.es/reader030/viewer/2022040605/5eab564292ebe623524a0ce1/html5/thumbnails/58.jpg)
Transfer learning
![Page 59: Machi e LeaØ i g a d C¡ ÍċöeØ Vièi¡control.units.it/pellegrino/attachments/article/100/2017MLCV_UNITS... · Machi e LeaØ i g a d C¡ ÍċöeØ Vièi¡ A i i ö¡ċØ a ¡](https://reader030.fdocuments.es/reader030/viewer/2022040605/5eab564292ebe623524a0ce1/html5/thumbnails/59.jpg)
Transfer learning I
inputimage
Low-levelfeatures
Mid-levelfeatures
High-levelfeatures Classifier label
• features may be significant per se (especially features at lower levels)• idea: take a pre-trained network and retrain the classifier only
54
![Page 60: Machi e LeaØ i g a d C¡ ÍċöeØ Vièi¡control.units.it/pellegrino/attachments/article/100/2017MLCV_UNITS... · Machi e LeaØ i g a d C¡ ÍċöeØ Vièi¡ A i i ö¡ċØ a ¡](https://reader030.fdocuments.es/reader030/viewer/2022040605/5eab564292ebe623524a0ce1/html5/thumbnails/60.jpg)
Transfer learning II
1. train on ImageNet 2. for small dataset:treat the CNN as afixed feature extractorand train the classifieronly
3. for medium-sizeddataset: use the old weightsas initialization and retraina bigger portion of the net
55
![Page 61: Machi e LeaØ i g a d C¡ ÍċöeØ Vièi¡control.units.it/pellegrino/attachments/article/100/2017MLCV_UNITS... · Machi e LeaØ i g a d C¡ ÍċöeØ Vièi¡ A i i ö¡ċØ a ¡](https://reader030.fdocuments.es/reader030/viewer/2022040605/5eab564292ebe623524a0ce1/html5/thumbnails/61.jpg)
Example: transfer learning with VGG
• transfer learning based on VGG net• took the output of the last hidden
layer (4096 elements)• trained a linear Support Vector
Machine
56
![Page 62: Machi e LeaØ i g a d C¡ ÍċöeØ Vièi¡control.units.it/pellegrino/attachments/article/100/2017MLCV_UNITS... · Machi e LeaØ i g a d C¡ ÍċöeØ Vièi¡ A i i ö¡ċØ a ¡](https://reader030.fdocuments.es/reader030/viewer/2022040605/5eab564292ebe623524a0ce1/html5/thumbnails/62.jpg)
CVPR course
• Course on Computer Vision and Pattern Recognition, starting September, 2018
SEE YOU IN SEPTEMBER!
57
![Page 63: Machi e LeaØ i g a d C¡ ÍċöeØ Vièi¡control.units.it/pellegrino/attachments/article/100/2017MLCV_UNITS... · Machi e LeaØ i g a d C¡ ÍċöeØ Vièi¡ A i i ö¡ċØ a ¡](https://reader030.fdocuments.es/reader030/viewer/2022040605/5eab564292ebe623524a0ce1/html5/thumbnails/63.jpg)
References i
Navneet Dalal and Bill Triggs. Histograms of oriented gradients for human detection. In ComputerVision and Pattern Recognition, 5. CVPR 5. IEEE Computer Society Conference on, pages886–893. IEEE, 2005.
Li Fei-Fei, Andrej Karpathy, Justin Johnson, and Serena Yeung. CS n: Convolutional NeuralNetworks for Visual Recognition. Stanford University, 2017. http://cs231n.stanford.edu/.
Pedro Felzenszwalb, David McAllester, and Deva Ramanan. A discriminatively trained, multiscale,deformable part model. In Computer Vision and Pattern Recognition, 8. CVPR 8. IEEEConference on, pages 1–8. IEEE, 2008.
Martin Fischer and Robert Elschlager. The representation and matching of pictorial structure. IEEETrans. Comput, 1:67–92, 1973.
Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep Learning. MIT Press, 2016.http://www.deeplearningbook.org.
58
![Page 64: Machi e LeaØ i g a d C¡ ÍċöeØ Vièi¡control.units.it/pellegrino/attachments/article/100/2017MLCV_UNITS... · Machi e LeaØ i g a d C¡ ÍċöeØ Vièi¡ A i i ö¡ċØ a ¡](https://reader030.fdocuments.es/reader030/viewer/2022040605/5eab564292ebe623524a0ce1/html5/thumbnails/64.jpg)
References ii
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for imagerecognition. In Proceedings of the IEEE conference on computer vision and pattern recognition,pages 770–778, 2016.
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deepconvolutional neural networks. In Advances in neural information processing systems, pages1097–1105, 2012.
David G Lowe. Object recognition from local scale-invariant features. In Computer Vision, 999. Theproceedings of the seventh IEEE International Conference on, volume 2, pages 1150–1157. Ieee,1999.
David G Lowe. Distinctive image features from scale-invariant keypoints. International journal ofcomputer vision, 60(2):91–110, 2004.
David Marr. Vision: A computational investigation into the human representation and processing ofvisual information. WH San Francisco: Freeman and Company, 1982.
59
![Page 65: Machi e LeaØ i g a d C¡ ÍċöeØ Vièi¡control.units.it/pellegrino/attachments/article/100/2017MLCV_UNITS... · Machi e LeaØ i g a d C¡ ÍċöeØ Vièi¡ A i i ö¡ċØ a ¡](https://reader030.fdocuments.es/reader030/viewer/2022040605/5eab564292ebe623524a0ce1/html5/thumbnails/65.jpg)
References iii
Warren S McCulloch and Walter Pitts. A logical calculus of the ideas immanent in nervous activity.The bulletin of mathematical biophysics, 5(4):115–133, 1943.
Volodymyr Mnih and Geoffrey E Hinton. Learning to detect roads in high-resolution aerial images.In European Conference on Computer Vision, pages 210–223. Springer, 2010.
Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-net: Convolutional networks for biomedicalimage segmentation. In International Conference on Medical Image Computing andComputer-Assisted Intervention, pages 234–241. Springer, 2015.
Frank Rosenblatt. The perceptron: A probabilistic model for information storage and organization inthe brain. Psychological review, 65(6):386, 1958.
K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition.CoRR, abs/1409.1556, 2014.
Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, DumitruErhan, Vincent Vanhoucke, and Andrew Rabinovich. Going deeper with convolutions. In The IEEEConference on Computer Vision and Pattern Recognition (CVPR), June 2015.
60
![Page 66: Machi e LeaØ i g a d C¡ ÍċöeØ Vièi¡control.units.it/pellegrino/attachments/article/100/2017MLCV_UNITS... · Machi e LeaØ i g a d C¡ ÍċöeØ Vièi¡ A i i ö¡ċØ a ¡](https://reader030.fdocuments.es/reader030/viewer/2022040605/5eab564292ebe623524a0ce1/html5/thumbnails/66.jpg)
References iv
Paul Viola and Michael Jones. Rapid object detection using a boosted cascade of simple features.In Computer Vision and Pattern Recognition, . CVPR . Proceedings of the IEEEComputer Society Conference on, volume 1, pages I–I. IEEE, 2001.
Matthew D Zeiler and Rob Fergus. Visualizing and understanding convolutional networks. InEuropean conference on computer vision, pages 818–833. Springer, 2014.
61